# Causal Inference in Python

Causal Inference in Python, or Causalinference in short, is a software package that implements various statistical and econometric methods used in the field variously known as Causal Inference, Program Evaluation, or Treatment Effect Analysis.

Through a series of blog posts on this page, I will illustrate the use of Causalinference, as well as provide high-level summaries of the underlying econometric theory with the non-specialist audience in mind. Source code for the package can be found at its GitHub page, and detailed documentation is available at causalinferenceinpython.org.

# Notation

In a nutshell, a causal effect is simply the difference between what happened and what would have happened. This notion is articulated more precisely by the so-called potential outcome framework, developed by Donald Rubin in the seventies. It goes as follows.

Suppose associated with each subject is a pair of potential outcomes, $$Y(0)$$ and $$Y(1)$$. Here $$Y(0)$$ denotes the outcome that would result if the subject does not receive the treatment, and $$Y(1)$$ denotes the outcome that would result if the subject does receive the treatment. Defined this way, the treatment effect for the subject is simply $$Y(1)-Y(0)$$.

For each subject, let their treatment status be denoted by the binary variable $$D$$. That is, if the subject received treatment, put $$D=1$$; otherwise put $$D=0$$. This allows us to write the observed outcome — denote it by $$Y$$ — as $$Y = (1-D)Y(0) + DY(1).$$

$$Y(0)$$ and $$Y(1)$$ are called potential outcomes because only one of them will be realized and observed as $$Y$$, since subjects cannot both receive and not receive treatment. This is the fundamental problem of causal inference (Holland, 1986): for every subject, one of the potential outcomes will always be missing; yet causal effects are necessarily defined in terms of both.

Partly because of this problem, we will have better luck if we focus not on the subject-level treatment effect $$Y(1)-Y(0)$$, but instead on the average treatment effect, $$\mathrm{E}[Y(1)-Y(0)]$$, or other similar aggregate quantities.

In most applications, we will also observe a vector of individual characteristics, or covariates, for each subject. Let us denote this by $$X$$. The vector of observables for each subject is therefore the triple $$(Y, D, X)$$.

Finally, denote the probability of receiving treatment conditional on $$X$$, also known as the propensity score, by $$p(X)$$. That is, let $$p(X) = \mathrm{P}(D=1|X)$$. As we shall see, this quantity will play a central role in most of what follows.

### References

Holland, P. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945-960.