Causal Inference in Python

Causal Inference in Python, or Causalinference in short, is a software package that implements various statistical and econometric methods used in the field variously known as Causal Inference, Program Evaluation, or Treatment Effect Analysis.

Through a series of blog posts on this page, I will illustrate the use of Causalinference, as well as provide high-level summaries of the underlying econometric theory with the non-specialist audience in mind. Source code for the package can be found at its GitHub page, and detailed documentation is available at


In a nutshell, a causal effect is simply the difference between what happened and what would have happened. This notion is articulated more precisely by the so-called potential outcome framework, developed by Donald Rubin in the seventies. It goes as follows.

Suppose associated with each subject is a pair of potential outcomes, \(Y(0)\) and \(Y(1)\). Here \(Y(0)\) denotes the outcome that would result if the subject does not receive the treatment, and \(Y(1)\) denotes the outcome that would result if the subject does receive the treatment. Defined this way, the treatment effect for the subject is simply \(Y(1)-Y(0)\).

For each subject, let their treatment status be denoted by the binary variable \(D\). That is, if the subject received treatment, put \(D=1\); otherwise put \(D=0\). This allows us to write the observed outcome — denote it by \(Y\) — as $$Y = (1-D)Y(0) + DY(1).$$

\(Y(0)\) and \(Y(1)\) are called potential outcomes because only one of them will be realized and observed as \(Y\), since subjects cannot both receive and not receive treatment. This is the fundamental problem of causal inference (Holland, 1986): for every subject, one of the potential outcomes will always be missing; yet causal effects are necessarily defined in terms of both.

Partly because of this problem, we will have better luck if we focus not on the subject-level treatment effect \(Y(1)-Y(0)\), but instead on the average treatment effect, \(\mathrm{E}[Y(1)-Y(0)]\), or other similar aggregate quantities.

In most applications, we will also observe a vector of individual characteristics, or covariates, for each subject. Let us denote this by \(X\). The vector of observables for each subject is therefore the triple \((Y, D, X)\).

Finally, denote the probability of receiving treatment conditional on \(X\), also known as the propensity score, by \(p(X)\). That is, let \(p(X) = \mathrm{P}(D=1|X)\). As we shall see, this quantity will play a central role in most of what follows.


Holland, P. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945-960.