Causal Inference in Python

Causal Inference in Python, or Causalinference in short, is a software package that implements various statistical and econometric methods used in the field variously known as Causal Inference, Program Evaluation, or Treatment Effect Analysis.

Through a series of blog posts on this page, I will illustrate the use of Causalinference, as well as provide high-level summaries of the underlying econometric theory with the non-specialist audience in mind. Source code for the package can be found at its GitHub page, and detailed documentation is available at

Least Squares

One of the simplest treatment effect estimators is the ordinary least squares (OLS) estimator. Below we illustrate several common specifications that can be computed by Causalinference, and describe why least squares can behave poorly when there is not enough covariate overlap.

To estimate treatment effects via OLS, we simply call the method est_via_ols, which by default runs the following regression: $$Y_i = \alpha + \beta D_i + \gamma' (X_i - \bar{X}) + \delta' D_i (X_i - \bar{X}) + \varepsilon_i.$$

The resulting treatment effect estimates are stored in the attribute estimates, and can be displayed as follows:

>>> causal.est_via_ols()
>>> print(causal.estimates)

Treatment Effect Estimates: OLS

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
           ATE      3.672      0.906      4.051      0.000      1.895      5.449
           ATC     -0.227      0.930     -0.244      0.807     -2.050      1.596
           ATT      6.186      1.067      5.799      0.000      4.095      8.277

Here ATE, ATC, and ATT stand for, respectively, average treatment effect, average treatment effect for the controls, and average treatment effect for the treated. Like summary_stats, the attribute estimates is a dictionary-like object that contains the estimation results.

Numerically, an equivalent way of running the aforementioned regression involves the following steps:

  1. Using only control units, regress the observed outcome on the covariates, and collect the regression coefficients. Using these coefficients and the covariates of each treated subject \(i\), compute the least squares prediction and call it \(\hat{Y}_i(0)\).
  2. Using only treated units, regress the observed outcome on the covariates, and collect the regression coefficients. Using these coefficients and the covariates of each control subject \(i\), compute the least squares prediction and call it \(\hat{Y}_i(1)\).
  3. Estimate the individual-level treatment effect by computing \(\hat{\tau}_i = Y_i-\hat{Y}_i(0)\) for treated subjects and \(\hat{\tau}_i = \hat{Y}_i(1)-Y_i\) for control subjects.
  4. Estimate the overall average treatment effect by \(\hat{\tau} = N^{-1} \sum_{i=1}^N \hat{\tau}_i\).

It turns out that \(\hat{\tau}\) computed above is numerically identical to the ATE estimate obtained from running the regression displayed at the beginning of this post.

This result shows that OLS is essentially imputing the missing potential outcomes of a given group by extrapolating linearly from the observations of the other group. It thus follows that the less covariate overlap there is between the two groups the more hopelessly heroic the extrapolation, especially if the underlying relationship between outcomes and covariates is nonlinear to begin with. This explains why the estimated ATE of 3.672 shown above is so far away from the true ATE of 10. Overcoming this problem of OLS motivates much of the remaining estimators we will consider.

Returning to the method est_via_ols, it is possible to use it to run two even more restrictive linear specifications. The first excludes the interaction terms and runs $$Y_i = \alpha + \beta D_i + \gamma' (X_i - \bar{X}_i) + \varepsilon_i.$$

Unlike the previous specification, this one assumes a constant treatment effect, which, if true, can lead to more precise estimates if the restricted regression is run instead. To do so, simply supply a value of 1 to the optional parameter adj:

>>> causal.est_via_ols(adj=1)

Setting adj=0, on the other hand, specifies that est_via_ols run the following no-covariates regression: $$Y_i = \alpha + \beta D_i + \varepsilon_i.$$

Of course, this gives nothing other than the raw difference between the sample means of the treatment and control groups, and could just as well be obtained from summary_stats, as we saw previously.