Causal Inference in Python

Causal Inference in Python, or Causalinference in short, is a software package that implements various statistical and econometric methods used in the field variously known as Causal Inference, Program Evaluation, or Treatment Effect Analysis.

Through a series of blog posts on this page, I will illustrate the use of Causalinference, as well as provide high-level summaries of the underlying econometric theory with the non-specialist audience in mind. Source code for the package can be found at its GitHub page, and detailed documentation is available at causalinferenceinpython.org.

Stratification

Unconfoundedness implies that matching on propensity scores also provides a valid way of constructing treatment effect estimators. In this post we look at a few ways of stratifying the sample into blocks that contain units with similar propensity scores.

Causalinference provides two methods for subclassification based on propensity score. The first, stratify, splits the sample based on what is specified in the attribute blocks. This attribute is created and assigned a default value of 5 whenever est_propensity or est_propensity_s is run:

>>> Y, D, X = vignette_data()
>>> causal = CausalModel(Y, D, X)
>>> causal.est_propensity_s()
>>> causal.blocks
5

Running stratify at this point will split the sample into 5 equal-sized bins, all of which will be contained in a new attribute strata. This attribute behaves like a list that contains 5 CausalModel instances, but with some bells and whistles added, including the ability to be printed:

>>> print(causal.strata)

Stratification Summary

              Propensity Score         Sample Size     Ave. Propensity   Outcome
   Stratum      Min.      Max.  Controls   Treated  Controls   Treated  Raw-diff
--------------------------------------------------------------------------------
         1     0.064     0.267       170        30     0.183     0.185     9.282
         2     0.268     0.498       115        85     0.368     0.388    12.928
         3     0.499     0.777        74       126     0.629     0.647    11.131
         4     0.778     0.937        27       173     0.844     0.872    12.778
         5     0.937     0.996         6       194     0.949     0.973    26.964

Since strata works like a Python list, you can, for instance, access specific stratum and work on it like a regular CausalModel instance:

>>> print(causal.strata[0].summary_stats)

Summary Statistics

                       Controls (N_c=170)          Treated (N_t=30)             
       Variable         Mean         S.d.         Mean         S.d.     Raw-diff
--------------------------------------------------------------------------------
              Y       35.495       32.329       44.777       33.344        9.282

                       Controls (N_c=170)          Treated (N_t=30)             
       Variable         Mean         S.d.         Mean         S.d.     Nor-diff
--------------------------------------------------------------------------------
             X0        3.336        3.287        2.912        3.281       -0.129
             X1        2.329        2.886        2.693        2.998        0.124

We can also loop over it like any Python list:

>>> for stratum in causal.strata:
...     print(stratum.summary_stats['ndiff'])
... 
[-0.12909363  0.12373693]
[ 0.13403878 -0.00388046]
[ 0.02754279  0.0482487 ]
[ 0.14361112  0.02104157]
[ 0.48662652  0.30347525]

Instead of splitting the sample into equal-sized bins, it is possible to specify exactly the propensity bin boundary values and ask stratify to split the sample accordingly. The following illustrates how we can split the sample into 3 bins, with boundaries at 0.3 and 0.7:

>>> causal.reset()
>>> causal.est_propensity_s()
>>> causal.blocks = [0, 0.3, 0.7, 1]
>>> causal.stratify()
>>> print(causal.strata)

Stratification Summary

              Propensity Score         Sample Size     Ave. Propensity   Outcome
   Stratum      Min.      Max.  Controls   Treated  Controls   Treated  Raw-diff
--------------------------------------------------------------------------------
         1     0.064     0.300       193        47     0.195     0.220    11.569
         2     0.300     0.697       150       159     0.470     0.526    13.864
         3     0.703     0.996        49       402     0.823     0.910    33.862

Finally, it is also possible to split the sample in a more data-driven way. The intuition for this is that, as the sample size increases, we should be able to more finely divide the sample into more bins so that we increase the degree of propensity score similarity within each bin.

In particular, the second method for subclassification, stratify_s, will use a data-driven procedure for selecting both the number of blocks and their boundaries, with the expectation that the number of blocks should increase with the sample size. At a high level, this algorithm, suggested by Imbens and Rubin (2015), works as follows:

  1. Divide the sample into two using the median propensity score as the split point.
  2. Run a two-sample t-test to test whether the average linearized propensity scores of the two split strata are significantly different. If so, keep the split. Otherwise, revert and stop.
  3. Recursively repeat Steps 1 and 2 on the left and right strata.

It is not difficult to show that this divide-and-conquer algorithm runs in \(O(N \log N)\) time, and so is very inexpensive to perform. To implement this in Causalinference, simply run stratify_s:

>>> causal.reset()
>>> causal.est_propensity_s()
>>> causal.stratify_s()
>>> print(causal.strata)

Stratification Summary

              Propensity Score         Sample Size     Ave. Propensity   Outcome
   Stratum      Min.      Max.  Controls   Treated  Controls   Treated  Raw-diff
--------------------------------------------------------------------------------
         1     0.064     0.204       106        20     0.149     0.163     4.430
         2     0.205     0.262        55         8     0.235     0.221    21.904
         3     0.262     0.312        39        23     0.283     0.284    19.669
         4     0.312     0.469        78        47     0.388     0.399     8.866
         5     0.471     0.646        51        74     0.563     0.551    15.079
         6     0.648     0.805        37        88     0.722     0.727     8.637
         7     0.807     0.862        11        52     0.831     0.837    -0.457
         8     0.862     0.910         5        57     0.883     0.889     6.973
         9     0.911     0.996        10       239     0.940     0.964    28.056

If we inspect blocks at this point, we can see the boundary points that were selected by the algorithm:

>>> causal.blocks
[0.064452633756050537, 0.20407017570006966, 0.26211924475391141, 0.31210473124612848, 0.4694140664640063, 0.64589266254404365, 0.80512142741982795, 0.86163972091018581, 0.90955041887994104, 0.99635034430606662]

As we can see above, the number of bins selected by the algorithm is quite a bit larger than the default number of 5, with each bin containing units that have very similar average estimated propensity scores.

References

Imbens, G. & Rubin, D. (2015). Causal inference in statistics, social, and biomedical sciences: An introduction. Cambridge University Press.