Causal Inference in Python

Causal Inference in Python, or Causalinference in short, is a software package that implements various statistical and econometric methods used in the field variously known as Causal Inference, Program Evaluation, or Treatment Effect Analysis.

Through a series of blog posts on this page, I will illustrate the use of Causalinference, as well as provide high-level summaries of the underlying econometric theory with the non-specialist audience in mind. Source code for the package can be found at its GitHub page, and detailed documentation is available at causalinferenceinpython.org.

Initialization

In this post we will go over the installation of Causalinference, as well as how to initialize the main class CausalModel.

To install Causalinference, simply type the following into the terminal:

$ pip install causalinference

This assumes that Pip was properly set up, and that the dependencies NumPy and SciPy have been installed. If any of these pose an issue, I recommend consulting this very excellent guide.

The main object of interest in Causalinference is the class CausalModel, which we can import with

>>> from causalinference import CausalModel

CausalModel takes as inputs three NumPy arrays: Y, an \(N\)-vector of observed outcomes; D, an \(N\)-vector of treatment status indicators; and X, an \(N\)-by-\(K\) matrix of covariates.

To initialize a CausalModel instance called, say, causal, simply run

>>> causal = CausalModel(Y, D, X)

If you don't have any particular data set handy yet, we can get a random one with

>>> from causalinference.utils import random_data
>>> Y, D, X = random_data()

In any case, once an instance of the class CausalModel has been created, it will contain a number of attributes and methods that are relevant for conducting causal analyses. As is the tradition in Python, these can be listed out by invoking dir on the object:

>>> dir(causal)
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_post_pscore_init', 'blocks', 'cutoff', 'est_propensity', 'est_propensity_s', 'est_via_blocking', 'est_via_matching', 'est_via_ols', 'est_via_weighting', 'estimates', 'old_data', 'propensity', 'raw_data', 'reset', 'strata', 'stratify', 'stratify_s', 'summary_stats', 'trim', 'trim_s']

Detailed documentation for the methods, on the other hand, can be found by

>>> help(causal)

For easy reading, this information is also available on the package's website.

CausalModel is stateful. This means that as we employ the methods we saw listed above, the instance causal will mutate, with new data being added or existing data being modified or dropped. To return causal to its initial state, simply run

>>> causal.reset()