In this post we will look at one additional treatment effect estimator — the so-called doubly-robust weighting estimator.

First, it turns out that under unconfoundedness, the following two equalities are true: $$\mathrm{E}\left[\frac{D Y}{p(X)}\right] = \mathrm{E}[Y(1)] \quad \mbox{and} \quad \mathrm{E}\left[\frac{(1-D)Y}{1-p(X)}\right] = \mathrm{E}[Y(0)].$$

This in turn suggests that the expectation of the potential outcomes can be estimated using $$\hat{\mathrm{E}}[Y(1)] = \frac{1}{N} \sum_{i=1}^N \frac{D_i Y_i}{p(X_i)} \quad \mbox{and} \quad \hat{\mathrm{E}}[Y(0)] = \frac{1}{N} \sum_{i=1}^N \frac{(1-D_i) Y_i}{1-p(X_i)}.$$

The difference between these two averages is thus a valid estimator of the average treatment effect \(\mathrm{E}[Y(1)-Y(0)]\). This estimator, also known as the Horvitz-Thompson estimator, is closely related to the inverse probability weighted estimators one might see in the missing data literature. In general, inverse weighting probability is used to inflate the weights for subjects who are underrepresented, thereby eliminating the bias that missing data might introduce. In our case, because the propensity score \(p(X)\) represents the probability of observing \(Y(1)\), inverse weighting by \(p(X)\) gives us exactly the right adjustment for eliminating the bias from selection.

Since the true propensity score is rarely known in practice, we typically use the estimated propensity score \(\hat{p}\) and the modified estimator $$\hat{\mathrm{E}}[Y(1)-Y(0)] = \left(\sum_{i=1}^N \frac{D_i}{\hat{p}(X_i)}\right)^{-1} \sum_{i=1}^N \frac{D_i Y_i}{\hat{p}(X_i)} - \left(\frac{1-D_i}{1-\hat{p}(X_i)}\right)^{-1} \sum_{i=1}^N \frac{(1-D_i) Y_i}{1-\hat{p}(X_i)}.$$

Alternatively, it is possible to compute the above estimator by running weighted least squares with the regression function $$Y_i = \alpha + \beta D_i + \varepsilon_i,$$

with weights given by $$\hat{\lambda}_i = \frac{1}{(1-\hat{p}(X_i))^{1-D_i} \hat{p}(X_i)^{D_i}}.$$

Expressed this way, it is easy to modify the estimator to further control for covariates, by including them into the regression function: $$Y_i = \alpha + \beta D_i + \gamma' X_i + \varepsilon_i.$$

Running weighted least squares on the above regression function with weights \(\hat{\lambda}\) yields the so-called doubly-robust estimator. This estimator has the property that as long as either the specification of the propensity score or the specification of the regression function is correct, it will be consistent for the true average treatment effect.

To compute this estimator using *Causalinference*, we simply run `est_via_weighting`

, as follows:

`>>> Y, D, X = vignette_data() >>> causal = CausalModel(Y, D, X) >>> causal.est_propensity_s() >>> causal.est_via_weighting() >>> print(causal.estimates) Treatment Effect Estimates: Weighting Est. S.e. z P>|z| [95% Conf. int.] -------------------------------------------------------------------------------- ATE 17.989 1.443 12.469 0.000 15.161 20.816`

Unfortunately, despite the apparent desirability of the double robustness property, it does not apply in this case as we have misspecified both the propensity score and the regression function. Furthermore, because the estimated propensity scores enter as the denominator, any noise in the estimated propensity scores can actually generate considerable bias. As a result, the estimated average treatment effect we see above turns out to be quite far from the true value of 10. For a more detailed discussion of the relative merits of weighting estimators, see Imbens and Rubin (2015).

### References

Imbens, G. & Rubin, D. (2015). *Causal inference in statistics, social, and biomedical sciences: An introduction*. Cambridge University Press.