Paper reading: "Estimating individual treatment effect: generalization bounds and algorithms"
Uri Shalit, Fredrik D. Johansson, David Sontag
Causal inference tasks are often focused on estimating the average effect of a treatment across a population—the ATE (average treatment effect) and the ATT (average treatment effect on the treated). In this paper, the researchers instead focus on Individual Treatment Effect (ITE). In reality many decisions are in fact made on an individual level—e.g., how should a doctor treat an individual patient with symptoms far from the average case?—making the goal of bounding ITE error a highly-desirable one.
The bounds in this paper are proven in the context of an assumption known as strong ignorability, which means (1.) we're assuming there are no hidden confounders—every feature that has a causal impact on the outcome
Intuition
How should we think about the task of bounding ITE error? Here is one way: we can fit one model to our data that estimates the outcome of the treated group (
But there is an additional source of error we haven't yet accounted for: the samples that
Results
The researchers show that this additional error is bounded by a distance metric between probability distributions called the Integral Probability Metric:
(Recall, or look up on Wikipedia like me, that a metric is just a function that defines a non-negative distance between every two elements of a set, and satisfies a few other conditions like the triangle inequality.)
The bound—expressed as expected Precision in Estimation of Heterogeneous Effect (PEHE), the expected square error of the ITE estimator—is based on a representation function
We can even incorporate the IPM term into our loss function to encourage the neural network to find representations that minimize the distribution distance!
Shalit et. al. call their algorithm Counterfactual Regression, and the version without the IPM regularization "Treatment-Agnostic Representation Network (TARNet)".
Experiments
As usual, evaluating causal inference algorithms is hard because real-world datasets have no ground truth! In this case the researchers evaluate their new algorithm on a "semi-synthetic" Infant Health and Development Program (IDHP) dataset and the ubiquitous LaLonde Jobs dataset. The Jobs dataset consists of both a randomized and non-randomized component, making it a popular choice for causal inference model evaluation.
Models are evaluated either within-sample, meaning ITE is estimated in a context where one of the counterfactual outcomes is known; or out-of-sample, meaning no outcome is observed.
Further reading:
"Learning representations for counterfactual inference" https://arxiv.org/pdf/1605.03661.pdf (the prequel to this work.
"Bayesian nonparametric modeling for causal inference" https://nyuscholars.nyu.edu/en/publications/bayesian-nonparametric-modeling-for-causal-inference
Git repo for CFRnet: https://github.com/clinicalml/cfrnet
Comments
Post a Comment