Posts

Paper reading: "Predicting the Present with Bayesian Structural Time Series"

Image
https://ai.google/research/pubs/pub41335/ Steven L. Scott, Hal Varian This 2013 paper from Google economists is a good read for the sake of building mental foundation for the well-known 2014 paper, Inferring Causal Impact Using Bayesian Structural Time-series Models , which accompanied the release of the R CausalImpact package . We'll read that in a later post on this blog. Predicting the Present lays out a system for "nowcasting"—predicting the present value of some time-series statistic, using past values of that time-series and past and present values of a bunch of other time-series. A motivating example from the paper: say we'd like to know the number of weekly unemployment claims in the US for the purpose of understanding macroeconomic trends. This statistic will eventually  be released by a government agency, but we want it now . We can nowcast this statistic with its history and a bunch of correlated time-series whose current values are more-easily attai

Paper reading: "Improve User Retention with Causal Learning"

Image
http://proceedings.mlr.press/v104/du19a/du19a.pdf Shuyang Du, James Lee, Farzin Ghaffarizadeh Here's the situation, handed to us in this paper from researchers at Uber: we'd like to offer a marketing promotion to our users in order to get them to retain  better—to use our product more consistently over time. One way we can model this problem is with a binary random variable $Y^r$ (the $r$ stands for "retention"), where $Y_i^r = 1$ if and only if user $i$ used our product in a given time period. We could assume our marketing promotion has some fixed cost per user, then target the users with the highest treatment effect $\tau^r(x_i) = \mathbb{E}[Y_i^r(1) - Y_i^r(0) \mid X = x_i]$, where user $i$ has covariates $x_i$. This might be a valid framing for some marketing promotions. For others, it is incomplete. Let's say for example you run, I don't know, a ride-sharing company, maybe we can call you Uber. One promotion you might run is to offer everyone free r

Blog reading: "Using Causal Inference to Improve the Uber User Experience"

Image
https://eng.uber.com/causal-inference-at-uber/ In this post, two data scientists survey the internal landscape of causal inference techniques at Uber, providing some valuable insight into the state of causal data science in Silicon Valley. With some great flowcharts to boot! The post is divided into two sections: one with techniques for use in the context of random experiments and another for observational contexts. There's a lot of ground covered in these flowcharts, so we'll have to go into more depth on some of these topics in later posts. (Uber has published more details in blogs and papers on a number of these topics.) Still, it's a great 5,000-foot view of how data scientists at tech companies are thinking about causal inference techniques today. Random experiments Why would we need causal inference techniques when we've already  run a random experiment? Isn't causal inference supposed to be for situations where we don't have access to the gold st

Paper reading: "Estimating individual treatment effect: generalization bounds and algorithms"

Image
Uri Shalit, Fredrik D. Johansson, David Sontag https://arxiv.org/pdf/1606.03976v5.pdf I Want My ITE Causal inference tasks are often focused on estimating the average effect of a treatment across a population—the ATE (average treatment effect) and the ATT (average treatment effect on the treated). In this paper, the researchers instead focus on Individual Treatment Effect (ITE). In reality many decisions are in fact made on an individual level—e.g., how should a doctor treat an individual patient with symptoms far from the average case?—making the goal of bounding ITE error a highly-desirable one. The bounds in this paper are proven in the context of an assumption known as strong ignorability, which means (1.) we're assuming there are no hidden confounders—every feature that has a causal impact on the outcome $Y$ is observed in either the treatment $t$ or the covariates $x$—and (2.) $0 < p(t = 1) < 1$ over the entire distribution, meaning we don't have

Paper reading: "Characterization of Overlap in Observational Studies"

Image
"Characterization of Overlap in Observational Studies" Fredrik D. Johansson, Dennis Wei, Michael Oberst, Tian Gao, Gabriel Brat, David Sontag, Kush R. Varshney https://arxiv.org/pdf/1907.04138v2.pdf Doing causal inference requires  overlap  between the treated and untreated groups. Suppose we were testing (from observational data) a pill meant to improve a patient's memory; if all the pill-takers were old and the abstainers were all young, there would be little we could infer from their results on a memory test due to the lack of overlap in the "age" covariate. Johansson et. al. describe an algorithm for finding and describe the region of overlap in an observational study in a form that is succinct and easily interpretable by a human being who is not a machine learning expert. A primary goal is to build guardrails for the application of learnings from observational studies: When researchers publish the findings of a clinical trial, they also share