Paper reading: "Predicting the Present with Bayesian Structural Time Series"

https://ai.google/research/pubs/pub41335/
Steven L. Scott, Hal Varian

This 2013 paper from Google economists is a good read for the sake of building mental foundation for the well-known 2014 paper, Inferring Causal Impact Using Bayesian Structural Time-series Models, which accompanied the release of the R CausalImpact package. We'll read that in a later post on this blog.

Predicting the Present lays out a system for "nowcasting"—predicting the present value of some time-series statistic, using past values of that time-series and past and present values of a bunch of other time-series. A motivating example from the paper: say we'd like to know the number of weekly unemployment claims in the US for the purpose of understanding macroeconomic trends. This statistic will eventually be released by a government agency, but we want it now. We can nowcast this statistic with its history and a bunch of correlated time-series whose current values are more-easily attainable. How about Google searches?

Model

Our goal is to estimate $y^t$, the value of the target variable $y$ at time $t$. We have a vector of other observations at time $t$, $\mathbf{x}^t$. We can think about estimating $y^t$ as a linear regression on the $\mathbf{x}^t$, plus a seasonal component $\tau$, plus a latent value $\mu^t$ whose non-random growth comes from another latent value $\delta^t$. Put it all together and you get the following:

Where $u^t, w^t$, and $v^t$ are random Gaussian noise.

This is broadly an instance of a structural time-series model in state-space form—a broad class of models which includes the popular ARIMA and VARMA. We'll compute the state components in our model using Kalman filters and Kalman smoothers, the standard tools for working with state-space models.

We also need to estimate the weights $\beta$, which come from a regression with a spike-and-slab prior. Spike-and-slab priors are popular when our model should be sparse: most regression weights should be equal to 0. In the context of nowcasting with Google searches, there are many time-series (meaning the vector $\mathbf{x}^t$ is large) but most of them are just noise. We can encode this in a prior distribution as follows: let a given regressor have non-zero weight with probability $p$ (where small $p$ will yield a "spike" at 0); then, conditional on inclusion of the regressor, a wide normal distribution on its value (the "slab"). The parameters from both the regression and state-space components can be estimated using Markov Chain Monte-Carlo methods.

Results

The researchers evaluate their "nowcasts" using data on weekly unemployment claims and monthly retail sales, benchmarking the two-component system with a pure time-series model. The $\mathbf{x}^t time-series used in the regression component come from Google Trends and Google Correlate. The nowcasting system exhibited generally lower errors than the pure time-series benchmark, with the differences most pronounced at the beginning of the recession of 2007. This is significant, as the problem of anticipating "turning points" is the most difficult challenge of economic forecasting.

Further Reading
Canonical textbook all the blogs bring up: Timeseries Analysis by State-Space Methods
State Space Modeling in Python
Chapter 6, Probabilistic Graphical Models: Principles and Techniques
An Introduction to the Kalman Filter
Sampling Methods: a good source for reviewing the MCMC/Gibbs sampling methods glossed over in this post.

Comments

Popular posts from this blog

Paper reading: "Improve User Retention with Causal Learning"

Paper reading: "Characterization of Overlap in Observational Studies"

Paper reading: "Estimating individual treatment effect: generalization bounds and algorithms"