What Bayesian Statistics Can Do For You

Here’s our estimate of public support for vouchers, broken down by religion/ethnicity, income, and state:

(Click on image to see larger version.)

We’re mapping estimates from a hierarchical Bayes model fit to data from the 2000 Annenberg survey (approximately 50,000 respondents).

In case you’re wondering what Bayesian modeling did for us, here are the corresponding maps from the raw data (weighted to adjust for voter turnout, but that doesn’t actually do that much anyway):

(Click on image to see larger version.)

OK, so Bayes gives you a lot. The costs?

– Effort. It took me a couple weeks to make the first set of maps. Some of this was the modeling–I tried several different versions of the model and also had to come up with a quick-and-dirty way of adjusting for the turnout weights amid the regression modeling and poststratification.

(I also put in a lot of work to make the maps look just right, but it’s not really fair to count this as a cost: if we were only able to look at the raw data, we wouldn’t even be trying to make such maps in the first place.)

– Model dependence. Changing the model will change the estimates and change the maps. I don’t feel so bad about this, first because the raw estimates are so noisy, second because so-called raw estimates are themselves highly model dependent.

Everything depends on models, so let’s take them seriously

This discussion relates to my disagreement with Kos over the maps of Obama and McCain vote. I graphed model-based estimates constructed using the Pew pre-election polls; Kos didn’t trust these where they disagreed with published exit poll results.

The problems with Kos’s argument?

1. Exit polls are far from perfect: I’ve heard that in 2008 the raw exit poll data weren’t close to the actual election outcome.

2. Exit poll estimates depend strongly on the model used to select polling locations, assumptions about who responds to the poll, and the models used to adjust for sampling error, nonresponse, and unexpected contingencies.

I’m not trying to pick a fight with Kos here–he had lots of useful comments on my original maps, which motivated me to go back and fix some problems I’d had. I think that the insights of Kos and other people who are closely involved with day-to-day politics, combined with some of our modeling tools, make a good combination.

P.S. I expect that other, non-Bayesian methods could also work well, and I’d love to see how they do on this and similar examples. As we always say, what’s important isn’t the method, it’s the information included in the estimate. A strength of Bayesian hierarchical modeling is that it allows inclusion of diverse sources of information, but I’m sure other methods could do fine also, if set up appropriately.

P.P.S. People have given me a lot of flak on the map colors–to start with, they’re not so great for color-blind people. Also I need some better labeling of what the colors mean. More work to be done, and it’s good to get these graphs out there to get that kind of useful feedback.

In any case, my focus here is not on the pretty maps but rather on the modeling technology–the impressive ability of Bayesian data analysis to give reasonable estimates for all these subgroups.

P.P.P.S. Brendan O’Connor’s made a webpage allowing you to click back and forth between the two maps shown above.

FiveThirtyEight

What Bayesian Statistics Can Do For You

Comments