Frequently Asked Questions, Last Revised 8/7/08

FAQ and Statement of Methodology
FiveThirtyEight.com
Revised 8/7/2008

Site/Meta

Who are you? My name is Nate Silver and I live in Chicago. For additional background, please see here or here. The other contributor to this website, Sean Quinn, lives in Washington, DC.

What is the significance of the number 538? 538 is the number of electors in the electoral college.

What is the mission of this website? Most broadly, to accumulate and analyze polling and political data in way that is informed, accurate and attractive. Most narrowly, to give you the best possible objective assessment of the likely outcome of upcoming elections.

How is this site different from other compilations of polls like Real Clear Politics? There are several principal ways that the FiveThityEight methodology differs from other poll compilations:

Firstly, we assign each poll a weighting based on that pollster’s historical track record, the poll’s sample size, and the recentness of the poll. More reliable polls are weighted more heavily in our averages.

Secondly, we include a regression estimate based on the demographics in each state among our ‘polls’, which helps to account for outlier polls and to keep the polling in its proper context.

Thirdly, we use an inferential process to compute a rolling trendline that allows us to adjust results in states that have not been polled recently and make them ‘current’.

Fourthly, we simulate the election 10,000 times for each site update in order to provide a probabilistic assessment of electoral outcomes based on a historical analysis of polling data since 1952. The simulation further accounts for the fact that similar states are likely to move together, e.g. future polling movement in states like Michigan and Ohio, or North and South Carolina, is likely to be in the same direction.

How often is the site updated? Generally, the charts, graphs and polling averages on the site are refreshed once per day to reflect any new polls. Sometimes, there might not be any polling on a given day, and so an update will not take place. Other times, volume may be so heavy that multiple updates are necessary.

You can tell that the charts and graphs on the site have been updated any time you see the “Today’s Polls” tag in the footer.

Senate polls are updated less frequently: generally once per week, on Mondays.

What is your political affiliation? My state has non-partisan registration, so I am not registered as anything. I vote for Democratic candidates the majority of the time (though by no means always). This year, I have been a supporter of Barack Obama. The other contributor to this website, Sean Quinn, has also been a supporter of Barack Obama.

Are your results biased toward your preferred candidates? I hope not, but that is for you to decide. I have tried to disclose as much about my methodology as possible.

Does this site accept advertising? FiveThirtyEight.com is a commercial site and accepts advertising. Our preferred advertiser is BlogAds. To run an ad at FiveThirtyEight.com, please click here. If you wish to purchase an ad that doesn’t fit into the template provided by BlogAds, you can contact me directly at 538dotcom@gmail.com.

Why do you run ads for [insert name of candidate you don’t like]? I believe in the right of free speech. Blogging is one form of free speech, and political advertising is another. If I believe an ad is particularly misleading, I will seek to block it, but otherwise, this site takes a non-partisan position toward which advertising it accepts. Ads for John McCain, Barack Obama and Hillary Clinton have each appeared on this website at various times.

How was the site designed? FiveThirtyEight.com is based on a Blogger.com template. The graphs are designed in MS-EXCEL 2007. I also use a statistical package (STATA) for some of the more complicated number-crunching. Thanks to Robert Gauldin for his design assistance.

The site isn’t showing up properly in my browser. FiveThirtyEight.com should render reasonably well in the latest versions of Firefox and Internet Explorer. Older versions of Internet Explorer have pervasive problems with Blogger.com templates and are not recommended.

How do I contact you? Nate can be reached at 538dotcom@gmail.com. Sean can be reached at pocket99s@gmail.com.

Why haven’t you responded to my e-mail? Between my various jobs and projects, I receive more e-mail each day than I’m able to respond to in full. However, I read each e-mail and very much appreciate both compliments and constructive criticism. Many of the new ideas and new features on the blog are a direct result of reader feedback. I appreciate your patience. Some e-mails are answered days or even weeks after they are received.

Are you hiring? Not really, but if you think there may be an exceptionally good fit, it never hurts to get in touch.

Are you available to do media appearances? Yes. I enjoy doing media and have done a fair amount of it in the past. If your request is pressing, please include the phrase “MEDIA REQUEST” in the subject heading of your e-mail.

Are you available to do consulting or speaking engagements? Theoretically yes, but practically speaking it will be very difficult in the midst of an Presidential election cycle.

Process Overview

The basic process for computing our Presidential projections consists of six steps:

1. Polling Average: Aggregate polling data, and weight it according to our reliability scores.

2. Trend Adjustment: Adjust the polling data for current trends.

3. Regression: Analyze demographic data in each state by means of regression analysis.

4. Snapshot: Combine the polling data with the regression analysis to produce an electoral snapshot. This is our estimate of what would happen if the election were held today.

5. Projection: Translate the snapshot into a projection of what will happen in November, by allocating out undecided voters and applying a discount to current polling leads based on historical trends.

6. Simulation: Simulate our results 10,000 times based on the results of the projection to account for the uncertainty in our estimates. The end result is a robust probabilistic assessment of what will happen in each state as well as in the nation as a whole.

Step 1. Polls, the Polling Average, and the Reliability Rating.

What is the reliability rating? It is a weight assigned to each poll based on three factors: the pollster’s accuracy in predicting recent election outcomes, the poll’s sample size, and the recentness of the poll.

How do you determine a pollster’s reliability? For a very thorough explanation, see here.

OK, so just who are the most reliable pollsters? Pollsters are rated by their long-term pollster-introduced error (PIE). This is the amount of error that a pollster introduces to its results because of methodological imperfections, rather the inherent limitations associated with limited sample sizes and conducting poll far in advance of the election.

Current pollster ratings can be found here.

How you do assess the reliability of other polling firms not included in the table above? These polls are treated as being slightly-below average and assigned a PIE of +2.11.

Are polls weighted by the number of respondents? Yes, although the methodology is a little involved. For a fuller explanation, see here.

How do you adjust for the recentness of a poll? For Presidential polling, polls are treated as having a half-life of 30 days. Specifically, the weight assigned to each poll is…

0.5^(P/30)

…where ‘P’ is the number of days transpired since the median date that the poll was in the field.

How did you derive this recentness formula? It is based on an analysis of 2000, 2004, and 2006 state-by-state polling data. Previously, this formula varied based on the number of days until the general election, with the half-life becoming shorter as we got closer to the general election. After further investigation into the data, I discovered that there was really no empirically valid reason for doing this. The 30-day half life did an optimal job, or very close to optimal, across a broad range of time frames, ranging from the evening before the election to 250 days before the election. Note that this is not true for Senate data, for which a different formula is applied.

Well, I still think you’re making a mistake by using ‘old’ polls. The recentness formula is just one of the mechanisms we use to keep the data fresh. All polls are also adjusted based on a trendline adjustment (see Step 2).

What do you do when you have multiple polls from the same polling firm? When a specific polling agency comes out with a new poll, we do not drop their previous poll. Instead, its sample sizes are aggregated for purposes of calculating the weight assigned to the poll, which has the effect of penalizing redundant polling data from the same firm. See the bottom one-third of this post for further discussion.
Are national polls accounted for? Yes, but only insofar as they are used to inform the trendline adjustment. See Step 2.

How do you handle tracking polls? Tracking polls are treated as any other poll, except that the number of respondents is taken to be the number of interviews conducted per day. So a tracking poll that consists of a rolling three-day sample of 900 voters will be counted as a separate data point each day, but as a data point at 300 voters per day.

Does a poll ever become so old that you drop it entirely? Yes. Once a poll’s weight falls below 0.05, it is dropped from the model for the sake of simplification and aesthetics. Exception: the highest-rated poll (not necessarily the most recent) in any given state is guaranteed a minimum weight of 0.25. For further discussion, please see here.

How do you find the polls you include in the analysis? I periodically scan the links you see on the left-hand side of the page. If you’ve come across a poll that is not included in the analysis, please give it a shout-out in the comments in the daily polling thread, and we will get it included in the next update. Occasionally, pollsters also e-mail me their results directly. This is very helpful.

Are there any polls you don’t include? All scientifically-conducted polls are included provided that they meet our reporting requirements and the internal poll rule (see below).

What are the reporting requirements for a poll? At a minimum, the poll must list (1) the percentage of the vote for each major candidate — not simply the margin; (2) the sample size; and (3) the dates that the poll was in the field. We may temporarily list a “BREAKING” poll that is missing some of this information, but if it does not become available promptly, it will be de-listed.
Do you list internal polls that are leaked by the campaigns? This site has a ban on listing internal polls. The logic behind this is that when an interested party conducts a poll, it is only liable to leak its results to the public only if it contains good news for their candidate, thereby encouraging donors, press persons, etc. This does not mean per se that the poll is “biased” — many pollsters do very good and thorough work on behalf of campaigns and affiliated interest groups. But it does mean that there may be a bias in which information becomes part of the public record: we learn about a poll that has a candidate ahead by 10 points in a state, but not one where he is down by 2.

For this reason, such polls are excluded. More specifically, a poll is excluded if it was conducted by any current candidate for office, a registered campaign committee, a Political Action Committee, or a 527 group, unless (i) the poll has a bipartisan partner (partisan polling groups will sometimes pair with one another to reduce the perception of bias), or (ii) the organization has a long and demonstrable track record of releasing all its data to the public.

Polls are not excluded simply because the pollster has conducted work on behalf of Republican or Democratic candidates, provided that the particular poll in question was intended for public consumption.

What precisely is indicated by the ‘date’ reported in association with the poll? It will indicate the median date of interviewing for that poll — not when that poll was reported or posted to the site. For example, a poll which conducted interviews on July 1, July 2 and July 3, and was reported to the media on July 5, would be listed with a date of July 2.

What if a pollster provides multiple versions of their poll — e.g. with or without third party candidates included, or different versions for registered and likely voters? When these situations arise:

(i) I will use the registered voter version until the first Presidential debate. After that, I will use the likely voter version;

(ii) I use the version with third-party candidates included if (i) they have officially announced their candidacy, and (ii) they are on the ballot in that state.

(iii) If a pollster lists separate results with and without ‘leaners’ (people who are initially uncommitted but pick a candidate after prompting), I use the version with leaners.

Step 2. The Trendline Adjustment.

What is the purpose of the trendline adjustment? Polling data comes out in different increments in different states. Some states are polled frequently, while others are only polled only occasionally. The trendline adjustment is an effort to correct for this problem by using polling movement in states that have been polled recently to adjust the data in states that have not been.

In addition, the trendline adjustment can account for what I refer to as ‘timing bias’. If a particular state is polled in the midst of a bounced cause by something like the conventions, such pollig may reflect only a temporary, near-term fluctuation rather than the longer-term demographic reality.

To take a more concrete example, suppose that Virginia was last polled in the weekend prior to the Democratic convention, and that poll showed John McCain ahead by 2 points. Suppose also that North Carolina was last polled in the weekend following the Democratic convention, and that poll showed Barack Obama ahead by 4 points. Looking at these two polls might give the impression that North Carolina is a better state for Barack Obama than Virginia. But depending on the size of Obama’s convention bounce, this could entirely be an artifact of when the respective polls were conducted. The trendline adjustment attempts to correct for this.

How does the trendline adjustment work? In plain English, we look at movement in the polling in recently-polled states and in national polls to predict movement in other states. For example, if there are new polls conducted in Massachusetts and Connecticut showing the Democratic candidate gaining 5 points, we can probably also infer that the candidate’s numbers have improved by about 5 points in Rhode Island.

For the original methodology behind the trendline adjustment, please see here. For subsequent refinements to the methodology, please see here, here, here and here.

Does the trendline adjustment assume that polling movement is uniform between different states? No, it does not. The adjustment attempts to account for which particular demographic groups are responsible for the polling movement, and those groups may produce differing results in different states. See here and here for discussion.

Does the trendline adjustment account for the convention bounce? We will have a special set of procedures in place on and around the time of the conventions to account for the convention bounce, but they have not yet been fully developed.

Are the polls weighted for purposes of calculating the trendline adjustment? Yes. More reliable polls have more influence in the computation of the trendline.

Step 3. The 538 Regression Estimate

What is the regression estimate? It is an analysis of what the polling data “should” be in each state based on its underlying demographics. Put differently, it is a way not to be held hostage by the results of individual polls that might defy common sense, particularly where polling data in a state is sparse.

Polls are an imperfect measure of voter sentiment, subject to the vagaries of small sample size, poor methodology, and transient blips and trends in the numbers. For example, the late February SurveyUSA polls had Barack Obama four points ahead of John McCain in North Dakota, but behind by four points in South Dakota. Since North Dakota and South Dakota are very similar, it is unlikely that there is a true eight-point differential in the polling in these states. The regression estimate is able to sniff out such discrepancies.
For general background on the process of regression analysis, see here.

What is the dependent variable in the regression analysis? Technically speaking, there are two regressions that are computed in each state. The first regression is a regression on the share of the two-way (Democrat + Republican) vote held by the Democratic candidate in that state based on our current polling averages after adjustment for present trendlines. The second is a regression on the total committed vote held by either of the major-party candidates.

What independent variables are included in the regression estimate? The regression models evaluate a total of 16 candidate variables. Variables are dropped via a stepwise process, until such time as each remaining variable is statistically significant at the 85% level or higher.

The 16 variables presently considered by the model are as follows:

Political

1. Kerry. John Kerry’s vote share in 2004. Note that an adjustment is made in Massachusetts and Texas, the home states of Kerry and George W. Bush respectively, based on Al Gore’s results in Massachusetts in 2000, and Bob Dole’s results in Texas in 1996.
2. Fundraising Share. The total share of funds raised in that state by each candidate (expressed specifically as the percentage of all funds raised that were raised by the Democratic candidate).
3. Clinton. The percentage of the two-way (Obama + Clinton) Democratic primary vote received by Hillary Clinton in that state. An adjustment is made to caucus states to account for their higher proclivity to vote for Barack Obama. In Michigan, the variable is based on the results of exit polling, which indicated who voters would have selected if all candidates were on the ballot.
4. Liberal-Conservative (Likert) Score. Per 2004 exit polls, a state’s liberal-conservative orientation, wherein each liberal voter is given a score of 10, each moderate a score of 5, and each conservative a score of 0. The most liberal state, Massachusetts, has a Likert score of 5.65. The most conservative, Utah, has a score of 3.30.

Religious Identity

5. Evangelical. The proportion of white evangelical protestants in each state.
6. Catholic. The proportion of Catholics in each state.
7. Mormon. The proportion of LDS voters in each state.

Ethnic and Racial Identity

8. African-American. The proportion of African-Americans in each state.
9. Hispanic. The number of Latino voters in each state as a proportion of overall voter turnout in 2004, as estimated by the Census Bureau. The reason I use data based on turnout rather than data based on the underlying population of Latinos is because Latino registration and turnout varies significantly from state to state. It is much higher in New Mexico, for instance, which has many Hispanics who have been in the country for generations, than it is in Nevada, where many Hispanics are new migrants and are not yet registered.
10. “American”. The proportion of residents who report their ancestry as “American” in each state, which tends to be highest in the Appalachians. See discussion here.

Economic

11. PCI. Per capita income in each state.
12. Manufacturing. The proportion of jobs in each state that are in the manufacturing sector.

Demographic

13. Senior. The proportion of the white population aged 65 or older in each state. Because life expectancy varies significantly among different ethnic groups, this version has more explanatory significance than when looking at the entire (white and non-white) population.
14. Twenty. The proportion of residents aged 18-29 in each state, as a fraction of the overall adult population..
15. Education. Average number of years of schooling completed for adults aged 25 and older in each state.
16. Suburban. The proportion of voters in each state that live in suburban environments, per 2004 exit polls.

How often is the regression updated? The regression updates automatically based on the latest polling data. Periodically, I will also test out new variables for potential inclusion in the model.

Step 4. The Snapshot.

What is the snapshot? It is (i) the combination of the trend-adjusted polling average (Step 2) with our regression estimate (Step 3). This represents our best estimate of what would happen if the election were held today.

How much weight is given to the regression estimate? The regression results are treated as a single, recent poll of average reliability (see here for how I define ‘average’ in this context). Therefore, the regression estimate will have comparatively substantial weight in states with little polling data, but very little weight in states with robust polling data.

Step 5. The Projection.

What is the projection? It is our best estimate of what will happen when the election is actually held in November.

How does the projection differ from the snapshot? It differs in two important ways. Firstly, current polling leads are mean-reverted. Secondly, undecided voters are allocated to the two major-party candidates.

How does the mean-reversion adjustment work? There has been an extremely robust tendency in Presidential elections for national polling numbers to revert to the mean as the election approaches – that is, for the trailing candidate to gain ground. The further we are out from the election, the more tightening in the polls we can expect. For example, a 20-point national lead held 200 days before the election projects, on average, to only about an 8-pont victory on Election Day, whereas a 5-point lead held 60 days before the election projects to about a 4-point victory. This adjustment is described in much more detail here.

Is the mean-reversion adjustment applied uniformly across all states? No. The mean-reversion adjustment is based on the notion that national polling data will tighten as the election nears. This does not necessarily imply that polling in any particular state will tighten. Instead, we first calculate the overall degree of mean-reversion expected in the national popular vote, and then imprint it on individual states through the process described here. States that have been more sensitive to movement in the national numbers will receive a larger degree of mean-reversion.

How are undecided voters allocated? This process may seem to work slightly backward. Firstly, we determine how much of the vote is likely to go to third-party candidates in each state based on a regression of the current undecided and ‘other’ vote in each state against historical trends. Then, having created an allocation for third-party candidates, we allocate the remaining undecided vote 50:50 between the major-party candidates.

Are you sure that allocating the undecided vote 50:50 is the best approach? I am fairly certain that the most obvious alternative – allocating the undecided vote based on each candidate’s proportion of the vote in each state – is not superior to this approach when evaluating presidential election data. Such an approach would imply that most of the undecided voters should be given to the leading candidate, but under certain circumstances – such as when there are a high number of undecideds a long way before the election – there is some tendency for undecided voters to break for the trailing candidate.

Step 6. Simulations and Win Probabilities

What is Win % or Win Probability? Simply, the number of times that a candidate wins a given state, or wins the general election, based on 10,000 daily simulation runs.
How is Win Probability determined? By simulating the election 10,000 times each day by means of a Monte Carlo analysis, based on the current Projection in each state. The simulation accounts for the following properties:

(i) That the true margin of error of a poll is much higher than the sampling error, especially when the poll is taken long before the election.

(ii) That polling movement between different states tends to be correlated based on the demographics in those states.

What is the purpose of the simulation runs? To account for three types of uncertainty in interpreting polling data: sampling error, state-specific movement, and national movement. Please see my discussion here.

The most important concept is that the error in predicting electoral outcomes is much larger than would be implied by the margins of errors from the polls alone, especially early in the election cycle. That is, the election may ‘break’ in any number of different and unpredictable directions, both at a state-by-state and at a national level.

As we get closer to November 4, the potentiality for these trends will become lesser, and therefore the error assumed by the simulation will become progressively less. However, even on election eve, the errors in predicting electoral outcomes are larger than those implied by each pollster’s reported margin of error calculation. Combining different polls together may reduce the problem, but it will not eliminate it, as polling errors tend to be correlated (that is, many pollsters miss in the same direction).
How reliable are polls conducted X days before the election? If ‘X’ is a number larger than about 30, the answer is ‘not very reliable’. Many voters do not begin paying attention the campaign until mere days or weeks before election day. As such, polling conducted before this period is tenuous. The specific amount of variance we apply to each state is determined based on an analysis of historical polling data since 1952 and is described here.

Is the polling in some states more volatile than in others? There is good reason to think that it is. Some states contain more true swing voters than other states. For instance, in 2008, the amount of volatility in the polling data in a given state has been positively correlated with the number of independent voters in that state, but inversely correlated with the number of African-American voters. Our process accounts for these tendencies, as described here.

What is the interrelationship between polling movement in different states? In reality, there is no such thing as national polling movement. Instead, you have millions of individual voters making up their minds in 50 individual states and the District of Columbia.

Like-minded voters, however, can be presumed to change their candidate preferences in similar ways. For instance, relative to national trends, election results in Massachusetts have historically been 90 percent correlated with election results in Rhode Island.

Our simulation accounts for this tendency by applying a similarity matrix, which evaluates the demographic relationships between different states by of a nearest-neighbor analysis as described here. Our process recognizes, for instance, that as the polling in Ohio moves, the polling in a similar state like Michigan is liable to move in the same direction. On the other hand, there may be little relationship between the polling movement in Ohio and that in a dissimilar state like New Mexico.

In our simulation runs, the state-by-state polling movement is architected so as to preserve (i) the interstate correlations described above; (ii) the historical relationship between the degree of national polling movement and that in different states – the more the polls move in the aggregate, the more volatile the polling movement in different states, and (iii) the empirical degree of volatility in the polling numbers within any one particular state (see question above).

Is there an empirical basis for this adjustment? Not as much of one as I’d like. State-by-state polling data is hard to come by in years before 2000, and the 2000 and 2004 cycles may not be representative as they were unusually stable elections. Therefore, there is a little bit of guesswork involved in calibrating the model and determining the appropriate amount of interdependence in polling movement between different states. But I am convinced that we have a substantially better model with this adjustment than without it.

What is the margin of error in the simulation runs? In terms of predicting the winner of the national electoral vote, there appears to be margin of error of somewhere around +/- 1 percentage point over our 10,000 daily simulation runs.

Charts and Graphs

National Summary Chart

How can there be fractional numbers in the electoral vote counts? For example, Obama winning 293.4 electoral votes? We are not predicting any one particular outcome in the election – Obama winning states A, B, and C, and McCain states X, Y, and Z. Rather, we are predicting a probability distribution – the relative likelihood of different outcomes occurring. The electoral vote counts represent an average of thousands of individual simulations, and the average may produce a fractional number of electoral votes.

State Summary Chart

What do the percentages mean next to each individual state? They are our estimate of the chances that Barack Obama and John McCain will win that state, respectively.

What is the significance of the color of the state (red-blue-purple) in the state summary chart? They reflect the results from that state in 2004. States are rendered in purple if the Bush-Kerry margin in those states was within 7.5 points.
What is the significance of the ‘regions’ as defined on the state-by-state summary charts? There isn’t any, other than as a way to present and organize the data. For additional discussion, see here.

Electoral Projection Map

How many colors are used in the electoral projection map? There is no specific limit. Rather, states are colored on a red-white-blue gradient based on the current win percentage in each state.

Electoral Vote Distribution

What do the individual spikes / data points represent? The number of simulations, out of 10,000, in which Barack Obama finishes with some precise number of electoral votes (such as 290). Simulations that result in a McCain electoral win are colored red, and an Obama win colored blue.

Is the distribution normal (e.g. a bell curve?) Not necessarily. Because the polling movement between different states is assumed to be correlated, the distribution can take on a variety of different shapes, with multiple peaks and so on. The distribution will also clearly not be normal in the event that one candidate is headed for a landslide, as there is an upper bound in his number of electoral votes.

Super Tracker

What do the individual, blue data points represent in the Super Tracker chart? They represent the inferred popular vote outcome based on all polling (state and national) conducted on that particular day, as determined by analyzing the degree of movement between previous iterations of that poll. This is not the same as simply averaging the polls, although the Super Tracker usually resembles the Pollster.com and RealClearPolitics.com national averages closely. While the individual data points can be interesting to look at, we advise against overinterpreting them – there is a lot of noise in any one particular day’s data. Instead, the red trendline curve represents our best estimate of the current state of the election. For further background on the Super Tracker chart, please see Step 2 above.

Tipping Point States and Return on Investment Index

What are Tipping Point States? A Tipping Point State is defined as a state that would alter the outcome of a close election if it were decided differently. For a thorough discussion, see here.

What is the Return on Investment Index? The ratio of a state’s Tipping Point percentage to the number of eligible voters in each state, calibrated such that an average state has a Return on Investment Index of 1.0. This is intended to represent the marginal return from spending one additional dollar (or other type of campaign resource) in that state. For further discussion, see here.

Poll Detail

How are states classified as ‘Lean’, ‘Likely’ and ‘Safe’? States are classified as follows, based on the Win Probability of the Democratic candidate in each state:

Win % Classification

0%-5% Safe GOP

5%-20% Likely GOP

20%-40% Lean GOP

40%-60% Toss-Up

60%-80% Lean Democrat

80%-95% Likely Democrat

95%-100% Safe Democrat

What does it mean when a polling result is highlighted in yellow? It means that the poll was conducted within the past 10 days.

Senate Polls

Do you assume that senate races move independently of one another? Or is the movement correlated, as in the presidential simulations? We assume a small amount of national (correlated) movement in senate races, as determined from an analysis of historical trends in the Generic Congressional Ballot. However, relative to the Presidential contest, the movement of individual senate races are relatively independent from one another.

What variables are included in the regression analysis for senate races? Five variables are included:

(1) A dummy variable to indicate the presence of an incumbent in the race;

(2) Where there is an incumbent, the approval ratings for that incumbent;

(3) The share of fundraising obtained by each candidate;

(4) The highest elected office held by each candidate, expressed on a proportional basis;

(5) The partisan ID index of each state (the number of self-identified Democrats less the number of self-identified Republicans), based on 2004 exit polling data.

For a more complete discussion, see here.

Is senate polling less reliable than presidential polling? This is debatable. However, senate races tend to break later than presidential races. Therefore, the degree of uncertainty tends to be higher at a given date before the election; a 10-point lead in the presidential polls in a state tends to be more meaningful than a 10-point lead in the senate polls. The win percentages for senate races are determined based on a historical analysis of senate race data and senate race data only, and apply different parameters than are used in the presidential estimates.

What other methodological differences are there between the senate numbers and the presidential numbers? There are several differences:

(1) For senate races, the half-life assigned to each poll shortens as we get closer to the election. That is, we place progressively more of a premium on the recentness of a poll as we near the presidential election. This is not true for our presidential numbers.

(2) However, there is no timeline adjustment applied to senate races, as there is in the presidential contest.

(3) In senate races, our allocation of undecided voters depends in part on the number of undecided voters. Higher numbers of undecideds indicate more uncertainty in the race and a greater likelihood of these undecideds breaking to the trailing candidate (usually the incumbent). We do not directly evaluate the number of undecided voters in our presidential polling.

Why do you count Joe Lieberman as a Democrat? Good question.

Miscellaneous Wonkery

How are ties broken? Ties (269 electors for both the Republican and Democratic candidates) are assigned to the Democrat based on the assumption that the Democrat would likely carry the day in the incoming House of Representatives. For additional discussion, see here.

Do you account for the potential for split electoral votes in Nebraska and Maine? Nebraska and Maine assign some of their electors based on the election results in individual congressional districts. The win probability and electoral vote averages do in fact account for these contingencies. This is somewhat relevant in this election, as Barack Obama looks to be competitive in both NE-1 and NE-2, while he will probably lose NE-3 (Western Nebraska) badly.

Do you account for home state effects, like in Arizona and Illinois? Directly, no, but indirectly yes. There is a very strong relationship between a candidate’s home state and the amount of fundraising that they’ve received from that state. Since fundraising is one of the variables in our regression model, these effects will in turn show up in our weighted average for that state.

What if any assumptions do you make about turnout? I don’t make any assumptions about turnout. The pollsters make various sorts of assumptions about turnout, and I rely on the pollsters. The only exception is in calculating the popular vote percentage shares for each candidate. For this purpose, I assume that the same proportion of the electorate will turn out in each state as turned out in 2004. However, the turnout figures are adjusted based on changes in the eligible voter population in each state since 2004.

So is this your prediction about what will happen in the election? Not necessarily. The goal of the model is to do absolutely as much as it can with current state-by-state polling data. That is not exactly the same thing as accounting for external contingencies that might move the polling data (and, more importantly, the actual election result) in the future.

Do you have any plans to introduce polling averages for House and Governor’s races? Unlikely in this cycle, but almost certainly in 2010.

What will you do after the election is over? Sleep. But FiveThirtyEight will continue to exist. There is all sorts of political data to sort through even when an election is not going on, particularly as it concerns the legislative process. We hope to continue presenting this data to you in new and exciting ways.

Comments