Some Post-Oscar Thoughts on Forecasting

In attempting to forecast the outcome of the six major categories in the Academy Awards, my computer model had four hits and two misses. The misses were in the categories of Best Supporting Actress, where Penelope Cruz beat the computer’s pick of Taraji P. Henson, and Best Actor, where Sean Penn beat Mickey Rourke.

What to make of this performance? Heath Ledger’s award for Best Supporting Actor was a virtual lock; it’s hard to take any credit at all for that one. The awards for Slumdog Millionaire and its director Danny Boyle were not quite in the same category — both were trading at around 80 percent on Intrade at the time I issued my forecasts. But still, Slumdog winning those categories was by far the most likely outcome. Of the three awards that were in more genuine doubt, the model got one right (Best Actress) and missed the other two.

I don’t know, however, that this is a terrific way to go about evaluating the model’s validity. There is uncertainty — as the model happily acknowledges — in any sort of human endeavor. One year’s worth of results is nowhere near enough to estimate the effects of this uncertainty.

Instead, whenever we make an incorrect prediction, we are probably better off asking questions along these lines:

What, if anything, did the incorrect prediction reveal to us about the model’s flaws?
Was the model wrong for the wrong reasons? Or was it wrong for the right reasons?
What, if any, improvements should we make to the model given these results?

In the miss on the Best Supporting Actress category, the model was a bit confused. If I actually had to put money on one of the candidates, it would have been on Penelope Cruz — not its choice of Taraji P. Henson. The reason why the model got “confused” is because of an unusual circumstance surrounding the Best Supporting Actress award. Namely, three of the four major awards that I tracked in this category (the Golden Globes, the Screen Actors Guild Awards and the Critics’ Choice Awards) were won by Kate Winslett, who was not on the ballot in this category at the Oscars. (Instead, the Academy considered her performance in The Reader to be a lead role.) Since the recipients of the non-Oscar awards are the single most important factors in predicting the Oscars, this deprived the model of much of the information that it would ordinarily use to make its forecasts.

However, I’m not sure this is such a good “excuse”. The one major award that wasn’t won by Winslett — the BAFTAs — was instead won by Cruz. What the model should probably have done instead was to throw out the results of the Globes, the SAGs and the Critics in making its forecasts — to treat them as missing variables. (There is a big difference between ‘missing’ and ‘zero’). This would have placed more emphasis on the BAFTAs — the only award that gave us useful information about the relative performances of Cruz against the other candidates.

If I had done this, it turns out, the model would have made Cruz the favorite, assigning her about a 60 percent chance of victory. This is something we could and probably should have thought about in advance. Failures, nevertheless, sometimes have a way of focusing the mind and pointing the way forward.

In the Best Actor category, we might also have learned a thing or two last night. Namely, it probably doesn’t help to be a huge jackass (like Mickey Rourke) to all of your peers when those peers are responsible for deciding whether you receive a major, life-altering award.

But is this information helpful for model-building? Probably not. (Unless perhaps we had some way to quantify someone’s jackassedness: Days spent at the Betty Ford Center?) It’s more information that was unique to this particular candidate in this particular year. The way the model accounts for that type of information is to build in uncertainty — which it did, giving Rourke a roughly 70 percent chance of victory but not a 100 percent one.

Arguably, since Rourke’s behavior was a known unknown rather than an unknown unknown, we could have gone a step further by disclaiming that the model’s estimate of his chances of victory was probably on the high side. Then again, suppose that Rourke had won. We’d be saying: “see, Hollywood loves a comeback story” and feeling very satisfied with ourselves, perhaps wondering why the program had given him only a 70 percent chance of a win when it “seemed so obvious in retrospect”.

Ultimately, this is not about humans versus computers. The computer I used to forecast the Oscars didn’t think for itself — it merely followed a set of instructions that I provided to it. Rather, it is a question of heuristics: when and whether subjective (but flexible) judgments, such as those a film critic might make, are better than objective (but inflexible) rulesets.

The advantage in making a subjective judgment is that you may be able to account for information that is hard to quantify — for example, Rourke’s behavioral problems or the politics of Sean Penn playing a gay icon in a year where Hollywood felt very guilty about the passage of Proposition 8. The disadvantage is that human beings have all sorts of cognitive biases, and it’s easy to allow these biases to color one’s thinking. I would guess, for instance, that most critics would have trouble decoupling the question of who they thought should win the Oscars — those performances they liked the best personally — from who they thought actually would win them.

In the case of something like the Oscars, where the ratio of subjective/qualitative to objective/quantitative information is relatively high, I’m pretty certain that the limitations of hewing to a rule-based approach (like a computer program) outweigh the advantages. But I was pretty certain about that long before last night. And I’m also pretty certain that the gap can be closed with better model-building.

FiveThirtyEight

Some Post-Oscar Thoughts on Forecasting

Comments