Can The Internet Predict The Oscars?

With the Academy Award nominees announced and the Oscars themselves mere weeks away, it’s time once again for America’s award whisperers, Hollywood’s dedicated reporters and Ireland’s profligate gamblers to take on a seemingly impossible task: figure out who 7,000 Hollywood insiders think should win a statue.

FiveThirtyEight has, for several years now, attempted to use simple stats to track the Oscar race. And although we’ve gotten pretty good at it, we made it a mission this year to find other folks who think they have developed a new way to predict the Oscars, a notoriously difficult forecast. We put out a call for innovative approaches to modeling the Oscars, and y’all delivered.

We’ve selected eight reader models to follow throughout the Oscar campaign. This week, we’re looking at two models that measure Internet buzz as a way of approximating the Oscar voter’s mindset. I spoke with the creators about how the models work and their initial picks. (The interviews below have been lightly edited for clarity.)

The Google News model

The first comes from Burak Tekin, who thinks he’s found a way to hack the headlines and determine Oscar-worthiness by monitoring Google News. The idea is that by gauging how much the press is talking about a film or a performance, we can get a look into how much buzz a film has managed to garner. And given that it’s doubtful that every Academy member is a high-information voter who’s personally seen each and every film and performance, charting that buzz could be enough. Burak started out by simply searching “[nominee name]” and “Oscars,” but to fine-tune the model for performers — who often generate headlines about things besides their Academy Award-nominated roles — he had to go one step further and find a way to make sure the articles that his model was crawling were about the nominated performances.

Walt Hickey: Can you tell me a little bit about yourself?

Burak Tekin: I studied economics at Harvard, and right now I’m doing freelance research. My passion projects are related to Turkey. I established a fact-checking website for Turkish news, and I put out a contest for election results for Turkish elections. I asked people to guess what percentage each party’s going to get, I ran regressions, and then I saw how people did. Basically, my passion is modeling, whenever I can.

WH: What’s the idea behind your Oscar model for this year?

BT: If you look at the news, you will see that if there’s certain buzz about a certain film or a certain actor, there could be a correlation. I use Google News. There may be better ways to check the buzz, but that’s what came to my mind. I checked it for 2015, and it worked fine, so I back-tested it for three or four more years. The [eventual Oscar] winner was in the top two in news volume for 22 of 24 categories for the past six years.

WH: What kinds of issues do you run into?

BT: I came up with a correcting variable [for actors] — I searched the actor’s name with the character he or she plays. Basically, if the model says in 2014 that Jared Leto is highest, and the search for the character he plays is also the highest, that means Jared Leto definitely wins. In 2014, best supporting actress buzz for Jennifer Lawrence was really high, but Lupita Nyong’o won that year. If I Googled Lupita Nyong’o and the character she played, she was the favorite.

Initial picks

Burak expects that his model will get increasingly accurate the closer it gets to the day of the awards (Feb. 28), and the final version will monitor headlines only from Feb. 6 through Feb. 17, during the final phase of voting.

Although it’s still early in the cycle, Burak’s model points to a tight best picture race between “The Revenant,” “The Martian” and “Room” — the win probabilities for the films are 19 percent, 17 percent and 17 percent, respectively. Based on the headlines, George Miller (“Mad Max: Fury Road”) is an early leader for best director, with a 40 percent win probability at this point. Leonardo DiCaprio leads the best actor category with a 36 percent chance of winning, while Jennifer Lawrence has a 40 percent chance of winning best actress, Sylvester Stallone has a 46 percent chance of winning best supporting actor and Kate Winslet has a 40 percent chance of winning best supporting actress. Still, Burak stressed that it’s way too early to consider these the end-game predictions — they’re just the opening salvo.

The Twitter model

While Burak is looking at what the press has to say, Paul Singman, an actuary from New Jersey who is training to be a data scientist, wants to tap into the predictions of the tweeting crowd. His model monitors Twitter to find tweets in which people make predictions about the awards — it searches for phrases like “I think,” “will win” or “should win,” for instance — and then aggregates the wisdom of the birds.

Walt Hickey: How’d you get interested in stats?

Paul Singman: Probably my first foray into statistics stuff was for baseball. I started in high school, but for a year in college, I wrote for Baseball Prospectus in their fantasy section. That was my big hobby for a number of years; then I felt like I wasn’t growing writing fantasy baseball articles. Last season, I decided to write an algorithm or script about an ideal fantasy baseball lineup, and that got me really into the coding and technical aspects that open doors to something like this.

WH: What got you interested in the Oscars?

PS: I’ve always considered myself a movie buff. In college, I watched a ton of movies, and I’m a huge Roger Ebert fan. I guess I’d call it the art medium I enjoy the most. When I watch movies, I almost find it hard to separate enjoying them from critiquing them and analyzing them. I’m just someone who watches a ton of movies.

WH: How’d you come up with your approach to modeling the Oscars?

PS: The crowd has shown to be more accurate than the experts in a lot of areas, although the Oscars are tough because it’s a very select group of individuals [who vote], and people aren’t very sure exactly who’s voting. It’s hard to get anything representative of what they think.

The model scans Twitter. Mine executes every five minutes and retweets every tweet pertaining to the Oscars based on certain hashtags I’m searching for, like #Oscars2016 or #OscarsPrediction. Then I download all the text from the tweets, remove links and pictures, and make the data usable. I’m researching different natural language processing libraries (the most common one is NLTK; there’s one called TextBlob that I’m looking at), which can add an element other than just how many times a movie or actor is mentioned. It can give you a sentiment analysis of whether the tweet was positive or negative or whether the tweeter strongly liked it or sort of liked it.

WH: So how are you going to get from that to a prediction?

PS: What’s going to be most important is raw number of mentions, and then I’d like to weight them by positivity. I’m still working on the exact formula and testing things out. I do foresee an issue with so many Leo tweets.

WH: So what would be a good outcome for you?

PS: I wish I had done this in previous years because it would be easier to judge how the model is doing. I’d love to predict everything right, but it’s going to be difficult to know beforehand how confident I am in the predictions, as you don’t get a lot of data points from the Oscars.

Initial picks

Paul is categorizing tweets for best picture, actor and actress based on whether they’re positive or negative. He’s counting a positive tweet as a vote and then fitting the vote outcome to a sigmoid distribution to figure out the probability of a win.

Just like Burak’s model, Paul’s is likely to get better the closer we get to the award ceremony. But based on the early results of the Twitter monitoring, Paul is giving a 54 percent shot at winning best picture to “The Revenant,” which is closely followed by “Spotlight,” with a 34 percent shot. Paul’s worry about Leo turned out to be warranted — the early buzz has him with a 98 percent shot at best actor. Brie Larson has an 89 percent chance to win best actress.

As the cycle continues, Paul will start monitoring other categories as well; the social network’s volume of tweets could help predict down-ballot races like cinematography or animated feature. “I’m sure it will be a lot less traffic,” he said, “but people will tweet about every category, so the model should be able to have a prediction for each category.”

We hope you’ll follow along as we introduce the rest of the modelers trying to predict the Oscars. And stay tuned for an update on the FiveThirtyEight Oscar model later this week.

The Google News model

Initial picks

The Twitter model

Initial picks

Comments