Welcome back to our series highlighting the dedicated amateur modelers trying to use data to predict the Oscars. We’ve previously talked to folks trying to pick the winners by looking at what people are talking about on Twitter and Google News, and also folks who use a big model with lots of variables and a simple model that uses only one or two. This week, we’re talking to two people trying to fake an Academy: One draws from his experience ranking college football teams to rank and pick Oscar winners, and the other is trying to hunt down online movie raters with the same taste as Academy members.
But before we get into that, a quick update on the FiveThirtyEight model!
An update to our Oscar predictions
This past weekend saw two awards ceremonies: the Eddie Awards from the American Cinema Editors guild and the Screen Actors Guild awards. The front-runners in three Oscar categories — Leonardo DiCaprio, Brie Larson and Alicia Vikander — were given a huge boost at the SAG awards; they are now the prohibitive favorites to win an Academy Award. That won’t change much over the next several weeks — perhaps the British Academy will give other contenders a boost at the BAFTAs — but based on the scoring so far, those three will likely go into Oscar night with big leads.
The best picture race is now a three-picture contest. “The Big Short” won the Eddie for comedy, “Mad Max: Fury Road” won the Eddie for best drama and “Spotlight” won the coveted Screen Actors Guild award for outstanding cast.
As a result, “Spotlight” almost recaptured the lead in the Oscar race from “The Big Short,” but it’s effectively a tie. With the BAFTAs and Writers Guild awards in two weeks, the race is still up for grabs, but whichever film ends up winning the top prize at the Directors Guild awards this weekend will likely go into Oscar night the favorite.
This leads us to the best supporting actor award, which is becoming a nightmare to forecast. The winner of the relatively predictive Screen Actors Guild award was Idris Elba (“Beasts of No Nation”), but he got snubbed for the Oscar nomination. This means the supporting actor Oscar race is somewhat bereft of data. In the absence of input from the SAG award, the best we’ve got is the Golden Globe, which went to Sylvester Stallone. With only the BAFTA and Satellite awards left in this category, we’re not going to get a decisive leader. So we’re kind of screwed there.
Luckily, we’ve got eight modelers tearing the data apart to try to find this information, too. Let’s meet two of them. (These interviews have been edited and condensed for clarity.)
The college football model
First up we have James England, a college football geek who has overhauled a system used to rank football teams to instead predict the Oscars.
Walt Hickey: Can you tell me a little bit about yourself?
James England: In my day job I’m the chief technology officer for a company called Dynamic Screening Solutions Inc. As far as what I do outside of work, my main hobby has been doing college football rankings the last couple of years. That spurred the Oscar ratings.
WH: You work on the Massey ratings, right?
JE: I graduated from Utah State University in 2008. As a pretty big football fan, it’s kind of unfortunate that our team wasn’t very good at the time. I noticed that the computer rankings were always a little bit nicer to Utah State than the human polls [were]. I looked to see how these computer rankings actually worked. I came across Kenneth Massey’s senior project when he was in college — he wrote out how you actually do ranking, and it’s this whole linear regression algorithm. I spent a lot of time over the next few months researching how that works. By the end of the year I sort of had a model that worked. Then I sent Kenneth an email and thanked him for having that paper out there, and he threw me on the composite rankings [for college football] and for the last couple of years I’ve been doing that.
I’ve been using that project as a way to keep my mind sharp as far as learning new things, and the code behind everything has been reworked a few times to learn new programming languages and new frameworks.
WH: So where do the Oscars come into it?
JE: Last year I had the idea of open-sourcing the algorithm that does the football rankings and I pulled all the football out of it. Once I did that, I wondered what else would people want to do this for? It was right around the time of the Oscars, so as an example I ran the whole thing with Oscars ratings. The idea is you take these movies, you make them like teams in a football season, and you pit them against each other. I get people to go on my website and ask them how many of the [nominated films] they saw. If they saw three or four of them, the site pits the movies against each other [and asks users to vote]. Rather than someone making a [ranked] list of all movies, they say, “I watched these two movies and this one’s better.” It pairs them all up until you run out of options. Winning one of those votes is like [a film] won a game of football.
It’s like how Elo ratings were used for chess but now we use Elo everywhere for predicting. You realize the same algorithm can be used in lots of places and figure out where to massage it to make it work.
WH: How’s it going?
JE: This year’s going to be an interesting study. For instance, “Selma” won last year with the highest rating, but it obviously didn’t win the Oscar. So I looked at the amount of games played. “Birdman” was in third place, but from 36 games versus Selma’s 15. What I’m going to do this year is take a film’s rating and multiply it by the number of wins it had, or the number of games played.
WH: Who’s voting on it?
JE: My assumption here is that the people who have watched Oscar movies are similar enough to the people voting in the Academy. People who are watching “Boyhood” or “The Grand Budapest Hotel,” they had the idea to go and watch those movies, so their opinions are no less valid than someone who has an actual vote in the Academy. It’s not a totally scientifically stable method, but you’re probably going to find out that they’ll mirror the Academy as well as you can expect.
If you’ve seen a few Oscar-nominated films and want to help with James’s research, he’d love for you to participate in the survey.
Based on 1,299 matchups so far, James has “The Martian” with a 25 percent chance to win best picture, followed by “Mad Max: Fury Road” with 21 percent, “The Revenant” with 18 percent and “The Big Short” with 13 percent. In the direction category, it’s between Alejandro G. Iñárritu (“The Revenant,” 37 percent chance) and George Miller (“Mad Max: Fury Road,” 33 percent).
DiCaprio (“The Revenant”) is leading in the best actor category with a decisive 52 percent chance of winning, followed by Matt Damon (“The Martian”) with 38 percent. Larson (“Room”) is dominating in the best actress category with a 77 percent chance of victory. Mark Ruffalo (“Spotlight”) leads the best supporting actor category (35 percent chance of winning), followed by Christian Bale (“The Big Short,” 25 percent) and Tom Hardy (“The Revenant,” 23 percent). It’s a similarly tight race in the best supporting actress category, with Vikander (“The Danish Girl,” 26 percent) and Kate Winslet (“Steve Jobs,” 25 percent) leading the field.
The Academy imitation game model
Next up we have Nigel Henry, who is leading a team of data scientists plowing through crowdsourced ratings data from MovieLens to find fans who have had similar taste to the Academy in the past in the hope they’ll be able to pick this year’s winners.
Walt Hickey: Tell me a little bit about your team.
Nigel Henry: My name’s Nigel. I’m the principal consultant at Solution by Simulation. It’s an analytics firm; we do a lot of different kinds of analytics, traditional market research, marketing analytics, data journalism, some election forecasting as well. Alongside me is Kimberly Gonzales, who is one of our data scientists. We also have Nigel Wallen, who’s one of our data engineers, and Kevin Weekes, who’s our project manager.
WH: How’d you get interested in the Oscars?
NH: I’m a longtime fan of FiveThirtyEight, from all the way back in 2008 when I was on the Obama campaign. My main source of news, believe it or not. So when you threw out the challenge [to predict the Oscars], it was an opportunity. It’s the first time we’re doing anything Oscars-specific, just taking a page out of the old FiveThirtyEight book of trying to use common-sense algorithms to solve tough prediction problems.
WH: How’d you design the model?
NH: Basically what it really is is a prediction algorithm, a recommendation algorithm. We can’t poll the members of the Academy; the only thing we can do is observe their past behavior. We envisioned the problem as not that different than if a user went onto Netflix and rated a bunch of movies. When the Oscar nominees came out on Netflix or Amazon, [those services’ recommendation engines] would have to say which ones they thought the user would like the most. We’re looking for the users who have rating histories that are most similar to the Academy’s.
WH: What data are you working with?
NH: We’re using the MovieLens data set. We had no previous experience with movie rating data, and we did some research on what was out there. Right now, there are about 21 million ratings from about 200,000 users. A lot of it isn’t directly usable, because if users haven’t rated any current nominees we can’t do anything with that. Out of that 21 million, we get it down to less than 50,000, so tens of thousands of people who have rated at least one movie [nominated] in this awards season.
WH: How’d you quantify success?
NH: Because of how our model is based, our big wins are predicting the overall movie categories: best picture, best documentary, best short film, best animated. I think we can call those. We’re tweaking our model slightly for the other categories — directors and actors — so if our model totally fails there, I’ll be fine with that. We’re really going for the movies.
For best picture, the team has “Room” as an early leader followed by “Spotlight,” based on the ratings of a pack of predictive superusers from the MovieLens set. “I know we’re going against all the betting markets and what everyone is saying, but ‘Room’ is our story and we’re sticking to it,” Nigel said. According to the MovieLens users analysis, DiCaprio leads in the best actor category but is very closely followed by Damon. Charlotte Rampling (“45 Years”) leads the best actress category, followed by Larson. Lenny Abrahamson (“Room”) is ahead in the best director category.
For the two supporting acting categories, “Spotlight” performances are ahead: Ruffalo and Rachel McAdams lead in their respective categories, with Bale and Jennifer Jason Leigh (“The Hateful Eight”) in second.
More Predict The Oscars: