Can You Read Between The Lines To Pick The Oscar Winners?

Welcome to the final set of interviews with amateur Oscar modelers in our series on how to predict the Oscars! So far, we’ve talked to people trying to pick the winners by looking at data on what people are talking about on Twitter and Google News, folks who use a big model with lots of variables and a simple model that uses only one or two, and contenders who are trying to simulate the Academy.

Today, we’re looking at two models that try to analyze film reviews and the press to find movies that could win the Oscar. One looks at reviews to figure out words that have indicated past Oscar winners. The second analyzes a set of publications read by people in Hollywood to determine what those readers might care about and then uses text analysis of movie reviews to find which nominees hew closest to the Academy’s worldview.

But first, another update on the FiveThirtyEight Oscar race tracker.

An update to our Oscar predictions

The Directors Guild awards were handed out over the weekend, and a win by Alejandro González Iñárritu has put the best picture Oscar race in flux. Iñárritu directed “Leonardo DiCaprio Got A New Fur Coat But, Gosh, Was It Worth The Consequences?” — released domestically as “The Revenant.” For now, that film is the favorite to win the best picture Oscar, but “The Big Short” and “Spotlight” are hot on its heels. In the race for the best directing Oscar, Iñárritu has leaped over George Miller (“Mad Max: Fury Road”) to take the lead and based on the math will remain the front-runner until the Oscars.

The next award show to look forward to is the BAFTAs, which takes place this weekend — the awards from the British Academy are the last highly predictive event that we track. A win for “The Big Short” or “Spotlight” there would put either of those films neck and neck with “The Revenant,” but a win for the front-runner would push it definitively ahead of the pack. Also, keep an eye on the best supporting actor category to see if there’s a chance for a challenge to Sylvester Stallone (“Creed”).

Now on to our modelers. (These interviews have been edited and condensed for clarity.)

The reviews-based model

Allison Walker is a Yale chemist who thinks she can judge the Oscar race by reading movie reviews and looking for patterns in the words used to describe past Oscar winners. Although she acknowledges that her method is probably not as predictive as an approach like ours that looks directly at past winners, she’s hoping to develop a great supplemental model.

Walt Hickey: Tell me a little bit about yourself!

Allison Walker: I study mostly biochemistry problems. The two things I’m working on right now for my Ph.D. are trying to figure out the application of this dye that labels proteins and using a statistical technique developed by this lab in Texas to study long-range interactions in the ribosome, which is the machine that synthesizes all proteins in the cell. The ultimate goal is to engineer the ribosome to make new types of molecules.

The lab that I work in doesn’t do that much computational or statistical work, but my adviser has been really supportive of me bringing statistical techniques to the lab.

WH: What attracted you to predicting the Oscars?

AW: I don’t have that much interest in the Oscars movies. But I saw the article [asking for Oscar prediction model submissions], and I had just taken a machine-learning class. I thought it would be a cool problem.

WH: How would you describe your approach?

AW: The general idea is to use word counts of different words in reviews, with common words taken out [like “the,” “for” and “and”], to see if there’s any association between certain words and winning. The math behind it isn’t all that important, but it looks at which words are more associated with winning and losing; if you take the word counts in reviews for a given movie, you can assign a probability that the movie will be a winner.

WH: What about best actor and best actress?

AW: What I’m going to do is break down the reviews by sentences and only count words used in the same sentence as the actor’s name or their character’s. This could be a problem for movies where the movie’s name is the same as the character’s name, because there’s no way to tell if the sentence is referring to the movie or the character.

WH: What are your hopes for the model?

AW: For best picture and directing, it’s better than guessing but not actually great. I would like to see it go above 50 percent accuracy to something closer to 70 percent or 80 percent, but I’m not sure if that’s possible!

INITIAL PREDICTIONS

Allison has “The Revenant” in the lead for best picture, with “Mad Max: Fury Road” a distant second. She said her model’s best picture predictions have 71.4 percent historical accuracy. For best director, she’s picking Miller to start, with Iñárritu a far second. In the best actor and best actress categories, she’s picking DiCaprio and Brie Larson (“Room”).

In the best supporting actor category — probably the toughest to call this year, other than best picture — Allison’s model has Mark Rylance (“Bridge of Spies”) in the lead. Her model also found a strong showing for Tom Hardy (“The Revenant”), but Allison believes that may be because a lot of reviewers are discussing Hardy and DiCaprio in the same sentence. Kate Winslet (“Steve Jobs”) leads in the best supporting actress category, with Alicia Vikander (“The Danish Girl”) close behind.

The worldview simulator

Gary Angel of Ernst and Young has assembled a team of folks to try to crack the Oscars. The gist: Use data to find out what the people who vote on the Oscars care about by analyzing what appears in the Hollywood press — in this case, one year of Vanity Fair and the Los Angeles Times culture and opinion sections — and then do a text analysis of movie reviews to find the nominated films that appeal to that worldview. For example, if back in 2013 there were lots of stories about “Iran” or “rescues” or “heists” in the press, then “Argo” might be favored to win because the same themes would appear in reviews of the film. For the acting categories, Gary’s team applies the same methodology to reviews of the performances.

Walt Hickey: Tell me about your squad.

Gary Angel: We are the digital analytics/advanced analytics team at Ernst & Young. We handle a lot of analytics related to understanding how people behave. That covers everything from website behavior to mobile and social behaviors; we do a lot of work understanding who people are, why they do what they’re doing, and how people can optimize their experiences.

WH: Who else is working on your Oscar model?

GA: I’m leading what we call a counseling family — it’s a large group of people, all of them are analysts, a lot of them are West Coast-based. We get together socially, sometimes to go over business stuff, sometimes to do shared projects together. This seemed like something my team would enjoy — we all love getting our hands on the data and doing this kind of stuff.

Not everybody has time for this, but I think we’re going to have eight different people spending at least a little bit of time on the project. We’re going to have the whole team at least brainstorm parts of the model.

WH: What’s the approach?

GA: I have to admit the approach you guys take when it comes to building the core predictive model is probably going to beat out anything we’re likely to do with this, but I was really thinking about alternative approaches that might supplement that model.

It sort of seemed to us that when you have a small set of movies, all of which presumably are pretty high-quality, you’re going to have a set of concerns that are political and personal that are going to drive a lot of [the Academy’s] voting. But that stuff is really hard to get at. It seemed like sometimes movies get picked because they fit a certain kind of worldview. They fit what people are thinking about right now, either the topical interest or just the way the Hollywood community is thinking about the world.

WH: How would you explain the core of your model?

GA: First of all, we have to find a corpus that adequately represents the Hollywood worldview, because that’s what we’re matching to. That’s an interesting problem in and of itself: We don’t know the exact sample [of Academy voters], we don’t know who’s behind the voting. We’re going to have to pick something we think is reasonably reflective.

The second thing is how do you capture a worldview? Once we’ve got this corpus, what we’re going to see is that they talk about things in different ways. Some topics emerge very strongly, some don’t emerge that would in the broader press. What we’ll probably do is take these initial corpuses and do some initial analysis about sentiment, topicality, what kind of language is being used.

The hope is what we’re eventually going to do is something like a cluster analysis. We’re going to take all these cultural dimensions, we’re going to group the movies, and we’re going to group them against how the Hollywood community falls out on that and see what’s closest. The essence of the model is saying, “Can we take some movie and match it up to a community and say it fits fairly well with their worldview?” Maybe it leads to some additional predictive value, but it also just seems like something inherently interesting to know.

WH: So to figure out how a film would be received among a group of people, you’re going to analyze what they read to figure out a “worldview,” then you’re going to find out what is said about the film [in reviews], and then you’re going figure out which film is most similar to the worldview.

GA: That’s it! That’s the core of the process. And I laugh a little bit, because it’s highly speculative, obviously. We’re not sure how this is going to work at all. We do a fair amount of traditional language processing and social media culling, but this certainly pushes the boundaries.

WH: We love the models that are like, “We have no idea if this will work, but it’s cool to think about.”

GA: We fall into that group, yes.

INITIAL PREDICTIONS

Based on the model, Gary’s team thinks “The Big Short” is the best cultural fit for the current mood in Hollywood, followed by “Spotlight” and “Brooklyn.” The team is still working on refining its predictions for the acting awards: Considering that Bryan Cranston (“Trumbo”) and Jennifer Lawrence (“Joy”) are in the lead, Gary said, the team is probably “going down in flames on this one” if it can’t tweak the model.

More Predict The Oscars:

The 2016 Oscars Race

How Much Do We Need To Know To Predict The Oscars?

Can The Internet Predict The Oscars?

FiveThirtyEight’s Guide To Predicting The Oscars

Can You Fake The Academy To Predict The Oscars?