Skip to main content
ABC News
The Herculean Effort Taken By One Group To Show Hollywood Is Sexist

Anecdotally, many believe that Hollywood is sexist. For example, can you name five female-led action movies made in the past decade off the top of your head? How about a movie this year that was directed by a woman? A feature-length film about D-list comic-book characters, “Guardians of the Galaxy,” was released before a movie about an A-list female character, “Wonder Woman,” went into production. Think about that.

But anecdotes can be countered with other anecdotes. What about “The Hunger Games” or “Divergent” franchises, which feature female leads? What about Sandra Bullock, Angelina Jolie and Scarlett Johansson, who are huge audience draws? What about “Frozen”? Let it go already, one might respond.

So, how would you build an ironclad case that women are disregarded on-screen and behind the camera? You’d have to look at the most popular movies, rigorously code each one, and inventory scores of writers and directors, hundreds of producers and thousands of characters — all the way from James Bond down to Barista No. 4. It would be an extraordinary effort just to prove what many already intuit. And even after investing hundreds of hours, there’s no guarantee that people with the power to change anything in Hollywood would take notice.

That’s the mission and modus operandi of Stacy Smith and her team of researchers at the Media Diversity and Social Change Initiative, a think tank at the University of Southern California’s Annenberg School for Communication and Journalism. Every year, Smith compiles the single-most comprehensive set of statistics about women and minorities in film — in front of and behind the camera — through a rigorous and borderline masochistic process. She and her team are trying to do the impossible: convince gatekeepers that women are people, too, and should probably, you know, be in movies.

In dissecting the top 100 grossing films each year, Smith and her team have analyzed a total of 26,225 characters in 600 films for gender, body type, age, race and more. In their most recent annual review, released in July, they found that in 2013, only 29 percent of characters were female, and a mere 28 percent of the films had a female lead or co-lead.

Similarly, in the five previous years the study was conducted — 2007, 2008, 2009, 2010 and 20121 — women were underrepresented on-screen.


The underrepresentation was more severe in movies catering to younger audiences. About 1 in 4 characters were female in PG-rated films, while just over 3 in 10 characters were female in R-rated films. Only 16 films had rough gender parity — in which women made up 45 to 54.9 percent of named or speaking roles — and only two films had more women than men.2


When it came to the people behind the camera — in the roles of director, writer and producer — only 16 percent were women. This statistic has been stagnant over time.


When it came to race, 74.1 percent of characters in last year’s top 100 grossing films were non-Hispanic white, a group that makes up 63.7 percent of the U.S. population. This hasn’t changed meaningfully in the six years of the study either.

Smith and her team also found that in film and television, women were “sexualized” — partially or fully nude and wearing sexually revealing clothing — substantially more often than men.

And last week, the group released a study that found that the disproportionate representation of men in film also extended to other top film-producing nations.

To come up with these statistics, Smith and her team developed a methodology for tracking all sorts of things in film, including sexualization, anthropomorphized animal characters, age and even key plot elements like “Truman Show”-style character dualities.

“When we started out, it was tricky. It was a challenge,” said Marc Choueiti, project administrator of the Media Diversity and Social Change Initiative. Building the methodological system from scratch wasn’t easy. “We figured out a system that we’re very confident in. We capture every speaking character on-screen,” he said. Uttering one word means the character speaks — so that’s a lot of characters.

“We’re a factory of producing 5,000 characters out of 100 films every year,” said Choueiti, who trains the students coding the films.

The work is demanding. For the typical coding, you can’t just watch a movie straight through and get it right — it involves stopping, rewinding, teasing out voices from a crowd to figure out not only who is talking, but what she looks like, etc. Coders crosscheck on IMDb when possible, but just because someone speaks doesn’t mean she scores a credit, so the coders have to drill deeper. And the same film has to be reviewed independently by at least three coders, and cross-checked by a fourth, to make sure everything is right.

“I work about 10 hours per week,” veteran coder Nathalia Taveres said, “so I watch about three movies per week.”

The coders are armed with a manual that’s been used since 2008. Once a character is named or speaks, they fill out a suite of stats on her. And if you want to find out how Hollywood objectifies women, you’ve got to be as objective as possible.

Some things are easy to determine. Characters are divided into form — human, animal, supernatural creature, anthropomorphized supernatural creature or anthropomorphized animal — and the study also notes the character’s relationship status and parental status if they’re explicitly provided.

Other characteristics can get more complicated. For example, the researchers are interested in the portrayal of unrealistic body images, so the coders have to classify each character’s build, graded from A (which is emaciated) through G (which is morbidly obese). They’re given pictures of body type to help them accurately gauge the grade.

They also code gender and race. They’re trained to make judgments based not only on the gender or race of the actor, but how the actor is presented in the film. This can sometimes be tricky, but they have lot of procedures to get it right. In “Argo,” for example, Ben Affleck, who is white, plays Tony Mendez, who is Hispanic, so he’s coded as Hispanic. And in “Hairspray,” John Travolta plays someone’s mother, so he’s coded as female. When the actor’s reality differs from the story’s, the story’s is preferred.

There are also lines that are drawn when considering which characters are worth grading for human attributes. Think of the difference between Goofy and Pluto — they’re both dogs, but it’s only worth analyzing Goofy for human attributes like body size or states of dress. Jabba the Hutt isn’t worth grading against human attributes, but his Twi’lek performers are.

Even something as simple as perceived age can be thorny. Characters are tagged depending on which age bucket they fall into — 5 or younger, 6 to 12, 13 to 20, 21 to 39, 40 to 65 and 65 or older. Most people are good at sorting characters into the three younger groups, but it can be hard, particularly for coders in their late teens or early 20s, to gauge age over 30. When they can’t cross-check IMDb for the age of the actor — like in the case of an uncredited background character who speaks — the coders are trained to reach a consensus based on stuff like laugh lines, hair color or the age of that character’s children (if any).

When you get to the variables describing sexualization of different characters, it gets more complicated. Characters are coded on their attire — if they wear sexually revealing clothing or if there’s nudity, which describes the “amount of exposed skin between the high upper thigh and mid chest regions,” according to the latest report. These two variables are only counted if the character has a humanlike body.3

Then there’s the three-point scale gauging a character’s attractiveness. This is a huge question for objective researchers: How do you code for something that appears to be an entirely subjective feature? And how do you make that rigorously scientific? Smith and her team have come up with a solution: If someone makes an explicit reference to how a character looks, coders count that. So “0” means no reference to physical beauty, “1” means one reference and “2” means two or more. It could be a verbal comment or compliment, or anything like wolf whistling or elevator eyes. Coders have to watch the film closely to tease out whether this happens to a single character once or more than once and record that correctly.

Keeping in mind that all of these characteristics need to be coded for every speaking character in a film — with about 40 characters per film, give or take — you can see how coding three films in 10 hours takes skill.

“Once you learn everything, by your second month you’re just going at it,” coder Gabe Rocha said. “I kind of found myself — when watching movies or TV shows at home — just thinking numbers in my head, like I’d tell my friends, ‘Oh, that person’s a two.’ I started doing that even in real life, just walking around campus.”

And all of them have stories of the worst films to code. Choueiti still shivers when remembering “Despicable Me 2.” He had to sit down with a student coder, Angel, to figure out all the minions. “I was like, ‘Angel, we need to track the eye size, the width and height of every single one of these.’ ”

Everyone has That Movie. For Artur Tofan, it was “The Lego Movie” with its eclectic, animated and particularly talkative cast. For Yoobin Cha, it was “The Hobbit” with a dozen nearly identical dwarves. For Rocha, it was “Free Birds” with its seemingly countless — but in the end, countable — computer-generated turkeys.

“I did ‘The Smurfs,’ ” Taveres said. “It was terrible. … Of course, some of them have like a hat, different clothing. But they all looked the same.”

Sports movies also drew ire, because any name that appears on a jersey means the character is named and must be coded, which means a lot of squinting at extras.

It’s not all tedious, though. “On the other hand,” Rocha said, “I watched ‘Gravity.’ ”

The coders might eventually get some reprieve. Several are now teaming up with engineers on what they call “the Google project.” The Geena Davis Institute, which funds research and activism in women in film, won a Google Impact Award worth $1.2 million that will be used to find a way to automate a lot of the manual effort involved in coding a film.

In the seven years since its first report, the team hasn’t noticed significant changes in women’s representation in American film. If anything, its data is remarkable in that it’s so consistent. But there are a few bright spots.

“Every now and again, we find something that’s really interesting. In the last top grossing report, we started looking at genre,” Smith said. “One thing that surprised all of us was that women were doing very well in comedy.” From 2007 to 2013, that rate was even, at 36 percent — which is slightly above the rate of female character in all of film. Women are also thriving in documentaries, Smith said.

The greatest deficiency is in action and adventure films.


Katherine Pieper, a research scientist and a co-leader of the team, said the most recurring theme is that when money moves in, women move out. A core finding of the work the group did for the Sundance Institute was that women are disconnected from existing financial structures to make movies.

That business reality isn’t going to disappear overnight. But Smith thinks there’s another way to gauge the success of the group’s work. She reasons that for 10 years, she’s been teaching a class of 250 students each semester about on-screen gender imbalances, and Choueiti’s been training 60 to 100 students a semester in how to code films. That’s thousands of students who go into the movie industry after graduation.

“That’s how change is going to happen,” Smith said.

“There’s so many of them. They work at Sony, they work at Disney, and that’s what you want. You want infiltration of all levels of these multinational companies with people who realize, ‘Wow, there can be another way.’ ”

So, even if we can’t yet see evidence of increased awareness among Hollywood’s financial gatekeepers, the project’s having an effect on the young people who meticulously code the data.

“I find myself on Netflix looking for films that are directed by women,” said Ariana Case, a coder who’s worked with the project since 2011. “And I want to watch them, and I want them to do well.”


  1. Teams at San Diego State University and UCLA studied films in 2011, so the group skipped that year.

  2. That is, greater than 55 percent of the cast was female, because the group defined gender parity as between 45 and 54.9 percent female.

  3. Every variable can also be coded as “not applicable” or “can’t tell.”

Walt Hickey was FiveThirtyEight’s chief culture writer.