I really liked the “Rich Data, Poor Data” story you wrote for ESPN The Magazine’s March 2 Analytics Issue, but I couldn’t help thinking that your first point, “sports has awesome data,” was missing one crucial word: “men’s.”
Men’s sports have awesome data.
Unfortunately, the beauty and breadth of sports data don’t yet extend to women. There are other ways to cover women’s sports intelligently, but the lack of accessible and complete data is incredibly limiting. We’ve struggled with this at FiveThirtyEight — where our job is to tell compelling stories with data — because of how much more difficult it is to find data that is “accurate, precise and subjected to rigorous quality control” like we’ve come to enjoy in men’s sports.
Take the recent news about Diana Taurasi’s decision to leave the WNBA for a salary of $1.5 million in the Russian Premier League. Neil Paine and I wanted to look at the distribution of WNBA player salaries, but as far as we can tell, that data doesn’t exist. There are league averages, but no player-by-player data.
It’s pretty crazy that we know how much money Chris Bosh will make in 2018, but we don’t know exactly how much money the former No. 1 draft pick Brittney Griner makes in the WNBA now. (We could speculate based on her rookie salary.) Imagine what this would be like in men’s sports! “We’re not sure how much money Anthony Bennett is making in the NBA right now.”
And while you can easily look up all 14,260,129 at-bats in the history of Major League Baseball, I have no idea how many at-bats were taken during the five years of the Women’s Professional Softball League. That league folded — along with any of the data it recorded, presumably — and now the new National Pro Fastpitch league has archives that only go back to 2004. (And it appears that they haven’t been updated since 2009.)
With such incomplete data, it’s hard to draw as rich of conclusions about how women play professional softball (better, worse, faster or slower than before?). You can glean a lot more from 85 years of data than from five. There’s not only better historical data, but there’s far more data recorded for men’s sports too. The PGA Tour site, for instance, lists hundreds of performance stats for each player. On the LPGA site, there are only eight.
Here’s another example: You know how our colleague Carl Bialik often writes about women’s tennis? He told me how tough it is to find good data. The ATP World Tour has data available for hundreds of men — by year and by surface — as well as individual match stats by set. The WTA, in comparison, only posts data online for the top 10 women in the current year. (Bonus hurdle: It comes locked in a PDF!)
There doesn’t seem to be much more parity in college sports data. Asked for NCAA women’s basketball tournament rankings, an NCAA official told me: “The women’s championship does not provide publicly the 1-64 seed list that is then flowed into the S Curve.” But that data is publicly available for the men’s NCAA basketball tournament.
We know Barcelona soccer messiah Lionel Messi has a below-average heading percentage, based on a set of data that includes 16,574 players and 24,904 games in both league and international play. But what about Abby Wambach — does she have the greatest international heading percentage of all time? I’m working with data from one World Cup and a few U.S. women’s national soccer team games. I don’t have anything for her club games.
If I sound discouraged, I don’t mean to! Like you, I am psyched about the data stories that we haven’t been able to tell in women’s sports but soon will. Just because the data is shittier and more difficult to find doesn’t mean that it’s not out there on random blogs or passionate Twitter feeds. (If you’re compiling women’s sports data or know of good resources, drop us a note in the comments.)
And just because the data doesn’t exist doesn’t mean we can’t compile it ourselves or make estimates based on what is available. I just think that in addition to praising the virtues of men’s sports data, we need to acknowledge that good women’s sports data is severely lacking.