## DataLab

This is the final part of my four-part response to questions and comments stemming from my article “The Hidden Value of the NBA Steal.” Here are Part 1, Part 2 and Part 3.

Near the beginning of my article on steals, I made the following claim:

If you had to pick one statistic from the common box score to tell you as much as possible about whether a player helps or hurts his team, it isn’t how many points he scores. Nor how many rebounds he grabs. Nor how many assists he dishes out.

It’s how many steals he gets.

My argument went like this: Steals are super-valuable predictors relative to other box score stats. They are “worth” — predictively —  as much as nine points because they’re more difficult to replace than other stats.

But a number of astute readers noticed something missing. Here’s commenter Mike Schloat:

I struggle with the real life value of steals when looked at in this way since there are SO SO few of them. Averaging 2.5 steals — finishing a game with 2 as often as you finish with 3 — is such a minuscule part of the game, and frighteningly random when you actually look at what sometimes constitutes a steal.

It’s a fair point. Because steals are so rare, they could be much more predictive than other box score stats on the margins and still be less important overall. And in the original article, I didn’t show that marginal steals are such a great predictor that, despite being so rare, they are still the most valuable predictor.

So let me address that concern. There are two levels we need to consider: The first is how rare steals are relative to other events recorded in the box score, and the other is how much steals vary from player to player, relative to how much other stats vary from player to player.

For example, in my dataset, players who played more than 20 minutes averaged .92 steals and .55 blocks per game. But the standard deviations — the typical amount that any particular player is likely to differ from an average player — were .43 steals and .59 blocks.

One way to judge how skilled a player is at a particular thing is to measure how many standard deviations they are above average. These values fluctuate, but the difference between Ricky Rubio (the league leader this year) and an average player is about a steal and a half, making him a little over three standard deviations above average for the steals per game stat.

To judge a stat’s overall predictivity, what we want to know is the extent to which a player’s skill in that stat predicts his overall value (measured by the impact on his team’s performance by his playing or not). For example, if a player is two standard deviations above average in steals per game but only one standard deviation above average in points per game, how does his value compare to a player who is the reverse?

To figure this out, we can run a regression similar to the one in the original article. But instead of using a player’s raw box score stats as our variables, we use his standardized stats; that is, the number of standard deviations the player is above or below the mean for each. The relative size of the coefficients (how much a stat should be weighted) that this type of regression spits out tells us the relative predictive importance of each stat overall.

Here are the results of such a regression, from the player’s standardized box score stats to his impact on team win percentage. I’ve listed the relative size of each stat’s coefficient (weight) as a percentage of the whole — reflecting the percentage of information about a player’s value that comes from each (note that turnover value is negative, I’ve converted it to a positive “skill at not giving up turnovers” for purposes of comparison):

This was the finding behind the claim that of all the basic box score totals, steals are the most predictive. It may be less sexy than nine points, but it’s pretty remarkable that a skill that comes up so infrequently can be so important.

Of course, there are a lot of different ways to structure this kind of regression: You have to decide which types of variables to use, how advanced they should be, whether to use game-based or play-based data, and what specific difference to predict.

So, why am I analyzing this particular group of stats at all?

I made a list of all the people who use points, rebounds and assists per game in their analysis and reporting more often than steals per game:

• Almost all sports reporters
• Almost all sports commentators
• Almost all sports columnists
• Almost all sports fans

Establishing the predictive ability of box score stats is only a tiny step toward improving our understanding of the dynamics of basketball. But, like the steal itself, it has outsize importance.

Benjamin Morris researches and writes about sports for FiveThirtyEight.

All Sports

### Who Would You Invite To The First GOP Debate?Jul 31, 2015

Filed under , , ,

Never miss the best of FiveThirtyEight.