Scoring in professional basketball is one of the most beautiful things in sports. With only moments to set up his shot, a player tosses a ball into a soaring arc, and it drops through a hoop only slightly larger than the ball. That or he flies to the hoop and deposits the ball directly.
It’s no wonder, then, that individual players’ scoring abilities get the most attention. But basketball is a complex and dynamic sport, and this skill is only one of many that determine what kind of impact a particular player has on the bottom line.
In fact, if you had to pick one statistic from the common box score to tell you as much as possible about whether a player helps or hurts his team, it isn’t how many points he scores. Nor how many rebounds he grabs. Nor how many assists he dishes out.
It’s how many steals he gets.
This phenomenon — that steals is one of the most informative stats in basketball — has important implications for how we think about sports data. But it can also help us investigate real-life basketball mysteries, such as “What the heck is going on in Minnesota?”
Consider the curious case of Ricky Rubio. A professional basketball player since the age of 14, he won a silver medal in the 2008 Beijing Olympics (leading a strong Spanish team in assists, steals and even defensive rebounds during the knockout rounds). The Minnesota Timberwolves drafted him in 2009 with the fifth overall pick (age: 18), but he initially stayed in Spain, not making his NBA debut until 2011.
During the two years Rubio spent at FC Barcelona, his eventual Minnesota teammate Kevin Love ascended into the ranks of the NBA’s statistical elite. This left many to expect (or hope) that adding Rubio would finally make the Timberwolves a contender. But in his first two seasons, the Timberwolves still haven’t made the playoffs. Going into the 2013-14 season, ESPN’s TrueHoop Network ranked Rubio as the 49th best player in the league (only slightly ahead of teammate Nikola Pekovic). He has struggled with injuries and is considered a terrible, “makes Rajon Rondo look like Reggie Miller”-type shooter.1
Since entering the NBA, Rubio has been dominant in two major statistical categories: not scoring and steals. Of all players averaging 30-plus minutes, Rubio’s 10 points per game is the third-fewest overall, and the worst of all guards by more than a point.2
His 2.4 steals per game, on the other hand, is the second most. It’s only .1 steals behind five-time NBA steals champion Chris Paul (and Rubio edges Paul in steals per minute and steal percentage).
What do you do when you have highly divergent indicators such as these? NBA stat geeks have been trying to mash up box score stats for decades. The most famous attempt is John Hollinger’s player efficiency rating, which ostensibly includes steals in its calculation but values them about as much as two-point baskets.3 In other words, steals have only a small effect on a player’s PER. Despite his stealing prowess, Rubio has a career PER of 15.6, ranking 82nd in the league for the period. Meanwhile, Love has a PER of 25.7 (fourth in the league) over that same time.
Hollinger weights each stat in his formula based on his informed estimation of its intrinsic value. Although this is intuitively neat, empiricists like to test these sorts of things. One way to do it is to compare how teams have performed with and without individual players, using the results to examine what kinds of player statistics most accurately predict the differences.4 In particular, we’re interested in which player stats best predict whether a team will win or lose more often without him.
By this measure, PER vastly undervalues steals. Because steals and baskets seem to be similarly valuable, and there are so many more baskets than steals in a game, it’s hard to see how steals can be all that important. But those steals hold additional value when we predict the impact of the players who get them. A lot more value. So much so that a player’s steals per game is more important to evaluating his worth than his ability to score points, even though steals are so much rarer.
To illustrate this, I created a regression using each player’s box score stats (points, rebounds, assists, blocks, steals and turnovers) to predict how much teams would suffer when someone couldn’t play.5 The results:
Yes, this pretty much means a steal is “worth” as much as nine points. To put it more precisely: A marginal steal is weighted nine times more heavily when predicting a player’s impact than a marginal point.6
For example, a player who averages 16 points and two steals per game is predicted (assuming all else is equal) to have a similar impact on his team’s success as one who averages 25 points but only one steal. If these players were on different teams and were both injured at the same time, we would expect their teams to have similar decreases in performance (on average).
Steals have considerable intrinsic value. Not only do they kill an opponent’s possession, but a team’s ensuing possession — the one that started with the steal — often leads to fast-break scoring opportunities. But though this explains how a steal can be more valuable than a two-point basket, it doesn’t come close to explaining how we get from that to nine points.
I’ve heard a lot of different theories about how steals can be so much more predictively valuable than they seem: Steals “cost” less than other stats,7 or players who get more steals might also play better defense, or maybe steals are just a product of, as pundits like to call it, high basketball IQ. These are all worth considering and may be true to various degrees, but I think there’s a subtler — yet extremely important — explanation.
Think about all that occurs in a basketball game — no matter who is playing, there will be plenty of points, rebounds and assists to go around. But some things only happen because somebody makes them happen. If you replaced a player with someone less skilled at that particular thing, it wouldn’t just go to somebody else. It wouldn’t occur at all. Steals are disproportionately those kinds of things.
Most people vastly underestimate how much a player’s box score stats are a function of that player’s role and style of play, as opposed to his tangible contribution to his team’s performance. A player averaging one more point per game than another doesn’t actually mean his team scores one more point per game as a result of his presence. He may be shooting more than he should and hurting his team’s offense. Similarly, one player getting a lot of rebounds doesn’t make his team a good rebounding team: He may be getting rebounds that his team could have gotten without him.
What we are looking for is a kind of statistical “irreplaceability.” If a player produces one more X (point, rebound, steal, etc.) for his team, and is then taken from the team (by injury, suspension, trade, etc.), how much of that stat does his team really lose? How much of it can be replaced?
I tested for this by running a series of regressions using each player’s box score stats (points, rebounds, assists, etc.) to predict how much teams would suffer without a player in each particular area. In other words, for a player who averages X points, Y rebounds, Z assists, etc., how much does his team’s scoring decrease when he’s out? How much does its rebounding decrease? The way I’ve set it up, a stat’s irreplaceability will roughly run from zero (completely replaceable) to one (completely irreplaceable).8 Let’s visualize it like so:9
So, look at the points-per-game column. Suppose a player averages one more point per game than another player. His team is likely to average only an additional .17 points with him on the floor because points are 83 percent replaceable. It would take almost six points of his scoring to add one additional point to his team’s tally.
For steals, the picture is much different. If a player averages one more steal than another player (say 2.5 steals per game instead of 1.5) his team is likely to average .96 more steals than it would without him (if all else stayed equal). That’s why, as an individual player action, steals are much more irreplaceable than points.
Basketball is a game of high scores and small margins. The best team ever — the 1995-96 Chicago Bulls — only won by an average of 12 points per game, and I’d be surprised if more than a handful of players have ever been worth half that on their own (maybe Michael Jordan, probably LeBron James). With steals 96 percent “irreplaceable,” and each worth a couple of points, one extra steal per game puts a good player well on his way to being an excellent one.
With this in mind, it’s worth taking another look at Rubio, the quirky sidekick to MVP candidate Love. Rubio seems deficient at the game’s central skill (putting the ball in the hoop) but is gifted at the one that matters to my model (thievery).
It’s our good fortune that Rubio and Love have missed a number of games at different times, so we can check whether there’s anything to be gleaned by comparing team performance with and without them. Here are his and Love’s win percentages and average team margin of victory both together and separate since 2011-12:
In other words, the Timberwolves have struggled to win games when either one of its duo out, and they’ve lost quite badly with both gone. Despite being an elite scorer and rebounder who is routinely ranked as one of the league’s top players, Love’s observable impact has been only marginally better than Rubio’s.10 So far, both are putting up elite numbers. The Timberwolves have played nearly seven points per game worse without Rubio in their lineup. That’s absurdly high. So high that I’d be surprised if either player’s numbers bore out in the long run. But it’s worth noting that, contrary to conventional wisdom, Rubio may be exceeding expectations.
Taken alone, this comparison doesn’t answer the question of Rubio’s value, and it doesn’t prove that steals are as valuable as I think they are. But it’s powerfully consistent with that claim. More important, it’s a perfect example of how, even in a storm of complex, causally dynamic, massively intertwined data and information, sometimes odd little things that are known to be reliable and predictable are the most valuable.
Editor’s note: A table in this article has been updated to include additional data from the past week.