We’re Predicting The Career Of Every NBA Player. Here’s How.

Congratulations! You, along with the other 3.2 billion people on the Internet, are now the proud owner of CARMELO, an algorithm that forecasts the future performance of NBA basketball players.

The basic premise of CARMELO is simple. For each current NBA player, CARMELO identifies similar players throughout modern NBA history¹ and uses their careers to forecast the current player’s future.

According to CARMELO, for example, Washington Wizards point guard John Wall, through this point in his career, is similar to former NBA players Isiah Thomas, Jason Kidd, Steve Francis and Kenny Anderson. Kidd continued to improve as a player through his mid-to-late 20s, while Thomas had a long peak and led the Detroit Pistons to two championships. So both are favorable comps for Wall. Francis and Anderson are less favorable. So while Wall has the potential to develop into a superstar, he’s not a sure thing.

CARMELO originated out of work I did for a 2014 article about the New York Knicks’ Carmelo Anthony. Hence the name, CARMELO, which my colleague Neil Paine (our senior sports writer) and I later developed into a silly backronym (Career-Arc Regression Model Estimator with Local Optimization). But the real inspiration for CARMELO is PECOTA, a system I built for Baseball Prospectus in 2003 to forecast the careers of baseball players. I’ve been thinking about developing a “PECOTA for basketball” for more than a decade and, thanks to help from Neil, Allison McCann (one of our visual journalists) and the rest of my FiveThirtyEight colleagues, we’ve finally gotten around to it.

INTERACTIVE: Check out CARMELO projections for every player in the NBA.

CARMELO is considerably simpler than PECOTA, however. It has fewer bells and whistles. It projects each player’s playing time and overall value on offense and defense, but not his component statistics.² The simplicity is partly by design. We think CARMELO gets the basics right and will be a fun and revealing way to explore the NBA. But we’d like to see how it does before complicating the model further.

Let’s take a quick tour of how CARMELO makes its projections, using Wall as our guinea pig. One warning: The descriptions in the next few sections explain how CARMELO works for veteran players who have completed at least one NBA season; projections for rookies are similar in spirit, but there are a few differences that I’ll explain later on.

Step 1. Define the player’s skills

Before CARMELO can identify comparable players, it needs to define each player’s skills and attributes statistically.

It starts with some basic biographical information for each player. The most important attribute of all, in terms of determining a player’s future career trajectory, is his age. NBA players, like MLB players, improve on average through about age 27 and then begin to decline after that. The age listed on a player’s CARMELO card reflects his age as of Feb. 1, 2016, the rough midpoint of the upcoming NBA season.

Next, we list a player’s vitals: his height, weight and draft position. It’s almost always better for a player to be taller and bigger, other things held equal. Players chosen with an earlier draft pick tend to have a higher ceiling, meanwhile, even once we control for other variables.³

Below a player’s vitals, you’ll see a number of statistics listed. Note that these categories are not projected statistics; instead, they reflect the weighted average of a player’s performance over his past three NBA seasons, with the most recent season weighted more heavily.⁴

We start with a few statistics related to his scoring and shooting ability. (For more precise definitions of these, see Basketball-Reference.com’s glossary page.) Usage rate reflects what percentage of a team’s possessions were “used” by a player in the form of a shot, turnover or trip to the free-throw line. Because there are five NBA players in a team’s lineup at one time, the average usage rate is 20 percent.

True shooting percentage is an “enhanced” version of shooting percentage that reflects the value of 3-pointers and made free throws, in addition to 2-point shots. Players like LeBron James and James Harden, who rank highly in both usage rate and true shooting percentage, are the best scorers in the game, providing both volume and efficiency. We also list a player’s free-throw percentage. Although less important to his overall value, it provides a purer gauge of shooting ability than true shooting percentage, which reflects both shooting ability and shot selection. In fact, it’s best to look at these categories in tandem with one another. The Clippers’ DeAndre Jordan has one of the best true shooting percentages in the league despite being an incompetent free-throw shooter because most of his shots are high-percentage dunks and layups near the rim.

The next two categories, 3-point frequency and free-throw frequency, reflect what shots a player is taking rather than how often he’s making them. (Three-point frequency is the percentage of a player’s field goal attempts that are 3-pointers; free-throw frequency is his ratio of free-throw attempts to field-goal attempts.) It’s usually desirable to rank highly in both departments. Free throws — unless you’re DeAndre Jordan — are generally the most efficient shots in the NBA and are a reward for a player’s ability to work effectively in the paint. Three-pointers, meanwhile, remain more efficient than 2-pointers, on average. Furthermore, ranking highly in one or (especially) both categories can reflect a player’s ability to stretch the court and provide for better floor spacing, which may have a favorable effect on his teammates.

Next are two familiar attributes related to a player’s ball-handling: his assist rate (what percentage of his teammates’ field goals are assisted by him, while he’s on the court) and turnover rate (the share of team possessions that result in a turnover by the player). For CARMELO purposes, a high turnover rate is considered bad, just as it is in the NBA. Wall’s high turnover rate is one of the few major negatives with his game, for example.

Finally are a set of categories related to a player’s rebounding and defense. His rebound rate is the share of rebounds he grabs while on the floor (10 percent is average). His block rate is the share of opponents’ 2-point field-goal attempts that result in his blocking a shot, and his steal rate is the share of opponents’ possessions that end in his stealing the ball. Last is a player’s defensive plus-minus rating. CARMELO’s plus-minus ratings reflect a 50-50 blend of Box Plus/Minus (BPM) and Real Plus-Minus (RPM). I’ll have more to say about plus-minus ratings in the “Fine Print” section down below; the important thing to know for now is that a rating of zero reflects an average defender, rather than a poor one.

Step 2. Identify comparable players

These statistics can sometimes tell a reasonably complete story about each player. In Wall’s case, they describe a high-volume, medium-efficiency scorer who distributes the ball really well. He’s also a good athlete who plays good defense, especially for a point guard. On the downside, Wall commits a lot of turnovers. And he neither shoots all that many threes nor draws all that many fouls, which can make his game flat at times.

These categories, along with a few others related to durability and playing time, form the basis for selecting CARMELO comparables. The basic idea is this: Because Wall is 25 years old this season, CARMELO runs a profile for past NBA players⁵ heading into their age-25 season.⁶ Then it identifies the most similar ones. Historical players start with a perfect similarity score of 100, and points are subtracted for every difference. Because Wall has a high assist rate, for example, a player with a low assist rate will lose a lot of points and is unlikely to be among Wall’s top comparables. CARMELO applies this process for 19 statistical categories, some of which are weighted more heavily than others.⁷

The process sounds complicated, but the comparisons are sometimes intuitively satisfying. As a Pistons fan growing up, for example, I can see the similarities between Wall and his No. 1 historical comp, Isiah Thomas. Compare their stats on Basketball-Reference.com and you’ll see where CARMELO is coming from: They are eerily alike in some respects. Even so, the comparison is not perfect. Thomas drew more contact around the basket, resulting in more free-throw attempts. But he was undersized, whereas Wall isn’t, exactly.

Like snowflakes, in other words, no two NBA players are exactly alike. While a theoretically perfect similarity score is 100, Thomas registers at a 57 instead. By CARMELO standards, that’s high: Many NBA players don’t have any comparables with a similarity score above 50. And similarity scores above 60 are even rarer.

This is partly because of the way CARMELO defines similarity scores. A score of 0 is average, not bad. Dominique Wilkins has a similarity score of about 0 relative to Wall, for instance; they’re not much alike, but they aren’t totally off one another’s radar. Many players will have negative similarity scores instead; Manute Bol’s similarity score to Wall is -113.⁸ Here’s a rough guide for interpreting similarity scores:

SIMILARITY SCORE	DESCRIPTION
100	Perfect score; identical
60-99	Separated at birth
50-59	Extremely similar
40-49	Highly similar
30-39	Mostly similar
20-29	Partly similar
1-19	Somewhat similar
0	As similar as dissimilar
	More dissimilar than similar

You can find a more technical description of CARMELO similarity scores, which are calculated using a version of a nearest neighbor algorithm, in the footnotes.⁹

Step 3. Make a projection

Each player’s top 10 comparables are listed in his CARMELO card. Each comp has a mini-graph (sparkline) depicting how that player’s career progressed over the next seven seasons, where applicable,¹⁰ based on wins above replacement (WAR):

So a player’s CARMELO projection is formed just by averaging the career tracks for his top 10 comparables? That’s pointing in the right direction … but doesn’t quite tell the whole story.

For one thing, though only the top 10 comparables are listed on a player’s CARMELO card, the system uses all historical players with a positive similarity score to make its forecasts.¹¹ Usually this means that dozens and oftentimes hundreds of players are used in generating a forecast; 179 historical players have a positive similarity score to Wall, for instance. Each player’s contribution to the forecast is weighted by his similarity score: A player with a similarity score of 50 will have twice as much influence on the forecast as one with a score of 25, for example.

The second issue is more technical. Take a look at Stephen Curry’s CARMELO card, for instance.

Although Curry has a few extremely flattering comparables — Michael Jordan! — most of the others are not as good as him. Terrell Brandon, Terry Porter and Chris Mullin, for example, are listed among Curry’s top 10 comps. They were good players, perhaps slightly underrated players, but none achieved the heights of excellence that Curry has already realized. They were a poor man’s version of Steph Curry — in mostly the same style as Curry, but inferior across the board. CARMELO is aware of this problem and has a solution to it called a baseline projection, which I describe in the footnotes.¹²

Think probabilistically

A more important theme is that CARMELO’s forecasts are probabilistic. Wall is projected to finish with 8.7 WAR next season, for example. But there’s uncertainty around that estimate. Each player’s chart shows a range spanning the middle 80 percent of likely outcomes for the player.

These forecast ranges are often quite wide. Basketball is possibly the most predictable of the four major U.S. professional sports, but it still contains a lot of uncertainty. Wall’s range spans from 4.7 wins above replacement (not much better than a league-average player) to 12.9 wins (a possible All-NBA candidate), for example. And to reiterate, this range covers only 80 percent of his outcomes. If CARMELO is well-calibrated, then Wall has a 10 percent chance of exceeding the high end of his range (in which case he could be an MVP candidate) and a 10 percent chance of falling below the low end of his range (in which case, he’ll extend D.C.’s sports misery). Some players have wider ranges than others, especially young players such as Andrew Wiggins or players coming off injury such as Paul George.

The fine print

So far, we’ve mostly been discussing a player’s wins above replacement projection. But WAR is the endpoint in a CARMELO forecast and not the starting point. If you scroll down to the bottom of each player’s CARMELO card, you’ll see a section called “The Fine Print,” which provides further insight into how the WAR sausage is made.

In particular, WAR reflects a combination of a player’s projected playing time and his projected productivity while on the court.¹³ Productivity is measured by the statistic plus-minus, which requires some explaining.

Mathematically, plus-minus is not that hard to define: It reflects how many points a player contributes to his team’s scoring margin per 100 possessions, relative to an average player.¹⁴ Wall, for instance, had a plus-minus rating of +3.9 for the Wizards last year. That means with Wall on the court, along with four average players, the Wizards were outscoring their opponents by 3.9 points per 100 possessions. Plus-minus can be broken down into offensive and defensive components. Wall had an offensive plus-minus of +2.5 last season, which is how many points he added to the Wizards’ scoring per 100 possessions. And he had a defensive plus-minus of +1.4, which is how many points he subtracted from his opponents’ scoring with his defense.¹⁵

However, there are many versions of plus-minus, ranging from simple to complex. The version we use for CARMELO reflects a 50-50 blend of Daniel Myers’s Box Plus/Minus (BPM), a relatively simple statistic that can be calculated by using conventional “box score” statistics, and Jeremias Engelmann’s Real Plus-Minus (RPM), a more complex statistic derived from play-by-play data.¹⁶

Neil Paine, our senior sports writer, and I had a lot of debates (which echoed long-running arguments within the broader basketball stat-geek community) about which advanced statistics to use for CARMELO before deciding on this BPM/RPM blend. What settled the debate was that the BPM/RPM blend did better than alternatives like PER and Win Shares in a variety of out-of-sample testing.

However, no all-in-one advanced stat is magic, and this is a source of systematic uncertainty in any NBA projection system. If it seems as though CARMELO “loves” or “hates” a certain player, it may be because of how BPM and RPM rate the player. For instance, both BPM and RPM rate the Raptors’ Jonas Valanciunas poorly compared with statistics like PER. So if Valanciunas’s forecast seems pessimistic to you, it’s not because CARMELO expects his performance to decline (in fact, CARMELO has him getting a little better). It’s because BPM and RPM didn’t evaluate Valanciunas as being all that good to begin with.

CARMELO also projects each player’s minutes in upcoming seasons. These forecasts may seem pessimistic. Among the 29 players who played at least 2,500 minutes last year, for example, CARMELO forecasts 26 to play fewer minutes this year. But this reflects the reality of NBA history. Even players who had been entirely healthy up to a certain point in their careers, such as the Pacers’ Paul George, have sometimes suffered catastrophic injuries. Or they underwent some other life circumstance ranging from illness to suspension to an unexpected retirement. In fact, CARMELO’s playing time projections are designed to be slightly optimistic, on average.¹⁷

Which players are included in CARMELO?

Whew. We’ve gotten through WAR wars. Now for a few odds and ends. For instance, are you curious about a certain player but don’t see a CARMELO card for him? Or are you wondering why you are seeing a CARMELO card for a player who’s retired or hurt?

Our interactive includes every player who played at least 100 NBA minutes in the 2014-15 season, or 250 minutes in 2013-14. This includes players such as Shane Battier who we know are retired; we figure there’s no harm in showing their projections in case they decide to return to action this season.

We’re also showing projections for players who we know have suffered a serious, season-threatening injury, such as the Hornets’ Michael Kidd-Gilchrist. The reason for this is transparency; we think it’s cheating to omit a player based on news we’ve subsequently learned about him when that knowledge wasn’t available to CARMELO. However, we do account for injuries when formulating team depth charts, a process I’ll describe in a moment.

Rookie projections

We’ve also run projections for about 80 rookies with college experience; here’s D’Angelo Russell’s sweet-looking projection, for instance. These projections are derived from a database provided to us by ESPN Stats & Info, which includes strength-of-schedule-adjusted college statistics for prospects in the 2001 NBA draft class onward who subsequently played at least one NBA game.

Technically, these rookie projections are produced by a different program from CARMELO, one we sometimes call FABMELO after the Syracuse star (and seeming NBA flop) Fab Melo. However, the principles behind rookie and veteran projections are the same, and the differences boil down to a few relatively minor details:

Rookie projections omit a couple of statistics¹⁸ that were not included in the Stats & Info database. They also use Effective Field Goal Percentage (eFG%) in place of true shooting percentage.
The weights assigned to identify comparable players are somewhat different. Draft position is weighted much more heavily in college projections, for example.
Whereas veteran projections treat age as an absolute — a 31-year-old will be compared against only other 31-year-olds — rookie projections are slightly more flexible. A 21-year-old draft pick might be compared against a 20-year-old draft pick if they’re otherwise extremely similar, for instance.
Whereas for veterans, CARMELO formulates a baseline projection based on a player’s age and playing time and plus-minus rating in his past three NBA seasons, rookie projections use a player’s age, draft position and height.¹⁹

The tl;dr version: Rookie projections rely heavily on a player’s age and draft position. A No. 1 overall pick is almost always going to get a reasonably favorable projection, while a late second-round pick almost always won’t. Still, now and then the system will find a player it really likes (such as Russell) or dislikes (such as Frank Kaminsky) relative to his draft position; we’ll see in a few years how those forecasts turn out.

Finally, there are a few oddball cases. We’ve run rookie projections for a couple of players such as Josh Huestis who were chosen in the 2014 NBA draft but received little or no NBA playing time last year. This includes the Lakers’ Julius Randle, who played in just one game last year before getting hurt. CARMELO is fairly punitive toward players with a “gap year” between their draft year and their first prolonged NBA action, however.

What about draft picks from Europe (or other continents) who didn’t play U.S. college ball? They don’t get full-fledged CARMELO projections. (Sorry, Kristaps Porzingis.) However, we do run simple, baseline projections for them based on their age, height and draft position, so you may see them included in team depth charts.

Team projections and depth charts

In addition to running player forecasts, we’re also releasing team-by-team projections that include projected win-loss totals for each club.²⁰ Here are the Oklahoma City Thunder, for example.

Unlike the player forecasts, these team projections involve some human intervention. In consultation with ESPN’s NBA beat writers, we’ve developed a depth chart for each team, which accounts for current injury information along with other news about a team’s roster construction. However, we aren’t taking too many liberties. If the playing time we assign to a player significantly exceeds the playing time recommended by CARMELO, the system responds by lowering his plus-minus rating; Manu Ginobili would not be very effective if asked to play 36 minutes a game, for example. This has the effect of rewarding deep teams (like Ginobili’s San Antonio Spurs) and punishing those that are stretching to fill out the roster.

So … should I bet on these things?

Hmm. Umm. Probably not? FiveThirtyEight’s relatively simple, RPM-based projections performed quite well last year, edging out Vegas along with most other projection systems. In theory, based on our back-testing, CARMELO should be slightly more accurate still, improving on the simple RPM projections by about 10 percent. But back-testing is not the same thing as seeing how predictions perform in the real world against truly unknown data. Rookie forecasting models can be buggy, moreover. I’d probably hold off until the system has at least a year or two of experience under its belt.

Does Carmelo Anthony get a good CARMELO projection?

No, not really. In fact, CARMELO sort of hates the Knicks; it doesn’t play favorites.

Footnotes

More specifically, since the NBA-ABA merger in 1976.
It will project Wall’s wins above replacement but not his free-throw percentage, for instance.
Although this effect diminishes after a player’s first few NBA seasons. Don’t expect Andrea Bargnani to magically become good.
Specifically, the weights are set by default to 60 percent for the most recent season, 30 percent for the second-most-recent season, and 10 percent for the third-most-recent season. However, these initial weights are multiplied by the square root of a player’s minutes played in each season, so a season in which a player accumulated more playing time will be weighed more heavily.
Since the NBA-ABA merger in 1976.
More precisely, CARMELO calculates ages to the decimal point and looks for players who are no more than half a year older or younger. Wall will be 25.4 years old on Feb. 1, for example, so it looks for comparable players who were between 24.9 and 25.9 on Feb. 1 of past seasons.

The categories, and their weights, are as follows:

STATISTIC	WEIGHT	NOTES
Position	3.0	Positions are translated to a 1 (point guard) to 5 (center) scale.
Height	3.5
Weight	1.0
Draft position	2.5	Taken as a natural logarithm. Undrafted players are treated as having been chosen 30 picks later than the last pick in their season’s draft.
Career NBA minutes played	1.5
Minutes per game	3.5
Minutes played	6.0	For historical players, minutes for seasons shortened by labor disputes are prorated to 82 games.
True shooting percentage	5.0
Usage percentage	5.0
Free-throw percentage	2.5
Free-throw frequency	1.5
Three-point frequency	2.5	In determining comparability, the league-average 3-point frequency is subtracted from the player’s frequency.
Assist percentage	4.0
Turnover percentage	1.5
Rebound percentage	4.0
Block percentage	2.0
Steal percentage	2.5
Defensive plus-minus	2.0	Calculated as a 50-50 split between BPM and RPM.
Overall plus-minus	5.0	Calculated as a 50-50 split between BPM and RPM.

How are these weights derived? To be honest, they’re a little arbitrary and, in the style of the old Bill James similarity scores, intended to make “good basketball sense” rather than being optimized to the nth degree in a way that might produce an overfit model. But approximately speaking, 20 percent of the weight is assigned to each of five major categories:

Vitals and physical attributes;
Durability and playing time;
Shooting and scoring;
Other offensive tendencies and attributes;
Rebounding and defense.

While 100 is the upper bound on a similarity score, there is technically no lower bound. A player could have a similarity score of negative infinity. In practice, scores lower than about -300 are uncommon.
For each statistical category, CARMELO calculates the difference in standard deviations between the current player and all historical players. It then squares this number, multiplies it by the proportion of the weight assigned to each category and takes the square root of the sum of squares. It calls this number the player’s deviance. A player’s similarity score is calculated as 100*((1.25 – deviance)/1.25). The lowest possible deviance (for an exactly identical player) is zero; therefore the highest possible similarity score is 100.

CARMELO evaluates each statistic one season at time, such that if a player rapidly improves (as Klay Thompson did last year) or declines (as Kevin Love did), it will ideally find comparables who underwent the same pattern. In practice, however, this doesn’t make all that much difference given all the other considerations CARMELO is trying to keep in balance.
What if a comparable player hasn’t yet had a chance to play seven more seasons? Russell Westbrook’s 2013-14 season is listed as a comparable for Wall in 2015-16, for example. Likewise, Westbrook’s 2014-15 season is used to forecast Wall’s 2016-17. The problem is that CARMELO then runs out of Westbrook seasons; Westbrook hasn’t yet played his 2015-16 season. Because CARMELO has two years of useful data on Westbrook, its solution is to include Westbrook when forecasting Wall’s performance for the next two years but omit him for its forecast for years three through seven.
An exception is if CARMELO identifies very few similar players, possibly for a very old player or an outlier such as Hassan Whiteside, in which case the system is designed to relax its standards and give players with mildly negative similarity scores some weight in calculating a player’s forecast.
A baseline projection is an extremely simple, Marcel the Monkey-type projection based only on a player’s age, playing time and plus-minus score over the past three seasons. CARMELO calculates a baseline projection for each player and then evaluates how he performed relative to that baseline. If Terry Porter exceeded his baseline projection, for instance, it will take that as a sign that Curry will also exceed his baseline projection. To simplify slightly: If the baseline projection had Porter as a 10-win player and he turned out to be a 12-win player, CARMELO will take that to imply that Curry will also exceed his baseline forecast by two wins. So if Curry had a baseline projection as a 15-win player, the precedent established by Porter would project Curry as a 17-win player instead.
To be more specific — where PM is a player’s plus-minus rating and MIN is his minutes played, WAR is calculated as follows:

WAR = (PM*MIN*(2.18))/(48*82)
Since each team has about 100 possessions in a typical NBA game, this is roughly equivalent to his value per 48 minutes.
In case this isn’t clear, having a positive defensive plus-minus score is favorable for a player.
To make matters even more confusing, there are a number of versions of RPM. In some versions, data from previous seasons is used to help inform the current season’s RPM — so how a player performed in 2013-14 would affect his RPM for 2014-15, for example. CARMELO uses a version that includes data from a single season only. Engelmann generously provided this data to FiveThirtyEight for the 2000-01 through 2014-15 NBA seasons. For seasons before 2000-01, no RPM is available and CARMELO uses BPM only.
This is because CARMELO excludes comparables who received zero playing time in the next season. This effect is most pronounced for older players. Dirk Nowitzki’s forecast reflects only comparables who continued to play in their age-37 season, for example, rather than those who retired beforehand.
For instance, free-throw percentage and defensive plus-minus.
In the baseline projection, it’s not inherently helpful for a player to be taller or shorter. However, the projection will assign more of a player’s value to defense if he’s tall, and more of his value to offense if he’s short.
Win-loss totals are based on a version of the Pythagorean formula. They use a Pythagorean exponent of 11.5, which produces relatively conservative win-loss totals. However, in our backtesting, this exponent produces the most accurate team forecasts when dealing with RPM- and BPM-based projections.