The MIT Sloan Sports Analytics Conference stages a research paper competition each year, and those papers usually contain results and ideas that are far more interesting than anything uttered on stage during the conference’s panels. Here are five trends that emerged in this year’s eight finalists:
- It’s all about big data sets: Camera-tracked data isn’t new to the pro sports scene; PITCHf/x data has been around for a decade. But only recently have we started to see it, along with other relatively large data sets, take over the research competition at Sloan. Four of the eight paper finalists used camera-tracked data, two more used sizable play-by-play databases, and another used a massive collection of geotagged in-game mobile-device requests from MLB stadiums. Simply put, research that doesn’t have to grapple with the demands of bigger data sets is becoming less common among Sloan paper finalists.
- The rise of machine learning: With the increased prominence of such large data sets, it was inevitable that state-of-the-art machine-learning techniques would begin to make their mark at Sloan. For instance, one of this year’s most interesting finalists used a “random forest” framework to predict the outcome of a tennis point after any shot based on the speed, trajectory and location of the ball, the context of the shot and priors for a player’s style derived from cluster analysis. (What this means to you is that if it works, the algorithm will be able to ferret out not only the most crucial points in a match, but also the most crucial shots.) Another paper used supervised learning to develop custom player-by-player strategies for pick-and-roll defense in the NBA — a clever way to translate statistical knowledge about a player into actionable tactics. In many ways, an amount of data so staggering can only be coherently processed using these kinds of advanced statistical techniques, so I wouldn’t be surprised if we see them used more in future research.
- A focus on classifying player types: As part of its model, that tennis paper developed what it called “style priors” for each player based on the types of shots he tends to play. Another paper, about complementary players in basketball, estimated the effect of an individual player’s skill set on the behavior of his teammates. The emphasis on underlying tendencies and similar player types to provide context and inform prediction isn’t unprecedented — PECOTA was doing a version of that 13 years ago — but it is now being used with far more granular data, to improve prediction in a wider variety of sports (particularly dynamic ones such as tennis and basketball).
- The Hot Hand, Part 1,000,000: Few topics have generated more research in psychology and statistics than the hot-hand fallacy. It has surfaced again with a Sloan paper finalist. The seminal work on the subject declared the hot hand nothing more than a trick of the mind, but there’s been a recent trend toward debunking the hot-hand debunkers. Here, that trend continues — using baseball data, the authors find that recent changes in player performance can be predictive and that opposing teams mostly react to them in an appropriate manner. But I’m guessing this won’t be the final word in the hot-hand wars.
- Fewer finalists from the “Big Four” and more from the business of sport: Compared to Sloan conferences past, this year’s crop of finalists featured easily the fewest papers focused on the North American “Big Four” sports of baseball, basketball, football and hockey.
NO. OF FINALIST PAPERS TOPIC 2013 2014 2015 2016 Basketball 4 4 2 2 Baseball 1 3 1 2 Football 1 0 1 0 Hockey 1 0 1 0 Soccer 1 1 1 1 Tennis 0 0 0 1 Gambling 0 0 1 0 Business 0 0 1 2
Using the Internet Archive, I tracked the breakdown of finalists by sport going back to 2013; that year, seven of the eight finalists researched a Big Four sport. This year, the number is down to four. Also of note is the emergence of finalists concerned with the business of running a sports franchise. Zero finalists focused on the subject in 2013 and 2014, but that changed last year with the inclusion of a paper about dynamic ticket pricing. Now we’re up to two finalists focused on topics like brand engagement and sponsorship revenue.