Kickers Are Forever

In football, there are constant power struggles, both on and off the field: players battling players, offenses battling defenses, the passing game battling the running game, coaches battling coaches, and new ways of thinking battling old ways of thinking. And then there are kickers. Battling no one but themselves and the goalposts, they come on the field in moments most mundane and most decisive. They take all the blame when they fail, and little of the credit when they succeed. Year in and year out, just a little bit at a time, they get better. And better. And better. Until the game is completely different, and no one even noticed that kickers were one of the main reasons why.

If you’ve been reading my NFL column Skeptical Football this season, you may have noticed that I write a lot about kickers. This interest has been building for a few years as I’ve watched field goals drained from long range at an ever-increasing rate, culminating in 2013, when NFL kickers made more than 67 percent of the kicks they took from 50-plus yards, giving them a record 96 such makes. There has been a lot of speculation about how kickers suddenly became so good at the long kick, ranging from performance-enhancing drugs (there have been a few possible cases) to the kickers’ special “k-balls” to more kick-friendly stadiums.

So prior to the 2014 season, I set out to try to see how recently this improvement had taken place, whether it had been gradual or sudden, and whether it was specific to very long kicks or reflected improvement in kicking accuracy as a whole.

What I found fundamentally changed my understanding of the game of football.¹

The complete(ish) history of NFL kicking

Pro Football Reference has kicking data broken down by categories (0-19 yards, 20-29, 30-39, 40-59 and 50+ yards) back to 1961. With this we can see how field goal percentage has changed through the years for each range of distances:

It doesn’t matter the distance; kicking has been on a steady upward climb. If we look back even further, we can see indicators that kicking has been on a similar trajectory for the entire history of the league.

The oldest data that Pro Football Reference has available is from 1932, when the eight teams in the NFL made just six field goals (it’s unknown how many they attempted). That year, kickers missed 37 of 113 extra-point attempts, for a conversion rate of 67.3 percent. The following year, the league moved the goal posts up to the front of the end zone — which led to a whopping 36 made field goals, and a skyrocketing extra-point conversion rate of 79.3 percent. With the uprights at the front of the end zone, kickers missed only 30 of 145 extra points.

For comparison, those 30 missed extra-point attempts (all with the goalposts at the front of the end zone) are more than the league’s 28 missed extra-point attempts (all coming from 10 yards further out) from 2011 to 2014 — on 4,939 attempts.

In 1938-39, the first year we know the number of regular field goals attempted, NFL kickers made 93 of 235 field-goal tries (39.6 percent) to go with 347 of 422 extra points (82.2 percent). In the ’40s, teams made 40.0 percent of their field goal tries (we don’t know what distances they attempted) and 91.3 percent of their XPs. In the ’50s, those numbers rose to 48.2 percent of all field goals and 94.8 percent of XPs. The ’60s must have seemed like a golden era: Kickers made 56 percent of all field goals (breaking the 50 percent barrier for the first time) and 96.8 percent of their extra points.

For comparison, since 2010, NFL kickers have made 61.9 percent of their field goal attempts — from more than 50 yards.

In the 1960s, we start to get data on field goal attempts broken down by distance, allowing for the more complete picture above. In 1972, the NFL narrowed the hash marks from 18.5 yards from 40, which improved field goal percentages overall by reducing the number of attempts taken from awkward angles. And then in 1974, the league moved the goal posts to the back of the end zone — but as kick distances are recorded relative to the posts, the main effect of this move was a small (and temporary) decline in the extra-point conversion rate (which you can see in the top line of the chart above). Then we have data on the kicks’ exact distance, plus field and stadium type, after 1993.²

So let’s combine everything we know: Extra-point attempts and distances prior to 1961, kicks by category from 1961 to 1993, the kicks’ exact distance after 1993, and the changing placement of goal posts and hash marks. Using this data, we can model the likely success of any kick.

With those factors held constant, here’s a look at how good NFL kickers have been relative to their set of kicks in any given year³:

When I showed this chart to a friend of mine who’s a philosophy Ph.D.,⁴ he said: “It’s like the Hacker Gods got lazy and just set a constant Kicker Improvement parameter throughout the universe.” The great thing about this is that since the improvement in kicking has been almost perfectly linear, we can treat “year” as just another continuous variable, allowing us to generalize the model to any kick in any situation at any point in NFL history.

Applying this year-based model to our kicking distance data, we can see just how predictable the improvement in kicking has actually been:

The model may give teams too much credit in the early ’60s — an era for which we have a lot less data — but over the course of NFL history it does extremely well (it also predicts back to 1932, not shown). What’s amazing is that, while the model incorporates things like hashmark location and (more recently) field type, virtually all the work is handled by distance and year alone. Ultimately, it’s an extremely (virtually impossibly) accurate model considering how few variables it relies on.⁵

This isn’t just trivia, it has real-world implications, from tactical (how should you manage the clock knowing your opponent needs only moderate yardage to get into field goal range?) to organizational (maybe a good kicker is worth more than league minimum). And then there’s the big one.

Fourth down

If you’re reading this site, there’s a good chance you scream at your television a lot when coaches sheepishly kick or punt instead of going for it on fourth down. This is particularly true in the “dead zone” between roughly the 25- and 40-yard lines, where punts accomplish little and field goals are supposedly too long to be good gambles.

I’ve been a card-carrying member of Team Go-For-It since the ’90s. And we were right, back then. With ’90s-quality kickers, settling for field goals in the dead zone was practically criminal. As of 10 years ago — around when these should-we-go-for-it models rose to prominence — we were still right. But a lot has changed in 10 years. Field-goal kicking is now good enough that many previous calculations are outdated. Here’s a comparison between a field-goal kicking curve from 2004 vs. 2014:

There’s no one universally agreed-upon system for when you should go for it on fourth down. But a very popular one is The New York Times’ 4th Down Bot, which is powered by models built by Brian Burke — founder of Advanced Football Analytics and a pioneer in the quantitative analysis of football. It calculates the expected value (either in points or win percentages) for every fourth-down play in the NFL, and tweets live results during games. Its 19,000-plus followers are treated to the bot’s particular emphasis on the many, many times coaches fail to go for it on fourth down when they should.

A very helpful feature of the 4th Down Bot is that its game logs break down each fourth-down decision into its component parts. This means that we can see exactly what assumptions the bot is making about the success rate of each kick. Comparing those to my model, it looks to me like the bot’s kickers are approximately 2004-quality. (I asked Burke about this, and he agrees that the bot is probably at least a few years behind,⁶ and says that its kicking assumptions are based on a fitted model of the most recent eight years of kicking data.⁷)

But more importantly, these breakdowns allow us to essentially recalculate the bot’s recommendations given a different set of assumptions. And the improvement in kicking dramatically changes the calculus of whether to go for it on fourth down in the dead zone. The following table compares “Go or No” charts from the 4th Down Bot as it stands right now, versus how it would look with projected 2015 kickers⁸:

Having better kickers makes a big difference, as you can see from the blue sea on the left versus the red sea on the right. (The 4th Down Bot’s complete “Go or No” table is on the Times’ website.)

Getting these fourth-down calls wrong is potentially a big problem for the model. As a test case, I tried applying the 4th Down Bot’s model to a selection of the most relevant kicks from between 25 and 55 yards in 2013, then looked at what coaches actually did in those scenarios. I graded both against my kicking-adjusted results for 2013. While the updated version still concluded that coaches were too conservative (particularly on fourth-and-short), it found that coaches were (very slightly) making more correct decisions than the 4th Down Bot.

The differences were small (coaches beat the bot by only a few points over the entire season), but even being just as successful as the bot would be a drastic result considering how absolutely terrible coaches’ go-for-it strategy has been for decades. In other words, maybe it’s not that NFL coaches were wrong, they were just ahead of their time!

Time-traveling kickers

Having such an accurate model also allows us to see the overall impact kicking improvement has had on football. For example, we can calculate how kickers from different eras would have performed on a common set of attempts. In the following chart, we can see how many more or fewer points per game the typical team would have scored if kickers from a different era had taken its kicks (the red line is the actual points per game from field goals that year):

The last time kickers were as big a part of the game as they are today, the league had to move the posts back! Since the rule change, the amount of scoring from field goals has increased by more than 2 points per game. A small part of the overall increase (the overall movement of the red line) is a result of taking more field goals, but most of it comes from the improvement in accuracy alone (the width of the “ribbon”).

How does this compare to broader scoring trends? As a baseline for comparison, I’ve taken the average points scored in every NFL game since 1961, and then seen how much league scoring deviated from that at any given point in time (the “scoring anomaly”). Then I looked at how much of that anomaly was a result of kicking accuracy.⁹:

Amid wild fluctuations in scoring, kicking has remained a steady, driving force.

For all the talk of West Coast offenses, the invention of the pro formation, the wildcat, 5-wide sets, the rise of the pass-catching tight-end, Bill Walsh, the Greatest Show On Turf, and the general recognition that passing, passing and more passing is the best way to score in football, half the improvement in scoring in the past 50-plus years of NFL history has come solely from field-goal kickers kicking more accurately.¹⁰

The past half-century has seen an era of defensive innovation — running roughly from the mid-’60s to the mid-’70s — a chaotic scoring epoch with wild swings until the early ’90s, and then an era of offensive improvement. But the era of kickers is forever.

Reuben Fischer-Baum contributed graphics.

CORRECTION (Jan. 28, 2:22 p.m.): An earlier version of this article incorrectly gave the distances from which extra-point kicks were taken in 1933 and in recent years. Actual extra-point distances aren’t recorded.

Footnotes

And possibly offered insight into how competitive sports can conceal remarkable changes in human capability.
This info is likely out there for older kicks as well, but it wasn’t in my data.
This is done using a binomial probit regression with all the variables, using “year taken” as a categorical variable (meaning it’s not treated like a number, so 1961, 1962 and 1963 may as well be “Joe,” “Bob” and “Nancy”). This is similar to how SRS determines how strong each team is relative to its competition.
Hi, Nate!
So how accurate is this thing? To be honest, in all my years of building models, I’ve never seen anything like it. The model misses a typical year/distance group prediction by an average of just 2.5 percent. Note that a majority of those predictions involve only a couple hundred observations — at most. For comparison, the standard deviation for 250 observations of a 75 percent event is 2.7 percent. In other words, the model pretty much couldn’t have done any better even if it knew the exact probability of each kick!

While there is possibly a smidge of overfitting (there usually is), the risk here is lower than usual, since the vast majority of each prediction is driven solely by year and distance. Here’s the regression output:

I wish I could take credit for this, but it really just fell into place. Nerds, perk up: The z-value on “season” is 46.2! If every predictive relationship I looked for were that easy to find, life would be sweet.
I don’t blame Burke or others for not updating their models based on the last few years. It’s good to be prudent and not assume that temporary shifts one way or the other will hold. Normally it is better to go with the weight of history rather than with recent trends. But in this case, the recent trends are backed by the weight of history.
Here’s his full statement: “The bot is about 3-4 years behind the trends in FG accuracy, which have been improving at longer distances. It uses a kicking model fitted to the average of the recent 8-year period of data. AFA’s more advanced model for team clients is on the current ‘frontier’ of kick probabilities, and can be tuned for specific variables like kicker range, conditions, etc. Please keep in mind the bot is intended to be a good first-cut on the analysis and a demonstration of what is possible with real-time analytics. It’s not intended as the final analysis.”
The exact values in the chart may differ slightly from the reports on the Times’ website because I had to reverse-engineer the bot’s decision-making process. But basically I’m assuming the model gets everything exactly right as far as expected value from various field locations, chances of converting a fourth-down attempt, etc., then recalculating the final expected value comparison using 2015 kickers.
The scoring deviation on this chart is calculated relative to the average game over the period. The kicking accuracy is relative to the median kicker of the period.
Side note, I’ve also looked at whether kicking improvement has been a result of kickers who are new to the league being better than older kickers, or of older kickers getting better themselves. The answer is both.

FiveThirtyEight

Kickers Are Forever

The complete(ish) history of NFL kicking

Fourth down

Time-traveling kickers

Footnotes

Comments