When To Hold Out For A Lower Airfare

In 2008, Microsoft snapped up Farecast, a company in the business of predicting airfares, for a handy $115 million. A poster child of the Big Data revolution, Farecast analyzed hundreds of billions of airline ticket prices using a machine-learning algorithm and told consumers when to buy. The acquisition helped make price prediction the key differentiator of Bing Travel, a core asset of Microsoft’s new “decision engine.”

But a few weeks ago, right as I started gathering data for this article, the heralded price predictor vanished from Bing Travel. Initial reports suggested executives had made a business decision to focus resources on travelers’ other needs. With Microsoft’s exit from the price-prediction game, it’s time we re-assessed the technology to see if what doomed Farecast was instead a failure of Big Data.

I was a loyal user of Bing Travel’s price predictor, so its removal came as a shock. (I now use the travel Web site, Kayak, which has a similar feature.) Like most travelers, I long ago abandoned travel agents, preferring to book my flights online. Before Farecast, I purchased tickets two weeks before my departure date, on the conventional wisdom that I could get good prices around that time. But I always experienced a nagging doubt. Should I take the current price, or wait? I wondered if airlines really did set their prices to the lowest level two weeks out.

Oren Etzioni, a computer science professor at the University of Washington who founded Farecast, told me the early adopters of airfare predictors were quantitative types like myself. For each search, Bing Travel advised users whether to buy now or to wait. A buy recommendation came with an explanation, similar to what Kayak says today: “There is a 79% chance that the price would increase by $20 or more in the next 7 days.” Likewise, a wait recommendation came with a statement of confidence about the future movement of prices.

In their 2013 book “Big Data,” Viktor Mayer-Schönberger and Kenneth Cukier wrote that using Farecast’s technology, Bing Travel “was saving consumers a bundle.” My guess is that my loyalty to Bing Travel saved me a pinch, not a bundle. And if this technology was really as powerful as claimed, Microsoft wouldn’t have recycle-binned it.

My own travel bookings are limited and biased, so to determine whether to trust airfare predictors I adopted a more scientific approach. I decided to compare two real ticket-purchasing strategies: buying the ticket two weeks ahead of my scheduled departure (my old method) versus buying only when the price predictor — in this case Kayak — recommended that I buy (the algorithm). The bottom line: the accuracy of current fare-prediction technology is pretty modest, but you may want to use it anyway, even if you might not save any money.

I started my test on March 29. I searched for non-stop economy fares on Kayak for 32 of the most popular domestic routes, including New York to Chicago, San Francisco to Seattle and Los Angeles to Miami, specifying a departure of April 12 — two weeks out — and a return of April 18. My analysis is potentially biased by the set of 32 routes I selected.¹ Since I did not choose a random sample, the average savings for these 32 routes might not be the same as the average savings across all routes. (According to Giorgos Zacharia, Kayak’s chief technology officer, the routes that present the greatest challenges to forecasters are those with sparse data or the greatest variability.)

Kayak issued immediate “buy” recommendations for 17 of the 32 queries. Given that I would have accepted the 14-day-out prices anyway, using Kayak made no difference for these routes, and so I stopped analyzing them.

For the remaining 15 routes, I searched again the next day, and again until the first time Kayak recommended that I buy. By the ninth day of searching, every route was settled. Then, I compared the final price to the initial price on March 29, which is what I would have paid had I not used Kayak.

You can evaluate a predictive algorithm in many ways. I’d argue, however, that the best way is to see whether the algorithm provides any benefit over a real alternative — in my case, buying tickets two-weeks ahead — instead of some ideal. In other words, I’m not checking to see whether Kayak accurately predicted when fares hit their lowest price during the two-week test period. That would be measuring Kayak’s algorithm against an impossibly high bar. As long as Kayak’s algorithm saves me more money than the ticket-purchasing strategy I’d otherwise use, it’s performing well, or at least relatively well.²

Following Kayak’s buy and wait recommendations produced savings on five of the 15 routes I investigated. The savings ranged from $10 to $192, or from 1 percent to 40 percent of the two-week-ahead purchase price. Take the New York-to-Boston route. Had I bought the ticket 14 days out, I would have paid $482. Kayak advised waiting. The fare dropped 25 percent the next day. I didn’t purchase it since the algorithm said to wait again. On the third and fourth days, the price inched up, unnerving me a bit, but I stuck with Kayak and waited. On the fifth day, 10 days from the date of departure, JetBlue came through with a $290 ticket. Score!

These wild swings in airfare form the basis of the prediction business. Zacharia confirmed that fluctuations of 20 to 50 percent are common. Since flight seats lose all their value on the day of departure, airlines use sophisticated algorithms to project consumer demand, adjusting prices fluidly. “This continues through the last seven days,” Farecast’s Etzioni said, “and this is an untapped insight in airfare prediction since Farecast decided not to focus on that period of time. It would be difficult to persuade people to wait ’til so late.”

Now let’s look at a flight for which following Kayak’s algorithm cost me money: the Los Angeles-to-Dallas-Fort Worth route. Listening to Kayak’s wait recommendation, I watched nervously as the price climbed from $308 to $437 over three days. On the fifth day, though, the fare retreated to $368, whereupon Kayak recommended buying. This still meant I paid $60 more because I waited.

For the 15 routes I investigated, there were more instances in which the final fare exceeded the price found on the first day. If weighted equally, the prices paid, following Kayak’s algorithm, were 2 percent higher than the initial March 29 prices. Wearing my statistical hat, I’d call that a tie between the two strategies.

From this analysis, you could say airfare prediction is a Big Data failure. But doing so misses the point. It takes a little more effort to unpack its secrets.

The Kayak algorithm is amazing at capturing value when the opportunity presents itself. Throughout the test period, I used Kayak Price Alerts to track fares for all the routes that required waiting. When I reviewed the price trajectories for the 10 flights in which I lost money following Kayak’s algorithm, I found that fares never once dipped below the 14-day-out level. In other words, there were no savings to be had. For the five routes in which there was a financial benefit to waiting, Kayak successfully reduced my fare in each instance.

Take the Los Angeles-to-Chicago route. Kayak recommended purchasing on the second day, right before the fare overshot the 14-day-out price and never came back. This is a sure sign of intelligence.³

But this still leaves open the complaint that Kayak suggested waiting on many routes for which there was no chance to improve upon the initial price. Undoubtedly, this is one kind of predictive failure, but such failure is unavoidable in any predictive model.

I recently spoke with Tim Harford, a columnist of the Financial Times, about this type of predictive failure as it pertains to the giant retailer Target, which has deployed a Big Data model to direct sales at pregnant women. Imagine that out of a group of 100 women, 10 are pregnant, and an analyst is asked to predict, based on the women’s online shopping profiles, which of the group is pregnant. In order to have any chance of finding the 10 pregnant women, the analyst must predict that 20 are pregnant. In other words, any good algorithm erroneously presumes some non-pregnant women to be pregnant, a type of error known as a “false positive.”

Kayak’s wait recommendations for the 10 flights for which prices never fell below their initial amounts were false-positive errors in airfare prediction. Hoping to capture more savings opportunities — or in Target’s example, identify the pregnant shoppers — the algorithm overshot, asking travelers to wait to buy on more flights than it should have. As we saw earlier, only a fraction of the routes offered potential savings. How companies tune their predictive models depends on how they judge the costs of these false-positive errors.

I asked Farecast’s Etzioni and Kayak’s Zacharia about this. Interestingly, they didn’t agree on strategy. Etzioni believes showing too many buy recommendations creates distrust amongst users, an impression that Farecast is not working hard enough to find better prices. Farecast carefully managed the fraction of buy recommendations to be between 67 and 80 percent. Kayak has the opposite concern. When a buy recommendation errs, users don’t know about it because they don’t typically track subsequent price changes. But when users see fares continue to rise while they’re waiting, the error is highly visible, and they may even cancel their trips. So, Zacharia said, “Kayak wants to make sure our ‘wait’ recommendations are as accurate as possible, even if that means we sacrifice a little bit of the ‘buy’ accuracy.”

In the end, even though my analysis shows that you might not save any money following Kayak’s algorithm as opposed to buying tickets two weeks ahead of your scheduled departure, it still might be worth your time and energy to use airfare prediction software. Why? Because following the algorithm isn’t going to cost you more money, and it might actually relieve some of the second-guessing that occurs when you’re left to your own devices. Etzioni said he found in user surveys that in addition to appealing to quantitative types, another group of regular users said Farecast gave them “peace of mind.”

Correction (April 20, 11:06 a.m.): An earlier version of this article misstated the year that Microsoft bought Farecast. It was 2008, not 2009.

Footnotes

The list of routes is as follows:
- Atlanta to Chicago
- Atlanta to Orlando
- Atlanta to Washington, D.C.
- Chicago to Boston
- Chicago to Denver
- Chicago to Las Vegas
- Chicago to Miami
- Chicago to Minneapolis
- Chicago to Washington, D.C.
- Los Angeles to Chicago
- Los Angeles to Dallas-Fort Worth
- Los Angeles to Denver
- Los Angeles to Honolulu
- Los Angeles to Las Vegas
- Los Angeles to Seattle
- Miami to New York
- Miami to Atlanta
- New York to Atlanta
- New York to Boston
- New York to Charlotte
- New York to Chicago
- New York to Dallas-Fort Worth
- New York to Los Angeles
- New York to San Francisco
- New York to Washington, D.C.
- Orlando to New York
- San Francisco to Chicago
- San Francisco to Las Vegas
- San Francisco to Los Angeles
- San Francisco to San Diego
- San Francisco to Seattle
- Washington, D.C., to Boston
These are essentially the most popular domestic routes in the U.S., and as such, the result is relevant to the largest number of readers.
There are some useful extensions of my analysis: increasing the number of routes covered; replicating the analysis at different times of the year; and replicating the analysis with different start dates relative to the date of departure.
The algorithm frequently missed the absolute lowest price, but remember, the price could shoot up at any time if I waited longer. Reading these price charts too literally can be misleading, since future prices were not known at the time of the wait-or-buy decision.

FiveThirtyEight

When To Hold Out For A Lower Airfare

Footnotes

Comments