Reflections Of A Degenerate ‘American Idol’ Gambler

First I enjoyed “American Idol,” then I studied it, then I gambled on it, then I watched it fall apart.

Watching “Idol” for me was a lot like anything else I do for fun. That is, it involved making inferences and probability-based predictions by synthesizing and interpreting different kinds of information, including hard data (like the rate of busy signals for each contestant during after-show voting) and completely subjective impressions. Like, seriously, how good is this?¹

With the show coming to an end after its 15th season this week — at least until its big nostalgia-soaked revival a few years from now — I’ve been reflecting on what it meant to me, as well as what made it so great and what made it so terrible.

Morning in ‘American Idol’

In the early days, there was something deeply un-cynical about “Idol.” For all the cheese, it was fundamentally a high-stakes talent show.

Its first season, in 2002, set this tone. Kelly Clarkson had folksy charm and crazy skills — and though many worried that she didn’t look the part as well as becurled heartthrob Justin Guarini — it now seems silly that her victory was ever in doubt. The talent won out and was rightly rewarded with wild success.

The notion that “American Idol” was fundamentally a show about making stars — one that cared about who could deliver great performances above all else — remained relatively undisturbed for years. The most dynamic contestants routinely dominated the competition. Ruben Studdard and Clay Aiken battled with much fanfare in its second season, Fantasia emerged from a seesaw season three and seemed headed for stardom, and Carrie Underwood’s versatile country vibe led Simon Cowell to presciently predict she would win (and sell more records than any previous winner) — with 11 contestants remaining.

The show could not have asked for a better start. It was promising something ridiculous, presented in a ridiculous package, and it was delivering.

The performance show and the (relatively meaningless) results show became the highest-rated shows on television and drove the Fox network to its first overall ratings victories in the 18-49 age group (i.e., “the demo”).

But then, in season five, this glorious enterprise took a bit of a turn.

The year the music died

Now, I don’t know if it’s fair to say that season five set the show’s demise in motion, but let’s just say that was the year things got complicated. It started out promising, with several talented contestants generating buzz early and often. Mandisa could belt it. Katharine McPhee gave her fans McPheever. Elliott Yamin was a delightfully goofy and earnest crooner, and even Ace Young looked a bit like Justin 2.0. And of course, there was Chris Daughtry, the rocker whose performance of Fuel’s “Hemorrhage” led to actual gigs with the band.

If you know anything about “Idol,” you know how this played out. Daughtry was “shockingly” eliminated in fourth place, and the crown was taken instead by “none of the above,” aka Taylor Hicks, the lovable leader of the Soul Patrol, whose dancing Cowell once likened to that of “a drunken uncle at a wedding.”

Daughtry has had a relatively successful career, while Hicks started a perma-trend of commercially unviable winners.

With the show losing its luster as a kingmaker, it took stranger and stranger turns: Sanjaya’s surprising run to the top seven (which now-defunct site Vote For The Worst — no description necessary — famously took credit for). Steven Tyler taking a spot on the panel and calling everything “beautiful.” White guys with guitars. Mariah Carey and Nicki Minaj turning the show into an exhibitionistic tabloid feud. Nick Fradiani. And scene:

The rise of DialIdol

Season five was also when serious “Idol” fans started getting aggressive in their analysis.

Normally, any details about fan voting were limited to the tidbits host Ryan Seacrest dropped on the results shows — like which contestants were in the bottom three.² But then came the enterprising folks at DialIdol.com, who programmed a computer dialer fans could use to auto-vote for their favorite contestants using a phone line and modem. DialIdol then aggregated the number of busy signals relative to votes for all its users.³ It made some necessary adjustments to account for the timing of the busy signals — it was most meaningful early in the dialing window when phone lines were jammed — and produced a DialIdol Score.

If you gambled on the show — as I did — being a good interpreter of DialIdol was more important than knowing it existed.

But was this metric any good at predicting who was doing best (and worst) with voters? Mostly, yes.⁴ Early in a season, it couldn’t really be relied upon to tell you everything, but it still made for valuable information. As the field winnowed down, though, it got deadly accurate.

DI started near the end of season four. Its first accomplishment was predicting that Underwood had won the crown over Bo Bice even though Bice “won” the finale according to What Not To Sing (a Metacritic for “Idol” performances that aggregates critics’ ratings of contestant performances).⁵ Underwood won despite having lower ratings for her performances in the finale, as well as a lower average for the season as a whole.

But DI’s first major triumph came in season five, when, with four contestants left, it predicted that Daughtry would be eliminated. Although there were hints on the show that Daughtry’s time might be up — he was a surprise addition to the bottom three a few weeks earlier, had not performed well during Elvis week and was several weeks removed from any of his “five-star” performances — the moment was considered (and portrayed as) a stunning elimination.

Not only did this confirm that DI was on to something, but it also demonstrated that seemingly strong and popular contestants who didn’t have a loyal voting base were living on a knife’s edge. Contestants needed blocs committed to voting for them whether or not they did well every week.

With the field down to four or fewer, DialIdol stayed dominant for six-plus years. There were 12 times during seasons four through 10 where DI scores would predict different eliminations than that week’s performance ratings (according to WNTS) would suggest, and DialIdol was correct each time.⁶

A front row to the apocalypse

DialIdol confirmed many fears and suspicions about how voting factions formed and acted, but it was limited in what it signaled. It did a good job of identifying losers in a given week, and who was perhaps stronger or weaker than suspected.

But if you gambled on the show — as I did — being a good interpreter of DialIdol was more important than knowing it existed. Most of the other gamblers knew what it said, and betting lines would fluctuate wildly with DI’s early results. The easy analogy to make is that it was a lot like polling: It was the closest thing we had to hard data about voter preferences. (There are even some similar problems: As with polling, DialIdol struggled as more and more people moved to alternative voting methods — in this case by texting or voting online.)

Thus, in the height of my fandom and financial interest, my typical routine for processing an episode was something like this:

First I’d watch the episode through from start to finish, skipping only commercials.
Then I’d watch the performances, skipping the judging.
Then I’d listen to the performances without watching them.
Then I’d watch just the judging minus the performances. (When Simon left, I started skipping this step.)
Then I’d read all the reviews of the performances online.
Then I’d check the buzz in the gambling forums.
Then I’d check DialIdol: What was each performance’s raw score (total votes by DialIdol users), as well as its East Coast and West Coast busy signal rates?
Then I’d download the performances on iTunes to listen to high-quality versions — both the live recording and, when available, the longer “studio” version.
Then I’d check to see how the bootleg YouTube videos were doing: What percentage of likes they were getting, what kind of traffic they were getting, etc.
Then I’d check IdolForums to see what was getting the most buzz, or, more importantly, to see which contestant forums were getting the most posts and which fan clubs were gaining the most members.
I’d update my spreadsheets and then probably watch the performances again as I tried to figure out what was going on.

At the time, there were a number of online futures markets that had “Idol” contracts (and gambling online was easy), but a lot of my best action came from prop bets arranged directly with other gamblers (and usually paid out on poker sites). Wagers might range from fun $10 “last-longers” to serious four-figure bets about who would win the season.⁷ My editor has asked me five times to reveal how much I won or lost, but I’ll just say I came out ahead in most years as well as overall.

To the end, I was a believer in the importance of performances. This was probably a bit naïve, or at least dangerous. There was a lot to be learned from putting yourself in the shoes of different kinds of viewers and imagining how they would react to what you’d seen.

But you could fall in love with a contestant or a performance the same way that an Oscar forecaster can fall in love with a movie, or a political observer can fall in love with a candidate. I admit that my biggest misses were mostly of this variety.

As the show has gone on and factions have taken over, however, it has seemed clear that performances have become less and less important.

To test this, I ran season-by-season regressions predicting whether a contestant would be in the bottom group each week and looked at how the significance of performance ratings as a predictor changed over time:⁸

Yes, the trend is somewhat noisy. There are fairly few data points, though each represents its whole season. But I wasn’t trying to prove something out of whole cloth — this supports what I already suspected from using my own eyes and ears.

What went wrong?

So what has contributed to this? I think a number of things:

As the viewership has declined, the demographics of the show’s voter pool have probably gotten more niche.
These days, fanbases are built up weeks or months before the voting even starts through early episodes of the show. The first few weeks are the most tense because people are paying the most attention, but as the season wears on, performances now get less and less important.
Now that the pretense of being able to produce stars is gone, voters may not be trying to find a superstar, they’re just voting for the person they want to win.
Voting itself has changed from phone-dialing to internet voting.
At some point the show shifted from covers to original arrangements — or more precisely, covers of covers. David Cook famously covered Incubus’s cover of Lionel Richie’s “Hello” and Chris Cornell’s cover of Michael Jackson’s “Billie Jean” and was called a creative genius each time. My suspicion is that this shift skewed results against more traditionally good performances that nevertheless received good reviews.

That list surely isn’t exhaustive. Plus there’s another big one: Simon.

Simon Cowell left the show after season nine. Note that in the previous chart the noisiest points came after Simon left.

I’ve always thought Simon’s talent — aside from his somewhat forced one-liners and colorful analogies — was cutting through a blandly good performance and identifying why it wasn’t great. This helped crystalize and shape people’s opinions. Sure, he appeared to get bored after a while. But Simon’s boredom was our saving grace: If a performance bored him, it didn’t matter if the singer hit his high notes or sang in tune, he just called it boring. Even if his particular criticisms didn’t ring true, his making them gave cover and comfort to those who felt like they were supposed to like a performance on its merits but weren’t moved by it.

And that’s not 100 percent theory, there’s some evidence in the data (subtle, of course, given the size of the data set). Here I’ve plotted contestants’ average performance rating versus their finishing position (bubble size corresponds to number of performances) before and after Simon left the panel:

The main thing to note here is that with Simon on the show, ratings and finish had a tighter relationship, and the trend was fairly similarly shaped for men and women. Granted, part of the reason the post-Simon chart is noisy is because he has been gone for fewer seasons than he was on the show, but I still find it revealing. In addition to having a flatter curve — suggesting that reviewers may be less incisive without Simon’s guidance — only a small number of women get low ratings, and women who’ve made it deep into the show have mostly gotten predictably similar ratings. In this new era of bland, positive reviews, only one woman has won the whole thing (so far). To find the great contestants, you need to see the contrasts with the not-quite-great ones.⁹

Without a neutral arbiter, someone who wasn’t afraid to hurt a contestant’s feelings or come across poorly to a contestant’s fans, the voting public has been left rudderless. And so it has crowned the Scotty McCreerys of the world.

Do I think Simon could have “saved” the show? Unlikely. Indeed, I think he left after he had determined he couldn’t.

Ultimately, for me “Idol” was a series of lessons about politics. Every week, new opinions formed, old opinions changed, votes were collected and consequences were doled out. Every cycle, the landscape changed, and every part of that process changed with it.

And just as with that other kind of politics this site covers, the results and the trajectory they charted could sometimes be disheartening. But a surefire way to cope with your own heartbreak — for better or worse — is to bet on it.

Footnotes

For my money, Lady Gaga’s advice — “Give them a little Edith Piaf. Give them a little ‘I’m crazy, and I’m a laugh away from a tear.’” — is the best ever given on the show.
For the purposes of this analysis, I’ve assumed that everything Seacrest unambiguously states on the show is true, despite what some conspiracy theorists lurking on old “Idol” message boards may say.
There were other power-dialing programs but no others that measured the busy signal (that I know of).
I should note that if you dig around the DialIdol site, it does not seem to employ very robust statistics — e.g., the site calculates a completely arbitrary “margin of error” — but its data was still useful.
Lately they have also relied on crowdsourced ratings as the number of critics reviewing “Idol” performances has dwindled.
Its streak would finally end with four contestants left in season 11, by which point volume of voting via DI (like phone voting in general) had declined precipitously. DialIdol recorded about 238,000 vote attempts and 38,000 busy signals in the last three weeks of season 11, which had steadily dropped from highs of 3.5 million votes and 2.6 million busy signals in the last three weeks of season five. Yet it still correctly predicted that Phillip Phillips would win over the routinely higher-rated Jessica Sanchez.
Kids: Don’t try this at home. First, betting on offshore futures markets is pretty much illegal in the U.S. now, and second, don’t ever bet with strangers unless you know their reputations well. Or, you know, at all.
Specifically, this is the t-statistic (coefficient divided by standard error) for average performance rating in a given week, weighted by number of performances that week.
This is one reason I really would have liked Simon to have been on the show when Haley Reinhart was a contestant. She started season 10 weakly but then had a great streak toward the end, all without the help of the judges. With no one making a coherent broader case for her, she was like a “Survivor” competitor who has to keep winning every challenge to stay on the island. She ultimately came up short in the third leg of the top three, when she was given the final slot of the night and failed to impress with “You Oughta Know.” Having Simon judging the show I think would have increased her chances of winning, though it may also have increased her chances of being eliminated much earlier.

FiveThirtyEight