Where Do Spelling Bee Words Come From?

The pressure is so great, and the stakes so high, that one contestant fainted at the mere utterance of the word “alopecoid.” He fell headfirst to the unforgiving stage floor, only to stand up, unaided, as if nothing at all had happened and nothing were the matter. He composed himself and said the letters: A-L-O-P-E-C-O-I-D. If the mic weren’t affixed to a stand, he surely would’ve dropped it.

Such is the human drama at the Scripps National Spelling Bee, the finals of which are Thursday in National Harbor, Maryland. (ESPN, which owns FiveThirtyEight, is airing the competition on ESPN2 during the day and on ESPN beginning at 8 p.m. Eastern.) In their quest for the $40,000 first prize, the 285 competitors, mostly middle-schoolers, will draw on extensive study, reach deep into their temporal lobes and attempt to spell words plucked from nearly 500,000 entries in Webster’s Third New International Dictionary. To aid them in their pursuit, the kids will invoke one phrase, over and over again: “Can I have the language of origin?”

Once a speller is given her word, she may ask the judges to use it in a sentence and to offer its pronunciation, definition, part of speech and — importantly — language of origin. The latter can give all-important clues to a word’s spelling. In a Latin word, for example, a k sound might be spelled with a “c” (like canal). In an English word with a Greek root, a short i sound might be spelled with a “y” (like symbiosis) an f sound with “ph” (like phalanx) and a k sound with “ch” (like chaos). French words often contain silent letters (like beaux).

Much of the time at the bee, the answer to the contestant’s query will be “Latin” or some form of it.

1 Latin 1,445 71%
2 Middle English 1,288 78
3 French 1,279 72
4 New Latin 811 68
5 Greek 480 73
6 Italian 415 73
7 German 390 70
8 Late Latin 317 74
9 International Scientific Vocabulary 233 72
10 Middle French 204 78
11 Spanish 186 74
12 Medieval Latin 135 71
13 Japanese 111 78
14 Russian 108 65
15 [Origin Unknown] 85 75
16 Arabic 66 83
17 Afrikaans 50 74
18 Yiddish 49 76
19 Hindi 48 81
20 Sanskrit 45 62
21 American Spanish 44 61
22 Dutch 38 74
23 Portuguese 33 70
24 Hawaiian 32 84
25 Mexican Spanish 31 71
26 Hindi & Urdu 27 82
27 Persian 26 58
28 Late Greek 20 90
29 Hebrew 20 45
Origin of words in the Scripps National Spelling Bee, 1996-2014

The data includes words from the bees’ second rounds and beyond.

Source: Christopher Long

The table above includes words that were used in the second round and beyond1 in 19 bees over the past couple of decades. It shows all origin languages that appear at least 20 times in the data, along with the percentage of the time words coming from each language were spelled correctly. The data comes from Christopher Long, a consulting data scientist with the Detroit Tigers.2 About 54 percent of the bee’s words came from Latin, Middle English, French or New Latin. (It’s worth noting that many words that come to us from Latin were filtered through Latin from Greek, or through French from Latin, and so on. English is complicated.)

Of the languages with at least 20 appearances, Hebrew (e.g., gabbai, genizah, Gideon) was the most difficult — spellers succeeded only 45 percent of the time. Late Greek was easiest, with spellers succeeding 90 percent of the time. Surprisingly to me, Hawaiian (haupia, heiau, hukilau) was second easiest, with solvers succeeding at a clip better than 84 percent. Arabic words (mancala, mazar, mihrab) and words of Hindi or Hindi and Urdu origin (mahout, masala, nainsook) also cracked 80 percent.

My colleague Reuben Fischer-Baum, writing for Deadspin, once concluded that winning words were becoming more obscure. In 1940, the winning word was “therapy” (Greek). Last year, the winning words were “scherenschnitte” (German) and “nunatak” (Inuit). And occasionally, though rarely, the bee’s words are sourced from some of the more obscure of the 7,000-ish languages spoken on Earth: “kathakali” (a form of dance drama, from Malayalam), “uayeb” (part of the Mayan calendar), “takin” (a large bovid, from Mishmi) and “compas” (a type of music, from Haitian Creole) have all made appearances. In 85 cases — including the words bagwyn, larrigan, pandowdy and tatterdemalion — the origin was unknown. Even the origin of “bee” itself is murky.

So grab that dusty unabridged from the bottom shelf (lift with your legs) and settle in for the etymological event of the year. Not excited yet?

What about now? (“Now,” of course, comes from the Greek “nyn” to the Latin “nunc,” akin to the Old High German “nū,” which made its way to the Old English “nū” and then to Middle English and then to us, here, today, now.)


  1. The list was restricted to eliminate the easier words from the first round of earlier bees.

  2. Long wrote Ruby scripts to scrape words and results from the bee’s website and the words’ origins from Merriam-Webster. Other word lists and code are available on his GitHub page.

