The Datasets We’re Looking At This Week
You’re reading Data Is Plural, a weekly newsletter of useful/curious datasets. Below you’ll find the July 6, 2022, edition, reprinted with permission at FiveThirtyEight.
Banned and challenged books, mass expulsions, European air traffic, Shakespeare and “Saturday Night Live.”
Banned and challenged books. A recent report from PEN America identified 1,500-plus decisions, made between July 2021 and March 2022, to ban books from classrooms and school libraries. A spreadsheet accompanying the report lists each decision’s date, type, state and school district, as well as each banned book’s title, authors, illustrators and translators. Related: Independent researcher Tasslyn Magnusson, in partnership with EveryLibrary, maintains a spreadsheet of both book bans and book challenges, with 3,000-plus entries since the 2021–22 school year. [h/t Gary Price]
Mass expulsions. Political scientist Meghan M. Garrity’s Government-Sponsored Mass Expulsion dataset focuses on “policies in which governments systematically remove ethnic, racial, religious or national groups, en masse.” Using a combination of archival research and secondary sources, Garrity documents 139 such events, estimated to have expelled more than 30 million people between 1900 and 2020. For each expulsion, the dataset provides “information on the expelling country, onset, duration, region, scale, category of persons expelled and frequency.” To download it, visit the Journal of Peace Research’s replication data portal and search for “mass expulsion.”
European air traffic. Eurocontrol, the main organization coordinating Europe’s air traffic management, publishes an “aviation intelligence portal” with a range of industry metrics, including traffic reports that count the daily number of flights by country, airport and operator. The portal also offers bulk datasets on topics such as airport traffic, flight efficiency, estimated CO2 emissions and more. [h/t Giuseppe Sollazzo]
Shakespeare. The Folger Shakespeare “brings you the complete works of the world’s greatest playwright, edited for modern readers.” Its digital editions of the Bard’s plays and poems are available to read online and to download in various file formats. It also provides an API, with endpoints for synopses, roles, monologues, word frequencies and more. [h/t Cameron Armstrong]
“Saturday Night Live.” Joel Navaroli’s snlarchives.net aims to catalog and cross-reference every episode, cast member, host, character, sketch, impression and other aspects of the show’s 47-and-counting seasons. An open-source project by Hendrik Hilleckes and Colin Morris scrapes much of that information into structured data files. As seen in: Morris’s 2017 analysis of gender representation in “SNL” sketches.
Dataset suggestions? Criticism? Praise? Send feedback to email@example.com. Looking for past datasets? This spreadsheet contains them all. Visit data-is-plural.com to subscribe and to browse past editions.