Skip to main content
ABC News
The Datasets We’re Looking At This Week

You’re reading Data Is Plural, a weekly newsletter of useful/curious datasets. Below you’ll find the June 8, 2022, edition, reprinted with permission at FiveThirtyEight.

2022.06.08 edition

House primaries, where college grads go, Hong Kong political prisoners, mercenaries and Roman amphitheaters.

Six decades of House primaries. In 2014, Stephen Pettigrew, Karen Owen and Emily Wanless published a dataset of all Democratic and Republican primary election results for the U.S. House of Representatives between 1956 and 2010. It indicates each election’s year, state, redistricting status, primary system (open, closed, semi-open, multiparty) and more. The dataset also lists each candidate’s name, gender, prior office and votes received. In 2020, Michael G. Miller and Nicki Camberg published a follow-up dataset, adding coverage for 2012 through 2018. It uses the same variable names and structure as the earlier dataset, so that the two files can be easily combined.

Where college grads go. Johnathan G. Conzelmann et al. have created a dataset that estimates the geographic distribution of recent graduates from 2,600 U.S. colleges and universities, calculated from information on the schools’ official LinkedIn landing pages. For each institution, the dataset indicates the proportions of alumni in each of the 278 specific U.S. locations in LinkedIn’s geographic lexicon and cross-references them with government-defined metropolitan and micropolitan statistical areas. Read more: An introductory Twitter thread. [h/t Sharon Machlis]

Hong Kong political prisoners. The Hong Kong Democracy Council, a U.S.-based advocacy group, last month published the first version of its Hong Kong Political Prisoners Database, which contains information about 1,000-plus protesters, opposition leaders and national security law defendants incarcerated since the city’s pro-democracy mass protests in mid-2019. It lists each defendant’s age, arrest date, arrest location, conviction date, convicted offenses, sentencing date, sentence length and other details. An accompanying report describes the database’s context and methodology. [h/t Samuel Bickett]

Mercenaries. Ulrich Petersohn et al.’s Commercial Military Actor Database examines “the market for force” in 72 countries from 1980 to 2016. It contains information, primarily sourced from news reports, on thousands of contractual relationships between providers (mercenaries and private military/security companies) and their clients (governments, opposition groups, NGOs and transnational corporations). The contracted work ranges “from combat services and support services (e.g., communication, maintenance), to logistics, security, consultancy, training and reconstruction.

Roman amphitheaters. Sebastian Heath, a professor of computational humanities and Roman archaeology, has constructed a dataset of 260-plus amphitheaters in the Roman Empire. It provides the structures’ known names, coordinates, orientations and capacities, among other characteristics, and links the entries to external data sources.

Dataset suggestions? Criticism? Praise? Send feedback to Looking for past datasets? This spreadsheet contains them all. Visit to subscribe and to browse past editions.

Jeremy Singer-Vine is a data editor, reporter and computer programmer based in New York City.


Related Interactives