Skip to main content
ABC News
The Datasets We’re Looking At This Week

You’re reading Data Is Plural, a weekly newsletter of useful/curious datasets. Below you’ll find the July 20, 2022, edition, reprinted with permission at FiveThirtyEight.

2022.07.20 edition

New voting laws, notable people, budget apportionments, digital trade provisions and the World Cup.

New voting laws. The Voting Rights Lab has been tracking 2,000-plus laws proposed in U.S. state legislatures since 2021. The tracker focuses on “12 major issue areas relating to voter access and representation,” such as early voting, same-day registration and ID requirements. It lists each bill’s state, number, author, introduction date, current status and issue areas, plus a summary and the lab’s “assessment of whether the legislation is likely to improve or interfere with voter access or the administration of elections.” As seen in:Has Your State Made It Harder To Vote?” (FiveThirtyEight). Related: States Newsroom’s Kira Lerner has compiled a spreadsheet of 120 new election-related criminal penalties, based partly on the tracker’s data.

Notable people. “A new strand of literature aims at building the most comprehensive and accurate database of notable individuals,” observe Morgane Laouenan et al., who contribute a “cross-verified database of 2.29 million individuals” mined from Wikidata and the English, French, German, Italian, Spanish, Portuguese and Swedish editions of Wikipedia. For each person, the dataset provides their birth and death dates, gender, citizenship, occupations and other details. Previously: The MIT-based Pantheon dataset (DIP 2016.02.03), also based on Wikipedia and since updated. [h/t Philip Jung]

Budget apportionments. Congress, through a process called appropriations, chooses how much money goes to each U.S. federal agency and program. But the Office of Management and Budget, through a process called apportionment, ultimately sets the rules for spending those funds, “typically limit[ing] the obligations [an agency] may incur for specified time periods, programs, activities, projects, objects or any combination thereof.” Those binding decisions have generally not been available to the publicuntil last week, when OMB launched a database of apportionments for FY 2022, per a requirement in Congress’s 2022 spending bill. [h/t Caitlin Emma]

Digital trade provisions. Mira Burri et al.’s TAPED dataset, which “seeks to comprehensively trace developments in the area of digital trade governance,” categorizes 100-plus relevant aspects of 300-plus preferential trade agreements signed since 2000. The dataset indicates, for instance, that the Peru-Australia Free Trade Agreement contains binding agreements on personal data protection, nonbinding language on cybersecurity and no provisions regarding net neutrality.

The World Cup. Joshua Fjelstul’s World Cup Database, published this month, provides “extensively cleaned and cross-validated” information about each of the 21 FIFA World Cup tournaments played so far. Its 27 tables contain “approximately 1.1 million data points” regarding the teams that participated, their players and managers, the referees, match outcomes, goals, penalties and more.

Dataset suggestions? Criticism? Praise? Send feedback to Looking for past datasets? This spreadsheet contains them all. Visit to subscribe and to browse past editions.

Jeremy Singer-Vine is a data editor, reporter and computer programmer based in New York City.


Related Interactives