The Datasets We’re Looking At This Week
You’re reading Data Is Plural, a weekly newsletter of useful/curious datasets. Below you’ll find the May 25, 2022, edition, reprinted with permission at FiveThirtyEight.
Supercomputers, infrastructure permitting, European election results, Moreno and Jennings’s sociograms and Art Garfunkel’s library.
Supercomputers. Since 1993, a team of researchers has regularly assessed the most powerful computers in the world. The resulting TOP500 lists are published twice a year, in June and November, using a performance benchmark developed by team member Jack Dongarra, who became a Turing Award laureate this year. Downloadable versions indicate each supercomputer’s name, rank, location, manufacturer, year built, power consumption, technical specifications and more. As seen in: “The race to build the fastest supercomputer,” by Datawrapper’s Edurne Morillo, who recommends visiting Barcelona’s MareNostrum, which ranked 74th on the latest list and is housed in a former chapel.
Infrastructure permitting. The U.S. government’s Federal Infrastructure Permitting Dashboard tracks the “environmental review and authorization processes for large or complex infrastructure projects,” particularly those funded by the Department of Transportation and those participating in a voluntary review-coordination effort known as FAST-41. The dashboard’s full dataset describes 12,000-plus milestones relating to nearly 1,000 projects, roughly half of which have been completed. Online, you can search across projects and browse their characteristics and timetables.
European election results. Dominik Schraff et al. have built EU-NED, a dataset that harmonizes European election results at a subnational level, providing party vote totals for 31 countries’ NUTS 2 and NUTS 3 geographic units. The dataset covers 1990 to 2020 and uses party identifiers from PartyFacts (DIP 2019.01.16), making it easier to link the records to other projects. [h/t Christian Breuer]
Moreno and Jennings’s sociograms. In the 1930s, Jacob Moreno and Helen Hall Jennings created a series of “sociograms” representing the seating preferences of grade-school classmates. These graphics “are frequently considered as the first examples of social network analysis and visualization,” according to historian and network analysis practitioner Martin Grandjean, who has translated them into simple data files. [h/t Christian Miles + Jer Thorp]
Art Garfunkel’s library. The legendary folk singer’s official website includes a catalog of “every book Art has read since 1968.” It lists each book’s title, author, year published, month/year read, page count and whether it was one of the musician’s favorites. Recently, AI engineer Corey Christensen converted the HTML pages into a downloadable dataset.
Dataset suggestions? Criticism? Praise? Send feedback to email@example.com. Looking for past datasets? This spreadsheet contains them all. Visit data-is-plural.com to subscribe and to browse past editions.