The Datasets We're Looking At This Week
You’re reading Data Is Plural, a weekly newsletter of useful/curious datasets. Below you’ll find the Sept. 28, 2022, edition, reprinted with permission at FiveThirtyEight.
FDA inspections, academic citations, Old and New Testament locations, university endowments and tech products promoted.
FDA inspections. The U.S. Food and Drug Administration’s inspections dashboard lists 264,000-plus assessments of facilities (primarily those manufacturing food, drugs and other FDA-regulated products) and 227,000-plus problems the inspectors found. The fields include the facility owner, location, product type, inspection completion date and outcomes. The records, which go back to fiscal year 2009, can be bulk-downloaded from the dashboard and queried via an API. They come with certain caveats; they exclude, for instance, “inspections waiting for a final enforcement action” and those conducted by state (rather than federal) inspectors. Related: More compliance-related data dashboards from the FDA.
Academic citations. Since 2017, the Initiative for Open Citations has urged academic publishers to share their papers’ reference lists as open data. Last month, the group announced it had hit a major milestone: Of the 61 million papers that have references and are indexed by DOI-registrar Crossref, “100 percent […] have made their citations openly available.” You can access the data through Crossref’s API and in bulk through OpenCitations. Read more: “Citation data are now open, but that’s far from enough” (Nature). Previously: Wikipedia citations (DIP 2018.05.23), biomedical citations (DIP 2019.10.23) and legal citations (DIP 2020.07.15). [h/t Data Science Community Newsletter]
Old and New Testament locations. OpenBible.info’s Bible Geocoding project “(1) comprehensively identifies the possible modern locations of every place mentioned in the Bible as precisely as possible, (2) expresses a data-backed confidence level in each identification and (3) links to open data to fit into a broader data ecosystem.” You can browse by book, chapter and location, as well as download the full dataset. Read more: The project’s author explains the backstory and methodology. [h/t Avi Levin]
University endowments. Earlier this month, the National Association of College and University Business Officers released the latest of its annual studies of college and university endowments in the U.S. and Canada. For 700-plus institutions, the study’s public tables indicate their total enrollment, endowment market value, previous year’s value and more. A page of historical datasets includes a spreadsheet listing many endowments’ sizes going back to the mid-1970s. [h/t Factle]
Tech products promoted. For “The Gamer and the Nihilist,” an essay in Components, Andrew Thompson and collaborators created a dataset of nearly 77,000 tech products on Product Hunt, a popular social network for launching and promoting such things. The dataset includes the name, description, launch date, upvote count and other details for every product from 2014 to 2021 in the platform’s sitemap. (“Based on experience, not every product that appears on Product Hunt seems to appear on the sitemap,” the authors caution.)
Dataset suggestions? Criticism? Praise? Send feedback to email@example.com. Looking for past datasets? This spreadsheet contains them all. Visit data-is-plural.com to subscribe and to browse past editions.