Skip to main content
ABC News
The Datasets We’re Looking At This Week

You’re reading Data Is Plural, a weekly newsletter of useful/curious datasets. Below you’ll find the Nov. 16, 2022, edition, reprinted with permission at FiveThirtyEight.

2022.11.16 edition

Big emitters, disease outbreaks, permissively licensed code, impact craters and tinned fish.

Big emitters. Climate Trace, a nonprofit coalition launched in 2020, uses satellite imagery, sector-specific datasets and other sources to estimate greenhouse gas emissions in detail. Their most recent inventory, released last week, highlights 70,000-plus individual sites that “represent the top known sources of emissions in the power sector, oil and gas production and refining, shipping, aviation, mining, waste, agriculture, road transportation, and the production of steel, cement and aluminum.” You can download the data, explore sector- and country-level estimates and browse a map of the sites. Read more: Coverage in The New York Times. [h/t Ian Johnson]

Disease outbreaks. Juan Armando Torres Munguía et al. have built a dataset of infectious disease outbreaks based on information extracted from the World Health Organization’s Disease Outbreak News alerts (DIP 2022.03.30) and its coronavirus dashboard. The authors have clustered the outbreaks by disease (classified by ICD-10 and ICD-11 codes), country and year. Excluding the COVID-19 pandemic, this leads to 1,500-plus total combinations between January 1996 and March 2022, spanning 60-plus diseases and 200-plus countries/territories. [h/t Konstantin M. Wacker]

Permissively licensed code. The Stack, a new dataset from the BigCode project, “contains over 3TB of permissively licensed source code files covering 30 programming languages crawled from GitHub.” Those terabytes hold more than 300 million files extracted from repositories whose licenses place “minimal restrictions on how the software can be copied, modified and redistributed.” The dataset provides the contents of each file along with its repository name, path, size, programming language, detected licenses and several high-level metrics. Read more: An introductory Twitter thread and preprint paper. [h/t Karsten Johansson]

Impact craters. The Earth Impact Database, maintained by the University of New Brunswick’s Planetary and Space Science Centre, catalogs nearly 200 impact craters caused by meteorites that have crashed into the planet. It presents the name, location, diameter, estimated age, geology and other features of the craters, as well as photographs and bibliographies. Related: Cody Winchester has scraped the crater characteristics into CSV and GeoJSON files.

Tinned fish. Rainbow Tomatoes Garden is a farm in East Greenville, Pennsylvania, that also happens to run an online store selling “the largest selection of tinned seafood in the world.” Curator-owner Dan Waber publishes a spreadsheet of the store’s 630-plus offerings, listing each product’s name, type of seafood, brand, country of origin, tin size and price; whether it’s organic, certified kosher, smoked, boneless and/or skinless; and more. [h/t George Ho]

Jeremy Singer-Vine is a data editor, reporter and computer programmer based in New York City.


Related Interactives