The Datasets We're Looking At This Week
You’re reading Data Is Plural, a weekly newsletter of useful/curious datasets. Below you’ll find the Aug. 31, 2022, edition, reprinted with permission at FiveThirtyEight.
Billion-dollar disasters, business formations, attempted repairs, Luftwaffe locations and a fern tree of life.
Billion-dollar disasters. As “the nation’s scorekeeper in terms of addressing severe weather and climate events in their historical perspective,” the U.S. National Centers for Environmental Information maintains an inventory of the most costly such disasters in the U.S. — those that have caused at least $1 billion in estimated direct losses. The quarterly updated dataset contains more than 330 severe storms, floods, droughts, wildfires, freezes and other extreme events since 1980. You can download, filter and sort the list (by disaster type, start/end dates, inflation-adjusted cost and total deaths), as well as map, chart and summarize it. [h/t Gary Price]
Business formations. To compile its Business Formation Statistics, the U.S. Census Bureau analyzes several sources, including every application to the IRS for an Employer Identification Number (EIN) and the first payroll tax filings of those applicants. This allows the bureau to provide monthly counts of business applications and formations by business type, industrial sector and state. They also publish weekly datasets of application counts by state and an annual dataset that drills down to individual counties; both, however, lack business formation counts and other details found only in the monthly files. [h/t John C. Haltiwanger]
Attempted repairs. The Open Repair Alliance, “an international group of organisations committed to working towards a world where electrical and electronic products are more durable and easier to repair,” is developing an open standard for sharing data about those repairs. So far, they’ve gathered 62,000-plus records from five partners. Each entry represents a repair session: its date and country, the product’s brand and category, the repair status, a description of the problem, barriers to repair and more.
Luftwaffe locations. Data scientist Sam Weiss has constructed a dataset tracking the World War II movements of the Luftwaffe, Nazi Germany’s air force. The information, scraped and geocoded from the Luftwaffe history website ww2.dk, includes monthly locations and aggregate statistics (total size, additions, losses) by aircraft type and unit. Read more: A blog post and Twitter thread from Weiss.
Fern phylogenetics. Joel H. Nitta et al.’s Fern Tree of Life uses “a mostly automated, reproducible, open pipeline” to convert fern DNA sequences from the National Institutes of Health’s GenBank into an interactive, browsable and downloadable evolutionary tree. It currently covers 5,500-plus species, from Abacopteris aspera to Zealandia vieillardii. [h/t Santiago Ramírez Barahona]
Dataset suggestions? Criticism? Praise? Send feedback to email@example.com. Looking for past datasets? This spreadsheet contains them all. Visit data-is-plural.com to subscribe and to browse past editions.