The Datasets We're Looking At This Week
You’re reading Data Is Plural, a weekly newsletter of useful/curious datasets. Below you’ll find the Oct. 5, 2022, edition, reprinted with permission at FiveThirtyEight.
Grid emissions, chain and indie restaurants, wildfire smoke pollution, federal audits and a decade of tasks.
Grid emissions. Ember, an “energy think tank that uses data-driven insights to shift the world from coal to clean electricity,” has begun compiling annual and monthly statistics on electricity demand, generation and estimated greenhouse gas emissions by country, standardized from national and international sources. The annual estimates span two decades and 200-plus countries and territories; the monthly dataset provides somewhat less coverage. Both can also be explored online. Related: Singularity’s Open Grid Emissions initiative estimates the hourly grid emissions of balancing authorities and power plants in the U.S., currently for 2019 and 2020. Previously: Other energy-related datasets. [h/t Philippe Quirion]
Chain and indie restaurants. Xiaofan Liang and Clio Andris of Georgia Tech’s Friendly Cities Lab have published a map and dataset examining the “chainness” of 700,000-plus U.S. restaurants. Starting with records provided by a marketing-data company, the researchers standardized the restaurants’ names, counted their frequencies and classified them as chains (those with more than five outlets) or not. The dataset also lists each restaurant’s cuisine and location. As seen in: Andrew Van Dam’s exploration of the data for his new-ish Washington Post column, Department of Data.
Wildfire smoke pollution. Marissa L. Childs et al. have developed a “machine learning model of daily wildfire-driven PM2.5 concentrations using a combination of ground, satellite, and reanalysis data sources that are easy to update.” (PM2.5 refers to particulate matter 2.5 micrometers in diameter or smaller.) The researchers then used that model to generate daily smoke PM2.5 estimates for each county, Census tract and 10-kilometer-grid tile in the contiguous U.S., for 2006-2020. Read more: Coverage and maps in the New York Times. [h/t George LeVines]
Federal audits. Nonprofits, state/local governments and other noncommercial entities expending $750,000-plus of federal funds in a year are required to undergo a standardized audit of their financials and compliance. The U.S. Federal Audit Clearinghouse maintains a public database of those audits; it offers bulk downloads of the report data (about the auditee, auditor, findings and more), as well a tool to search and access individual reports. [h/t Big Local News]
A decade of tasks. Between April 2009 and February 2019, software engineer Renzo Borgatti set 17,000-plus daily tasks for himself. He completed slightly less than half of them. He labeled them with tags such as “@meeting”, “@talk” and “@clojure.” He estimated how many “pomodoros” each would take, and how many they really did. We know this because Borgatti allowed Derek M. Jones to publish a partially redacted dataset of his tracked tasks. Previously: One software company’s task estimates (DIP 2019.04.24), also published by Jones.
Dataset suggestions? Criticism? Praise? Send feedback to firstname.lastname@example.org. Looking for past datasets? This spreadsheet contains them all. Visit data-is-plural.com to subscribe and to browse past editions.