How We Analyzed 7,914 COVID-19 Testing Sites And Found Racial Disparities

For our story on the distribution of COVID-19 testing sites, we wanted to develop a measure of how busy coronavirus testing sites were likely to be. That’s not as simple as counting the number of sites, since there tend to be more sites in places with more people. Somehow we had to account for that population density and come up with a way of measuring whether some sites were likely to face greater demand than others.

Our solution was a metric that’s a little complicated, but which — we think — does a decent job accounting for real-life behavior. It works like this:

We analyzed block groups — the smallest unit for which we could get national demographic data from the 2014-18 American Community Survey five-year estimates.
For each block group, we calculated the distance between the central point of the block group and each of the 7,914 testing sites that were open as of June 18 and included geographic coordinates in the data set provided by health navigation platform Castlight. This data set included sites conducting diagnostic tests, not those conducting only antibody or antigen testing.¹ We assumed that people were likely to try and go to sites that were nearby, so we focused on the 10 closest sites for every block group. The dataset included some information about the sites themselves, such as cost, health insurance information or criteria for getting tested, but this wasn’t systematically available, so we didn’t include it in our calculations. This likely means some sites are less accessible to the public than others, which would matter for whether people would be able to go there.
We took the population of the block group, as of the 2018 ACS, and assigned each person in that population to one of the sites. Importantly, because we assumed that people would be more likely to go to sites that were closer, we divided the population proportionally, based on how close each site was, instead of evenly. So if one site was a lot closer to the centroid than the other nine, that close site would be “allocated” the bulk of the block group’s population. This calculation gave us each site’s potential patient demand.
But we also wanted to get a sense for how the block groups themselves compared to one another. In order to do that, we averaged the potential patient demand for the 10 nearest sites for each block group. But like before, we wanted to account for distance, so we weighted that average by the proximity of the test site to the block group centroid. In other words, if the 10 closest sites to a block group were all equally far away, the potential community need in the block group is a straight average of those 10 sites’ potential patient demand. The closer a site is to the block group, the larger its weight in the average. We now had a potential community need score for every block group.
To see whether there were disparities in potential need, we divided block groups into four groups, using census definitions for categorizing race and ethnicity:
- If over 50 percent of the population was non-Hispanic Black, per the census we classified the block group as “majority Black.”
- If over 50 percent was Hispanic, we classified it as “majority Hispanic.”
- If over 50 percent was non-Hispanic white, we classified it as “majority white.”
- If no racial group made up over 50 percent of the population, or if another racial group did, we classified it as “other.”
At this point, we grouped the block groups into what the census calls “urbanized areas,” which captures the core of the city as well as the surrounding densely settled territory.² We focused on urbanized areas which contained over 1 million people. Block groups and urbanized areas don’t share the exact same borders,³ so we included block groups if more than half of their area fell within the urban area. Within each urbanized area, we then grouped the block groups by racial category, and computed the average potential community need, weighted by population. This gives us a measure of potential need for predominantly Black, Hispanic and white areas in each city.
To make those differences comparable across cities, we used the percentage difference — that is, how busy predominantly Black and Hispanic areas were relative to predominantly white areas. This is the metric shown in the chart.

Footnotes

The data set provided by Castlight contained some sites that shared geographic coordinates; Castlight indicated that many of these might be duplicates. Running our analysis both with and without the duplicates yielded extremely similar results, and we are presenting the analysis run on the de-duplicated dataset.
We also experimented with using metropolitan statistical areas, which encompass all the counties surrounding a city, and census places, which are a narrower definition of the core city itself, but landed on urbanized areas as most precisely capturing the definition of the city that we intended.
Urbanized areas are defined at the census block level, which are smaller than census block groups. The Census Bureau only releases demographic information at the census block level in the decennial census, so there isn’t updated information.

FiveThirtyEight

How We Analyzed 7,914 COVID-19 Testing Sites And Found Racial Disparities

Footnotes

Comments