These days it seems like you can’t even scroll through goat videos on Twitter without hitting a COVID-19 chart — and people trying to use the chart to understand whether the epidemic has “peaked.”
Some good news: This graph shows diagnosed cases statewide over the last couple of weeks. We think our actions have started to bend the curve in our hardest hit counties.
— Governor Jay Inslee (@GovInslee) March 26, 2020
It’s tempting to look at charts like these and try to find reason for optimism. That curve sure seems like it’s bending! Or, that new case number seems lower than yesterday’s!
But it’s not that easy. Each chart you see reflects a host of decisions — which data to chart, which source to use, how to compare countries or states, how to display the data — that can drastically change what you see and what you can safely take away from the chart. To truly understand whether a place has reached the peak of its infection curve, you need to be a savvy reader of the charts and the underlying data. And, ideally, you need to look at more than one chart.
So what should you keep in mind when you’re trying to interpret all these data visualizations?
What is the chart showing?
Before you can know whether the chart is showing you a peak, you need to know what data it’s showing you in the first place. Are you looking at total confirmed cases? Total hospitalizations? Total deaths? Or is the chart showing you day-by-day counts? Each of these numbers provide information — but each is incomplete. And they’re not interchangeable.
It’s really hard to tell whether someplace has hit a peak using a chart of confirmed cases. Obviously, we’d like to see the number of people who have tested positive for the coronavirus decline. But anytime you see something tracking the number of cases, remember how hugely dependent it is on the number of tests being done. Anyplace that doesn’t test for the coronavirus is not going to find cases. That’s especially true in the U.S., where the number of tests done is reportedly minuscule relative to how many people report symptoms.
“When we see a plateau, it may just be because we’ve maxed out our testing capacity rather than a true flattening of incident cases,” said Tara C. Smith, a professor of epidemiology at Kent State University. In other words, a flatter curve might just mean that the number of tests is failing to keep up with the number of infections. And as some states look to resume certain economic activities, they may have an incentive to keep testing levels low so the number of cases doesn’t appear to increase.
So let’s change our understanding of these kinds of charts off the bat: These aren’t visualizations of the number of cases, they’re visualizations of the number of confirmed cases, a number that, in most countries, drastically underestimates the true number of people who are sick.
Even if testing was sufficient, the tests themselves still need to be accurate. There are reports that some tests might have shockingly high rates of false negatives — that is, people who have the disease but get a negative test result at least some of the time.1
Even if all cases were somehow measured, there would still be a lag between when someone is infected and when they get tested. It takes time to develop symptoms (the incubation period), have those symptoms get bad enough to send someone to the doctor for a test, and for the test to come back. While the lag likely differs by person, the numbers on a chart of confirmed cases are (even in the best case) quite out of date.
This lag is also one reason why charts showing the daily number of new cases can be somewhat misleading. What looks like a change happening that day might actually be reflecting a change that started a week or two ago. And lots of funky things can happen as data is collected and reported that can artificially depress or inflate a day’s tally.
All of that could be dealt with — if all those factors stayed constant. That is, if people were generally getting tested at the same point in the course of their illness … and how health care professionals administered the tests wasn’t changing … and the tests themselves weren’t changing … and the delay in reporting results wasn’t changing … then the trends in confirmed cases would still be informative about the virus’s spread, even though the number of confirmed cases wouldn’t directly reflect the number of infections. We’re just looking for a change in the trend line on the chart, after all.
But the problem is that we don’t know if any of that is staying constant. So it’s very difficult to glean a useful signal from a metric as noisy as confirmed case counts.
So counting cases is fraught. What about just counting deaths? Surely that’s less questionable, since every place already tracks how many people die there, so those numbers should be more reliable. This might be true — but it’s hard to be sure.
To track coronavirus deaths, you still need to confirm that a person who died had COVID-19 — and it’s unclear that that’s being done. Some people aren’t being tested before or after they die, and COVID-19 may not appear on someone’s death certificate even if it seemed likely they had the disease. For people who don’t die at a hospital, establishing the cause of death might be even more challenging, especially if coronavirus tests are scarce and being reserved for the living. And some countries have changed how they count deaths outside of hospitals as this pandemic has ground on, making it harder to interpret the trend over time.
Hospitals and governments also have an incentive to underreport COVID-19 deaths, since fatalities can make those institutions look bad. For instance, the CIA doesn’t believe the Chinese government’s official infection and death tallies.
Are hospitalizations the most accurate type of data? New York City, for example, has started releasing the number of people who have been hospitalized. Assuming that this data is collected correctly, we might think that this is the chart that would provide us the most reliable trend. If it peaked, then that would seem to be a good indicator that the infection has as well, even if most people with COVID-19 are never hospitalized.
But what happens if the criteria for hospitalization change as the outbreak progresses? If hospitals increase the threshold for admission as they run out of beds — if, for example, any shortness of breath was enough to get someone admitted in the early days of the pandemic, but health care workers start turning away those patients as resources get scarcer — then a chart that shows a plateau in hospitalizations could be a sign of a system approaching capacity rather than (or in addition to!) a change in the number of infections. (The same is true, for instance, if people are increasingly told to avoid hospitals unless absolutely necessary.)
How is the data being displayed?
The kind of chart you’re looking at also matters. Peaks are even harder to see if charts use a logarithmic scale, which many COVID-19 visualizations do. Using a log scale means that the chart’s vertical axis doesn’t increase the way you might expect, with the distance between 0 and 1,000 being the same size as the distance between 10,000 and 11,000. Instead, the number multiplies for each equally sized space on the y-axis — that is, if one space on the y-axis goes from 100 to 1,000, the next space of the same size will go from 1,000 to 10,000, and the one after that will go from 10,000 and 100,000, but the size of the gap between each set of those numbers stays the same even as the numbers jump up by bigger and bigger margins. This kind of trick has several advantages, including that it lets you compare a location’s curve when it has a relatively low number of cases and when it has a high number of cases. (If you have a linear scale, in contrast, showing the top end of the scale necessarily means that small values will be very hard to see.)
On the other hand, using a log scale also means that once you get to higher values, small-looking fluctuations in case counts can reflect pretty big differences in raw numbers. This makes it even harder to visually draw inferences about peaks from charts.
Of course, all of these concerns are even more serious when the chart shows data from more than one place, regardless of whether what’s being compared are different countries or different states. These kinds of charts have all the same issues as the ones tracking numbers in a single place, but that set of problems is multiplied by the number of places on the chart.
What happens next?
Let’s say that, according to one of the metrics above, the curve does seem to be bending downward. What does that mean? Well, first, you might want to look at the other metrics too. If all three measures — confirmed cases, hospitalizations and deaths — are going down, and the total number of tests administered hasn’t dropped, that makes it more likely that there really is good news. And a better metric — though still one with issues — is the share of positive tests out of all tests administered, which tries to account for testing in some way (though it doesn’t address whether who is being tested is changing). (Though if the number of cases is still growing, even if it’s at a lower rate, that would continue to increase the strain on an already burdened health care system.)
And as Joshua Epstein, the director of the New York University Agent-Based Modeling Lab, said in an online seminar: With any infectious disease, “There’s a lot of transmission after the peak.” Reaching the peak on a chart doesn’t mean you can go outside again — it means that the outbreak is at its worst. Even if it is getting better, you might still need to be extremely careful for a while.
And even if you are fully convinced that things are truly improving — new cases are going down, hospitals are nowhere near overcrowding, and the sick are getting better — what do you do when restrictions ease? Germany, which tested very high rates of people early on in the outbreak and had relatively low case numbers and deaths, has come up with a plan for gradual reopening. But if the peak has been successfully averted and relatively few people have had the disease, that might mean that a second, potentially larger peak is in the offing. Even after the infection peaks, the danger may be on the rise.
CORRECTION (April 23, 2020, 5:41 p.m.): An earlier version of this story misstated how the y-axis on a logarithmic chart would look. Instead of saying that the distance between 0 and 1,000 would be the same as the distance between 1,000 and 10,000, it should have said that the distance between 100 and 1,000 would be the same as the distance between 1,000 and 10,000.