In February 2020, just a month before the COVID-19 pandemic caused widespread lockdowns, a health policy researcher presenting at one of the world’s largest science conferences told thousands of attendees that the U.S. was more prepared to deal with global health threats than any country in the world.
This researcher cited the Global Health Security Index, an effort to measure the capacity of 195 countries to prepare for future epidemics and pandemics. In the 2019 index, the U.S. ranked higher than any other country, with a score of 83.5 out of 100. The ranking includes a sub-score of 98.2 out of 100 for “detection and reporting,” which highlights “real-time surveillance” and “epidemiology workforce.”
I attended that conference session, and remember feeling comforted by the country’s investments in scientific infrastructure compared to other countries. But two years on, it’s clear the Global Health Security Index had it wrong — the U.S.’s data systems weren’t standardized, its genomic surveillance was a mess and its inequitable healthcare system led to incomplete datasets.
That has left a mess for those of us tracking this novel virus. For two years we’ve tried to make sense of COVID-19 trends with metrics that were fundamentally impaired by our chronically decentralized and underfunded public health system. Looking back, it’s remarkable how poorly we started, how far we’ve come and how far we still have to go. If the country doesn’t want to repeat its mistakes, it will have to take radically different actions the next time a health crisis hits.
January to March 2020
As news spread in early 2020 of a pneumonia-like disease ripping through China, many Americans started to wonder if their coughs and sore throats were actually symptoms of the novel virus. And even if they weren’t, exactly how many Americans had COVID-19, as opposed to a cold or the seasonal flu?
COVID-19 cases went underreported around the world as labs scrambled to produce tests for the virus. But the U.S. was uniquely challenged in this area, said Dr. Eric Topol, the prolific COVID-19 commentator who is director of the Scripps Research Translational Institute. The federal government refused to use a World Health Organization (WHO) test in favor of waiting for the Centers for Disease Control and Prevention (CDC) to develop its own. That CDC test had contamination issues, delaying its rollout to labs by several weeks. And furthermore, testing was restricted to people with a specific set of flu-like symptoms, demonstrated travel history to China or exposure to someone who had previously tested positive.
“We rejected the WHO test, we had a test that was contaminated, and the government shut down academic labs that could do the proper PCR testing,” Topol said. “It was a nightmare, a veritable nightmare. And the testing has never gotten right since.”
This limited testing led to a problem that statisticians call ascertainment bias. “When we say, ‘This is how many cases we have,’” said Dr. Ellie Murray, an epidemiologist at the Boston University School of Public Health, “that really means, ‘This is how many positive cases we have found from the people we tested. Access to testing is always a huge caveat to case numbers.”
Data from New York City, the early epicenter of COVID-19 in the U.S., shows the ramifications of under-testing. During the city’s intense first wave, case trends aligned closely with trends in hospitalizations and deaths because the majority of New Yorkers testing positive were doing so in hospitals after developing severe symptoms.
Due to this ascertainment bias, the number of New Yorkers who actually got infected in the first wave will always be unknown; one late 2020 study from Mount Sinai estimated this true infection number at 1.7 million, or about 20 percent of the NYC population.
March to September 2020
As the country shut down in March 2020, people inside their homes were glued to the news, watching freezer trucks filling with dead bodies in NYC and healthcare workers cobbling together their own PPE. People were desperate for information on how to protect themselves against the novel virus, yet America’s foremost public health organization — the CDC — had basically abdicated its responsibility to provide that information.
Instead of watching press conferences from the nation’s foremost public health experts, we watched as then-President Donald Trump promoted unproven “cures” and promised that the crisis would be over within weeks. This lack of reliable information extended to data, too: The CDC failed to provide frequent and comprehensive reports of COVID-19 cases, tests, or deaths — numbers that we needed to understand the pandemic’s impact in our communities.
Individual experts and research projects stepped in to fill that void, as the challenge of limited COVID-19 testing went from the niche concern of experts to a mainstream issue. Sites like the Johns Hopkins COVID-19 dashboard and The COVID Tracking Project (which I contributed to) gained millions of viewers.
Between March and May 2020, The COVID Tracking Project was the only source of COVID-19 testing data in the U.S., but even its herculean, grassroots effort had limitations. States reported tests using different units and different time scales, so a testing number from one state could not necessarily be compared to a number from another. That meant test positivity rates were unreliable — but people cited them anyway, tying them to policy decisions like school reopening.
Test availability increased through the spring, causing testing itself to rise from under 5,000 new tests conducted per day in early March to hundreds of thousands of tests a day by late April. But that testing was still biased: White Americans and those higher up on the socioeconomic ladder had more access. Some testing sites in South Texas faced demand of over 600,000 patients per site, while sites in much of New England were serving patient numbers in the 25,000 to 50,000 range, according to a summer 2020 FiveThirtyEight analysis. In New York City, officials learned from the first-wave testing troubles by greatly increasing test availability across the city.
After not knowing how many COVID-19 cases were going unreported in spring 2020, we were then asking how many people lacked the tests they needed. Even now, two years into the pandemic, this is still a challenge because the U.S. does not provide demographic data on testing, Murray said.
“We don’t collect reasons for seeking a test,” she said. “We don't collect age, race, ethnicity, occupation of people who seek a test.” If this data was available, those numbers could be used to determine which essential occupations are leading to more exposures, which groups need testing but can’t access it, and other important trends.
October to December 2020
During early pandemic surges, hospitals in crisis could call upon healthcare workers from other parts of the country for help. Doctors and nurses flocked first to New York in the spring, then some went to southern states like Arizona and Texas as COVID-19 surged there in the summer.
But by late fall 2020, the country’s biggest COVID-19 surge thus far was overwhelming hospitals everywhere at once, meaning healthcare workers were less available to fly to hard-hit areas and heightening the importance of keeping local hospitalization numbers low. In addition, researchers began to highlight hospitalizations as a more reliable alternative after months of dealing with spotty case and testing data.
Hospitalizations are “a happy medium … more timely than deaths, and more reliable than cases,” said Lauren Ancel Meyers, director of the University of Texas at Austin’s COVID-19 Modeling Consortium. While many cases can go unreported if the sick have mild symptoms or lack access to testing, “most people who had severe enough COVID that they needed acute care were going to the hospitals to get it,” Meyers said.
The main problem with this metric, at least in the U.S., was that reliable hospitalization data simply was, like testing data, not available for most of 2020. While other countries like the U.K. started publishing highly detailed hospitalization data in spring 2020 — taking advantage of its national healthcare system — the highly decentralized U.S. system floundered. In early April 2020, there was no national dataset and only 13 states were regularly reporting current COVID-19 hospitalizations, according to the COVID Tracking Project.
It took multiple surges for the Department of Health and Human Services (HHS) to create a clearinghouse for these crucial statistics. The HHS launched an all-new system in July 2020 to collect data directly from hospitals. It would take several more months before the resulting data was actually reliable, as healthcare workers learned to submit numbers through the new system and HHS analysts learned to identify errors.
Imagine how much more information would be available if every hospital had already been reporting a unified set of metrics to a centralized database when COVID-19 first hit.
Jeff Shaman, an infectious disease modeling expert at Columbia University’s public health school, said that addressing the lack of standards among electronic health records in the U.S. “may actually be harder” than setting up entirely new systems in developing countries, due to the many U.S. businesses with “vested interests” in keeping records private.
January to June 2021
When the U.S. started vaccinating people at the end of 2020, it was tempting to forget about all other COVID-19 metrics. Public health agencies rolled out vaccination dashboards in December and January, and my COVID Tracking Project colleagues and I rejoiced in finally having a good metric to watch after almost a year of tracking all the bad ones. (Really, we had a special relationship with pandemic data. In December 2020, I tweeted: “I just teared up looking at texas's vaccination dashboard... this is fine this is normal.”)
Many Americans focused on vaccinations over other metrics, assuming that, once enough people got their shots, the pandemic would simply end. President Biden was one of them, setting a goal in May 2021 to vaccinate 70 percent of American adults by the Fourth of July.
But there were a lot of issues with this goal, Murray said. For one thing, this “70 percent” referred to 70 percent of people eligible for vaccination, not the entire population. Children, of course, are vectors, too. And what’s more, adults only needed to receive one dose to be counted as vaccinated. (At the time, the vast majority of Americans were receiving the two-dose Pfizer or Moderna vaccines, and research has shown that a single dose of these vaccines offers limited protection compared to the full series.)
“Also, all of our math told us that the threshold for herd immunity was somewhere north of 90 percent,” Murray said. The 70 percent goal created a false sense of optimism as U.S. politicians pushed vaccination and let other safety measures, such as mask requirements and easier access to testing, fall to the wayside.
Moreover, America’s fractured public health system made it challenging to determine when different parts of the country had actually met this 70 percent goal. When analyzing data on how many people had been vaccinated, researchers and journalists divided the doses administered in a given municipality over the total population. But inconsistencies between population and vaccine data led to major errors in these calculations, particularly when people got vaccinated in regions where they were not formally counted as residents.
John Burn-Murdoch, chief data reporter at the Financial Times, told FiveThirtyEight that this issue was “the most striking example of [data] inconsistencies in the U.S.” Some parts of Florida were particularly egregious, Burn-Murdoch and colleagues documented in an October 2021 article: In some zip codes, vaccination uptake figures reached over 2,000 percent. This was possibly due to retirees who traveled to Florida for the winter, got vaccinated there and were counted in the state’s dose numbers, despite not being included in its population numbers, Burn-Murdoch said.
A lack of coordination between state health departments also led to challenges with counting vaccinations when Americans got different doses in different places. For example, a college student who received their first dose in their hometown might then receive a second dose on campus — leading them to be counted as two different first doses in two different states. On a large scale, this leads to over-counting of partial vaccinations and undercounting of full vaccinations, Bloomberg reported in December.
July to November 2021
The U.S. was enjoying a long-promised “hot vax summer” when delta hit the country in July 2021. As the Biden administration had preemptively declared victory over the pandemic, health agencies were unprepared to deal with a new surge — much less with one that infected a lot of people who were already vaccinated.
As breakthrough cases went from anecdotal to a widespread phenomenon, Americans wondered whether they would need an additional vaccine shot for further protection. But limited data on the breakthroughs made it challenging for both federal institutions and individuals to determine whether boosters were needed, and for whom.
The U.S. has struggled to collect and report real-time data on vaccine effectiveness, in part because it’s difficult to sync our vaccination databases with those logging other COVID-19 metrics — namely, cases and hospitalizations. “We have nice studies of vaccine effectiveness, but they become available maybe three or six months after the fact,” said Cécile Viboud, a staff scientist at the National Institutes of Health who studies infectious disease mortality. This delay makes it challenging to predict the possible impact of a new variant or new outbreak.
The CDC said in May 2021 that it would only investigate and report breakthrough cases that resulted in hospitalization or death, a small fraction of the total — leaving outside researchers and reporters to fill in the gap. Some projects took a similar tactic to the COVID Tracking Project, compiling data from states to create an incomplete, unstandardized picture of breakthrough cases in the U.S. But that atomized approach was flawed: A report card from the Rockefeller Foundation's Pandemic Prevention Institute shows how some states are far more comprehensive in their breakthrough case reporting than others.
In the absence of breakthrough case data at home, U.S. scientists have looked abroad to answer questions about waning immunity and the need for booster shots. The U.K.’s Health Security Agency has been a particularly popular data provider when new variants emerge, with its regular reports showing the connections between variants and changes in transmission, hospitalization rates and vaccine effectiveness.
“It’s very hard to watch,” Topol said, discussing these regular updates. “The U.K. reports, I read them every week. And what do we have? Nothing.”
December 2021 to January 2022
During this surge, case data became more unreliable than at any point since the first wave thanks to a combination of competition for PCR testing appointments and increased at-home rapid testing. People in omicron outbreak zones waited for hours to get tested, prompting discussions about essential workers and others at high risk who couldn’t afford to stand in line. At-home tests, meanwhile, were difficult to acquire and positive results were difficult to report, so most state and local health departments opted not to track them at all.
As case data became less useful, experts turned back to hospitalizations. This time though, hospitalizations were more complicated: Because omicron was less likely to cause severe COVID-19 symptoms than past variants, there was uncertainty about how many people hospitalized with COVID-19 were actually there due to COVID-specific symptoms. In other words, how many people were in the hospital “incidentally,” meaning they’d entered the facility for a non-COVID reason but then tested positive through routine screening?
Possibly a lot of people, according to a few hospitals and health departments that started reporting this breakdown. For example, from mid-January through early March, more than half of COVID-19 patients in Massachusetts hospitals had tested positive for the virus after being admitted for a “non-COVID” reason, according to the state’s health department.
However, it can be difficult to gauge whether a hospitalization is truly “incidental” because patients who appear to enter the hospital for a non-COVID reason could actually have an uncommon set of COVID-19 symptoms, or a chronic disease exacerbated by the virus. “You need a panel to adjudicate it,” Topol said. And all COVID-related hospitalizations, incidental or no, increase strain on the healthcare system.
February 2022 and onward
New Kinds of Data
As the Omicron surge wanes, millions of Americans are preparing to live with the coronavirus, rather than defining their lives around it. Many are following the lead of their state governments, which are likely now declaring the end of COVID-19 emergencies and shifting their data strategies to treat this virus more like the flu.
South Carolina’s public health department, for example, announced that it has stopped reporting daily case counts this month and will switch to weekly reporting for hospitalizations and deaths. Iowa’s agency shifted from daily to weekly reports in February, decommissioning two COVID-specific dashboards and moving statistics to a single page on the overall public health website. Missouri’s agency is similarly planning to end case investigations and contact tracing in the coming weeks, moving its focus to hospitalization data and wastewater.
The CDC has yet to make such a drastic shift, but it did move away from case data in a big way while also changing its mask guidance in late February. Rather than relying on case rates and test positivity to determine which U.S. counties should implement COVID-19 safety measures, the agency now recommends relying on hospital admissions and the share of beds occupied by COVID-19 patients; cases are still included in the guidance, but have taken a backseat.
Wastewater data may be particularly useful going forward, as this sewage sampling provides “a really nice early warning signal,” Murray said. As health departments stop rigorously tracking cases, they can use wastewater to see when outbreaks are coming, then use hospitalization data to see how bad those outbreaks are. At the same time, individual residents can use rapid tests to determine their own infection status. But U.S. wastewater sampling has been quite varied so far: On a new CDC dashboard for this metric, the locations where wastewater gets sampled are concentrated in a small number of states, leaving the majority of the country without this data.
Data experts I spoke to for this article expressed fears that the reporting systems built up during the past few years may fall to the wayside after the pandemic is declared “over,” when really, we should be fortifying them for future preparedness. A recent New York Times report discussing the CDC’s failure to publish much of the COVID-19 data it collects suggests that the agency still has more to learn from the past two years when it comes to transparency and communication.
The future of infectious disease monitoring should be thought of like weather forecasting, according to Kaitlyn Johnson, a data analyst at the Pandemic Prevention Institute. “Everyone would be outraged if, suddenly, they could not know whether it was gonna rain or snow that day,” she said. Collecting data on COVID-19 and other diseases could similarly help people make day-to-day decisions and prepare for possible crises; the data deserve investment in line with that potential.