On July 29, 2015, the New York City Department of Health and Mental Hygiene sent out an alert — 31 people in the South Bronx had contracted Legionnaires’ disease, a lung infection from waterborne bacteria that kills about 1 out of every 10 people who get it. By the time officials found the source (a cooling tower) and contained the spread, 128 people had contracted Legionnaires’ and 12 people had died. It was the largest outbreak of Legionnaires’ disease in the city’s history — an outbreak that was first detected by a computer program.
In less than a month, the outbreak was over — the software had helped investigators narrow in on the infected cooling tower and might have saved lives. Few health departments are as advanced at fighting communicable diseases as New York’s. But this was not the first time investigators have used software and data to help fight and detect diseases. A recent special edition of the Journal of Infectious Diseases was devoted entirely to studies on how to use big data to detect and model infectious diseases. In Europe, Influenzanet allows people in 11 countries to report influenza-like symptoms, and in the U.S., the National Notifiable Diseases Surveillance System allows health departments to voluntarily share data on public health and disease as part of efforts to identify and stop outbreaks. But Sharon Greene, lead author of a new paper describing New York’s outbreak detection program and director of the Data Analysis Unit at the city’s Bureau of Communicable Disease, wanted to go beyond just software that shared data. A program that can look through that data itself and identify potential outbreaks can have a significant impact.
“It is just not possible to effectively monitor every communicable disease in real time with human eyes alone,” Greene said. “To be able to quickly and effectively and precisely detect an outbreak, to kick off an outbreak investigation process — the earlier that you can begin this it helps to limit sickness, it helps to limit death, and it makes it more likely that you will successfully solve the outbreak.”
In March 2013, Greene and her team began to develop a system, using free software called SaTScan, to automatically monitor, map and detect disease outbreaks throughout New York City. To do that, the system relies on the staggering amount of data that the city health department receives daily. To track disease outbreaks in the city of more than 8.5 million people, health care laboratories submit about 1,000 reports daily on various confirmed diagnoses. Greene’s system identifies the location of those diagnoses within the city’s 2,216 census tracts.
According to Greene, the system creates a digital cylinder centered in each census tract, with a base representing space and a height representing time. For every diagnosis, the system maps the disease in space and time. Imagine, for example, that a person who lives in the South Bronx is diagnosed with influenza — the system makes a data point there within its census cylinder for influenza. As time goes on, any future flu cases in that census-centered cylinder will be added as points, each one just a little higher in the cylinder than the one that came before it. If the software finds that a cylinder has enough diagnoses within a small enough area of the city and over a short enough period to be significantly unusual, it sends out an alert — the South Bronx might be in the early stages of an influenza outbreak.
Greene said the computer does not actually display these points or space-time cylinders — it’s all digital code — but the alerts are real. In November 2014, less than a year before the program detected New York City’s largest-ever outbreak of Legionnaires’ disease, it detected an outbreak of shigellosis in Brooklyn, a contagious disease that can cause diarrhea, fever and abdominal pain. Both times, according to Greene’s paper, the SaTScan program sent out an alert before any human reported a pattern.
But seeing patterns isn’t always enough — sometimes a group of people is just unlucky. For example, the paper describes how in April 2015 the software saw that six people in the Bronx were diagnosed with giardia, another waterborne illness. Although this was an unusual cluster of giardia for the city, investigators found that all six were housemates and had acquired the illness while traveling together. This is why, Greene said, no matter how sophisticated the software, people still make the final decisions.
SaTScan “is a tool that you use in combination with other tools,” Greene said.
And although Greene’s program is among the more sophisticated data tools that city health departments use, other health departments also wield data creatively.
Monica Bharel, commissioner of the Massachusetts Department of Public Health, said her department has built a program that uses data to fight a different type of epidemic: opioid addiction. The Massachusetts Prescription Awareness Tool, or MassPAT, monitors opioid use for pharmacies and prescribers, Bharel said. MassPAT can see a patient’s prescribed opioid history — what the person was prescribed and when. The program then tells doctors and pharmacists what drugs the patient should or shouldn’t get and alerts them when it sees a pattern indicative of opioid abuse.
In Chicago, the health department uses data to map and fight West Nile virus and the mosquitoes that spread it. Sarah Kemble, medical director of the Chicago Department of Public Health’s Communicable Disease Program, said the department sets up mosquito traps all over the city and collects data on what insects are infected and where. This “heat map” of West Nile in Chicago tells the department where to focus efforts to beat back the virus.
Unlike in New York, this map is still created by people, Chicago city employees who take the data and extrapolate it onto a map themselves. In fact, New York’s level of automation may be hard to implement for many city health departments simply because of how those agencies receive the data.
“Most laboratories in Chicago and in the state do use [electronic laboratory reporting] now,” Kemble said. “… There are still some laboratories that don’t, however. Those will still fax their test results to the city.”
SaTScan does not work as well or as quickly when it relies on data from fax machines, but it could still be useful for cities with older systems. Chicago’s faxed-in reports are entered into the health department’s database. And although the process wouldn’t be streamlined, Greene said that SaTScan could help cities fight potential outbreaks once that data is digitized. She and her team published the exact code they use in order to make it easier for other health departments to adopt the system.
“These are very effective statistical disease surveillance methods that are available in free-to-download software,” Greene said, “and so that should really facilitate adoption anywhere.”