Skip to main content
Menu
Science Wants Your Data

Eric Dishman wants your data. As the director of the National Institutes of Health’s All of Us research program, he’s trying to convince 1 million Americans to donate reams of sensitive personal information to science. Electronic medical records? Gimme. Genetic data? He’ll take it. Residence history? His inbox waits with open arms.

Dishman’s goal is to build a database that can help all kinds of scientists make connections between how people are affected by a disease and what biographical differences they might share, which in turn could lead to new, more-personalized treatments. He’s not alone in his search for data donors. Experts who study the way science uses data say that both health and social sciences are increasingly reliant on collecting huge amounts of potentially sensitive information about human research subjects. And while participating in research has always carried risks, this new approach means that the amount of data collected is so large and the types of data are so interconnected that the risks have grown large and connected, too.

Data donation is definitely in demand. For instance, you can donate social media and health data for suicide prevention research at OurDataHelps; download an app called Bitmark that connects data donors with researchers; or join a social movement led by a private company called Dateva that’s petitioning governments to make it mandatory for health care organizations to share data. Scientists have published calls for a post-death data-donation system, similar in concept to organ-donation and body-donation programs.

All of Us launched its search for participants in early May in hopes of creating a massive database of Americans’ personal and medical information. The idea is that NIH would manage the database and scientists from many different institutions could apply to access and use the data in a wide variety of studies. It’s a big request: All of Us is specifically focused on minority communities that have very good historical reasons to not trust government-run medical research. Dishman says the program’s goal is for the database to be made of up of 70 to 75 percent people who are underrepresented in biomedical research, generally people of color and women.

But the challenge for Dishman isn’t just on the patient side. All of Us also represents a big request for scientists who want to use the data. As with many large research projects, researchers will have to apply to get access. But they’ll also have to undergo data ethics training and work with the data only on NIH servers — no downloading allowed. And that is not common practice, Dishman said. In that way, the All of Us program is representative of two interlocking trends: Scientists are looking for data donors to share highly personal information … but getting that data means seriously rethinking the way science studies human subjects.

While scientific research has long relied on data from human subjects — including personal medical data — modern data donation is different. Digital collection, storage and analysis make it possible to take in far more information than in the past and to make connections between sets of data that never could have been made before. Scientists can, in effect, build a more complete profile of individual participants, said Nicholas Proferes, professor of information science at the University of Kentucky.

That provides new opportunities — and it creates new risks. People with malicious intent have long been able to do damage with data stolen from any medical study, Proferes said. But what a bad guy could find out, and the damage they could do, is different now that a study can link, say, a person’s Fitbit with their known genetic variants and their vaccination records.

There’s also an increased risk beyond the individual level. “What we’re seeing now is that there are network effects,” Proferes said, referring to the Cambridge Analytica scandal earlier this year. “And that’s a hard thing for an individual to conceptualize.” For instance, precision medicine databases are governed by a patchwork of laws that could mean an individual’s choice to share data ends up helping to create a search engine used for law-enforcement surveillance. If something like that happened, it would likely pose a larger risk to some communities — like people of color — than others, said Tonia Sutherland, professor of information science at the University of Alabama. “Surveillance isn’t applied equally across the board,” she said. “There are people who are more vulnerable to exposure than others.”

And those risks are compounded by the fact that scientists aren’t well prepared to deal with them. Proferes and Sutherland told me that the institutional review boards that regulate research using human subjects don’t typically deal with social media or online data collection. Experts also said that most scientists don’t have any training in data ethics.

The issue of donor consent is also complicated. Multiple experts told me that the vast majority of scientists doing research on public social media accounts don’t even seek consent, nor do they have to — a reality that Twitter’s terms of service was updated to reflect in 2014. If you’ve been using Twitter for a few years and have a public account, you can basically assume you’ve been a part of somebody’s research — whether you know it or not.

And the more data that scientists want research subjects to donate, the more confusing consent can become. What you choose to share today could, five or 10 years from now, be used in ways you can’t foresee as researchers ask new questions and connect new data sets to one another. And subjects might not realize that one consent form can open up access in perpetuity. For instance, offering scientists access to your electronic health records may not just give them a one-time snapshot of your life. That’s something All of Us has had to figure out how manage in a transparent way. “You’re turning it on as a spigot, and we don’t want people to forget that,” Dishman said. “So we have to have regular reminders. ‘You’ve turned on the spigot, and are you still okay with that?’”

None of this means that people should reject the possibility of becoming data donors out of hand, said Sutherland and Proferes. The problem is more that data collection and analysis on this scale is new and so far there aren’t any standardized practices for how to do it. Some institutions and researchers handle big data from human subjects really well. Others don’t.

So if you’re interested in being a donor, there are some key questions you should be asking scientists. Who has access to the data and what qualifications do they need to get it? How will data be anonymized and what plans does the institution have for handling a data breach? How will data be used and what will happen if someone wants to use it in a new way at some point in the future? How will subjects be informed about what is happening in the research and what recourse do they have if they don’t like how their data is being used? And what do subjects get out of their donation?

That last one is particularly important, Proferes said, because it’s a crucial part of both how scientists build relationships with the public and how the public decides whether those relationships are a good idea. People really do want to know the results of research they participate in, he said. But sharing results isn’t a normal part of how scientists deal with human subjects. Dishman, for instance, has himself been a research participant in more than a dozen studies and no one ever came back and told him what the results were. And that’s a problem. Nobody can decide whether the risks of data donation are worth it if they never find out what “it” is.

Maggie Koerth-Baker is a senior science writer for FiveThirtyEight.

Comments