Skip to main content
ABC News
Big Government Is Getting In The Way Of Big Data

When the government wants to know how many people are unemployed, it calls people and asks them whether they’re working. When it wants to know how quickly prices are rising, it sends researchers to stores to check price tags. And when it wants to know how much consumers are spending, it mails forms to thousands of retailers asking about their sales.

“Big data” may have revolutionized industries from advertising to transportation, but many of our most vital economic statistics are still based on methods that are decidedly, well, small.

Now economists both inside and outside government are trying to change that. They are working to open up access to government records, make better use of private-sector data and use modern statistical techniques to link together different sources — steps, they believe, that could allow for economic statistics to become more accurate, more detailed and perhaps available more quickly. Ultimately, they hope to allow economists to tackle questions that aren’t answerable using currently available data sources — how government programs affect participants years or even decades down the line, for example. President Obama’s latest budget, released last month, dedicates an entire chapter to proposals to expand access to so-called administrative data, records collected as part of government programs rather than through surveys.

But it won’t be easy. Efforts to change the way the government collects statistics face legal, bureaucratic and practical hurdles and in some cases could run afoul of privacy advocates worried about how the government tracks its citizens. Despite bipartisan support for change, actual progress has been slow.

“It’s just taking a lot longer than anyone wanted,” said Hal Varian, chief economist for Google and an outspoken advocate for the expanded use of modern data-collection approaches.

One example of the shortcomings of the current system — and the potential for improvement — is the monthly jobs report. Every month, investors, economists and journalists (myself included) race to digest the Bureau of Labor Statistics’ count of new jobs. But the report is highly imperfect. Its numbers, based on a survey of businesses, are volatile and subject to revision. It provides little information about what kinds of jobs are being created or how much they pay. The only demographic information available is based on an entirely separate survey with even larger margins of error.

The government has much more complete information on almost all those subjects. State unemployment systems collect detailed information on employment and wages. The Internal Revenue Service has extensive information about virtually every business in the country, and the Social Security Administration tracks nearly all workers. There’s even the National Directory of New Hires, a little-known database created to track child-support delinquents.

Statisticians at the Bureau of Labor Statistics don’t have access to any of those sources for the monthly jobs report, however.1 IRS data is off-limits under federal law. Other sources are unavailable either because the agencies that control them won’t share them or because the BLS doesn’t have the resources to turn them into a usable form.

The problem goes far beyond the jobs report. Most of the government’s most closely watched economic indicators are based on surveys, from monthly reports on construction, retail sales and inflation to annual reports on household income and consumer spending. In nearly every case, more complete data exists from either public or private sources, if only government agencies could get access to it.

The access limitations are at least partly the legacy of the last time that the government tried to expand its collection and use of administrative data. In 1965, the Johnson administration proposed the creation of a national data center in part to track the performance of Johnson’s Great Society initiatives. The plan faced immediate public backlash, according to a recent history of the proposal published by the Census Bureau:

The government’s endorsement of the national data center proposal led to public outcry and intense congressional scrutiny over the data on individuals maintained by federal agencies, potential misuse of such data, and threats to privacy posed by emerging technologies … Fears of ‘Big Brother’ and secret government dossiers swirled around discussions of the national data center, and the issue became identified with other concerns about invasions of privacy ranging from psychological testing to illegal wiretapping, culminating in the passage of the Privacy Act of 1974.

The Privacy Act and later laws put strict limits on how the government can use administrative records. The 1998 Workforce Investment Act, for example, prohibits the creation of a database of people receiving federal job-training services. So even though the government already has a database of new hires, “we cannot use it to evaluate federal job-training programs,” said Aviva Aron-Dine, acting deputy director of the Office of Management and Budget, which has helped lead the White House’s open-data efforts. “That’s a particularly striking example.”

Legal restrictions aren’t the only obstacle. There are also practical challenges to cleaning the data and making it usable. Administrative records are created not for statistical purposes but to serve a particular need — tracking Medicare recipients, say, or making sure companies pay their taxes. Even something as simple as vital records — births, deaths, marriages and divorces — were for decades collected differently in each state. Even when all the data is consistent, making it useful for researchers often requires a lot of work. The Census Bureau has spent years painstakingly going through business records to assign industry codes to individual companies.

But government economists say they have little choice but to look for ways to expand the use of administrative data. More Americans are refusing to answer surveys, making survey-based data less reliable and more expensive to collect. At the very least, said BLS Commissioner Erica Groshen, statistical agencies ought to be able to stop relying on surveys to collect data that’s already on another government agency’s servers.

Groshen said the real promise of administrative data is the potential to open up new avenues of research. Take one of the most hotly debated topics in modern economics: inequality. The biggest income gains in recent years have been concentrated among the very top tier of earners, the richest 0.1 percent or even 0.01 percent. But it’s almost impossible to study those trends using traditional survey-based sources. Even a large survey such as the one used for official income statistics, which is based on about 75,000 interviews each year, would include at best a handful of households in the top 0.01 percent. And that assumes the rich will willingly disclose their personal financial information to a government survey-taker.

Economists Raj Chetty and Emmanuel Saez were able to get access to IRS records that allowed for a far more detailed look at incomes, including those of the very rich. And they were able to go beyond studying simple inequality trends; because they were looking at data for essentially the entire population, not just an ever-changing survey sample, they were able to track individuals over time. That allowed them to make a surprising discovery: Economic mobility — how likely it is for someone born poor to reach the middle class, for example — has been more or less flat in recent decades, not falling as previously thought.

“When you see good papers being written with administrative data, very little of the quality comes from simply the fact that there’s more data,” said John Friedman, a Brown University economist who has worked with Chetty and Saez. “Usually, it comes from the fact that it unlocks either better research designs or more interesting questions … that you simply could not have feasibly attempted with the existing survey data.”

Administrative data is also useful for evaluating government policies, particularly when researchers can combine multiple datasets — linking tax filings to educational records, for example, or matching health care and employment data. In its budget, the Obama administration cites research by Chetty, Saez and Friedman that looked at the long-term impact of a government preschool program from the 1980s. Using IRS data, the researchers were able to go back and see how program participants had fared over the decades, something that would have been prohibitively expensive if not impossible using survey techniques.

That kind of policy-focused research has given open-data efforts bipartisan appeal. Democrats, who tend to support government programs, think data will help demonstrate the programs’ effectiveness; Republicans, who tend to be more skeptical, are hoping data will help identify and ultimately shut down inefficient programs. Last year, Washington Sen. Patty Murray, a Democrat, and Wisconsin Rep. Paul Ryan, a Republican, joined forces to sponsor a bill to create a commission to recommend ways to expand access to and use of government data in policymaking.

“Instead of putting the focus on effort, we want to put the focus on results,” Ryan said in a statement when the bill was introduced.

But despite such political support, change has come only gradually. Groshen, the BLS commissioner, said dozens of companies now feed their employment records directly into the agency’s servers, for example. The Centers for Medicare and Medicaid Services have made millions of health care-related records available through their website. And the Census Bureau is linking transaction, trade and patent data in its massive business database. But Ryan and Murray’s bill hasn’t made it out of committee, and Chetty and Saez are still among the only people with access to the IRS data, which they aren’t allowed to share with outside researchers.

Other countries, particularly in Scandinavia, offer far easier access to their data, said Katherine Smith, executive director of the Council of Professional Associations on Federal Statistics. As a result, even many American economists are forced to conduct their research overseas.

“Our Harvard and Yale and Princeton economists are studying Finland because they can’t study the same thing here,” Smith said.


  1. The BLS does use state unemployment insurance data for the Quarterly Census of Employment and Wages and to revise the monthly jobs reports.

Ben Casselman was a senior editor and the chief economics writer for FiveThirtyEight.