In February of 2014, New York City Mayor Bill de Blasio announced the creation of IDNYC, a municipal identification card primarily designed to ease bureaucratic burdens for the city’s immigrant population. When the card became available a year later, de Blasio described the program as “fraud-proof, secure and appealing to anyone.”
Now privacy advocates and progressives are worried that it also may be appealing to Donald Trump. The president-elect has said he plans to deport up to three million undocumented immigrants, and immigrant advocates are concerned the database of immigrants may be a good place to start. That combined with de Blasio’s vow that New York will remain a sanctuary city has brought renewed attention to the security of the database. In December, a court barred the city from deleting the data to protect users’ identities and an ongoing lawsuit ensures that the records continue to be retained today. But there’s an urgent question about the records, fundamental to understanding not just the fate of the data for IDNYC, but all consumer data in the hands of third parties, be they private companies or state departments: Can an entire dataset of important information really be deleted, just like that?
There are three key principles to how data gets deleted in today’s technological age. Let’s start with the first: All data, from dutiful Facebook likes to iCloud selfies to every secret NSA database, is stored on a physical device somewhere. The difference between your computer’s hard drive and “the cloud” is that you only have physical access to the former. For our purposes, this is an important detail: You can only confidently erase data that you have total control over.
To see what this looks like, I went to the northern reaches of Brooklyn. “We specialize in data destruction,” Bill Monteleone told me when I visited his e-waste recycling and data destruction company, GreenChip. What GreenChip calls “data destruction” the feds call data sanitization. (The Department of Defense and the National Institute of Standards and Technology are the bodies that set data deletion standards. According to the National Association for Information Destruction, there are at least 1,200 companies following these standards.)
In practice, data sanitization is as clinical as the name implies. At GreenChip, hard drives are the patient zero of the process. These unassuming rectangles, present in every computer, house the gigabytes or terabytes of data to be erased. To clear the drives, GreenChip uses a five-foot-tall computer that resembles a nondescript black server rack. That computer, armed with the company’s proprietary wiping software, uses a special algorithm to programmatically access each bit on the drive, stored as either a ‘0’ or ‘1,’ and write a new digit to it. The process must be repeated at least three times and with different numbers to be up to DoD standards. It’s the actual data on the drive that is affected, rather than the shortcuts or pointers to the data that are typically modified by dragging a file to the trash bin or reformatting a drive.
This overwriting process is a bit like painting a wall: If you start with a white wall and paint it red, there’s no way to erase the red. If you want the red gone or the wall returned to how it was, you either destroy the wall or you paint it over, several times, so that it’s white again.
The analogy breaks down, of course. A repainted wall can be scratched so that it reveals the underlying color. In contrast, when drives are overwritten enough times — like the way GreenChip does it — the data that once was there really is gone for good.
Monteleone prefers wiping drives to destroying them, as it’s less wasteful and more profitable for clients (wiped drives can be reused or sold to other companies for a fraction of the price). Wiping is what he calls a “21st century way of getting rid of the data.” Drives slide into the machine and emerge completely different, the mechanics of the process abstracted from our sight. Destruction, on the other hand, involves a mobile shredding unit with a large, all-black machine inside. After hard drives have been pierced and punctured, they are fed into the mouth of this unit. Capable of shredding more than 400 hard drives an hour, it demagnetizes the drives and reduces them to literal pieces in minutes. Leftover shredded fragments are spat into a bin that sits at its base, reduced to nothing more than jagged shards of metal.
Regardless of whether the personal data of a city’s undocumented immigrants should or shouldn’t be destroyed, it’s comforting to think of private, important data going through the shredding process and ending up as useless as the pieces in the bin. Unfortunately, it’s only half the story.
The second thing that is useful to understand about erasing data: Data cannot be erased. At least, not in the typical way that we think of erasing things, where we know all versions of a file are gone forever. There’s no way to certifiably ensure that every copy of some data set is permanently gone.
“There is no delete in the world of data,” Matt Mitchell announced over an encrypted phone call. Mitchell is the founder of the CryptoHarlem meet-up and a security expert. “Data destruction places know how to physically destroy a drive in a manner so that other people can’t recover that data. But that’s assuming you have all the drives.”
Mitchell explained that any company or institution that makes money from or has a responsibility to collect data knows that if data is overwritten, it’s gone for good. For this reason, data is backed up endlessly, creating constant copies of it.
Mitchell used Google as an example. “You want content easily sent around. You want the information quickly delivered to you. So even though there’s only one file in my Google Drive, it’s replicated to hundreds of thousands of files. And deleting that one file isn’t actually that easy because I’m deleting it from hundreds of thousands of places.”
Google did not respond to FiveThirtyEight’s request for comment about its data storage practices. But tech company policies do mention this constant duplication directly. A representative from Facebook pointed me to the company’s terms, which state directly: “when you delete IP content, it is deleted in a manner similar to emptying the recycle bin on a computer. However, you understand that removed content may persist in backup copies for a reasonable period of time (but will not be available to others).”
Data is copied through backups, but theoretically can also be copied by anyone. Security experts such as Mitchell are trained to consider the fact that data copying or infiltrating can be a silent, unknowable process. “How do you know that data wasn’t copied off a drive? What if an outside person was somehow able to penetrate a system and pull that data?”
It’s impossible to know, and therein lies the problem. Data can certainly be disposed of, but even a company like GreenChip can only destroy or wipe the hard drives they are given. Companies can only overwrite the data they have.
And thus the third key point of data erasure: Data is always easier to create than to destroy.
For IDNYC, it’s still not yet clear if the situation will be as secure as de Blasio promised. A court has temporarily barred the city from destroying any of the data associated with the IDNYC program. In an email, Rosemary Boeglin, deputy press secretary for the mayor, said, “Mayor de Blasio is absolutely committed to protecting the security of our data. As we continue to review all of our options, we are confident that we can keep IDNYC data private.” But it isn’t yet clear to anyone how that will happen.
This third rule of data erasure is a lesson that might be hard to learn, and it runs counter to the prevailing wisdom of saving, copying and backing up all data. However, as we live in a world where data collection only intensifies, it may prove itself to be the most important one.
It’s one the city of New York has already learned. According to Boeglin, the program has already transitioned to a policy that does not involve the retention of future cardholders’ personal data. The best way to sidestep the difficulties of destroying data? Not creating it at all.