When I tweeted from a Knicks game at Madison Square Garden on Dec. 2, I had no idea that data scientists could use that information to find out I’d used my MasterCard to buy an overpriced $12 beer — as well as identify all my other credit card purchases.
But with as few as four publicly available geo-tagged data points, scientists can accurately connect 90 percent of people to their credit card transactions, according to research published in the journal Science on Friday. That data is supposed to be anonymous, but it’s not really, and women and high-income people have less anonymity than others.
The study used metadata from three months of credit card transactions made by 1.1 million people who shopped at 10,000 stores in an unnamed (for now?) wealthy country. This metadata had no names, no account numbers, nor any other information that would make it easy to identify someone. The only transaction data available was the day it took place, the rough location and — in a separate model — the amount spent.
The researchers were able to then take geo-tagged information — such as Instagram photos, tweets and Facebook posts — and use it to mine the “anonymous” credit card metadata. So, in my case, they could combine my tweet from M.S.G. with three other data points — maybe when I posted on Facebook from Whole Foods, the public library and the gym — to match my name to my user ID in the transactions.
The authors’ model found that women were 21 percent more likely than men to be “re-identified” from the transaction data. High-income people were about 75 percent more likely to be identified than those with lower incomes, and medium-income people were 17 percent more likely to be pinpointed. The authors scored individuals’ behavior based on how unique it was relative to others (see the chart below, Figure 4 from the paper).
What is unique about their behavior? Where they shop. The stores that women and high-income people frequent are more distinct, making them easier to distinguish.