We Gave You 3 Million Russian Troll Tweets. Here’s What You’ve Found So Far.

Last week, FiveThirtyEight published nearly 3 million tweets sent by handles affiliated with the Internet Research Agency, a Russian “troll factory.” That group was a defendant in one of special counsel Robert Mueller’s indictments, which accused the IRA of interfering with American electoral and political processes.

We shared the data with the public in concert with the researchers who first assembled it: Darren Linvill and Patrick Warren, both of Clemson University. Their hope, and ours, was that other researchers, as well as our broader readership, would explore the tweet data, share their findings and improve the data set, all with a goal of understanding Russian interference in American politics.

“So far it’s only had two brains looking at it,” Linvill said of the data last week. “More brains might find God-knows-what.”

Many more brains have now looked at it, and our readers — as ever — have not disappointed. Some of them found ways to improve the data set. We’ve already incorporated a number of those suggestions, and we brought several of them to the attention of the researchers. And some readers made a lot of cool stuff.

What follows is a sampling of reader projects that came to my attention via Twitter (where else?) and email. The projects reinforce and expand upon the Clemson researchers’ initial finding: The trolls were engaged in a sophisticated and intricate Russian assault on the political debate in America and several other countries. It was an assault waged both before and after the 2016 presidential election — and an assault that appears to continue, at least in some form, to this day.

A number of these projects focused on the networks of users and groups of topics that the Russian trolls both created and operated within. Linvill’s and Warren’s main finding was a taxonomy of trolling that separated accounts into categories, including those they called Right Trolls, Left Trolls and Fearmongers. We now have a few ways of visualizing that taxonomy.

For example, Christopher Marcum, a staff scientist at the National Human Genome Research Institute, wrote code in R to plot the network of which troll accounts mention which other troll accounts. Each node in the network below is an IRA-connected Twitter handle, colored by its taxonomic type (Right Troll, Left Troll, etc.) and sized by the number of followers the account had at its peak.

The distinct neighborhoods of this network lay some helpful geometry on top of the Clemson researchers’ work — and they open some interesting new questions. What are the accounts and tweets that bridge these neighborhoods, for example? And did this network shift meaningfully over time?

John West, who calls himself a narrative technologist, developed an artistic way to visualize this Russian trolling strategy. In his graphics, one of which is shown below (the code is available on GitHub), “each square in this grid represents a particular way of tweeting.” A full explanation of the logic beneath the arrangement is available at West’s website, but the closer together any two squares are, the more similar those accounts’ tweeting styles are.

Among West’s observations from his project: “Russian trolls were sophisticated enough to make some of their left- and right-impersonating trolls interested in procedural topics [like debates and election dates] while keeping others focused on uniquely left- or right-leaning topics.”

Andrew Cook, an analyst at Johns Hopkins University Applied Physics Laboratory, used the natural language processing tool Quid to analyze the types of topics the trolls tweeted about. He charted the way Right Trolls and Left Trolls congregated around certain topics. Here’s what that visualization looks like for Right Trolls (where each dot represents not a Twitter user but rather a single tweet).

And here’s what the same type of chart looks like for Left Trolls.

There are significant differences between the two charts. Large swaths of the Right Troll network are devoted to topics such as media outlets, free speech, American jobs and discrediting the FBI. The Left Troll network skewed more toward topics such as racism, police brutality and the Black Lives Matter movement.

Other readers endeavored to make the data as user-friendly as possible. Simon Willison, the creator of Datasette and co-creator of Django, a web framework, loaded the 3 million tweets into a searchable, filterable webpage and provided further details on his blog. Fabio Pardi did something similar, along with making customizable charts of various tweet frequencies, at fromrussiawithtroll.com. Others helped improve the underlying data itself: Martijn Pieters, for example, suggested a useful script for fixing an encoding issue, which we then incorporated, resolving issues with tweets that had characters such as letters of the Russian alphabet and emoji.

Still other readers, and news outlets, focused on the geography of the troll tweets. Christian MilNeil, a reporter with the Portland Press Herald in Maine, highlighted the subset of troll tweets that mentioned that state’s governor and senators — there were hundreds of them. All politics, and some trolling, is local.

And the tweets didn’t stop at America’s northern border. Roberto Rocha, a data journalist with the Canadian Broadcasting Corporation, identified close to 8,000 of the 3 million tweets that were targeted at Canada. He found that spikes in these tweets correlated with significant news events in that country, as shown in his chart below.

Canada wasn’t the only country to take an interest in the troll data. The Clemson researchers told me that they’d also fielded interview requests from Italy, the U.K., Spain and Israel. “They didn’t even seem to know that they had been targeted in the attacks, too,” Warren said.

In Italy, for example, Federico Fubini of the newspaper Corriere della Sera wrote that the trolls repeatedly wrote posts in Italian that supported populist parties’ positions, according to a Google translation. Israel’s Channel 10 News aired a special report investigating patterns in the tweet data relating to U.S.-Israel relations and Prime Minister Benjamin Netanyahu, and finding, according to foreign news editor Nadav Eyal, that the trolls sought to associate President Trump with Netanyahu, and that the trolls also promoted “virulent anti-Israel themes.”

Many other readers shared their works in progress, and given the sheer size of the data set, there is likely much more to come — as well there should be. Releasing the data was meant to preserve an important historical record, but analyzing it is the only way to understand what happened and bolster national security.

Nick Diakopoulos, a communications professor at Northwestern University, tweeted: “Brace yourselves for a wave of academic papers on this dataset.”

Consider FiveThirtyEight braced.

If you use the data and find anything interesting, please let us know. Send your projects to oliver.roeder@fivethirtyeight.com or @ollie.

Comments