How I detect fake news (original) (raw)

How I traced the falsity of one internet meme, and what that teaches us about how an algorithm might do it.

December 7, 2016

I have a brother who is a big Donald Trump fan, and he frequently sends me articles from various right-wing media sources. Last week, he sent me a variant of the image below:

Fake maps

Figure 1. Fake maps claiming to correlate crime rates and Democratic votes, circulated via email.

I immediately consulted Snopes, the fact checking site for internet hoaxes, and discovered that it was, as I expected, fake. According to Snopes, these are actually both electoral maps. Per Snopes: “On 11 November 2016, the Facebook page “Subject Politics” published two maps purportedly comparing the results of the 2016 U.S. presidential election with the 2013 crime rate in in the U.S. … The map pictured on the bottom actually shows a 2012 electoral map that was created by Mark Newman from the Department of Physics and Center for the Study of Complex Systems at the University of Michigan.” Snopes was unable to verify the source of the first map, but concluded (presumably by comparing with known electoral maps) that it is in fact an incomplete electoral map from the 2016 election.

Learn faster. Dig deeper. See farther.

Snopes, which uses human editors for fact checking, does a good job, but they can’t find every fake news story. Still, when a reputable fact-checking organization like Snopes or Politifact identifies a story as false, that’s a pretty good sign.

Continuing my research, I used Google to search for other sources that might provide more insight on the relationship between the electoral map and crime rates. I quickly found this 2013 article from Business Insider, “Nine Maps That Show How Americans Commit Crime.” It shows a very different picture:

violent crime per one hundred thousand people

Figure 2. Data on violent crime per one hundred thousand people, from the FBI Uniform Crime Report, 2012.

Since Business Insider told me the source of the data (the FBI Uniform Crime Report), I could go verify it for myself. Sure enough, the data on the FBI site matched the Business Insider map.

I tell this story of two maps to emphasize that when people are discussing the truth or falsity of news, and the responsibility of sites like Facebook, Google, and Twitter to help identify it, they somehow think that determining “truth” or “falsity” is something that only humans can do. But as this example shows, there are many signals of likely truth or falsity that can be verified algorithmically by a computer, often more quickly and thoroughly than they can be verified by humans:

Note that when fake news is detected, there are a number of possible ways to respond:

  1. The stories can be flagged. For example, Facebook (or Gmail, since much fake news appears to be spread by email) could show an alert, similar to a security alert, that says “This story appears likely to be false. Are you sure you want to share it?” with a link to the reasons why it is suspect, or to a story that debunks it, if that is available.
  2. The stories can be given less priority, shown lower down, or less often. Google does this routinely in ranking search results. And while the idea that Facebook should do this has been more controversial, Facebook is already ranking stories, for example featuring those that drive more “engagement” over those that are more recent, and showing “more engaging” stories or stories related to ones we’ve already shared or liked. Once Facebook stopped showing stories in pure timeline order, they put themselves in the position of curating the feed algorithmically. It’s about time they added source verification and other “truth” signals to the algorithm.
  3. The stories can be suppressed entirely if certainty is extremely high. We all rely on this level of extreme prejudice every day, since it is what email providers do to filter the email we actually want to see from the billions of spam messages sent every day.

As I wrote in my first article on the topic of fake news, Media in the age of algorithms, “The essence of algorithm design is not to eliminate all error, but to make results robust in the face of error.” Much as we stop pandemics by finding infections at their source and keeping them from finding new victims, it isn’t necessary to eliminate all fake news, but only to limit its spread.