How Crowdsourcing Can Identify and Debunk Disinformation
By Andy Carvin
Crowdsourcing in journalism – in which members of the public work together to conduct reporting outside the traditional confines of the newsroom – is as old as journalism itself.
The very first independent broadsheet published in North America – Publick Occurrences of Boston in 1690 – intentionally left its back page blank so readers could jot down the latest news and pass it on to other readers. In the modern era prior to the Internet, you have famous examples such as a schoolteacher cracking a code written by the infamous Zodiac killer, as well as the brother of Unabomber Ted Kaczynski recognizing his writings after the bomber’s manifesto was published by the New York Times and Washington Post in 1995. Meanwhile, there are countless more mundane examples of crowdsourcing that can be seen every day in journalism, from traffic updates to weather reports. The practice, in and of itself, is nothing new.
Having said that, the advent of the Internet and social media platforms in particular has raised the stakes in ways few of us could have anticipated. I first witnessed its potential on September 11, 2001, after launching an email discussion group in the hours following the attacks on New York and Washington. I wasn’t yet a working journalist at the time, but I created it out of a sense of helplessness, not knowing what reports to believe or what rumors to dismiss. The email group, SEPT11INFO, quickly gained a following of more than 1,000 participants, who posted more than a thousand messages in the first 24 hours. We shared what we were hearing in news reports and what we were seeing with our own eyes, from the rooftops of Manhattan to high-rise apartments in DC. It would be several more years before the term “crowdsourcing” would be coined by Jeff Howe and Mark Robinson of Wired Magazine, but the possibilities were clear – groups of people working to uncover information collaboratively had the potential of making a significant contribution to journalism.
Experimenting with crowdsourcing would become a full-time job for me in 2006, when I joined NPR and went on to found the company’s social media desk. We worked with the public to fact-check presidential debates, track reports of voting problems, identify weaponry being used in the Libyan revolution and in many other contexts. Newsrooms like ours didn’t have a monopoly on organizing crowdsourcing efforts, however; teams of Internet sleuths would routinely self-organize to tackle projects such as analyzing suspicious photos or attempting to ID people seen in a particular video. In practice, anyone with the motivation to answer a question or solve a problem could coordinate with like-minded individuals online to see what they could uncover.
The results, unsurprisingly, were often all over the map. Some crowdsourcing efforts would succeed while others failed – and occasionally there were instances that ended up making a particular situation worse. When inventor and explorer Steve Faucett went missing in 2007, for example, Sir Richard Branson partnered with Google and Amazon’s Mechanical Turk service to recruit volunteers to analyze satellite photos in search of potential airplane wreckage. Before it was over, more than 20,000 volunteers would sift through over 50,000 images. Not only did they fail to find Faucett’s plane wreckage, they negatively impacted search and rescue operations. Many volunteers incorrectly concluded they’d uncovered evidence of the wreckage and reported it to the civil air patrol, tying up valuable resources. As Civil Air Patrol Major Cynthia Ryan told Wired at the time:
The crowdsourcing thing added a level of complexity that we didn’t need, because 99.9999 percent of the people who were doing it didn’t have the faintest idea what they’re looking for…. In the early days, it sounded like a good idea. In hindsight, I wish it hadn’t been there, because it didn’t produce a darn thing that was productive except for being a giant black hole for energy, time and resources. There may come a day when this technology is capable of doing what it says it can deliver, but boy, that’s not now.
When crowdsourcing meets disinformation
When it comes to crowdsourcing in journalism, the most common examples tend to revolve around confirming or debunking information. This becomes especially necessary in scenarios when people online are circulating content they claim to be relevant to a major news story, but turn out to be incorrect. Most of this incorrect information can either be described as misinformation or disinformation.
Broadly speaking, misinformation is something being circulated by people who have good intentions, but are simply incorrect. It could be as simple as someone citing a statistic or a factoid that they’ve misheard or misinterpreted. In contrast, disinformation is done intentionally – a bad actor circulating something they know to be wrong but hope to sow confusion or deceive the general public. Broadly speaking, you’re more likely to come across people who are misinformed than are intentionally engaging in disinformation. Having said that, there are entire communities across the Internet – 4chan among the most infamous – that dedicate their energies towards sharing information they know to be untrue, just to troll the rest of us.
Whether something is misinformation or disinformation, the process of going about debunking it is pretty much the same. Free tools such as Google Image Search, Google Earth, Google Scholar, etc, are often good starting points whether you’re trying to answer the question yourself or are soliciting assistance from the public. In other cases, crowdsourcing can help you track down technical or cultural expertise – knowledge that is more constructive coming from someone who knows what they’re talking about rather than a bunch of volunteers simply poking around search engines.
During the Libyan Civil War, Al Jazeera Arabic posted a story claiming Israel had supplied chemical weapons to Muammar Gaddafi. Their evidence was a photo showing Libyan revolutionaries possessing government weapons they had captured; one of them was carrying some type of shell with a six-pointed star on it. Since the likelihood of the Israelis putting their national symbol on a covert weapons shipment seemed pretty much nil, I asked my Twitter followers if they could figure it out. Through a combination of subject-matter experience and time spent searching arms manufacturing websites, they quickly proved that the weapon in question was an illumination round, used by military forces to light up the sky at night. The “six pointed star” turned out to be a standard symbol used to identify illumination rounds going back to at least World War I. The Israeli weapons claim was proven untrue – an example of lazy and uncritical journalism that jumped to conclusions rather than investigate a more plausible answer.
With disinformation, though, it’s not enough to debunk something. If it appears something has been circulated intentionally, crowdsourcing becomes a bit of a whodunnit – who’s to blame, how they managed to do it and why. In the early days of the Syrian revolution, a prominent blogger named Amina Arraf was kidnapped, causing an international outcry until a handful of us began question whether she existed in the first place. It wasn’t enough for us to poke holes in her blog posts to show they were false – we needed to identify the perpetrator and their motivation. Was it a disinformation op being run by an intelligence agency? Was it a politically-minded troll trying to undermine the authority of revolutionary activists? It turned it the answer was neither. Ultimately, the perpetrator was identified as an American man living in Scotland who claimed he had created Amina Arraf because he wanted to engage in more authentic online discussions about the Middle East, and his use of the persona spiraled out of control once the revolution started. Knowing the reasoning behind the disinformation campaign was key to interpreting its impact.
When it comes to conducting my own crowdsourcing projects, I generally try to follow some basic guidelines:
Define the task as clearly as possible. People are busy, and they’re more likely to help you if you’re precise in your request. Sometimes it might be as simple as asking “I’m looking for a fluent Egyptian Arabic speaker who can help translate this paragraph,” or “Are any of you from Caracas or have spent a lot of time there? I’m trying to identify a landmark in the background of a pic so I can confirm the exact location.” In contrast, consider something like “Can any of you look through several hundreds pages of receipts to see if there’s anything interesting?” In this example, you’re not telling anyone exactly what you’re looking for, nor are you providing a convenient way to divvy up the work. Think of it this way – if you saw a journalist on Twitter or Facebook post a request, would you be likely to set aside what you’re doing and help? Probably not.
Give examples of what types of responses are helpful or unhelpful. Make sure people understand what you’re expecting of them. Do you need a verbatim translation, or will a general summary suffice? Do you need people to provide screenshots of their work, or explain how they went about conducting it? Whatever the situation, your volunteers will appreciate the guidance, and they’re less likely to get frustrated if they try to help you and you’re dissatisfied because you didn’t provide that guidance.
Ask people to cite primary sources or other direct evidence. If you’re looking to fact-check something or find an obscure bit of information, be sure to ask your online community to cite how they know the answer. Perhaps it’s a link to a research paper, or a video that documents whatever you’re looking for. Otherwise, someone might send you an answer without anything to back it up, which can be problematic when you’re trying to confirm something. (You may also want to remind people that Wikipedia in and of itself doesn’t make for a good primary source, though Wikipedia entries may link to them.)
Consider giving it a precise hashtag so people can follow the conversation more easily. On some social platforms like Twitter, a hashtag can be a handy way for participants to keep track of the conversation. If you’re asking people to help identify the location of a landmark in a video, use a specific hashtag that isn’t being used by anyone else. That way it won’t be cluttered with conversations not germane to the task at hand.
Be transparent about your deadlines. If you need to figure out something now, say so. Otherwise people might make a mental note of your request and take a crack at it once it’s already too late. And there’s nothing more frustrating to a volunteer than taking the time to help you out and then discovering that you needed it two days ago, but never explained you were on a tight deadline.
Always acknowledge the people who helped you. Your online community isn’t getting paid to help you, and they’re probably not doing it for fame or glory. They want to help you succeed, because they think your work is important. If volunteers take the time to help you with your reporting, acknowledge them in a meaningful way. If you’re writing a feature story, perhaps add a note at the end thanking the people who helped you complete the assignment. Or tag their names on Facebook or Twitter when you share the story upon completion. If for any reason this isn’t possible, you should at least send them a private message thanking them. They may not be expecting the acknowledgment or thanks, but they certainly will appreciate it.
Of course, not all crowdsourcing efforts are initiated from within the newsroom.
Internet sleuths routinely latch onto a news story they find suspicious or interesting in order to solve whatever mystery may be behind it. A recent example came to public attention in late 2018 when someone began circulating a photo they claimed was an Instagram post from the feed of Rep. Alexandria Ocasio-Cortez, showing a woman in a bathtub holding a vape pen. The photo in question was taken by the bather; you couldn’t see her face but her feet were jutting out of the water. Oddly enough, the photo was ultimately debunked by a foot fetishist who compared the toes of the bather to known photos of Rep. Ocasio-Cortez’s feet catalogued on a fetish site called WikiFeet. While this particular case is unusual on a number of levels, it follows a common pattern of publicly-instigated crowdsourcing: something is shared broadly and individuals with subject-matter expertise ultimately solve the mystery.
On other occasions, crowdsourcing efforts initiated by members of the public have had a significant negative impact on a major breaking news story as it unfolded. One of the most infamous cases took place in the wake of the 2013 Boston Marathon Bombing, where volunteers on 4chan and Reddit identified the wrong people as potential suspects. In one infamous thread, participants pointed fingers at a pair of men at the scene wearing backpacks, using racial profiling to place them under scrutiny. Elsewhere on Reddit, users speculated that a recently missing college student was one of the suspects; in reality he had committed suicide well before the bombing took place. Reddit took a bruising from the press in the wake of its mistakes, and it raises an important question: who’s ultimately responsible when crowdsourcing on social platforms goes wrong?
While the vast majority of the blame should fall on the individuals who participated in the conversations, I would argue that the media shoulders some of the blame as well. While it’s easy for journalists to tut-tut at crowdsourcing failures such as these, it’s not like Reddit is some secret society that screens out journalists from participating. In fact, quite the opposite, as news organizations have spent years routinely monitoring Reddit looking for scoops and potential sources. So while they’re ready to exploit any reporting opportunities they discover through Reddit, they seem less inclined to jump into a thread while it’s taking place to nip unethical reporting practices in the bud.
It’s not a difficult to imagine a more constructive scenario: while monitoring Reddit during a breaking news event, a reporter notices participants going down a counterproductive rabbit hole, speculating on perpetrators with questionable evidence. Rather than shaking their heads dismissively and moving on to another thread, they immediately jump in and offer some guidance, whether it’s sharing an existing resource like On The Media’s Breaking News Consumer’s Handbook or noting their own news org’s reporting on the subject. Even better, they offer to help in a more direct way, assisting other users in how to go about investigating a question more journalistically and avoiding potential pitfalls.
This isn’t a particularly radical scenario, but there’s a reason we don’t see it play out very often: not enough news organizations are willing to expend energy debunking rumors or questioning reporting methods on other platforms because they’re solely focused on what’s getting reported through their own platforms. Whenever I’ve asked skeptical editors who are critical of crowdsourcing taking place on Reddit and similar platforms, the response is almost unanimous: it’s not our problem.
Not only are such responses dismissive, I’d argue they’re completely missing the point.
Crowdsourced reporting that takes place outside newsroom are no longer quaint, homegrown initiatives taking place on the far corners of the Internet. They’re everyday public occurrences playing out on platforms with massive followings with the potential to metastasize across the Internet, far ahead of the curve of what’s being reported by mainstream media. Sometimes they’ll get it right, as in the case of the foot fetishist debunking the photo allegedly of Rep. Ocasio-Cortez. Other times, as we saw in the wake of the Boston Marathon bombings, they’ll get it horrifyingly wrong.
News organizations no longer have the luxury of going about their business as large swaths of the Internet openly engage in speculation. In the same way they devote reporting resources to trawling social platforms for leads, they need to devote resources for helping these platforms become more responsible in how they conduct their own reporting. Whether it’s participating in discussion threads to help guide crowdsourcing efforts or establish projects of their own to teach the public better ways of doing it, news orgs have an ongoing role to play in bolstering constructive crowdsourcing on social platforms. They need to help ensure that the public’s enthusiasm for conducting crowdsourced investigations reinforce our collective responsibility for reliable journalism, rather than creating more excuses to become skeptical of it.