Combating Fake News with Data Mining
By Erik Kleinsmith, Associate Vice President, Public Sector Outreach, American Military University
Editor’s Note: This article first appeared on In Homeland Security.
Fake news and its more accurate name, propaganda, can be found in almost every part of our lives. From our social media feeds to unscrupulous media sources, it seems that propaganda and other forms of war waged against information have grown into a commercially viable industry able to cash in on the gullible. As a secondary effect, consumers of information are growing more and more distrustful of sources they used to depend upon. Even a budding counter industry of fact-checking has already been tainted by the biases of the checkers.
Despite such a dismal take on what has happened to the integrity of our news and information, analysts armed with both critical thinking and data mining tools can take advantage of this environment. Using a few simple techniques and the capability to harvest and visualize many sources of information in a short amount of time—analysts can identify the fake stories themselves as well as the primary sources who publish them. Quite possibly, they can also help to identify the original liar or stooge as the source.
As stated, the key to identifying fake news is by using effective data mining and visualization tools. Unlike the human analyst, data mining tools have the benefit of not carrying any emotional bias into their work. They don’t have a vested interest in the subject matter, they aren’t triggered by political correctness, and they don’t care about the feelings of their users. They only return data and information based upon clearly written search criteria using a programmed algorithm. If there is a bias present in data mining searches, it’s because the programmer inserted their own bias into their code in an attempt to censor or hide phrases or other information they don’t like.
Data mining tools will always have a certain amount of weaknesses in terms of determining the context of information – like understanding the difference between Clinton the president, the senator, the town in Maryland, the funk musician, or the 18th century British general. While quality of information is not a strong suite of data mining, quantity is. Harvesting and visualizing dozens if not hundreds of stories on a particular topic can help to identify fake news as trends and patterns that a manual reader wouldn’t normally see.
One technique is in identifying how specific talking points and themes propagate from news source to news source. As many of these talking points originate from a single source, they are then picked up by others who agree with it and want to reverberate it. Some of these sources are outright paid or required to push it forward by interest groups or state-run media, respectively. In doing so, the same phrases and keywords associated with that talking point are published over-and-over in many different sources, both online and in broadcast news. A clever analyst, knowing what to look for, will be able to identify these specific terms. Using harvesting and visualization tools, they can do this faster and more efficiently.
As an example of what this technique could look like, the following chart resembles a similar product created by my top analysts when we assigned to what was perhaps the first instance of using data mining for the U.S. military—within the Army’s newly built Information Dominance Center as part of the U.S. Army Intelligence and Security Command (INSCOM).
One of the requests we received from a Combatant Command was to monitor, identify and assess the effectiveness of “Anti-US propaganda” themes prior to a national election of an allied country. The majority of these themes were coming from their more adversarial and propaganda-savvy neighbor.
The chart below is a notional recreation of what we were able to come up using our tools.
In the example shown here, analysts were able to identify five primary “Anti-US” themes. Three of which were prominent before the election. The remaining two ramped up afterwards. Each theme could be shown when it started, when it peaked, and when it waned with an additional ability to tie these changes to certain events. In presenting charts like these, along with more detailed information about each theme, our analysts were able to provide several things to the supported command.
First, we were able to give them an idea of the types of themes they would need to combat using their own methods in order to keep attitudes toward the U.S. presence in that county positive. We were also able to backtrack each identified theme in order to establish the primary source. This gave our customers leads on which organization they would need to study more intently, and if required, counter other messages from them as propaganda. Finally, as a template, we were able to give them the playbook of “Anti-US” propaganda planners for the next election, identifying which themes would be coming out at which times prior to and after the election. This final area was the most important because it gave our supported command the ability to preempt the propaganda before it was even created.
The possibilities of using this and other analytic techniques by means of data mining tools to combat fake news and to shine a light on those who create it are enormous in many different arenas. From national security to political attacks, analyzing patterns and trends within both mainstream and fledgling media sources will help analysts not only evaluate sources for reliability and bias, but will also help them identify which sources are trying to persuade instead of inform.
About the Author: Erik Kleinsmith is the Associate Vice President for Business Development in Intelligence, National & Homeland Security, and Cyber for American Military University. He is a former Army Intelligence Officer and the former portfolio manager for Intelligence & Security Training at Lockheed Martin. Erik is one of the subjects of a book entitled The Watchers by Shane Harris, which covered his work on a program called Able Danger tracking Al-Qaeda prior to 9/11. He currently resides in Virginia with his wife and two children.