Fake News

Data story on fake news for the class CS401 - Applied Data Analysis at EPFL.

Getting started

Lately, the world has witnessed a growing epidemic of fake news. In the current era of social media and instant connectivity, the impact of fake news is greater than ever. Not only does it pose a threat to the integrity of journalism, it also creates disturbances and uncertainties in the political world. What's alarming is when people believe what they read, and act upon it. For these reasons, the spotlight needs to be set on fake news. The LIAR dataset for fake news detection, created by William Yang Wang at the University of California in Santa Barbara, will be used in this data story to study the world of fake news in the political scene of America. Let's dive in, shall we?

The liar dataset

As defined by its author, the LIAR dataset is a "new benchmark dataset for fake news detection". It consists of almost 13'000 short statements from various contexts made between 2007 and 2016. The statements have been manually labeled for truthfulness, topic, context, speaker, state, and party and are well distributed over these different features. To illustrate that, the table to the right shows a sample statement from Donald Trump along with its features.


Speaker Donald Trump
Statement The last quarter, it was just announced, our gross domestic product was below zero. Who ever heard of this? It's never below zero.
Context Presidential announcement speech
Justification According to the Bureau of Economic Analysis and National Bureau of Economic Research, the growth in the gross domestic product has been below zero 42 times over 68 years. That's a lot more than “never.” We rate this claim Pants on Fire!
Label Pants on Fire

Warming up

To warm up and get more familiar with the data, we will start by vizualizing how the values of the statements are distributed. For this purpose, the six pie charts below show how the statements are caracterized. For instance, who made it, what was it about and how truthful was it?

If you wish to know more about the dataset before moving on to the analysis, the link below will lead to an interactive widget which shows the number of statements for all labels individually and for any specified combination of features.

A long time ago in a galaxy far, far away...

Before rushing like fools and assaulting the dataset with a plethora of fancy data analysis tools, let’s just take a step back and travel back in time throughout the whole period during which the statements were made. As it was mentioned in the introduction, this period ranges from 2007 up until 2016. While this is nowhere near being some obscure ancient history subject which we studied at school years ago, a quick refresher will surely be a sensible thing to do in order to set everything in context. Our time travel machine will be the word cloud map of the USA on the right. Not the fanciest machine but, eh, it’s taking us where we want, right? Unlike politicians… Without further ado, let’s hop on the word cloud and go !

Some of the words that pop out the most are Barack Obama and health care. It’s no wonder since Obama was president during most of the period 2007-2016 and health care was one of his greatest fights. During his two terms in the oval office, Barack Obama, being at the head of the country, was the target of much criticism, both positive and negative, and was in this sense a hot topic by himself. Specifically in the domain of health care, as he managed to set in place what was called ObamaCare, a health care reform whose goal was to enlarge the population of Americans covered by insurance. As this was closely related to taxes and to the debate of a strong vs weak government, it was a highly controversial subject and caused a lot of ink to flow.

After looking at the big words, one might be wondering where two names, omnipresent in the news during the past three years, are hidden on the map: Donald Trump (below American) and Hillary Clinton (below year). Indeed, they startle us here because of their relatively small size with respect to some other words. Nevertheless, the explanation for that is actually obvious: the time span of the statements only includes part of the campaign for the 2016 presidential elections and none of the events that took place with Trump in the White House! The reader should bear this in mind throughout the data story, as the data analysis might point out aspects that does not align with his expectations, which are naturally based on his knowledge of what came after 2016.






Finally, we'll focus on some of the smaller words. First of all, it must be reminded that, in 2008, one of the biggest financial crisis ever initiated in the U.S.. It plunged the world into a global recession and was the most important topic for some time. This is illustrated by numerous words like job, money, taxes, unemployment, debt and so on. Secondly, as the statements are all about american politics, we can find some of their favorite topics, such as gun or abortion. Third, we can find some less prominent american politicians like (Mitt Romney (who ran for president against Barack Obama in 2012), Rick Perry (former Governor of Texas) and Scott Walker (Governor of Wisconsin)).

Democrats & Republicans

the everlasting love story

The Venn diagram shows the twenty most frequently discussed subjects for Democrats (blue) and Republicans (red). Most subjects are of shared interest (grey).

Most important topics for the parties

The Venn diagram compares the twenty subjects most frequently addressed by Democrats and Republicans respectively. Naturally, most subjects are of high interest to both parties, but there are some distinguishments. The military, terrorism, and workers appears to be of higher importance to the Democrats, while abortions, job accomplishments, and criminal justice are more often discussed by the Republicans. Some of these subjects represent quite controversial issues, where the Democrats and Republicans take different stands. The following are some broadly generalized opinions, and it should be noted that there are naturally many politicians within each party who have different and more nuanced positions on these issues.

While Republicans prefer increasing military spending, and have a higher tendency to employ the military option, Democrats prefer lower increases in military spending, and are comparatively more reluctant to use military force.

In general, Democrats support abortion rights and want to keep elective criminal justice legal. Republicans, on the other hand, believe abortions should not be legal.

Whereas Democrats favor increase in the minimum wage to help workers, Republicans oppose raising the minimum wage because it would hurt businesses.

Bokeh Plot

The graph shows the twenty politicians within the Democrats and Republicans with the highest number of 'pants-fire' and 'false' statements. By clicking on the node associated to each name, you can highlight the interactions with this speaker. Whenever the same color spans out in different directions, these are lies originating with the speaker. All other colors represent lies where the speaker is the target.

The main players of the game

We want to take a closer look at politicians with the highest number of pants-on-fire and false statements. The circle-shaped graph shows the relationship between them. Each arc represent a lie made by one speaker (the source) about another speaker (the target). The four main players are Barack Obama (Democrat), Donald Trump (Republican), Hillary Clinton (Democrat), and Mitt Romney (Republican). These are names we know well. Barack Obama, former President of the U.S., ran against Mitt Romney in the 2012 presidential election; Donald Trump, current President of the U.S., ran against Hillary Clinton in the 2016 presidential election.

It's interesting to observe that the majority of Donald Trumps' lies is directed towards Hillary Clinton, his opposing candidate, and that there are more than twice as many lies going in this direction, than the other way around. However, it should be noted that the majority of Clinton's lies, although less in numbers, are also directed towards her opposing candidate, Trump. This is perhaps to be expected, as politicians are known to demonize their opponents, especially with upcoming elections. The same pattern is observable with regard to Mitt Romney; most of his lies are discussing Barack Obama. Obama's lies, however, are more evenly distributed, and compared to how many lies he's the target of, he hasn't really contributed with that many himself. It's natural that he's the subject of most of the lies, as he was the sitting President for almost the entire time frame of the statements.

The network at the top shows the lies between Democrats and Republicans. The blue network down to the left shows lies with the Democratic party, and the red network down to the right shows the lies within the Republican party.

Politics: the kindergarten for old people

To top up our comparison between Democrats and Republicans, we will look at the graphs on the left. The one at the top shows the interactions, more specifically the lies, between the parties and the two at the bottom do the same but within each party. Even if it is somewhat predictable, it is striking how many arrows are going between the Democrats and the Republicans while there is almost nothing happening inside of each party. This seems to suggest that politics is more about childish finger-pointing of the opponents rather than meaningful and educated debates.

Wrapping up

Even though these statements were produced before 2016, fake news are no less relevant today. President Donald Trump has already shown he's prone to lies, and there seems to be little to no evidence of him turning honest since he took up residence in the White House in 2016. Fake news became a global subject and was widely introduced as a subject mainly due to the 2016 U.S. presidential election. Numerous political commentators and journalists wrote and stated in media that 2016 was the year of fake news and as a result politics would never be the same.

Trump himself claims that the mainstream American media regularly reports fake news. He has carried on a war against the mainstream media, often attacking it as "fake news" and the "enemy of the people". According to university professor Jeff Hemsley, who studies social media, Trump uses the term "fake news" for any news that is not favorable to him, or which he simply dislikes. On May 9, 2018, Trump wrote a tweet on Twitter:


Chris Cillizza, an American politcal commentator for CNN, described the tweet as an "accidental revelation about Trump's "'fake news' attacks'", and wrote: "The point can be summed up in these two words from Trump: 'negative (Fake)'. To Trump, those words mean the same thing. Negative news coverage is fake news. [...]"

So when reading and listening to politicians, you might want to keep in mind what you've just read, and take what they say with a pinch of salt.

Authors

Mimmi Gjems

Martin Vold

Benjamin Rahm