polls and news articles during the 2016 USA presidential ...

Post on 10-Jun-2020

2 views 0 download

Transcript of polls and news articles during the 2016 USA presidential ...

The Mass Media bias: Analysing and comparing the time series of

polls and news articles during the 2016 USA presidential election.

Federico Albanese(ffalbanese@gmail.com)

Director: Pablo BalenzuelaCodirector: Viktoriya Semeshenko

Departamento de Física, FCEyN-UBA


1) Does a Mass media influence the society?

2) Does the negative propaganda have a positive or negative effect in a candidate?

3) Is there a bias in the Mass Media?


- 263 polls ( an average of 2.7 polls per day)

- Made by: NBC, New York Times, LA Times, CBS, Fox News, Gravis, ABC, IBD (entre otros)

∆(Clinton - Trump)

Time [month]



ge [%


MediaNew York Times Fox News Breitbart

[2] https://datascience.berkeley.edu/data-media-map-bitly/

- The most republican media, according to a study made at Berkeley University (2013) [2].

An article by A.J.Delgado in Oct. 22 2015

- Fox News is more conservative,whereas Breitbart is exclusively pro-Trump from the very first day.

[1] Google Trends in the USA between the most important newspapers

- Most consume and most google newspaper in the USA [1].

First look into the data

Clinton Trump

Number of mentions per article in the New York Times

First look into the data

Clinton Trump

Number of mentions per article in the New York Times

Clinton was mention less than 5 times in most of the articles. In contrast, Trump was mention more than 80 times in some articles.

Sentiment AnalysisStandford NLP: The algorithm makes a binary tree from each sentence taking into account the semantic composition.

(There are slow and repetitive parts, but it has just enough spice to keep it interesting )

Going from the children to the root, a sentiment value (positive, negative or neutral) is assigned for each node

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., & Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1631-1642).

Sentiment AnalysisTime Series:

(1) Republican National Convention(2) First Debate(3) Election Day







ber o

f fra


# positive frases

# neutral frases

# negative frases

# total frases

















ber o

f fra


Linear CorrelationLinear Correlation with a 14 days lag

Coeficient p-value Coeficient p-value Coeficient p-value

Clinton’s positive mentions 0.485 3.43e-6 -0.213 0.05 0.060 0.590

Clinton’s negative mentions 0.394 2.24e-4 -0.682 1.29e-12 -0.319 0.3

Clinton’s total mentions 0.453 1.70e-5 -0.616 5.54e-10 -0.174 0.116

Trump’s positive mentions 0.554 5.64e-8 -0.395 2.20e-4 0.160 0.149

Trump’s negative mentions 0.476 5.39e-6 -0.470 7.54e-6 -0.021 0.853

Trump’s total mentions 0.518 5.31e-7 -0.437 3.62e-5 0.082 0.460

- The more phrases published by the New York Times, bigger the difference in favor of Clinton.

- The more phrases published by Fox News, Trump goes up in the polls and smoller is the difference.

Difference in the polls

Linear CorrelationLinear Correlation with a 14 days lag

Coeficient p-value Coeficient p-value Coeficient p-value

Clinton’s positive mentions 0.485 3.43e-6 -0.213 0.05 0.060 0.590

Clinton’s negative mentions 0.394 2.24e-4 -0.682 1.29e-12 -0.319 0.3

Clinton’s total mentions 0.453 1.70e-5 -0.616 5.54e-10 -0.174 0.116

Trump’s positive mentions 0.554 5.64e-8 -0.395 2.20e-4 0.160 0.149

Trump’s negative mentions 0.476 5.39e-6 -0.470 7.54e-6 -0.021 0.853

Trump’s total mentions 0.518 5.31e-7 -0.437 3.62e-5 0.082 0.460

Mutual Information of the symbolize time series

where Xi and Yj are two random variables and “n” and “m” are the number of possible values for X and Y. The value of MI goes from 0 (no mutual information) and 1 (perfect relation between the variables).

Mutual Information (MI) measures the dependency between two time series:

- The permutation test was used in order to measure the significance of the statistics results [1].

- A symbolization of all the time series was made for this analysis [2]:

[1] François, D., Wertz, V., & Verleysen, M. (2006, April). The permutation test for feature selection by mutual information. In ESANN (pp. 239-244).[2] Bandt, C., & Pompe, B. (2002). Permutation entropy: a natural complexity measure for time series. Physical review letters, 88(17), 174102.

Mutual Information of the symbolize time series


Polls of Hillary Clinton

Hillary Clinton


It was observed how the sentiment of the frases is important and it is related to the time series of the polls.

Topic Detection:Dimensionality reduction

Topic Detection

Ramos, J. (2003, December). Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning (Vol. 242, pp. 133-142).Xu, W., Liu, X., & Gong, Y. (2003, July). Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 267-273). ACM.

Advantages: - Vectors have positive components (easy interpretation)

- Orthogonality is not imposeDisadvantages: - The # of topics is an input, not an output of the algorithm.

Dimensionality reduction:NMF is an algorithm where a matrix V is factorized into two matrices W and H (M ≈ H*W ), with the property that all three matrices have no negative elements.

How could you mathematically represent a document?

- Vectors

V = [ ... , TF(t)*IDF(t) , … ] -> dim = # words


where N is the # of documents and nt the # of documents in which the word t appears.

Combining all the vectors of all the documents, we have a matrix M

Non Negative Matrix Factorization (NMF)

ECONOMY Social Issues: Immigration

Detección de tópicos para cada medio por separado

Social Issues(Immigration and racism)


week review

Clinton’s and Trump’s scandals


Foreign affairs



Clinton’s email scandal

Social issues(immigration)


Foreign affairs

Clinton foundation scandals


Social issues (racism)

FBI investigation of the Clinton’s emails

third party

Clinton foundation scandals

Social issues(immigration)

Clinton’s email scandal


<< ffalbanese@gmail.com >>