ars.els-cdn.com · Web viewWeb Appendix. This Appendix contains details on (A) the brand sample...

Web Appendix

This Appendix contains details on (A) the brand sample composition, (B) the sentiment analysis,

(C) YouGov data, (D) Model specifications, results on (E) brand and (F) industry level and (G)

forecasting.

Web Appendix A: Brand Sample Composition

To enable empirical generalizations across different types of products and brands, we selected 8

different industries from YouGov’s consumer and technology sectors. By analyzing Airlines,

Banking, Beverages, Foods, Cars, Consumer Electronics, Fashion, and Gastronomy, we include

different levels of involvement, expertise, buying frequencies and social media engagement

across our sample.

For each industry, we included both a high and low-end volume leader and several niche players.

Further, we aimed to achieve a similar brand awareness and impression composition within the

industry groups as displayed in Table A1. To match companies, we used the average YouGov

metrics for our observation period. The data period is identical for all brands: th e 586 days

between November 12th, 2012 and June 20th, 2014.

1

Table A1: Within Industry Sample Matching

Brand Awareness Impression Average Number of Fans

Number of Comments

American Airlines 91.55 13.05 997326 310067Frontier 55.00 3.39 86648 40792JennAir 62.50 23.34 160076 7033Jet-Blue 80.05 11.04 726142 72209Lufthansa 61.14 12.62 60274 236543Qantas 50.70 10.74 31422 123551Singapore Airlines 33.90 4.36 37776 33777Southwest Airlines 89.61 41.68 3786358 380931Fidelity 76.11 12.47 112421 128685th3rd 33.90 -2.15 130772 8660Huntington Bank 18.33 -1.83 9408 11620JPMorgan 86.63 -12.02 120340 3794Liberty Mutual 18.63 -0.02 9497 2844PNC Bank 81.76 6.64 1702367 74365RaboBank 45.69 -0.35 209292 19018US Bank 3.98 -2.16 573 32428Dos Equis 62.79 11.20 2846756 216537Hennessey 70.07 7.93 1381774 36289Jack Daniels 93.75 27.89 2357122 312358Jameson 49.05 11.70 522849 68494Smirnoff

91.44 20.22 2159448 197037Audi 91.67 20.92 764517 260013BMW 93.98 32.55 1054739 335931Ford 95.80 41.41 270638 123520Kia 92.31 -0.44 46769 11682Lexus 93.08 28.96 1209543 183869Subaru 92.14 20.90 1492191 152959

2

Volkswagen 93.69 26.75 797270 256394Volvo 92.59 23.69 71247 80659Apple 95.17 44.65 349038 82419Lenovo 48.21 7.38 332131 115804Samsung 94.96 47.43 18670111 87219SanDisk 57.42 20.66 243583 6230Sony 95.36 59.77 1648076 202015Abercrombie & Fitch 82.74 -8.11 3085234 204297Aeropostale 63.41 5.44 5408882 316914NineWest 51.66 15.14 694069 17508NorthFace 52.23 24.00 2320379 46704KettleBrand 74.13 26.77 1162493 44785JosABank 63.80 17.95 375013 15220Nestea 90.61 26.29 174783 10957Pepsi 96.19 25.66 6217911 65391Tostitos 90.18 38.95 406750 21855Burger King 96.31 24.25 3009032 13140Donato’s 11.44 1.52 1 51809KonaGrill 13.95 1.74 29511 4713McDonalds 96.84 19.04 9222254 3990Starbucks 60.24 -3.42 16219222 95939

3

For each brand, we used Facebook’s Graph API to identify the official US Facebook site. We

then used the API to extract all posts from the Facebook site (including company and customer

posts). For each post, we then used the specific post ID to re-crawl all related comments. This

procedure ensures that we extract all publicly available user generated content from Facebook.

Finally, we used the API to collect the number of likes and shares each post received.

It is important to note that Facebook provides only the time stamp for comments on posts

but does not provide such time stamp for likes and shares. Thus, we can only use comments in

our time series analysis. Table A1 further gives an overview over the number of comments and

user posts extracted per brand. Differences across brands abound, driven largely by factors such

as the number of Facebook followers and their engagement, how long the company has had the

Facebook page etc.

To ensure that we only analyze user-generated content, we first cleaned the data sets from

company posts and comments. To do so we used again the API to find the companies’ Facebook

IDs and then cleaned the textual data from all comments and posts coming from these IDs. The

resulting data then entered the next stage: sentiment analysis.

4

Web Appendix B: Sentiment Analysis

Dictionary Based Sentiment Analysis

For the linguistic, dictionary based approach, we use the Linguistic Inquiry Word Count

Software from Pennebaker et al. (2015). The software does not only belong to the most widely

used text analysis tools, but does also provide researchers with reliable and valid dictionaries for

measuring a mixture of sentiments within textual data. The software is easily accessible and

affordable to researchers and practitioners alike. The main LIWC dictionary has been developed

for over 20 years (Pennebaker 1993) and is based on the linguistic analysis of over 181,000 text

files containing more than 231 million words. The developer used these documents to infer

words, word stems and symbols, which are related to various human feelings and expressions.

The main dictionary now consists of over 6400 entries related to over 30 categories, including

also positive and negative emotions. The positive emotions category consists of 620 words such

as happy, smile, love and sweet. The negative emotions category features in total 744 words

including words such as hurt, ugly, nasty and fearful. Thus, LIWC offers a very wide dictionary

allowing to measure very different concepts. For example, emotions that are distinguished on the

two established dimensions of valence and arousal (Bradley et al. 1992). Words such as furious

express anger (negative valence, high arousal), depressed express sadness (negative valence, low

arousal), supercool express excitement (positive valence, high arousal) and fine and okay express

contentment (positive valence, low arousal). The scientific community so far developed a wide

array of additional dictionaries in different languages and for different purposes such as for e.g.

measuring stated morality in texts (Haidt and Joseph 2004). LIWC was successfully applied in

marketing research (see e.g. Rooderkerk and Pauwels 2016).

5

To determine a sentence’s total sentiment LIWC counts the number of total words in a

text document and the number of words related to positive and negative emotions. It then

calculates a relative score by dividing the number of positive words over the number of total

words. For example, the sentence “Coke offers the best soft drink on the market”, leads to the

following result. The total number of words is nine. “Best” is the only word associated with

positive emotions, so the positive emotion score becomes one divided by nine, equaling 0.11.

There is no word associated with negative emotions, so the negative score is 0. To come to a

final sentiment variable, we subtract the positive score from the negative score, ending up with a

Pos-Neg Score of .011.

Machine Learning Based Sentiment Analysis

Based on the findings of Gamon (2004) and Agarwal et al. (2011), we chose Support Vector

Machines (SVM) as the focal point for our supervised machine learning based sentiment

analysis. SVMs are a specific form of supervised Machine Learning and commonly used for

classification purposes (Kübler, Wieringa, and Pauwels 2017). To classify data into pre-set

categories, SVMs aim at maximizing the space between the classified groups. To do so, the SVM

spans a plane between the two groups, limited by support-vectors that define the outer boarders

on the edges of the (two) to different classes (in our case positive and negative sentiments). The

algorithm tries to identify the support vectors that maximize the plane.

As all forms of supervised machine learning algorithms, SVM need a set of training data

to infer the classification rules for new data. Machine learning has the advantage that it can rely

on specific training data that should be tailored to the analysis. However, this implies that

machine learning based models suffer from a classic Garbage In-Garbage out (GIGO) issue. If

6

the training data is not suitable or biased, this bias will spillover on the classification routine and

lead to biased results. Thus, obtaining suitable and reliable training data is key for machine

learning based sentiment extraction. Machine learning research has therefore invested significant

time and resources to identify suitable training data sources. The main idea is that the training

data should be as similar as possible to the data that needs to be classified. In terms of text

analysis, this means that longer texts also require longer training texts, preferably from the same

author. Classification of short texts –such as from social media – similarly need shorter text corpi

for training, preferably from the same social media source (Pak and Paroubek 2010). Beside

source and style similarity, researchers need to trade-off between computational power and

training data size. More training data ensures better classification results. The more text samples

are present in the training data, the more likely it is to observe rare combinations that leading to

better precision. However, larger training sets have two downsides: they consume significantly

larger computational power, and are harder to obtain.

One approach could be to slice a part of the crawled data and to use human coders, to

classify text and then to use this subset for training purposes. Given the existence of service

platforms such as M-Turk it becomes feasible to pre-classify textual data into desired categories

(like positive vs. negative sentiment or complain vs. praise, etc.). Still human coding is

expensive, takes time and is rather inflexible. Additionally, it bares the risk that coders could be

biased by instructions and overlook or over-accentuate patterns in the data. Therefore, such an

approach is especially useful in case of smaller datasets, which only require low amounts of

training data. In our case however, we face a fast amount of data from different sources and

different contexts. We would thus need multiple human coders to classify more than 200,000

7

comments and posts. Given that a single coder would not be able to achieve this, inter-coder-

reliability would become an issue.

Another approach is the use of publicly available classified sentiment data from public

online sources such as Twitter (e.g. Culotta and Cutler 2016; Toubia and Stephen 2013).

Different studies used emoticons and emojis within tweets to infer the sentiment of the related

tweets. A mixture of such datasets is available for training purposes (Naji 2013). Despite being

large, these sets however are all unrelated to our companies, industries, products and

consumption settings. Another key problem here is how to determine whether the obtained data

is positive or negative. Naji (2013) use emoticons in tweets (like and ) to infer whether a

tweet is positive or negative. Collecting more than 1.5 million tweets he successfully trains a

NB-classifiers and is able to correctly classify non-classified tweets as positive or negative with a

hit-rate of more than 75%. Mcauley and Yang (2016) use Amazon Product Reviews to build

training sets. Online reviews commonly do not only feature textual data but also scale-rating

(commonly from 1 to 5, with 1 expressing very bad and 5 expressing very good). By excluding

all reviews with ratings other than 1 and 5, they also construct a reliable and very powerful

training set. A key advantage of rating based training data is that it can be adapted to product

categories by only including ratings from a specific category. However, such large scale social

media data for training purposes brings internal validity issues, as it is unclear if the training data

can really capture sentiments specific to the our data context. Having imprecise training data

could in the worst-case lead to miss-classifications, significantly biasing our insights.

To achieve a sufficient level of reliability and validity we thus use a mixed approach. We

use publicly available training data, obtained from Amazon online reviews available from

(McAuley, Pandey, and Leskovec 2015). For each of our product category, we use reviews from

8

a similar Amazon category. This approach ensures that we have a category specific fit for each

of our 7 product categories. For airlines and banking we use data from the web’s leading airline

rating website (Skytrax) and one of the leading rating websites for US banks and financial

services (depositaccounts.com). To only obtain unambiguously positive and negative reviews,

we only use reviews that come with a very low (1star) or very high (5star) rating. Table A2

shows for each the category the source and size of the training data for each of the 7 categories.

To apply SVMs to textual data, one needs to transform the data into a readable format.

Most commonly researchers use Term Frequency Matrices (TFM) to achieve this. TFMs are

sparse matrices that have a column for each word occurring at least once in one of the text

documents used for analysis. The rows of the TFM represent each document in the dataset.

Dummy codes are then used to indicate whether a specific word occurs in the respective

document/row. Each textual document within the data set can then be expressed as a

combination of word dummies.

We followed the standardized procedure of text preparation described in Feinerer (2017)

to prepare for each training set the necessary TFM. To reduce the number of columns and to safe

computational power, we first apply R’s TM package stop word list to drop all unnecessary

words from the document (such as e.g. the, and, or, etc.). We then further stem the remaining

words to their core form to further minimize TFM length. We follow Mullen and Collier (2004)

who show that linear Kernel based SVM are best suitable for sentiment analysis.

For the training and sentiment prediction we use R’s Rtexttool package that combines

different machine learning packages and text processing packages in R. The SVM is trained with

the standard settings for linear SVMs as documented in Jurka and Collingwood (2015). We use

10% of our training material to test the power of our SVM with a holdout approach. These 10%

9

are not included in the training set, but used for classification. Table A2 displays the holdout-hit-

rates for each category’s SVM. Hitrates vary from 83% to 92% indicating generally good fit of

our SVM.

To perform the sentiment analysis on our data, we apply the same data preparation

approach to our social media data (stop word dropping, word stemming, TFM) before we use the

trained SVMs to classify texts into positive and negative sentiments. The Rtexttool package

further delivers a classification likelihood ranging from 0 to 100% for each classified document.

Following Jurka et al. (2013) we only assign a text to a category if the classification likelihood is

above 75%. Otherwise we classify the text to be neutral, not belonging to either of the categories.

To finally asses the quality of our classification, we randomly drew 500 positive,

negative and neutral comments from each category. Human coders, unrelated to the project were

then asked to control the classification. Table A2 reports these human post-classification

confirmation checks. Again hit rates – ranging from 87% to 96% (with intercoder reliability

above 89%) - indicate a very good fit.

Table A2: Human post-classification confirmation checks.

Training Data Source

Training Data Size Prediction Hit Rate (Holdout)

Human Confirmation Rate (Post-Estimation)

Airlines Skytrax 12,000 85% 89%Banking & Finance

depositaccounts.com 8,000 83% 87%

Beverages Amazon Food & Beverages Reviews

20,000 87% 91%

Electronics Amazon Electronics Reviews

20,000 92% 96%

Fashion Amazon Fashion Reviews

20,000 85% 90%

Food Amazon Food & Beverages Reviews

20,000 89% 93%

Gastronomy Amazon Food & Beverages Reviews

20,000 87% 94%

10

Web Appendix C: Details on the YouGov data

YouGov’s Brandindex is a daily measure of consumer mind-set metrics among the public,

tracking many brand across multiple consumer sectors simultaneously. For the US market,

YouGOv surveys 5,000 randomly selected consumers (from a panel of 5 million) on a daily

basis. Table A3 provides details on the exact question on each item.

The data collection of YouGov can be described as follows: For each item minimum of

100 respondents per day are randomly drawn from the panel and provided with a set of up to 30

brands for a pre-selected industry. To reduce common method bias respondents, evaluate only

one brand item per industry per enquiry. First, for Customer Satisfaction measure, respondents

select those brands (per click) for which they agree with the positive statement of the brand item

(e.g. satisfied with this brand). Then, they select those brands for which they agree with the

negative statement of the brand item (e.g. not satisfied with the brand). The aggregate brand

measure (e.g. Customer Satisfaction) is calculated by counting the number of respondents who

agree with the negative statement subtracted by the number of respondents who agree with the

positive statement divided by the total number of respondents (= [number of positive-number of

negative]/ [number of positive+negative+neutral] multiplied by 100. As a consequence, the

Customer Satisfaction in YouGov Brandindex measure a ratio-scaled variable that lies within the

range of -100 to +100. Similarly, for the awareness, impression, purchase intent and

recommendation metrics, the respondents are prompted with the respective questions. Again,

respondents can answer yes or no to these questions implying that similarly to Customer

Satisfaction, these metrics measure a ratio-scaled variable that lies within the range of -100 to

+100.

11

The collection procedure yields about 5000 daily responses across all the brand items. To

ensure representativeness individual sampling weights are applied to correct for variations in the

probability selection of respondents. Although panellists might be re-invited after a period of two

weeks, they will be blocked for the respective sector and brand item they have answered before

for a period of at least two months. This is important to eliminate repeated measurement as a

source for demand effects and serial correlation in brand perceptions. Brand competition effects

are also controlled for because respondents rate the competing brands within one sector

simultaneously.

Table A3Consumer Mind-set Metrics

(Questionnaire YouGov Brandindex)

Consumer Journey Stage Question

AwarenessWhich of the following brands have you *ever* heard of?

Please select all that apply.

Impression

a. Overall, of which of the following brands do you have a

POSITIVE impression?

b. Now which of the following brands do you have an overall

NEGATIVE impression?

Purchase Intent From which of these would you be most likely to purchase?

Satisfaction

a. Of which of the following brands would you say that you

are a "satisfied customer"?

b. Of which of the following brands would you say that you

are a "dissatisfied customer"?

Recommendation

a. Which of the following brands would you RECOMMEND

to a friend or colleague?

b. And which of the following brands would you tell a friend

or colleague to AVOID?

12

Web Appendix D: VAR models specifications

In this part of the Web Appendix, we provide more details on the VAR models estimated in the

study. Our analysis consists of several methodological steps (see Table A4) which we apply to

each brand separately (e.g. Pauwels and Hanssens 2007).

Table A4

Analysis steps in the Vector Autoregression Modeling Approach

Methodological Step Relevant Literature Research Question1. Tests

Granger causality test (Granger 1969)What is the temporal causality among

variables?

Augmented Dickey-Fuller Test (Enders 2014) Are variables stationary or evolving?

KPSS test*(Maddala and Kim

1998)

Are the results robust to null

hypothesis?

Cointegration test(Johansen, Mosconi,

and Nielsen 2000)

Are evolving variables in long-term

equilibrium?

2. Model of Dynamic

Interactions

Vector autoregressive (VAR)

model

(Dekimpe and

Hanssens 1999)

How do SETs and consumer mind-set

metrics interact in the long run

accounting for the unit roots and

cointegration?

3. Forecast error variance

decomposition (FEVD)

(Nijs, Srinivasan, and

Pauwels 2007)

What fraction of performance variance

comes from each SETs?Note: *Null hypothesis: Series are stationary

13

Unit roots

A mean-reverting trends exhibits stationarity and a trend that changes permanently exhibits

evolution. Following Enders (2014) we use the augmented Dickey–Fuller test (ADF) with

evolution as the null hypothesis. We complement the ADF test with the KPSS test (stationarity

as the null hypothesis) proposed by Kwiatkowski et al. (1992). Each test is estimated in two

forms: with and without a deterministic trend. Ideally they should converge in their results

(Maddala and Kim 1998). In addition, we also conduct panel unit root tests (see Table A5). We

find that all variables are stationary and enter the system in levels.

Table A5

Panel Unit Roots

Panel and individual unit root tests Levin, Lin and Chu(no intercept, no trend)

Breitung(individual intercept and trend)

ADF-Fisher(no intercept, no trend)

Null Hypothesis Common Common Individual

VolumeLikes .000 .000 .000Comments .000 .000 .000Shares .000 .000 .000

SVMPositive .000 .000 .000Negative .000 .000 .000Neutral .000 .000 .000

LIWCPositive .000 .000 .000Negative .000 .000 .000

SVM DispersionPositive .000 .000 .000Negative .000 .000 .000

Consumer Mindset Metrics

Awareness .000 .000 .000Impression .000 .000 .000PurchaseIntent

.000 .000 .000

Recommendation .000 .000 .000Satisfaction .000 .000 .000

14

Lag Selection

The optimal lag order (“n”) is chosen by Akaike Information Criterion (AIC) taking into

account the Breusch–Godfrey serial correlation Lagrange multiplier test (Breusch 1978). Our

goal is to balance lag-selection criteria with auto-correlation bias (Slotegraaf and Pauwels 2008).

We first select the appropriate lag based on AIC, estimate the model and check whether we

should add lags to pass diagnostic tests on residual autocorrelation (Franses 2005). We add lags

until we have no serial auto-correlation in the model. Overall, we select the seventh lag model for

each SETs-mindset metrics model for each brand.

Separate VAR models

We estimate the VAR models for each brand separately for two reasons. First, we want to

isolate time series support or refutation for our arguments and resulting recommendations. For

instance, brand A may enjoy a large number of positive comments and a higher Purchase Intent

than brand B, but that does not mean that brand B can increase its mindset metrics by increasing

its positive comments. Estimating the model for each brand allows us to both show for how

many brands a relationship holds, and give brand-specific advice (based on a second-stage

analysis). Second, brand-specific estimation allows easier comparison with previous papers that

demonstrated the effect of social media on consumer mindset metrics (Colicev et al. 2017) and of

effectiveness of social media (Luo, Zhang, and Duan 2013; Tirunillai and Tellis 2012).

Control Variables

We control for advertising as previous research has a wide evidence that advertising has an

impact on customer acquisition (Trusov, Bucklin, and Pauwels 2009) and mindset metrics (Colicev et

al. 2016; Srinivasan, Vanhuele, and Pauwels 2010). We include the measure from YouGov “advertising

awareness” that proxies for advertising expenditures.

15

Model Estimation

We estimate five main models, for each SETs, and four models that combines different

SETs in the same model. Specifically, in Models 1-5 we estimate how the Volume metrics (likes,

comments, shares), SVM negative and positive, LIWC negative and positive, SVM positive and

negative dispersion and SVM positive, negative and neutral comments, respectively explain the

five consumer mindset metrics (Awareness, Impression, Purchase, Satisfaction and

Recommendation) (see Equation 1-5). Next, we test how different combinations of SETs can

explain the consumer mindset metrics in Models 6-9. In particular, Model 6 combines Volume

and SVM Neutral, Model 7 combines Volume and LIWC, Model 8 combines LIWC and SVM

Neutral and Model 9 combines Volume, LIWC and SVM Neutral. Finally, we from the above

models we select the most performant individual metrics and combine them in Model 10.

Model Equations 1-10: Ten (10) Models of SETs and Consumer Mindset Metrics

Model 1: Volume

[Likest

Commentst

Sharest

Awarenesst

Impression t

Purchase Intent t

Satisfactiont

Recommendationt

]=∑n=1

p [γ1,1n . γ1,8

n

. . .γ8 ,1

n . γ8 ,8n ] [

Likest-n

Commentst-n

Shares t-n

Awarenesst-n

Impressiont-n

Purchase Intent t-n

Satisfaction t-n

Recommendationt-n

]+[φ1,1 . φ1,2

. . .φ8 ,1 . φ8 ,2

] [x1

x2]+ [e1t

e2t

e3t

e4t

e5t

e6t

e7t

e8t

] (1)

Model 2: SVM (without neutral)

16

[SVM positive t

SVM negativet

Awarenesst

Impression t

Purchase Intent t

Satisfactiont

Recommendationt

]=∑n=1

p [γ1,1n . γ1,7

n

. . .γ7 ,1

n . γ7 ,7n ] [

SVM positive t-n

SVMnegativet-n

Awarenesst-n

Impressiont-n

Purchase Intent t-n

Satisfaction t-n

Recommendationt-n

]+[φ1,1 . φ1,2

. . .φ7 ,1 . φ7 ,2

] [x1

x2]+ [e1t

e2t

e3t

e4t

e5t

e6t

e7t

] (2)

Model 3: LIWC

[LIWC positivet

LIWCnegativet

Awarenesst

Impression t

Purchase Intent t

Satisfactiont

Recommendationt

]=∑n=1

p [γ1,1n . γ1,7

n

. . .γ7 ,1

n . γ7 ,7n ] [

LIWC positivet-n

LIWCnegativet-n

Awarenesst-n

Impressiont-n

Purchase Intent t-n

Satisfaction t-n

Recommendationt-n

]+[φ1,1 . φ1,2

. . .φ7 ,1 . φ7 ,2

] [x1

x2]+ [e1t

e2t

e3t

e4t

e5t

e6t

e7t

] (3)

Model 4: SVM (adjusted for dispersion)

[SVM ¿t

SVM ¿t

Awarenesst

Impression t

Purchase Intent t

Satisfactiont

Recommendationt

]=∑n=1

p [γ1,1n . γ1,7

n

. . .γ7 ,1

n . γ7 ,7n ] [

SVM ¿t-n

SVM ¿t-n

Awarenesst-n

Impressiont-n

Purchase Intent t-n

Satisfaction t-n

Recommendationt-n

]+[φ1,1 . φ1,2

. . .φ7 ,1 . φ7 ,2

] [x1

x2]+ [e1t

e2t

e3t

e4t

e5t

e6t

e7t

] (4)

Model 5: SVM (with Neutral)

(5)

17

[SVM positive t

SVM negativet

SVM neutralt

Awarenesst

Impression t

Purchase Intent t

Satisfactiont

Recommendationt

]=∑n=1

p [γ1,1n . γ1,8

n

. . .γ8 ,1

n . γ8 ,8n ] [

SVM positive t-n

SVMnegativet-n

SVM neutralt-n

Awarenesst-n

Impressiont-n

Purchase Intent t-n

Satisfaction t-n

Recommendationt-n

]+[φ1,1 . φ1,2

. . .φ8 ,1 . φ8 ,2

] [x1

x2]+ [e1t

e2t

e3t

e4t

e5t

e6t

e7t

e8t

]Model 6: Volume+ SVM (with Neutral)

[Likest

Commentst

Shares t

SVM positive t

SVM negativet

SVM neutralt

Awarenesst

Impression t

Purchase Intent t

Satisfactiont

Recommendationt

]=∑n=1

p [γ1,1n . γ1,11

n

. . .γ11,1

n . γ11,11n ] [

Likest-n

Commentst-n

Sharest-n

SVM positive t-n

SVM negativet-n

SVMneutral t-n

Awarenesst-n

Impression t-n

Purchase Intent t-n

Satisfactiont-n

Recommendation t-n

]+[φ1,1 . φ1,2

. . .φ11 ,1 . φ11,2

] [x1

x2]+ [e1t

e2t

e3t

e4t

e5t

e6t

e7t

e8t

e9t

e10t

e11t

] (6)

Model 7: Volume+ LIWC

[Likest

Commentst

Shares t

LIWC positivet

LIWCnegativet

Awarenesst

Impression t

Purchase Intent t

Satisfactiont

Recommendationt

]=∑n=1

p [γ1,1n . γ1,10

n

. . .γ10,1

n . γ10,10n ][

Likest-n

Commentst-n

Shares t-n

LIWC positive t-n

LIWCnegativet-n

Awarenesst-n

Impression t-n

Purchase Intent t-n

Satisfactiont-n

Recommendationt-n

]+[φ1,1 . φ1,2

. . .φ10,1 . φ10 ,2

][ x1

x2]+[e1t

e2t

e3t

e4t

e5t

e6t

e7t

e8t

e9t

e10t

] (7)

18

Model 8: LIWC+ SVM Neutral

[LIWC positivet

LIWCnegativet

SVM positive t

SVM negativet

SVM neutralt

Awarenesst

Impression t

Purchase Intent t

Satisfactiont

Recommendationt

]=∑n=1

p [γ1,1n . γ1,10

n

. . .γ10,1

n . γ10,10n ][

LIWC positive t-n

LIWCnegativet-n

SVM positivet-n

SVM negative t-n

SVM neutralt-n

Awarenesst-n

Impression t-n

Purchase Intent t-n

Satisfactiont-n

Recommendationt-n

]+[φ1,1 . φ1,2

. . .φ10,1 . φ10 ,2

][ x1

x2]+[e1t

e2t

e3t

e4t

e5t

e6t

e7t

e8t

e9t

e10t

] (8)

Model 9: Volume+ LIWC+SVM (with Neutral)

[Likest

Commentst

Shares t

LIWC positivet

LIWCnegativet

SVM positive t

SVM negativet

SVM neutralt

Awarenesst

Impression t

Purchase Intent t

Satisfactiont

Recommendationt

]=∑n=1

p [γ1,1n . γ1,13

n

. . .γ13,1

n . γ13,13n ] [

Likest-n

Comments t-n

Sharest-n

LIWC positivet-n

LIWCnegativet-n

SVM positive t-n

SVMnegative t-n

SVM neutralt-n

Awarenesst-n

Impression t-n

Purchase Intent t-n

Satisfaction t-n

Recommendationt-n

]+[φ1,1 . φ1,2

. . .φ13,1 . φ13 ,2

][x1

x2]+[e1t

e2t

e3t

e4t

e5t

e6t

e7t

e8t

e9t

e10t

e11t

e12t

e13t

] (9)

Model 10: Combination of best performant metrics: Likes, Negative (LIWC) and Positive

(SVM)

(5)

19

[Likest

LIWCnegativet

SVM positive t

Awarenesst

Impression t

Purchase Intent t

Satisfactiont

Recommendationt

]=∑n=1

p [γ1,1n . γ1,8

n

. . .γ8 ,1

n . γ8 ,8n ] [

Likest-n

LIWCnegativet-n

SVM positive t-n

Awarenesst-n

Impressiont-n

Purchase Intent t-n

Satisfaction t-n

Recommendationt-n

]+[φ1,1 . φ1,2

. . .φ8 ,1 . φ8 ,2

] [x1

x2]+ [e1t

e2t

e3t

e4t

e5t

e6t

e7t

e8t

]Second-Stage

In this section, we provide details on the second-stage estimations. To systematically uncover

contingency factors, we regress in a second stage the brand-specific FEVD results on brand

strength and industry sentiment. Our dependent variables are the quality scores given by the

difference in FEVD between SVM with neutral option (the best classification method, on

average) with (1) the volume measures, (2) SVM without a neutral option, (3) LIWC and (4)

SVM dispersion. In equations (7-10) below, we show the second-stage estimation for these

differences.

FEVD ¿β3 BrandStrengt hi∗Average Industry Sentimen ti+β4 Searchi+β5Searchi∗BrandStrengt h i+εi

(7)


(8)


(9)


(10)

20

Web Appendix E: Brand-level results

This part of the Appendix contains the brand-level results for each mindset metric. We present

such results in Tables A6-10.

Table A6 : FEVD Brand Level Results for AwarenessBrand Industry Vol SVM LIWC SVM

dispersionSVM

neutralVol

+SVM neutral

Vol+LIWC

LIWC+SVM

Neutral

Vol+LIWC+

SVM Neutral

Likes+LIWC

(negative)+

SVM(positive)

Abercrombie Fitch Fashion 0.057 0.068 0.067 0.025 0.075 0.109 0.101 0.094 0.135 0.081Aeropostale Fashion 0.044 0.027 0.035 0.035 0.044 0.089 0.089 0.099 0.135 0.064

American Airlines Airlines 0.078 0.032 0.036 0.046 0.044 0.120 0.119 0.061 0.144 0.079Apple Electronics 0.066 0.074 0.041 0.047 0.083 0.128 0.085 0.115 0.146 0.077Audi Cars 0.078 0.077 0.089 0.031 0.086 0.099 0.100 0.128 0.134 0.082BMW Cars 0.048 0.021 0.031 0.069 0.037 0.089 0.066 0.077 0.114 0.041

Burger King Gastronomy 0.082 0.029 0.032 0.033 0.056 0.114 0.109 0.086 0.224 0.064Donatos Gastronomy 0.040 0.021 0.015 0.029 0.037 0.080 0.066 0.048 0.112 0.030

DosEquis Beverages 0.028 0.032 0.018 0.022 0.045 0.080 0.048 0.071 0.100 0.029Fidelity Banks 0.027 0.036 0.042 0.032 0.043 0.072 0.070 0.082 0.107 0.060

Fifth Third Bank Banks 0.073 0.014 0.019 0.030 0.034 0.111 0.093 0.066 0.147 0.064Ford Cars 0.064 0.035 0.034 0.015 0.041 0.095 0.094 0.066 0.126 0.048

Frontier Airlines 0.047 0.035 0.050 0.047 0.063 0.112 0.095 0.095 0.139 0.045Hennessy Beverages 0.032 0.036 0.045 0.026 0.053 0.092 0.086 0.095 0.133 0.053

Huntington Bank Banks 0.029 0.019 0.019 0.018 0.032 0.064 0.059 0.062 0.094 0.023JPMorgan Banks 0.041 0.050 0.031 0.039 0.063 0.092 0.067 0.080 0.104 0.061

Jack Daniels Beverages 0.055 0.036 0.041 0.028 0.057 0.094 0.084 0.087 0.123 0.057Jameson Beverages 0.039 0.020 0.026 0.031 0.031 0.078 0.074 0.066 0.119 0.043JennAir Airlines 0.020 0.029 0.019 0.040 0.067 0.087 0.047 0.079 0.105 0.032JetBlue Airlines 0.036 0.019 0.043 0.023 0.050 0.088 0.083 0.082 0.117 0.049

JosABank Fashion 0.053 0.021 0.031 0.021 0.047 0.095 0.077 0.084 0.131 0.044Kettle Brand Chips Food 0.047 0.033 0.026 0.025 0.054 0.098 0.067 0.083 0.119 0.049

Kia Cars 0.098 0.014 0.018 0.030 0.023 0.126 0.118 0.039 0.141 0.095Kona Grill Gastronomy 0.054 0.028 0.034 0.021 0.038 0.106 0.082 0.066 0.137 0.054

Lenovo Electronics 0.056 0.027 0.014 0.045 0.058 0.107 0.084 0.095 0.140 0.050Lexus Cars 0.058 0.048 0.021 0.055 0.068 0.114 0.087 0.101 0.145 0.040

Liberty Mutual Banks 0.152 0.076 0.070 0.068 0.094 0.202 0.172 0.120 0.222 0.149Lufthansa Airlines 0.035 0.101 0.074 0.101 0.107 0.118 0.099 0.114 0.128 0.116

McDonalds Gastronomy 0.029 0.019 0.022 0.011 0.023 0.060 0.054 0.051 0.090 0.033Nestea Food 0.102 0.085 0.069 0.060 0.095 0.157 0.131 0.109 0.174 0.119

Nine West Fashion 0.066 0.048 0.029 0.022 0.059 0.100 0.084 0.080 0.130 0.055NorthFace Fashion 0.046 0.046 0.036 0.029 0.057 0.103 0.081 0.092 0.135 0.046PNC Bank Banks 0.039 0.032 0.040 0.013 0.039 0.060 0.072 0.077 0.098 0.040

Pepsi Food 0.046 0.023 0.030 0.019 0.033 0.090 0.073 0.062 0.118 0.040Qantas Airlines 0.031 0.026 0.034 0.006 0.032 0.058 0.063 0.050 0.077 0.055

Rabobank Banks 0.038 0.037 0.018 0.047 0.051 0.097 0.062 0.071 0.116 0.046Samsung Electronics 0.026 0.024 0.014 0.034 0.041 0.066 0.040 0.054 0.078 0.024SanDisk Electronics 0.037 0.032 0.026 0.033 0.041 0.092 0.074 0.059 0.108 0.035

Singapore Airlines Airlines 0.034 0.028 0.029 0.049 0.037 0.092 0.087 0.083 0.127 0.054Smirnoff Beverages 0.067 0.050 0.043 0.048 0.060 0.106 0.092 0.074 0.119 0.077

Sony Electronics 0.032 0.023 0.016 0.029 0.032 0.068 0.047 0.044 0.076 0.023Southwest Airlines 0.054 0.027 0.022 0.065 0.043 0.097 0.074 0.077 0.119 0.045

Starbucks Frappucino Gastronomy 0.050 0.026 0.038 0.029 0.032 0.096 0.102 0.072 0.124 0.059Subaru Cars 0.034 0.031 0.020 0.058 0.061 0.101 0.062 0.081 0.122 0.028Tostitos Food 0.065 0.059 0.051 0.014 0.076 0.095 0.088 0.092 0.107 0.065US Bank Banks 0.038 0.028 0.037 0.028 0.045 0.079 0.062 0.070 0.099 0.053

Volkswagen Cars 0.049 0.021 0.023 0.032 0.034 0.084 0.064 0.056 0.099 0.036Volvo Cars 0.060 0.022 0.024 0.010 0.041 0.089 0.078 0.052 0.101 0.048

21

Table A7 : FEVD Brand Level Results for ImpressionBrand Industry Vol SV

MLIW

CSVM

dispersion

SVMneutra

l

Vol+SVM neutra

l

Vol+LIWC

LIWC+

SVM Neutra

l

Vol+LIWC+

SVM Neutral

Likes+LIWC

(negative)+SVM

(positive)Abercrombie Fitch Fashion 0.08

90.11

8 0.087 0.123 0.130 0.165 0.127 0.140 0.184 0.127Aeropostale Fashion 0.03

80.02

9 0.017 0.045 0.042 0.083 0.063 0.059 0.100 0.032American Airlines Airlines 0.07

60.04

7 0.041 0.038 0.071 0.139 0.115 0.112 0.174 0.078Apple Electronics 0.03

80.02

5 0.045 0.038 0.048 0.093 0.077 0.079 0.122 0.054Audi Cars 0.04

90.04

1 0.040 0.037 0.056 0.088 0.085 0.084 0.110 0.055BMW Cars 0.02

80.01

5 0.023 0.029 0.047 0.078 0.055 0.060 0.105 0.035Burger King Gastronom

y0.03

90.03

3 0.025 0.029 0.042 0.074 0.069 0.072 0.160 0.049Donatos Gastronom

y0.04

30.02

8 0.013 0.069 0.037 0.085 0.063 0.057 0.115 0.031DosEquis Beverages 0.05

80.03

2 0.024 0.025 0.052 0.104 0.081 0.081 0.128 0.040Fidelity Banks 0.04

80.03

4 0.024 0.033 0.044 0.084 0.075 0.063 0.107 0.042Fifth Third Bank Banks 0.03

70.02

4 0.034 0.020 0.035 0.073 0.068 0.076 0.115 0.038Ford Cars 0.05

70.01

3 0.014 0.011 0.017 0.079 0.076 0.039 0.098 0.041Frontier Airlines 0.04

10.02

9 0.023 0.019 0.051 0.092 0.063 0.086 0.119 0.041Hennessy Beverages 0.04

70.04

7 0.060 0.045 0.067 0.098 0.110 0.121 0.147 0.092Huntington Bank Banks 0.02

90.01

8 0.019 0.027 0.037 0.067 0.042 0.055 0.091 0.033JPMorgan Banks 0.05

90.03

6 0.036 0.043 0.052 0.115 0.090 0.081 0.138 0.055Jack Daniels Beverages 0.02

40.02

3 0.035 0.018 0.030 0.050 0.055 0.062 0.081 0.036Jameson Beverages 0.08

80.03

2 0.023 0.047 0.056 0.130 0.102 0.065 0.142 0.059JennAir Airlines 0.02

40.03

9 0.032 0.033 0.059 0.088 0.058 0.083 0.116 0.047JetBlue Airlines 0.04

50.03

9 0.059 0.045 0.059 0.097 0.091 0.097 0.131 0.074JosABank Fashion 0.05

90.04

1 0.021 0.035 0.054 0.119 0.082 0.075 0.140 0.065Kettle Brand Chips Food 0.03

30.02

1 0.019 0.017 0.034 0.058 0.047 0.058 0.081 0.039Kia Cars 0.03

00.01

7 0.023 0.030 0.026 0.057 0.052 0.050 0.085 0.039Kona Grill Gastronom

y0.03

60.01

8 0.040 0.015 0.035 0.062 0.083 0.089 0.124 0.049Lenovo Electronics 0.06

20.04

9 0.038 0.027 0.071 0.135 0.105 0.100 0.163 0.086Lexus Cars 0.05

20.01

2 0.016 0.035 0.047 0.099 0.068 0.067 0.121 0.032Liberty Mutual Banks 0.03

70.01

7 0.032 0.032 0.033 0.093 0.076 0.065 0.121 0.053Lufthansa Airlines 0.02

60.03

9 0.032 0.047 0.054 0.072 0.050 0.070 0.087 0.045McDonalds Gastronom

y0.04

70.03

3 0.027 0.026 0.042 0.096 0.079 0.080 0.130 0.070Nestea Food 0.04

40.02

5 0.021 0.023 0.034 0.081 0.066 0.050 0.099 0.036Nine West Fashion 0.04

00.02

3 0.024 0.017 0.036 0.073 0.060 0.066 0.099 0.030NorthFace Fashion 0.05

10.03

4 0.031 0.022 0.037 0.082 0.076 0.058 0.106 0.035PNC Bank Banks 0.05

20.04

6 0.042 0.048 0.055 0.103 0.084 0.087 0.132 0.078Pepsi Food 0.04

00.03

0 0.032 0.018 0.051 0.077 0.073 0.068 0.100 0.038Qantas Airlines 0.02

00.03

2 0.020 0.018 0.036 0.054 0.051 0.056 0.076 0.025Rabobank Banks 0.05

60.01

0 0.025 0.032 0.039 0.100 0.087 0.067 0.130 0.046

22

Samsung Electronics 0.032

0.018 0.014 0.019 0.024 0.063 0.057 0.049 0.092 0.030

SanDisk Electronics 0.037

0.025 0.012 0.041 0.039 0.094 0.058 0.057 0.118 0.025

Singapore Airlines Airlines 0.032

0.025 0.033 0.061 0.043 0.085 0.074 0.078 0.110 0.033

Smirnoff Beverages 0.025

0.019 0.012 0.024 0.028 0.063 0.044 0.063 0.097 0.030

Sony Electronics 0.043

0.027 0.024 0.034 0.050 0.094 0.074 0.076 0.120 0.031

Southwest Airlines 0.059

0.041 0.038 0.039 0.062 0.100 0.090 0.098 0.140 0.072

Starbucks Frappucino

Gastronomy

0.032

0.009 0.019 0.024 0.021 0.056 0.050 0.053 0.092 0.024

Subaru Cars 0.039

0.028 0.042 0.025 0.040 0.067 0.075 0.078 0.102 0.052

Tostitos Food 0.037

0.037 0.041 0.023 0.049 0.097 0.075 0.077 0.121 0.051

US Bank Banks 0.061

0.041 0.053 0.022 0.065 0.123 0.110 0.103 0.163 0.069

Volkswagen Cars 0.058

0.032 0.035 0.057 0.041 0.096 0.090 0.088 0.134 0.067

Volvo Cars 0.025

0.023 0.038 0.022 0.031 0.056 0.060 0.058 0.078 0.030

23

Table A8: FEVD Brand Level Results for Purchase IntentBrand Industry Vol SVM LIWC SVM

dispersionSVM

neutralVol

+SVM neutral

Vol+LIWC

LIWC+SVM

Neutral

Vol+LIWC+

SVM Neutral

Likes+LIWC

(negative)+

SVM(positive)





















24

Table A9: FEVD Brand Level Results for Satisfaction Brand Industry Vol SV

MLIW

CSVM

dispersion

SVMneutra

l

Vol+SVM neutra

l

Vol+LIWC

LIWC+

SVM Neutra

l

Vol+LIWC+

SVM Neutral

Likes+LIWC

(negative)+SVM

(positive)Abercrombie Fitch Fashion 0.05

60.02

2 0.025 0.034 0.028 0.085 0.935 0.064 0.117 0.053Aeropostale Fashion 0.03

90.04

4 0.011 0.046 0.061 0.115 0.946 0.072 0.126 0.035American Airlines Airlines 0.07

20.06

9 0.040 0.032 0.122 0.196 0.963 0.147 0.222 0.108Apple Electronics 0.03

90.02

7 0.022 0.027 0.046 0.083 0.943 0.064 0.099 0.041Audi Cars 0.03

80.02

4 0.023 0.042 0.030 0.061 0.907 0.044 0.086 0.039BMW Cars 0.04

40.02

9 0.033 0.027 0.039 0.087 0.914 0.065 0.103 0.057Burger King Gastronom

y0.04

20.01

9 0.021 0.022 0.030 0.075 0.947 0.063 0.146 0.041Donatos Gastronom

y0.04

10.02

4 0.027 0.042 0.034 0.068 0.950 0.056 0.113 0.046DosEquis Beverages 0.04

00.03

0 0.028 0.023 0.043 0.090 0.950 0.065 0.107 0.041Fidelity Banks 0.03

80.01

7 0.020 0.034 0.033 0.072 0.929 0.056 0.094 0.034Fifth Third Bank Banks 0.04

10.01

1 0.019 0.026 0.027 0.066 0.924 0.062 0.117 0.022Ford Cars 0.04

70.02

3 0.018 0.038 0.030 0.089 0.964 0.055 0.108 0.046Frontier Airlines 0.02

60.01

8 0.029 0.050 0.041 0.085 0.911 0.076 0.120 0.025Hennessy Beverages 0.04

40.03

9 0.033 0.032 0.055 0.092 0.942 0.092 0.124 0.045Huntington Bank Banks 0.03

10.02

9 0.015 0.028 0.048 0.102 0.940 0.070 0.116 0.049JPMorgan Banks 0.02

90.03

1 0.047 0.023 0.046 0.086 0.939 0.106 0.131 0.072Jack Daniels Beverages 0.02

20.02

1 0.035 0.019 0.045 0.062 0.937 0.074 0.098 0.033Jameson Beverages 0.05

30.02

9 0.030 0.035 0.037 0.082 0.938 0.067 0.111 0.084JennAir Airlines 0.04

30.03

3 0.028 0.021 0.043 0.083 0.950 0.069 0.119 0.057JetBlue Airlines 0.04

50.04

5 0.058 0.030 0.076 0.126 0.946 0.131 0.170 0.063JosABank Fashion 0.04

50.01

9 0.023 0.018 0.025 0.078 0.945 0.048 0.097 0.044Kettle Brand Chips Food 0.05

90.03

7 0.029 0.017 0.047 0.090 0.920 0.066 0.107 0.054Kia Cars 0.02

90.02

2 0.031 0.034 0.044 0.080 0.922 0.080 0.114 0.045Kona Grill Gastronom

y0.03

30.04

1 0.025 0.034 0.054 0.087 0.952 0.073 0.105 0.036Lenovo Electronics 0.03

30.02

8 0.034 0.008 0.040 0.070 0.919 0.066 0.100 0.050Lexus Cars 0.03

90.03

8 0.038 0.047 0.052 0.086 0.945 0.086 0.122 0.053Liberty Mutual Banks 0.02

80.03

4 0.031 0.033 0.050 0.080 0.940 0.067 0.092 0.031Lufthansa Airlines 0.02

80.01

5 0.016 0.018 0.029 0.070 0.958 0.055 0.086 0.025McDonalds Gastronom

y0.05

90.02

4 0.030 0.025 0.035 0.098 0.951 0.057 0.116 0.053Nestea Food 0.04

60.02

1 0.012 0.019 0.033 0.085 0.953 0.054 0.101 0.041Nine West Fashion 0.02

40.03

0 0.029 0.014 0.066 0.091 0.935 0.092 0.119 0.032NorthFace Fashion 0.04

70.02

4 0.019 0.027 0.038 0.083 0.935 0.050 0.108 0.035PNC Bank Banks 0.03

10.02

4 0.033 0.022 0.033 0.072 0.940 0.080 0.119 0.038Pepsi Food 0.03

80.02

9 0.014 0.024 0.038 0.067 0.928 0.049 0.084 0.036Qantas Airlines 0.04

20.03

4 0.031 0.015 0.042 0.074 0.955 0.060 0.099 0.049Rabobank Banks 0.06

50.01

7 0.020 0.022 0.037 0.105 0.960 0.063 0.128 0.028

25

Samsung Electronics 0.039

0.027 0.015 0.034 0.035 0.083 0.932 0.067 0.115 0.041

SanDisk Electronics 0.058

0.022 0.025 0.046 0.049 0.106 0.953 0.070 0.124 0.046

Singapore Airlines Airlines 0.035

0.022 0.017 0.037 0.032 0.069 0.935 0.045 0.081 0.033

Smirnoff Beverages 0.041

0.042 0.035 0.037 0.058 0.087 0.934 0.081 0.110 0.050

Sony Electronics 0.045

0.026 0.025 0.029 0.042 0.077 0.891 0.062 0.092 0.042

Southwest Airlines 0.041

0.018 0.021 0.048 0.034 0.085 0.933 0.061 0.117 0.044

Starbucks Frappucino

Gastronomy

0.028

0.010 0.023 0.037 0.017 0.053 0.919 0.051 0.092 0.041

Subaru Cars 0.065

0.026 0.018 0.037 0.038 0.097 0.938 0.070 0.135 0.047

Tostitos Food 0.040

0.015 0.026 0.025 0.021 0.077 0.915 0.053 0.095 0.023

US Bank Banks 0.025

0.035 0.035 0.053 0.069 0.105 0.933 0.097 0.140 0.051

Volkswagen Cars 0.041

0.032 0.025 0.034 0.042 0.078 0.953 0.070 0.107 0.041

Volvo Cars 0.035

0.021 0.027 0.027 0.029 0.062 0.933 0.063 0.098 0.037

26

Table A10 : FEVD Brand Level Results for RecommendationBrand Industry Vol SVM LIWC SVM

dispersionSVM

neutralVol

+SVM neutral

Vol+LIWC

LIWC+SVM

Neutral

Vol+LIWC+

SVM Neutral

Likes+LIWC

(negative)+

SVM(positive)





















27

Figures of the Results

Figure A1: Explanatory Power (R2) across brands

Figure A2: FEVD across five models

28

Figure A3: FEVD for each metric

29

Web Appendix F: Industry Level Analysis (PVAR models)

In this part of the Appendix, we provide details on the industry-level analysis. To obtain the

industry level estimates, we model the relation between the our variables as a dynamic system of

simultaneous equations using the panel vector auto regressive model (Holtz-Eakin, Newey, and

Rosen 1988) which allows for unobserved individual (sector-level) heterogeneity. To estimate

our model, we implement the STATA package pvar developed by Abrigo and Love (2015).

PVAR was recently employed in the marketing (e.g. Hewett et al. 2016) and information systems

literature (e.g. Dewan and Ramaprasad 2014).

Similarly, to our brand-level VAR models, we employ a PVAR model in the following

form (see Equation 12) for each industry and SETs model. Where SETs are the Volume metrics,

SVM without neutral, LIWC, SVM dispersion and SVM with neutral. The optimal lag order

(“p”) is chosen similarly to the VAR model and we select lag 7 for the model estimation. We

estimate the Panel VAR analysis according to the optimal lag order selected from the

combination of the criteria. Based on these criteria we select the seventh order panel VAR since

it has the smallest MAIC and Hansen’s J statistic and the largest CD.

We also checked the stability condition of the estimated panel VAR and found that the

panel VAR is invertible with all moduli of the companion matrix strictly less than one (Hamilton

1994). Therefore the panel VAR model is stable and we can proceed with the next steps of the

analysis.

[ SET t ¿ ] [ Awarenesst ¿ ] [Impression t ¿ ] [ Purchaset ¿ ] [Satisfactiont ¿ ]¿¿

¿¿ (12)

30

In panel VAR with the presence of lagged dependent variables in the right-hand side of the

system of equations the estimates would be biased even with large N (Nickell 1981). Therefore,

the joint estimation of the system of equations may be implemented based on the Generalized

Method of Moments (GMM) estimator. However, prior research has demonstrated that the fixed

effects μi are correlated with the regressors because of the lags of the dependent variables

(Arellano and Bover 1995). To solve this issue, Arellano & Bover (1995) proposed forward

orthogonal deviation (also known as Helmert transformation) that subtracts the average of all

available future observations, thereby minimizing data loss. Since past realizations are not

included in this transformation, they remain as valid instruments and can be used to estimate the

coefficients by system GMM. In addition, the use of Helmert transformation does not induce

autocorrelation in the error terms freeing us from serial autocorrelation (Drakos and

Konstantinou 2014). To avoid heteroscedasticity in the errors we implement the

heteroscedasticity-robust standard errors.

We then derive the FEVD’s from each PVAR model for each Industry and present the

findings in Tables A11-15 and Figures A1-A5

For airlines, we find that Volume dominates for the Awareness; SVM (dispersion) for

Impression, Purchase Intent and Recommendation and SVM with neutral for Satisafaction. The

airline industry is characterized by frequent service requests, which often have neutral sentiment.

Therefore, brand managers in the airline sector should use SVM and include neutral comments in

their analysis. For banking, we find that the simple Volume SETs outperform the other SETs in

all consumer mindset metrics but Satisfaction for which SVM Dispersion wins. This finding is in

line with our expectations, given the low average sentiment, that consumers use social media

31

mostly to complain about their experiences in the banking sector. Thus, ‘no news is good news’

in this industry, while UGC activity typically represents bad news. For electronics and

beverages, we find that the SVM (dispersion) metric consistently outperforms other sentiment

measures for all consumer mindset metrics. In these two industries, brands attract large amounts

of both positive and negative comments, implying high brand dispersion. Accordingly, managers

in these two industries should adjust for such sentiment dispersion and SVM as their main

sentiment technique. For fashion, we find that SVM with neutral dominates for impression, and

recommendation, while Volume dominates for Awareness, Purchase Intent and Satisfaction.

32

Table 11: Industry Level Results for Awareness

Industry

Model 1

Volume

Model 2

(SVM)

Model

3(LIWC)

Model 4

(SVM dispersion adjusted)

Model 5

(SVM with neutral)

Airlines 1.12% 0.29% 0.16% 0.75% 0.34%Banking 16.62% 0.50% 0.90% 0.35% 0.66%Beverages 3.72% 3.36% 2.72% 4.42% 3.71%Cars 1.01% 0.28% 0.18% 1.12% 0.36%Electronics 4.22% 1.45% 0.63% 16.88% 5.78%Fashion 2.48% 1.51% 1.92% 1.15% 1.75%Food 1.25% 0.98% 1.36% 0.41% 1.84%Gastronomy 1.85% 0.88% 0.15% 1.09% 1.35%

Figure A1: Industry Level Results for Awareness

33

Table A12: Industry Level Results for Impression

Industry

Model 1

Volume

Model 2

(SVM)

Model

3(LIWC)

Model 4

(SVM dispersion adjusted

Model 5

(SVM with neutral)


Figure A2: Industry Level Results for Impression

A i r l i n e s B a n k i n g B e v e r a g e s C a r s E l e c t r o n i c s F a s h i o n F o o d G a s t r o n o m y

1.70

%

12.0

2%

1.06

%

0.41

%

3.50

%

1.84

%

1.50

%

1.17

%

1.31

%

1.03

%

0.82

%

0.36

%

0.81

%

2.02

%

1.08

%

0.77

%

0.88

%

1.28

%

0.54

%

0.15

%

0.73

%

1.36

%

0.96

%

0.20

%

2.99

%

2.25

%

2.07

%

0.52

%

19.2

3%

1.86

%

0.34

%

0.99

%

1.51

%

1.32

%

0.86

%

0.54

% 2.36

%

2.99

%

1.59

%

1.11

%

ImpressionVolume SVM LIWC SVM_DISP SVM_3 metr

34

Table A13: Industry Level Results for Purchase Intent

Industry

Model 1

Volume

Model 2

(SVM)

Model

3(LIWC)

Model 4


Model 5

(SVM with neutral)


Figure A3: Industry Level Results for Purchase Intent

35

Table A14: Industry Level Results for Satisfaction

Industry

Model 1

Volume

Model 2

(SVM)

Model

3(LIWC)

Model 4


Model 5

(SVM with neutral)


Figure A4: Industry Level Results for Satisfaction

36

Table A15: Industry Level Results for Recommendation

Industry

Model 1

Volume

Model 2

(SVM)

Model

3(LIWC)

Model 4


Model 5

(SVM with neutral)


Figure A5: Industry Level Results for Recommendation

37

Web Appendix G: Forecasting

To investigate the relative benefit of using a SET over other SETs we compare the

forecasting accuracy of each SET/mind-set metric combination. We explore such forecasting

accuracy of five main models and four combined models by estimating the out-of-sample

forecasts. We report the results in Table A16.

We leave out 10 last days of the sample (from 8th June 2014 to 17th June 2014) for each

brand and assess the forecasting accuracy of the model estimates. We report the Mean Average

Percent Error (MAPE), Mean Average Error (MAE), Root Mean Square Error (RMSE) and the

Theil Index, averaged across brand for each model and mind-set metric combination.

Table A16: Out of Sample Dynamic Forecasting Accuracy Indexes

a) Awareness

Model MAPE MAE RMSE Theil

Volume 0.119 3.811 4.622 0.849

SVM 0.118 3.601 4.392 0.783

LIWC 0.117 3.615 4.409 0.784

SVM (Dispersion) 0.117 3.604 4.424 0.785

SVM (Neutral) 0.117 3.618 4.413 0.789

Volume+SVM (Neutral) 0.124 4.032 4.852 0.893

Volume+LIWC 0.124 3.956 4.789 0.881

LIWC+SVM (Neutral) 0.117 3.679 4.457 0.798

Volume+LIWC+SVM (Neutral) 0.122 3.975 4.786 0.881

Likes+LIWC(negative)+SVM(positive) 0.116 3.764 4.529 0.793

38

b) Impression


Volume 0.341 4.890 5.947 0.783

SVM 0.339 4.618 5.596 0.709

LIWC 0.338 4.641 5.616 0.713

SVM (Dispersion) 0.358 4.752 5.734 0.708

SVM (Neutral) 0.340 4.633 5.619 0.713


Volume+LIWC 0.349 4.918 6.009 0.798

LIWC+SVM (Neutral) 0.344 4.725 5.732 0.735



c) Purchase Intent


Volume 0.273 1.910 2.394 0.812

SVM 0.262 1.906 2.372 0.799

LIWC 0.252 1.909 2.375 0.809

SVM (Dispersion) 0.265 1.898 2.378 0.802

SVM (Neutral) 0.266 1.911 2.384 0.802


Volume+LIWC 0.278 1.940 2.418 0.829

LIWC+SVM (Neutral) 0.287 1.919 2.393 0.813



39

d) Satisfaction


Volume 0.262 3.456 4.168 0.791

SVM 0.239 3.132 3.852 0.756

LIWC 0.236 3.139 3.874 0.758

SVM (Dispersion) 0.257 3.143 3.866 0.752

SVM (Neutral) 0.234 3.157 3.884 0.755


Volume+LIWC 0.253 3.448 4.162 0.789

LIWC+SVM (Neutral) 0.231 3.284 4.029 0.765



e) Recommendation


Volume 0.400 4.762 5.794 0.762

SVM 0.440 4.386 5.389 0.744

LIWC 0.390 4.361 5.377 0.741

SVM (Dispersion) 0.423 4.427 5.425 0.748

SVM (Neutral) 0.431 4.395 5.403 0.743


Volume+LIWC 0.356 4.500 5.563 0.758

LIWC+SVM (Neutral) 0.430 4.486 5.568 0.747



40

References

Abrigo, Manuel and Inessa Love (2015), “Estimation of Panel Vector Autoregression in Stata: a Package of Program.”

Agarwal, A., B. Xie, I. Vovsha, O. Rambow, and R. Passonneau (2011), “Sentiment Analysis of Twitter Data,” in Proceedings of the workshop on languages in social media, Association for Computational Linguistics, 30–38.

Arellano, Manuel and Olympia Bover (1995), “Another look at the instrumental variable estimation of error-components models,” Journal of econometrics, 68 (1), 29–51.

Bradley, M M, M K Greenwald, M C Petry, and P J Lang (1992), “Remembering pictures: pleasure and arousal in memory.,” Journal of experimental psychology: Learning, memory, and cognition, 18 (2), 379–90.

Breusch, T.S. (1978), “Testing for Autocorrelation in Dynamic Linear Models,” Australian Economic Papers, 17 (31), 334–55.

Colicev, Anatoli, Ashwin Malshe, Koen Pauwels, and Peter O’Connor (2016), “How Earned and Owned Social Media Impact Shareholder Value Through Consumer Mindset Metrics,” MSI Working Paper Series 16-115.

———, ———, ———, and ——— (2017), “Improving Consumer Mind-Set Metrics and Shareholder Value through Social Media: The Different Roles of Owned and Earned,” Journal of Marketing, (Forthcoming).

Culotta, Aron and Jennifer Cutler (2016), “Mining Brand Perceptions from Twitter Social Networks,” Marketing Science, Articles i, 1–16.

Dekimpe, Marnik G. and Dominique M. Hanssens (1999), “Sustained Spending and Persistent Response: A New Look at Long-Term Marketing Profitability,” Journal of Marketing Research, 36 (4), 397–412.

Dewan, Sanjeev and Jui Ramaprasad (2014), “Social Media, Traditional Media, and Music Sales,” MIS Quarterly, 2 (3), 101–21.

Drakos, Konstantinos and Panagiotis Th. Konstantinou (2014), “Terrorism, crime and public spending: Panel VAR evidence from Europe.,” Defence and Peace Economics, 25 (4), 349–61.

Enders, Walter (2014), Applied Econometric Time Series, New York: John Wiley & Sons.Feinerer, Ingo (2017), “Introduction to the tm Package: Text Mining in R,” R vignette, 1–8.Franses, Philip-Hans (2005), “On the Use of Econometric Models for Policy Simulation in Marketing,”

Journal of Marketing Research, 42 (February), 4–14.Gamon, M. (2004), “Sentiment classification on customer feedback data: noisy data, large feature vectors,

and the role of linguistic analysis.,” in Proceedings of the 20th international conference on Computational Linguistics, Association for Computational Linguistics, 841.

Granger, C.W.J (1969), “Investigating Causal Relations by Econometric Models and Cross-Spectral Methods,” Econometrica, 37 (3), 424–38.

Haidt, Jonathan and Craig Joseph (2004), “Intuitive Ethics: How Innately Prepared Intuitions Generate Culturally Variable Virtues,” Daedalus, 133, 55–66.

Hamilton, JD (1994), Time series analysis, Princeton University Press.Hewett, Kelly, William Rand, Roland T Rust, and Harald J van Heerde (2016), “Brand Buzz in the

Echoverse.,” Journal of Marketing, 80 (3), 1–24.Holtz-Eakin, Douglas, Whitney Newey, and Harvey S . Rosen (1988), “Estimating Vector

Autoregressions with Panel Data,” Econometrica, 56 (6), 1371–95.Johansen, Søren, Rocco Mosconi, and Bent Nielsen (2000), “Cointegration Analysis in the Presence of

Structural Breaks in the Deterministic Trend,” Econometrics Journal, 3 (2), 216–49.Jurka, Tp and Loren Collingwood (2015), “RTextTools: A Supervised Learning Package for Text

41

Classification.,” R Journal, 5 (1), 6–12.Kübler, Raoul V, Jaap Wieringa, and Koen H. Pauwels (2017), “Big Data and Machine Learning,” in

Advanced Methods for Modeling Markets, Leeflang, Wieringa, Bijmolt, and Pauwels, eds., Springer, Berlin, 1–35.

Kwiatkowski, Denis, Peter C.B. Phillips, Peter Schmidt, and Yongcheol Shin (1992), “Testing the Null Hypothesis of Stationary Against the Alternative of a Unit Root,” Journal of Econometrics, 54 (1–3), 159–78.

Luo, Xueming, Jie Zhang, and Wenjing Duan (2013), “Social media and firm equity value,” Information Systems Research, 24 (1), 146–63.

Maddala, G.S and In-Moo Kim (1998), Unit Roots, Cointegration, and Structural Change, Cambridge, UK: Cambridge University Press.

McAuley, Julian, Rahul Pandey, and Jure Leskovec (2015), “Inferring Networks of Substitutable and Complementary Products,” Knowledge Discovery and Data Mining, 12.

Mcauley, Julian and Alex Yang (2016), “Addressing Complex and Subjective Product-Related Queries with Customer Reviews,” in IW3C2, 625–35.

Mullen, Tony and Nigel Collier (2004), “Sentiment analysis using support vector machines with diverse information sources,” Proceedings of Conference on Empirical Methods in Natural Language Processing, 412–18.

Naji, I. (2013), “Twitter Sentiment Analysis with Naives Bayes,” [available at http://thinknook.com/twitter-sentiment-analysis-training-corpus-dataset-2012-09-22/].

Nickell, Stephen (1981), “Biases in dynamic models with fixed effects,” Econometrica, 49 (6), 1417–26.Nijs, Vincent R., Shuba Srinivasan, and Koen H. Pauwels (2007), “Retail-Price Drivers and Retailer

Profits,” Marketing Science, 26 (4), 473–87.Pak, Alexander and Patrick Paroubek (2010), “Twitter as a Corpus for Sentiment Analysis and Opinion

Mining,” in In Proceedings of the Seventh Conference on International Language Resources and Evaluation, 1320–26.

Pauwels, Koen H. and Dominique M. Hanssens (2007), “Performance Regimes and Marketing Policy Shifts,” Marketing Science.

Pennebaker, James W. (1993), “Putting stress into words: Health, linguistic, and therapeutic implications,” Behaviour Research and Therapy, 31 (6), 539–48.

Pennebaker, James W, Ryan L Boyd, Kayla Jordan, and Kate Blackburn (2015), “The development and psychometric properties of LIWC2015,” (September 2015), 1–22.

Rooderkerk, Robert P. and Koen H. Pauwels (2016), “No Comment?! The Drivers of Reactions to Online Posts in Professional Groups,” Journal of Interactive Marketing, 35, 1–15.

Slotegraaf, Rebecca J and Koen H. Pauwels (2008), “The Impact of Brand Equity and Innovation on the Long-Term Effectiveness of Promotions,” Journal of Marketing Research.

Srinivasan, Shuba, Marc Vanhuele, and Koen H. Pauwels (2010), “Mind-Set Metrics in Market Response Models : An Integrative Approach,” Journal of Marketing Research, XLVII (August), 672–84.

Tirunillai, Seshadri and Gerard J. Tellis (2012), “Does chatter really matter? Dynamics of user-generated content and stock performance,” Marketing Science, 31 (2), 198–215.

Toubia, Olivier and AT Stephen (2013), “Intrinsic vs. Image-Related Utility in Social Media: Why Do People Contribute Content to Twitter?,” Marketing Science, 32 (3), 368–92.

Trusov, Michael, Randolph E. Bucklin, and Koen H. Pauwels (2009), “Effects of Word-of-Mouth Versus Traditional Marketing: Findings from an Internet Social Networking Site,” Journal of marketing, 73 (5), 90–102.

42

ars.els-cdn.com · Web viewWeb Appendix. This Appendix contains details on (A) the brand sample...

Documents

Transcript of ars.els-cdn.com · Web viewWeb Appendix. This Appendix contains details on (A) the brand sample...