SOCIAL MEDIA MINING what is it GOOD for? and when is it good enough?

1

SOCIAL MEDIA MINING WHAT IS IT GOOD FOR?AND WHEN IS IT GOOD ENOUGH?

Nick BuckleySoShall Consulting

asc Funky Data 25th September 2012

2

The Plan

• What is Social Media Mining? [SMM]• How do Market Researchers tend to think about it?• Nuts & Bolts – practical outcomes• Challenges and Constraints• [How] Do these make Researchers re-think the ‘place’ of SMM• Where will it go from here?

BUT:• Assumption of a vendor researcher distinction – even if in house• No naming or comparing of vendors/applications• Difficult to judge where to pitch the basics – too familiar vs. too abstract

3

1. What are we talking about?

4

Definition* of social media monitoring:“Social Media Monitoring (SMM) means the identification, observation, and analysis of user-generated social media content for the purpose of market research.”

What exactly are we talking about?

What they say

* http://www.social-media-monitoring.org

5

What are we talking about? Social Media...

Review sitesProfessional & Consumer

Blogs/Microblogs

Forums

Client sites

Video sites

PublicCommunities

NewsgroupsNews sites

http://www.consumerreports.org/cro/index.htm

http://www.consumerreports.org/cro/index.htm

http://www99.epinions.com/

http://forums.t-mobile.com/tmbl/



http://images.google.com/imgres?imgurl=http://www.iupui.edu/~anthpm/facebook-logo.jpg&imgrefurl=http://www.iupui.edu/~anthpm/walkerhome.html&usg=__AOysLCio9BkSem48c7sVB-ly_j4=&h=301&w=800&sz=79&hl=en&start=1&itbs=1&tbnid=zJxMgbsojbq3jM:&tbnh=54&tbnw=143&prev=/images?q=facebook+logo&gbv=2&hl=en





6

What’s in a word?

GfK NOP currently prefers “Mining”. User generated content in social media lays down a rich seam of activity, opinion, thought and information… mess, echoes and ‘whimsy’.

For some time marketing and PR professionals have been monitoring Social Media to capture headline ‘buzz’ in real time, and to detect sudden changes requiring a response.

But collecting and counting this content is only the beginning of a process which can add value via many techniques… including integration with other sources such as market research data.

7

2. What happens when Market Researchers get hold of it?

8

Sony brand damage was driven by PlayStation breach (2011)

sony buzz this year

sony sentiment this year

sony buzz in april

sony sentiment in april

playstation buzz

playstation sentiment

9

Market Researchers believe that SMM can also give clients a window on other dimensions of online conversations

• Category Dynamics Consumer needs Problems and issues consumers discuss Product usage discussions New product entries & trends in purchase

intention

• Corporate Corporate mentions related to reputation Crises Social issues

• Brand/Product Brand/sub-brand mentions, brand “buzz” Number of positive vs. negative sentiments for

each brand – including customer service Brand content analysis, what’s being said

about brand Advertising noticed most and related

discussion – launch tracking Source of mentions (specific sites) and the

most influential sites

• Competition All the above for preference & competition

SMM provides insights into:

10

Market Researchers are fitting SMM into different places within method or process

• As a precursor to traditional Market Research• Refining hypotheses for research design• Prioritising criteria – identifying new ones• Defining or qualifying the competitive set• Identifying niche respondents for small-scale studies

• As a successor to traditional Market Research• Tracking the impact of implemented findings• Monitoring for events which may create discontinuities in this• Low intensity/low detail follow-up

• As a companion to traditional Market Research• Compare and contrast – e.g unconditioned• Add granularity to satisfaction drivers• Complement reach• Interpolate lengthy studies

•

So can SMM research stand alone?

Is there a hierarchy, within these hybrid uses, of ‘best fit’. Does the story change if you get longitudinal with a category?

To what extent do some of these uses assume that the data can be treated like conventional MR data?

In any case – should it be treated and analysed thus?

11*within certain technical limitations

• You can ‘ask a new question’ without having to issue a new questionnaire*

• Unconditioned by participant awareness of a research process, often more emotive than considered survey responses

• Low cost - under certain circumstances• Spontaneously generated content -

unconstrained by research frame• Offers insight into active social media

users• Potentially

global• Very immediate

• Not necessarily representative of the general population

• Difficult to weight back to general population, as demographic data is sparse

• Automated sentiment analysis only as good as the algorithms [and these vary greatly]

• Automated harvesting can capture a lot of ‘noise’ for certain words or brands

• No guarantee of sufficient data

• Costs rise when we use supplementary analysis to overcome some of these issues

But inevitably they think about comparison with surveys…

12

Different approaches for different client needs For example - Precision Extraction vs ‘Trawl & Filter’

Crude mention &

mood tracking

Quantitative - Brand tracking and integration with traditional

research

Indicative Qual e.g. using trends and

volumes to guide focus of analysis

Exploratory Qual – more complex collection.

Manually manageable volumes and ‘tuning’

Higher data volumesfrom simple search terms

Lower data volumesfrom targeted & compound search terms

More post processing, applied to data by GfK - to reduce noise and refine sentiment attribution

Accept raw data output from application

13

3. Too Abstract?

14

The raw material - Results from search terms

SMM applications extract results from wholesale supplies of data, conducting searches defined by “search terms”

• can be anything from a simple and distinctive brand or product name, to a complex expression configured to capture discussions about a category or concept

• search terms combine words or phrases via logical instructions such as AND, OR, NOT by employing functions such as WITHIN to detect words in

a certain proximity to each other with brackets that can dictate sequence in which

instructions are applied e.g. “word1” AND ( “word2” OR “word3” )

14

15

Typical SMM application offers a dashboard view of data returned by these search terms – and the facility to export the underlying data

16

AnalysesWhatever the Search Terms define – here is what can be measured about the results returned… in combination or in isolation

Volume “how much is it talked about, and how is this changing over time”

Channels “where on the web is it being talked about… twitter, blogs, forums,

comments?”

Location“where in the world is it

being talked about?”

Themes “what other words and

phrases are most regularly associated with

it?”

People “who is talking about it?” That may be by influence

– according to various proprietary indices – or

by demographics [to be used with caution]

Sentiment: Across all of these variables is superimposed automatically generated “Sentiment” analysis – positive, negative or neutral language associated with the subject of the posts…

Verbatims drill-down to individual

posts, in their own words – “what do people

actually say?”

17

Combinations of these basics tell different types of story

• Brand A’s new ad was mainly discussed on Forums when it was being shot by a famous pop star, but was mainly discussed on Twitter when it was being aired. Volume + Channels

• Automotive brand X is associated mainly with topics around performance, whereas brand Y is associated with comfort and style. Both enjoy roughly the same level of positive sentiment overall. Themes + Sentiment

• Beverage brand N enjoyed a bigger ‘spike’ in its mentions when news of a future big game at a sponsored venue was announced, than it got from a tournament sponsorship that was live at the time. Volume vs. Offline Schedule

• Some ‘general’ social Forum sites enjoy bigger concentrations of discussion of a particular topic than specialist Forums dedicated to that same topic! Channels + Themes + People

17

18

Examples of outcomes from SMM studies

Focus on the right social media channels

at the right time.

Differentiate ‘trade press’ buzz from real

engagement.

Consumers don’t always talk about the product features that

you highlight.

Places where naturally occurring discussion of a

category offers an opportunity for brands to ‘intercept’ rather than try to create competing social media conversations.

‘The world’ can sometimes throw up more interesting stories

about you than you could hope to generate for yourself… but

not always with the connotations you would like.

19

BUT!

20

There are many forces which erode this nice model…

Accuracy?

Reach?...................................................

Relevance?

Reach image from titletrack.com

21

AccuracyIs the searched-for phrase even in the returned “snippet”?

Is it ‘real content’ – or is it• Navigation?• Ticker or title content?• Ad Content?• Various species of spam [overlaps with ‘Relevance’]?

Is meta-data about the poster• Present?• Reliable?

Understanding this, apart from making your own manual checks, is about understanding your 3rd party vendor and, often, their ‘wholesale data suppliers’ in turn.

22

Reach

[T]here are known knowns; there are things we know that we know.There are known unknowns; that is to say there are things that, we now know we don't know.But there are also unknown unknowns – there are things we do not know, we don't know.

Donald Rumsfeld

• Are these results from scrutiny of the entire [English speaking] social web No• Are they results from a very large, sometimes stated, number of social sources? Yes• Could this range be skewed relative to the subject under scrutiny? Yes• Where it’s Twitter data – is it from the whole of Twitter Maybe• Is historical data always the same basis as current data, or data gathered since the search was defined? Not always• Do we always have a good idea of what the ‘Reach’ is? No

23

Relevance

Even when the application has collected exactly what we asked for, and it is legitimate content, with some nice useful data about the poster… it might not be relevant

“Cats are great company.”

“#EMT Bolt one cool cat!”

“Also, the Cat is a great resort”

“I love my aunt Cat!”

“I think Cat Stark is worse than any Lanister.”

“I think this hurricane was a scam cooked up by the fat cats in Big Grocer.”

24

… put another way

Oh s**t!

I forgot

it’s still the internet.

25

Other challenges include…

However , commencing too early public smoking facts will just overstress your pet ; quite a fresh pet will not learn everything from services. Just after he has ended up perched for some a few moments, supply him with the particular take care of, plus for instance in advance of, make sure you compliment the pup. When dog house teaching your dog, continue to keep the dog house in the vicinity of the spot where you as well as the canine are usually conversing.

26

And I haven’t mentioned automated Sentiment Analysis yet!

Irony – really?

Slang/Dialect/Register

Multiple meanings – “50 strong”

Adjacent subjects – “My beautiful FIAT next to a BMW”

27

4. And what is Good, and what is not Good?

28

To Recap

• SMM tools make it very easy to “Super Google” certain Brands, people, objects and even categories or concepts – quickly generating tables and charts.

• But underneath there’s a complex story about accuracy, reach and relevance… which you only really see when you drill down… and which you only really understand by getting inside the provider’s systems and sources.

• The fact that this isn’t blazoned across all dashboards, is about the fact that many solution providers started out somewhere else… with monitoring. It’s not that they should have anticipated our needs.

• Sentiment analysis is only part of this story – it doesn’t define it.

29

Relationships matter as much as technology

Social Media

Content

3rd Party System

[e.g. SaaS]

3rd party organisation

“Vendor”

Dashboard-w

ielding

MR

Agency

Clients

FEEDS

Queries and more refined requirements

Reports [inc post hoc analysis]

“Results”

Modified searches

Topic-specific feedback

Customise EngineCustomise

Feeds

Wholesalers?

30

Natural Language Processing [NLP] to the rescue?

Definition

“Specifically, it is the process of a computer extracting meaningful information from natural language input and/or producing natural language output”*

Many SMM applications now claim some level of NLP.

*Warschauer, M., & Healey, D. (1998). Computers and language learning: An overview

This may legitimately be contrasted with simpler analysis of vocabulary combinations, and probabilistic methods, it sometimes means little. It may only mean that some rules of language have been ‘attended to’ in what is still essentially a pattern-matching exercise

31

But clearly sophisticated NLP can make a big difference

• Improved Accuracy – including filtering out of unstructured spam

• More tools available to achieve/check Relevance

• Much-improved Sentiment Analysis

Trends:

• there’s more NLP – not just in social media analysis,

• there’s more commercially affordable NLP and it keeps getting better,

• some of it is even helpfully self-auditing.

Significantly, when NLP is set to retain only high-confidence classifications, volumes of results are dramatically reduced.

32

Barking up the wrong Tree?

Researchers’ instincts have been to use, and so judge, SMM like survey data.

But “what is good” the ancient philosophers would tell us, is really about

function and

purpose.

I think we’ve now learned enough about SMM to stop and ask..

“what was it we were trying to do?”

33

Remind me what we are trying to do?• Use the social web as a proxy for the population?

• Understand how the social web is responding – for the benefit of those solely interested in this sub-set of the population as a channel or marketplace?

• Access particularly niches which are more concentrated online than off?

• Detect significant events?

• Measure shifts and changes?

• Make rough comparisons?

• Discover new insights, themes and connections?

© 2012 GfK NOP 34

Different client needs indicate different SMM approaches For example - Precision Extraction vs ‘Trawl & Filter’

Crude mention & mood tracking

Quantitative - Brand tracking and integration with traditional researchIndicative Qual

e.g. using trends and volumes to guide focus of analysisExploratory Qual – more

complex collection. Manually manageable volumes and ‘tuning’

Higher data volumesfrom simple search terms

Lower data volumesfrom targeted & compound search terms

More post processing, applied to data by MR agency - to reduce noise and refine sentiment attribution

Accept raw data output from application

Not radical enough!

Too much like hard work?

Sensible

35

Rather than wait for NLP utopia…

Settle, for now, on:

1. SMM as a powerful and novel Qual exploration tool

2. Big number crunching, on single terms, that takes a “hyena” approach. i.e.

Accept all* occurrences of a brand or product name in posts as an indication of significance… even the ‘trending’ spam and the adverts and the competitions…

Look for pure correlations between words/phrases and other word/phrases…

Or between trends in these numbers and classes of offline events – such as sales, complaints and other behaviours… with a view to predicting, explaining or causing such events in the future.

*Except for the most obvious duplication errors such as over-indexing

36

5. Some Concquestions

37

Talking PointsHow will commercial SMM applications and services with the best accuracy, reach and relevance capabilities be recognised, validated and promoted?

If you’re a researcher and you want to use this stuff, for the first time, tomorrow… what must be done?

Fortunately – there’s enough to learn by “super-googleing”, browsing and crude trend tracking to keep us going… and learning… for some time to come. Is that, whilst pragmatic, enough of an ambition?

38

Dr Nick BuckleySoShall Consulting

Tel: 07958 516967 t: @grimboldE: [email protected]

Babita EarleDigital Strategy

DirectorGfK NOP

Tel: 020 7890 9467 E: [email protected]

SOCIAL MEDIA MINING what is it GOOD for? and when is it good enough?

Documents

Transcript of SOCIAL MEDIA MINING what is it GOOD for? and when is it good enough?