SOCIAL MEDIA MINING what is it GOOD for? and when is it good enough?
description
Transcript of SOCIAL MEDIA MINING what is it GOOD for? and when is it good enough?
1
SOCIAL MEDIA MINING WHAT IS IT GOOD FOR?AND WHEN IS IT GOOD ENOUGH?
Nick BuckleySoShall Consulting
asc Funky Data 25th September 2012
2
The Plan
• What is Social Media Mining? [SMM]• How do Market Researchers tend to think about it?• Nuts & Bolts – practical outcomes• Challenges and Constraints• [How] Do these make Researchers re-think the ‘place’ of SMM• Where will it go from here?
BUT:• Assumption of a vendor researcher distinction – even if in house• No naming or comparing of vendors/applications• Difficult to judge where to pitch the basics – too familiar vs. too abstract
3
1. What are we talking about?
4
Definition* of social media monitoring:“Social Media Monitoring (SMM) means the identification, observation, and analysis of user-generated social media content for the purpose of market research.”
What exactly are we talking about?
What they say
* http://www.social-media-monitoring.org
5
What are we talking about? Social Media...
Review sitesProfessional & Consumer
Blogs/Microblogs
Forums
Client sites
Video sites
PublicCommunities
NewsgroupsNews sites
6
What’s in a word?
GfK NOP currently prefers “Mining”. User generated content in social media lays down a rich seam of activity, opinion, thought and information… mess, echoes and ‘whimsy’.
For some time marketing and PR professionals have been monitoring Social Media to capture headline ‘buzz’ in real time, and to detect sudden changes requiring a response.
But collecting and counting this content is only the beginning of a process which can add value via many techniques… including integration with other sources such as market research data.
7
2. What happens when Market Researchers get hold of it?
8
Sony brand damage was driven by PlayStation breach (2011)
sony buzz this year
sony sentiment this year
sony buzz in april
sony sentiment in april
playstation buzz
playstation sentiment
9
Market Researchers believe that SMM can also give clients a window on other dimensions of online conversations
• Category Dynamics Consumer needs Problems and issues consumers discuss Product usage discussions New product entries & trends in purchase
intention
• Corporate Corporate mentions related to reputation Crises Social issues
• Brand/Product Brand/sub-brand mentions, brand “buzz” Number of positive vs. negative sentiments for
each brand – including customer service Brand content analysis, what’s being said
about brand Advertising noticed most and related
discussion – launch tracking Source of mentions (specific sites) and the
most influential sites
• Competition All the above for preference & competition
SMM provides insights into:
10
Market Researchers are fitting SMM into different places within method or process
• As a precursor to traditional Market Research• Refining hypotheses for research design• Prioritising criteria – identifying new ones• Defining or qualifying the competitive set• Identifying niche respondents for small-scale studies
• As a successor to traditional Market Research• Tracking the impact of implemented findings• Monitoring for events which may create discontinuities in this• Low intensity/low detail follow-up
• As a companion to traditional Market Research• Compare and contrast – e.g unconditioned• Add granularity to satisfaction drivers• Complement reach• Interpolate lengthy studies
•
So can SMM research stand alone?
Is there a hierarchy, within these hybrid uses, of ‘best fit’. Does the story change if you get longitudinal with a category?
To what extent do some of these uses assume that the data can be treated like conventional MR data?
In any case – should it be treated and analysed thus?
11*within certain technical limitations
• You can ‘ask a new question’ without having to issue a new questionnaire*
• Unconditioned by participant awareness of a research process, often more emotive than considered survey responses
• Low cost - under certain circumstances• Spontaneously generated content -
unconstrained by research frame• Offers insight into active social media
users• Potentially
global• Very immediate
• Not necessarily representative of the general population
• Difficult to weight back to general population, as demographic data is sparse
• Automated sentiment analysis only as good as the algorithms [and these vary greatly]
• Automated harvesting can capture a lot of ‘noise’ for certain words or brands
• No guarantee of sufficient data
• Costs rise when we use supplementary analysis to overcome some of these issues
But inevitably they think about comparison with surveys…
12
Different approaches for different client needs For example - Precision Extraction vs ‘Trawl & Filter’
Crude mention &
mood tracking
Quantitative - Brand tracking and integration with traditional
research
Indicative Qual e.g. using trends and
volumes to guide focus of analysis
Exploratory Qual – more complex collection.
Manually manageable volumes and ‘tuning’
Higher data volumesfrom simple search terms
Lower data volumesfrom targeted & compound search terms
More post processing, applied to data by GfK - to reduce noise and refine sentiment attribution
Accept raw data output from application
13
3. Too Abstract?
14
The raw material - Results from search terms
SMM applications extract results from wholesale supplies of data, conducting searches defined by “search terms”
• can be anything from a simple and distinctive brand or product name, to a complex expression configured to capture discussions about a category or concept
• search terms combine words or phrases via logical instructions such as AND, OR, NOT by employing functions such as WITHIN to detect words in
a certain proximity to each other with brackets that can dictate sequence in which
instructions are applied e.g. “word1” AND ( “word2” OR “word3” )
14
15
Typical SMM application offers a dashboard view of data returned by these search terms – and the facility to export the underlying data
16
AnalysesWhatever the Search Terms define – here is what can be measured about the results returned… in combination or in isolation
Volume “how much is it talked about, and how is this changing over time”
Channels “where on the web is it being talked about… twitter, blogs, forums,
comments?”
Location“where in the world is it
being talked about?”
Themes “what other words and
phrases are most regularly associated with
it?”
People “who is talking about it?” That may be by influence
– according to various proprietary indices – or
by demographics [to be used with caution]
Sentiment: Across all of these variables is superimposed automatically generated “Sentiment” analysis – positive, negative or neutral language associated with the subject of the posts…
Verbatims drill-down to individual
posts, in their own words – “what do people
actually say?”
17
Combinations of these basics tell different types of story
• Brand A’s new ad was mainly discussed on Forums when it was being shot by a famous pop star, but was mainly discussed on Twitter when it was being aired. Volume + Channels
• Automotive brand X is associated mainly with topics around performance, whereas brand Y is associated with comfort and style. Both enjoy roughly the same level of positive sentiment overall. Themes + Sentiment
• Beverage brand N enjoyed a bigger ‘spike’ in its mentions when news of a future big game at a sponsored venue was announced, than it got from a tournament sponsorship that was live at the time. Volume vs. Offline Schedule
• Some ‘general’ social Forum sites enjoy bigger concentrations of discussion of a particular topic than specialist Forums dedicated to that same topic! Channels + Themes + People
17
18
Examples of outcomes from SMM studies
Focus on the right social media channels
at the right time.
Differentiate ‘trade press’ buzz from real
engagement.
Consumers don’t always talk about the product features that
you highlight.
Places where naturally occurring discussion of a
category offers an opportunity for brands to ‘intercept’ rather than try to create competing social media conversations.
‘The world’ can sometimes throw up more interesting stories
about you than you could hope to generate for yourself… but
not always with the connotations you would like.
19
BUT!
20
There are many forces which erode this nice model…
Accuracy?
Reach?...................................................
Relevance?
Reach image from titletrack.com
21
AccuracyIs the searched-for phrase even in the returned “snippet”?
Is it ‘real content’ – or is it• Navigation?• Ticker or title content?• Ad Content?• Various species of spam [overlaps with ‘Relevance’]?
Is meta-data about the poster• Present?• Reliable?
Understanding this, apart from making your own manual checks, is about understanding your 3rd party vendor and, often, their ‘wholesale data suppliers’ in turn.
22
Reach
[T]here are known knowns; there are things we know that we know.There are known unknowns; that is to say there are things that, we now know we don't know.But there are also unknown unknowns – there are things we do not know, we don't know.
Donald Rumsfeld
• Are these results from scrutiny of the entire [English speaking] social web No• Are they results from a very large, sometimes stated, number of social sources? Yes• Could this range be skewed relative to the subject under scrutiny? Yes• Where it’s Twitter data – is it from the whole of Twitter Maybe• Is historical data always the same basis as current data, or data gathered since the search was defined? Not always• Do we always have a good idea of what the ‘Reach’ is? No
23
Relevance
Even when the application has collected exactly what we asked for, and it is legitimate content, with some nice useful data about the poster… it might not be relevant
“Cats are great company.”
“#EMT Bolt one cool cat!”
“Also, the Cat is a great resort”
“I love my aunt Cat!”
“I think Cat Stark is worse than any Lanister.”
“I think this hurricane was a scam cooked up by the fat cats in Big Grocer.”
24
… put another way
Oh s**t!
I forgot
it’s still the internet.
25
Other challenges include…
However , commencing too early public smoking facts will just overstress your pet ; quite a fresh pet will not learn everything from services. Just after he has ended up perched for some a few moments, supply him with the particular take care of, plus for instance in advance of, make sure you compliment the pup. When dog house teaching your dog, continue to keep the dog house in the vicinity of the spot where you as well as the canine are usually conversing.
26
And I haven’t mentioned automated Sentiment Analysis yet!
Irony – really?
Slang/Dialect/Register
Multiple meanings – “50 strong”
Adjacent subjects – “My beautiful FIAT next to a BMW”
27
4. And what is Good, and what is not Good?
28
To Recap
• SMM tools make it very easy to “Super Google” certain Brands, people, objects and even categories or concepts – quickly generating tables and charts.
• But underneath there’s a complex story about accuracy, reach and relevance… which you only really see when you drill down… and which you only really understand by getting inside the provider’s systems and sources.
• The fact that this isn’t blazoned across all dashboards, is about the fact that many solution providers started out somewhere else… with monitoring. It’s not that they should have anticipated our needs.
• Sentiment analysis is only part of this story – it doesn’t define it.
29
Relationships matter as much as technology
Social Media
Content
3rd Party System
[e.g. SaaS]
3rd party organisation
“Vendor”
Dashboard-w
ielding
MR
Agency
Clients
FEEDS
Queries and more refined requirements
Reports [inc post hoc analysis]
“Results”
Modified searches
Topic-specific feedback
Customise EngineCustomise
Feeds
Wholesalers?
30
Natural Language Processing [NLP] to the rescue?
Definition
“Specifically, it is the process of a computer extracting meaningful information from natural language input and/or producing natural language output”*
Many SMM applications now claim some level of NLP.
*Warschauer, M., & Healey, D. (1998). Computers and language learning: An overview
This may legitimately be contrasted with simpler analysis of vocabulary combinations, and probabilistic methods, it sometimes means little. It may only mean that some rules of language have been ‘attended to’ in what is still essentially a pattern-matching exercise
31
But clearly sophisticated NLP can make a big difference
• Improved Accuracy – including filtering out of unstructured spam
• More tools available to achieve/check Relevance
• Much-improved Sentiment Analysis
Trends:
• there’s more NLP – not just in social media analysis,
• there’s more commercially affordable NLP and it keeps getting better,
• some of it is even helpfully self-auditing.
Significantly, when NLP is set to retain only high-confidence classifications, volumes of results are dramatically reduced.
32
Barking up the wrong Tree?
Researchers’ instincts have been to use, and so judge, SMM like survey data.
But “what is good” the ancient philosophers would tell us, is really about
function and
purpose.
I think we’ve now learned enough about SMM to stop and ask..
“what was it we were trying to do?”
33
Remind me what we are trying to do?• Use the social web as a proxy for the population?
• Understand how the social web is responding – for the benefit of those solely interested in this sub-set of the population as a channel or marketplace?
• Access particularly niches which are more concentrated online than off?
• Detect significant events?
• Measure shifts and changes?
• Make rough comparisons?
• Discover new insights, themes and connections?
© 2012 GfK NOP 34
Different client needs indicate different SMM approaches For example - Precision Extraction vs ‘Trawl & Filter’
Crude mention & mood tracking
Quantitative - Brand tracking and integration with traditional researchIndicative Qual
e.g. using trends and volumes to guide focus of analysisExploratory Qual – more
complex collection. Manually manageable volumes and ‘tuning’
Higher data volumesfrom simple search terms
Lower data volumesfrom targeted & compound search terms
More post processing, applied to data by MR agency - to reduce noise and refine sentiment attribution
Accept raw data output from application
Not radical enough!
Too much like hard work?
Sensible
35
Rather than wait for NLP utopia…
Settle, for now, on:
1. SMM as a powerful and novel Qual exploration tool
2. Big number crunching, on single terms, that takes a “hyena” approach. i.e.
Accept all* occurrences of a brand or product name in posts as an indication of significance… even the ‘trending’ spam and the adverts and the competitions…
Look for pure correlations between words/phrases and other word/phrases…
Or between trends in these numbers and classes of offline events – such as sales, complaints and other behaviours… with a view to predicting, explaining or causing such events in the future.
*Except for the most obvious duplication errors such as over-indexing
36
5. Some Concquestions
37
Talking PointsHow will commercial SMM applications and services with the best accuracy, reach and relevance capabilities be recognised, validated and promoted?
If you’re a researcher and you want to use this stuff, for the first time, tomorrow… what must be done?
Fortunately – there’s enough to learn by “super-googleing”, browsing and crude trend tracking to keep us going… and learning… for some time to come. Is that, whilst pragmatic, enough of an ambition?
38
Dr Nick BuckleySoShall Consulting
Tel: 07958 516967 t: @grimboldE: [email protected]
Babita EarleDigital Strategy
DirectorGfK NOP
Tel: 020 7890 9467 E: [email protected]