Leveraging the Power of Social Media

Post on 11-Aug-2014

374 views 8 download

description

A light intro to natural language processing on social media, presented as an invited talk at the University of Sheffield Engineering Symposium 2014 in the AI session. As well as an introduction to the area, this presentation covers powerful real-world applications of social media, and touches on the work we do in the Sheffield NLP group. Video cast: https://www.youtube.com/watch?v=QUbRmUinhHw&feature=youtu.be

Transcript of Leveraging the Power of Social Media

Leveraging the Power of Social Media

Leon Derczynski

Natural Language Processing GroupDepartment of Computer Science

Faculty of EngineeringUniversity of Sheffield

work in the field of “computational linguistics”

focus on turning textinto “understanding”

and “decision support”

the “AI effect”

Pamela McCorduck

artificial intelligence is less impressive when we know how it works

the “AI effect”

Pamela McCorduck

artificial intelligence is less impressive when we know how it works

..so this talk won't have deep technical detail

language

(note huge evolutionary advancement)

??

social media

social media – a poster child for big data

big data:

promises new insights

is (was) a cool buzzword

causes headaches

what is it?

V: velocity

twitter: 255 000 000 users / month

Facebook: 1 280 000 000 users / month

VV: volume

reddit:34 000 000 posts / month

twitter:650 000 000 messages / month

VVV: variety

there are many online social networks

we need one of these

there are many online social networks

we need one of these

artificial intelligence

“Human knowledge is expressed in language. So computational linguistics is very important.”

- Mark Steedman

Start: sequence of bytes

[naturallanguage

processinggoeshere]

End: actionable knowledge

why bother programming at all?

why bother programming at all?

… let the computer program itself!

machine learning:

make decisions about tasks based on things you've seen before

a little bit like human learning

give text and examples of what we want done

machine learns to from these examples

understanding language

social media text is surprisingly formal

they see me rollin

- a typo?

they see me rollinthey hatin

- perhaps not. G-dropping mapped from speech

they see me rollinthey hatinpatrollin

- incidentally,this linguistic phenomenon is a good predictor of

education level

they see me rollinthey hatinpatrollin

tryna catch me ridin dirty

- a new style! flawless; not a single mistake

omb x

- surely they mean “omg”?

omb ✔

- the keys are like, right next to each other

Xreally? this guy?

Shall we go out for dinner this evening?

Ey yo wen u gon let me tap dat

spelling ability distribution in net slang users

with spelling ability distribution in non-slang users

Do you feel luccy, punk?

Do you feel luccy, punk?

challenge 1: what language is this anyway

je bent Jacques cousteau niet die een nieuwe soort heeft ontdekt, het is duidelijk, ze bedekken hun gezicht. Get over it

RT @TomPIngram: VIVA LAS VEGAS 16 - NEWS #constantcontact http://t.co/VrFzZaa7

challenge 2: pls type better

I wonde rif Tsubasa is okay..

- misplaced space = two new words

no homwork tonight.. suprising??

- maybe there should be!

challenge 3: finding names

derekx is a person

milesx might be a person

Marie Clairex should not be a person

Exodus Porter x probably an OK person, but actually a beer

challenge 3: finding names

Spicy Pickle Jr. x apparently actually a person

challenge 3: finding names

Spicy Pickle Jr. x apparently actually a person

???

old news

social media defends against earthquakes2010

Japanese and US quake response times:

down from ~20s to ~17.5s

social media predicts epidemics2012

exhibit a: one dead crow

social media mentions of dead crows predict WNV in humans

''There's a dead crow in my garden''

social media predicts you getting flu2012

@mari: i think im sick ugh..

great potential for misuse :)

this november:

social media dispatches fire engines2014

trust

if hospitals and fire stations act based on tweets,

wrong information is extra-harmful

rumoursspeculation

misinformationdisinformation

who can you trust online?Imagine a lie detector for politicians / Fox News

responsibility

1. Collect tweets2. ????3. Profit!

how long do we keep them for?

- “15 years is OK, right?” - NSA

what do we store and process?

- “just metadata, it's harmless” - GCHQ

(from Kurt Opshal's slides at the Chaos Communication Congress, photo by Marion Marschalek)

bias

newsstyle

socialmedia

most of our language AI was trained on news text

the bias is:

- middle class- white

-working age- educated

- male- 1980s/1990s- from the US

- journalist- following AP guidelines

your phone rewards you if you talk and write like

(ok.. sort of)

your phone rewards you if you talk and write like

(ok.. sort of)

.. and punishes you when you don't.

(not cool!)

twitter bias is different

- not German or Nordic- are young(ish)

lower requirements

- you can publish even if you're not a journalist- still operates beyond the 1990s

some new requirements

- you do need access to the internet...- ...and twitter (对不起,中国人 )

the big picture

we're racing ahead and improving life quality

there is immense value in “trivia”

understanding social media lets ushelp people better

understanding social media lets ushelp people better

Thank you!

Leon Derczynski