Extracting emerging knowledge from social media -

35
Extracting Emerging Knowledge from Social Media Marco Brambilla, Stefano Ceri, Emanuele Della Valle, Riccardo Volonterio, Felix Acero Salazar [email protected] marcobrambi WWW 2017, Perth, Australia

Transcript of Extracting emerging knowledge from social media -

Page 1: Extracting emerging knowledge from social media -

Extracting Emerging Knowledgefrom Social Media

Marco Brambilla, Stefano Ceri, Emanuele Della Valle, Riccardo Volonterio, Felix Acero Salazar

[email protected] 2017, Perth, Australia

Page 2: Extracting emerging knowledge from social media -

Humans aim at formalizingknowledge

Page 3: Extracting emerging knowledge from social media -

Ontology is the philosophical study ofthe nature of being, becoming,

existence or realityand the basic categories of being and their

relations.

Page 4: Extracting emerging knowledge from social media -

the nature of being, becoming, existence or reality

the basic categories of being and their relations.

Page 5: Extracting emerging knowledge from social media -

the nature of being, becoming, existence or reality

the basic categories of being and their relations.

Page 6: Extracting emerging knowledge from social media -

Formalizing new knowledge is hard

Only high frequency emerges

The long tail challenge

Page 7: Extracting emerging knowledge from social media -

There are more things In heaven and earth, Horatio, Than are dreamt of in your philosophy.

Shakespeare (Hamlet Act 1, scene 5)

Page 8: Extracting emerging knowledge from social media -

The Answer to the Great Question... Of Life, the Universe and Everything

Data

Information

Knowledge

WisdomContextindependence

Understanding

Understanding relations

Understanding patterns

Understanding principles

Page 9: Extracting emerging knowledge from social media -

Our focus: The Evolving Knowledge

knownsocial

factoid

a

c¬c

bpotentially emerging potentially

decaying

actual and solid

d

Page 10: Extracting emerging knowledge from social media -

Heaven and Heart

How to peer into the world through an effective window?

TWO INGREDIENTS

Social media – the dataDomain experts – the context

Page 11: Extracting emerging knowledge from social media -

Can we use social media to discover and codify emerging knowledge?

Page 12: Extracting emerging knowledge from social media -

Overview

Page 13: Extracting emerging knowledge from social media -

Famous Emerging

Page 14: Extracting emerging knowledge from social media -

Knowledge Enrichment Setting

HF Entity1 HF Entity5

HF Entity2 HF Entity4

HF Entity3

LF Entity1 ??

LF Entity2 LF Entity4

LF Entity3

??

High FrequencyEntities

Low FrequencyEntities

??

?? ????

??

Type1

Type11

Type2Type111

Instances Types

<<instanceof>>

<<instanceof>>

<<ins

tance

of>>

<<instanceof>>

<<instanceof>>

<<instanceof>>

??

??

??

??

??

Seed Entity

Seed Type Type of interest

Legend

Expert inputs

Enrichment problems

Property2

Relations HF - LF entities

Relations LF - LF entities

Typing of LF entities

Extraction of new LF entities

Property1

?? ?? ??Finding attribute values

Page 15: Extracting emerging knowledge from social media -

Emerging Knowledge Harvesting

Page 16: Extracting emerging knowledge from social media -

Input (1): Domain Specific TypesTypes selected by the expert

Relevant for the domain

Page 17: Extracting emerging knowledge from social media -

Input (2): Seeds (emerging entities)Known and selected by the domain expert

Belonging to an expert type

Thoroughly Described

# @ a

Page 18: Extracting emerging knowledge from social media -

Objectives

(1) Discover candidate unknown emerging entities(2) Determine the relevance of the candidate(3) Determine the type of the candidate

Page 19: Extracting emerging knowledge from social media -

Step (1): Social Media Sourcing

Collect content produced by the seeds

Page 20: Extracting emerging knowledge from social media -

Step (2): Candidate Extraction

Potentially any entity extracted from the social streams of the seeds

Resulting in huge sets of candidates

Our hyp.: take only SN users as candidates

# @ w

@

Page 21: Extracting emerging knowledge from social media -

Step (3): Candidate PruningInitial pruning of candidates based on

TF-DF:= df * ttf / (N – df +1)

Where: df = Number of seeds with which a candidate co-occurs with;ttf = Total number of times a candidate occurs in the analyzed content;

N = Number of seeds.

Ranking + threshold

(*) variant of TF-IDF that does not discount document frequency because we are actually happy about frequent appearance (we don’t look for information entropy!)

Page 22: Extracting emerging knowledge from social media -

Step (4): Candidate Description

Repeat social media sourcing for candidates

A potentially good candidate is one that behaves similarly to one or more of the seeds

Our hyp.: Talks about the same things# @ w

Page 23: Extracting emerging knowledge from social media -

Step (5): Candidate Ranking

Seed centroid

Page 24: Extracting emerging knowledge from social media -

Step (6): Feature selection

Purely syntacticonly user handles (accounts)handles and hashtags

Semantic:based on entity extraction / Dbpediabased on deep learning on images / ClarifAI

Page 25: Extracting emerging knowledge from social media -

Step (6): Semantic Feature selection for text

9 basic strategies

Generating 18 combinations of T + E strategies

Page 26: Extracting emerging knowledge from social media -

990 semantic strategies evaluated18 alternative feature vectors

11 different weighting values for aggregations

5 levels of recall for entity extraction

( + 3 different distance functions analyzed)

Page 27: Extracting emerging knowledge from social media -

Experiments

Fashion BrandsWriters

Exhibitions

Page 28: Extracting emerging knowledge from social media -

Emerging Australian Writers – 22 seeds

http://www.emergingwritersfestival.org.au/ in June in Melbourne

Page 29: Extracting emerging knowledge from social media -

Emerging Australian WritersWeighting parameter

Entity extraction recall

Page 30: Extracting emerging knowledge from social media -

Emerging Australian WritersPrecision @ K for two strategies

EHE—AST CHE—AST

Page 31: Extracting emerging knowledge from social media -

Cross-scenario39 strategies always outperform the syntactic one

Writers

Expo

Fashion

Page 32: Extracting emerging knowledge from social media -

Conclusions

Extraction of relevant emerging entities

Top, Fast and Reliable are the important

Off-the-shelf or as-a-service tools

Page 33: Extracting emerging knowledge from social media -

Repeatability in time (years!)Recursion (candidates to seeds)

Multi-source data collection

Multiple typesEmerging relations

Emerging types

Challenges ahead

Page 34: Extracting emerging knowledge from social media -

You can try it yourself!http://datascience.deib.polimi.it/social-knowledge

Page 35: Extracting emerging knowledge from social media -

THANKS! QUESTIONS?

Marco Brambilla, Stefano Ceri, Emanuele Della Valle, Riccardo Volonterio, Felix Acero SalazarExtracting Emerging Knowledge from Social Media

Marco Brambilla @marcobrambi [email protected]://datascience.deib.polimi.it http://home.deib.polimi.it/marcobrambi