Extracting emerging knowledge from social media -

Post on 21-Jan-2018

1.063 views 0 download

Transcript of Extracting emerging knowledge from social media -

Extracting Emerging Knowledgefrom Social Media

Marco Brambilla, Stefano Ceri, Emanuele Della Valle, Riccardo Volonterio, Felix Acero Salazar

marco.brambilla@polimi.itmarcobrambiWWW 2017, Perth, Australia

Humans aim at formalizingknowledge

Ontology is the philosophical study ofthe nature of being, becoming,

existence or realityand the basic categories of being and their

relations.

the nature of being, becoming, existence or reality

the basic categories of being and their relations.

the nature of being, becoming, existence or reality

the basic categories of being and their relations.

Formalizing new knowledge is hard

Only high frequency emerges

The long tail challenge

There are more things In heaven and earth, Horatio, Than are dreamt of in your philosophy.

Shakespeare (Hamlet Act 1, scene 5)

The Answer to the Great Question... Of Life, the Universe and Everything

Data

Information

Knowledge

WisdomContextindependence

Understanding

Understanding relations

Understanding patterns

Understanding principles

Our focus: The Evolving Knowledge

knownsocial

factoid

a

c¬c

bpotentially emerging potentially

decaying

actual and solid

d

Heaven and Heart

How to peer into the world through an effective window?

TWO INGREDIENTS

Social media – the dataDomain experts – the context

Can we use social media to discover and codify emerging knowledge?

Overview

Famous Emerging

Knowledge Enrichment Setting

HF Entity1 HF Entity5

HF Entity2 HF Entity4

HF Entity3

LF Entity1 ??

LF Entity2 LF Entity4

LF Entity3

??

High FrequencyEntities

Low FrequencyEntities

??

?? ????

??

Type1

Type11

Type2Type111

Instances Types

<<instanceof>>

<<instanceof>>

<<ins

tance

of>>

<<instanceof>>

<<instanceof>>

<<instanceof>>

??

??

??

??

??

Seed Entity

Seed Type Type of interest

Legend

Expert inputs

Enrichment problems

Property2

Relations HF - LF entities

Relations LF - LF entities

Typing of LF entities

Extraction of new LF entities

Property1

?? ?? ??Finding attribute values

Emerging Knowledge Harvesting

Input (1): Domain Specific TypesTypes selected by the expert

Relevant for the domain

Input (2): Seeds (emerging entities)Known and selected by the domain expert

Belonging to an expert type

Thoroughly Described

# @ a

Objectives

(1) Discover candidate unknown emerging entities(2) Determine the relevance of the candidate(3) Determine the type of the candidate

Step (1): Social Media Sourcing

Collect content produced by the seeds

Step (2): Candidate Extraction

Potentially any entity extracted from the social streams of the seeds

Resulting in huge sets of candidates

Our hyp.: take only SN users as candidates

# @ w

@

Step (3): Candidate PruningInitial pruning of candidates based on

TF-DF:= df * ttf / (N – df +1)

Where: df = Number of seeds with which a candidate co-occurs with;ttf = Total number of times a candidate occurs in the analyzed content;

N = Number of seeds.

Ranking + threshold

(*) variant of TF-IDF that does not discount document frequency because we are actually happy about frequent appearance (we don’t look for information entropy!)

Step (4): Candidate Description

Repeat social media sourcing for candidates

A potentially good candidate is one that behaves similarly to one or more of the seeds

Our hyp.: Talks about the same things# @ w

Step (5): Candidate Ranking

Seed centroid

Step (6): Feature selection

Purely syntacticonly user handles (accounts)handles and hashtags

Semantic:based on entity extraction / Dbpediabased on deep learning on images / ClarifAI

Step (6): Semantic Feature selection for text

9 basic strategies

Generating 18 combinations of T + E strategies

990 semantic strategies evaluated18 alternative feature vectors

11 different weighting values for aggregations

5 levels of recall for entity extraction

( + 3 different distance functions analyzed)

Experiments

Fashion BrandsWriters

Exhibitions

Emerging Australian Writers – 22 seeds

http://www.emergingwritersfestival.org.au/ in June in Melbourne

Emerging Australian WritersWeighting parameter

Entity extraction recall

Emerging Australian WritersPrecision @ K for two strategies

EHE—AST CHE—AST

Cross-scenario39 strategies always outperform the syntactic one

Writers

Expo

Fashion

Conclusions

Extraction of relevant emerging entities

Top, Fast and Reliable are the important

Off-the-shelf or as-a-service tools

Repeatability in time (years!)Recursion (candidates to seeds)

Multi-source data collection

Multiple typesEmerging relations

Emerging types

Challenges ahead

You can try it yourself!http://datascience.deib.polimi.it/social-knowledge

THANKS! QUESTIONS?

Marco Brambilla, Stefano Ceri, Emanuele Della Valle, Riccardo Volonterio, Felix Acero SalazarExtracting Emerging Knowledge from Social Media

Marco Brambilla @marcobrambi marco.brambilla@polimi.ithttp://datascience.deib.polimi.it http://home.deib.polimi.it/marcobrambi