HOMME: Ontological Explorer

80
HOMME: HierarchicalOntological Mind Map Explorer YiShin Chen, PeiLing Hsu, HsiaoShan Hsieh, LiChin Lee, Carlos Argueta Institute of Information Systems and Applications N i l Ti H Ui i National T sing Hua University IDEA Lab

Transcript of HOMME: Ontological Explorer

Page 1: HOMME: Ontological Explorer

HOMME: Hierarchical‐Ontological Mind Map Explorer

Yi‐Shin Chen, Pei‐Ling Hsu, Hsiao‐Shan Hsieh, Li‐Chin Lee, Carlos ArguetaInstitute of Information Systems and Applications

N i l T i H U i iNational Tsing Hua UniversityIDEA Lab

Page 2: HOMME: Ontological Explorer

Outline

• Introduction to IDEA Lab

• Introduction to HOMMEIntroduction to HOMME

• Framework

• Experimental Evaluation

• Conclusions and Future WorkConclusions and Future Work

Page 3: HOMME: Ontological Explorer

llIntelligent Data Engineeringand Applications (IDEA) Laboratoryand Applications (IDEA) Laboratory 

Page 4: HOMME: Ontological Explorer

Research Focus

Query

Mining

Optimization

Query

Storage

Index

DB

Page 5: HOMME: Ontological Explorer

Corresponding Research Issuesp g

AI

HCI Network

DatabaseWeb Pattern Recognition

Page 6: HOMME: Ontological Explorer

CURRENT PROJECTS

Page 7: HOMME: Ontological Explorer

Current Projectsj

• GoogolPlex

– Web information integration and retrievalg

– Topic expansion and integration

Group “answers” based on topic and sentiment– Group  answers  based on topic and sentiment

Page 8: HOMME: Ontological Explorer

GoogolPlex Project (Cont.)g j ( )

• Apply cloud computing to speed up the analysis in large scale and heterogeneous data (Googolplex size)

Page 9: HOMME: Ontological Explorer

GoogolPlex Project (Cont.)g j ( )

R l t d h i• Related research issues– Automatic Ontology construction from heterogeneous data

Page 10: HOMME: Ontological Explorer

GoogolPlex Project (Cont.)g j ( )

• Related research issues– Sentiment analysis for short articles (e.g., micro‐blogs, social network messages) in multi‐language environments

I hate it when it’s rainy and cold!I hate it when it s rainy and cold!

Loved today’s trip.

I can’t believe this happened!

Page 11: HOMME: Ontological Explorer

GoogolPlex Project (Cont.)g j ( )

• Related research issues– Keyword extraction from short articles (e.g., micro‐blogs, social network messages) in multi‐language environments

…task of algorithm analysis consists…

…in a Markov Chain is…

…when sorting is…

Page 12: HOMME: Ontological Explorer

GoogolPlex Project (Cont.)g j ( )

• Related research issues– Semantic analysis for different purposes, such as geo‐tagging

– TweoLocator: A Non‐Intrusive Geographical LocatorSystem for Twitter

Id if h l i f i l i i• Identify the location of a particular Twitter at a given time

Using exclusively the content of his/her tweets– Using exclusively the content of his/her tweets

Page 13: HOMME: Ontological Explorer

HOMME Conceptual Finder Demop

Page 14: HOMME: Ontological Explorer

HOMME Ontology Builder Demo(cont’d)gy ( )

Page 15: HOMME: Ontological Explorer

TweoLocator: Framework

Page 16: HOMME: Ontological Explorer
Page 17: HOMME: Ontological Explorer

TweoLocator: Experimental Resultsp

50%

60%

70%

80%

350

400

450

500

20%

30%

40%

50%

100

150

200

250

300

Tweets

70%80%90%100%

200

250

US GB CA AU INOTHERS

Avg Acc

0%

10%

0

50

100

30%40%50%60%70%

100

150

ProfilesCorrect tweets 463 288 353 169 125 23 65.6%

Unrelated Tweets 110 55 114 53 41 18 18.1%

Disagreed & Reallocated

142 175 22 0 14 0 16.3%

Accuracy 65% 56% 72% 76% 69% 56% 66%

US GB PH CA Others AU SE

Correct 240 88 39 28 26 22 9

0%10%20%

0

50y

Wrong 24 3 0 2 0 0 0

N/A 44 17 3 6 1 3 1

Disagreement 16 0 0 1 0 0 0

Accuracy 74% 81% 93% 76% 96% 88% 90%

Page 18: HOMME: Ontological Explorer

Current Projectsj

• GoogolPlex

– Web information integration and retrievalg

• iConductI i d i– Interactive conducting system

Page 19: HOMME: Ontological Explorer

iConduct Projectj

• Analyze the intentions from data streams

• Instantly aggregate user intentions and multimedia data

Page 20: HOMME: Ontological Explorer

Current Projectsj

• GoogolPlex

– Web information integration and retrievalg

• iConductI i d i– Interactive conducting systems

• MyMiningy g– Market analysis

Page 21: HOMME: Ontological Explorer

MyMining Projecty g j

• Mining market information from– Stock data (numerical data)( )

– News, blogs, and micro blogs (text data)

• Find the relationship between Stock Market and social networking sites

Page 22: HOMME: Ontological Explorer

Goal

• In this research, our goal is to build a system which can help us to :p– Automatically integrate the stock news and Identify the events.Identify the events.

– Evaluate the event influence on the industry level and use the information on verifying pricesand use the information on verifying prices movement.

Page 23: HOMME: Ontological Explorer

MyMining Projecth d lMethodology

Off‐line

On‐line

Page 24: HOMME: Ontological Explorer

Current Performance

• Accuracy of four methods:Methods Average 

Accuracy

Pheromone 0.5784574

Adjust 0 5323214Adjust regression

0.5323214

Regression 0.5134457

Blind test 0.3045479

Page 25: HOMME: Ontological Explorer

PEOPLE IN IDEA LAB

Page 26: HOMME: Ontological Explorer

Peoplep

• Current students:– Domestic students: 7

– International students:  8San Lucia

Nationality

i

Myanmar7%

7%

Taiwan46%

Honduras20%

El Salvador

Malaysia6%

Indonesia7%

7%

Page 27: HOMME: Ontological Explorer

INTRODUCTION TO HOMME

Page 28: HOMME: Ontological Explorer

Humans generate Knowledgeg g

• Collecting all human knowledge has always been a recurring goalg g

Page 29: HOMME: Ontological Explorer

Internet Era

• WWW has made collecting all human knowledge possible.g p

Page 30: HOMME: Ontological Explorer

Data Flood

• Redundant

• ScatteredScattered

• Mutually complementary

Page 31: HOMME: Ontological Explorer

Integrationg

• It is crucial to integrate heterogeneous data sources.– Easier access

Summarization– Summarization.

– Less redundancy

Page 32: HOMME: Ontological Explorer

Previous Work (1)( )

• Web data integration and organization based on expert knowledge or collaboratively‐p g ycreated (crowd wisdom) data– Manually– Manually 

– Semi‐automatic

– Automatic

Page 33: HOMME: Ontological Explorer

Previous Work (2)( )

• Wikipedia: most successful collaboratively‐created collection of human knowledge on the gweb

U t t d ti l• Unstructured articles• Structured information (infoboxes)

Page 34: HOMME: Ontological Explorer

Previous Work (3)( )

• Other works used Wikipedia structured data to integrate web data.g– YAGO: 

• Wikipedia Categories + WordNetWikipedia Categories  + WordNet

• http://www.mpi‐inf.mpg.de/yago‐naga/yago/

– DBpedia: • Wikipedia infoboxes

• http://dbpedia.org/About

Page 35: HOMME: Ontological Explorer

Previous Work (4)( )

• Other sources of crowd wisdom studied to integrate and organize web datag g– Social annotations

Search logs– Search logs

Page 36: HOMME: Ontological Explorer

Previous Work (5)( )

• Two approaches to integrate web data:– External Resources to extract relationshipsp

• Relatively small coverage

– Bottom‐up approach to web data integration• Difficulty in labeling the semantic relationships• Difficulty in labeling the semantic relationships

Page 37: HOMME: Ontological Explorer

HOMME

• Relies on multiple heterogeneous “crowd wisdom” data sources.

B i f i• Bottom‐up extraction of semantic relationships present in the web data.

P t i d lik t ti f• Presents a mind map like representation of knowledge for easy navigation

Page 38: HOMME: Ontological Explorer

FRAMEWORK

Page 39: HOMME: Ontological Explorer

Framework

Page 40: HOMME: Ontological Explorer

Data Sources

• Multiple heterogeneous data sources– Search logsg

– Social annotations: Delicious tags

Web directory: Open Directory Project (ODP)– Web directory: Open Directory Project (ODP)

Page 41: HOMME: Ontological Explorer

Framework

Page 42: HOMME: Ontological Explorer

Resource Integratorg

• Normalize and decompose heterogeneous data into smaller elements with common characteristics.

• We use the notion of word sequences and concept sequences

Page 43: HOMME: Ontological Explorer

Word Sequencesq

h h l d d d• Every query in the search log is considered a word sequence

• Every URL in the search log can be decomposed into a word sequenceEvery URL in the search log can be decomposed into a word sequence

– www.mtv.com/music/artist/bowlingforsoupartist.jhtml

<mtv, music, artist, bowling, for, soup, artist>

• All the Delicious tags assigned to a URL are a word sequence

• The ODP title assigned to a URL is a word sequence.

• The ODP category assigned to a URL is turned into a word sequence.– E.g.    air/travel/agent  <air, travel, agent>

Page 44: HOMME: Ontological Explorer

Concept Sequencesp q

• A sequence of words can represent concept

Page 45: HOMME: Ontological Explorer

Framework

Page 46: HOMME: Ontological Explorer

Term Extractor

• For each frequent word sequence it tries to split it into concepts.p p– E.g. Query: “star wars light saber”

Word sequence: <star wars light saber>Word sequence: <star, wars, light, saber>

Concept sequences: <<star, wars>, <light, saber>> 

Page 47: HOMME: Ontological Explorer

Framework

Page 48: HOMME: Ontological Explorer

Term Mapperpp

• Term Mapper uses the output of Term Extractor to build a features matrix.

1. Classify concepts by ODP category.

2. Frequency of tags assigned to queries as features.q y g g q

Page 49: HOMME: Ontological Explorer

Framework

Page 50: HOMME: Ontological Explorer

Relationship Finderp

• Input data from Term Extractor: Word sequences

• Goal of relationship Finder: p– Seeks to find important semantic relationships between word sequencesbetween word sequences

• Challenges:T d t t t did t i d– To detect concept candidates in word sequences

– To gather correlated concept candidates

– To name semantic relationships between concept candidates

Page 51: HOMME: Ontological Explorer

Relationship Finderp

S l i• Solutions:– Rules of detecting concept candidates from word 

sequences • Mapped with existed concepts• Mapped with dictionaries• Mapped with dictionaries• Crowd wisdom

– Frequent queries– ODP titles

• Word sequences containing “of”

C id i th t t d– Considering the contexts among word sequences– Considering the meanings of relationships

Page 52: HOMME: Ontological Explorer

Relationship Finderp

i hi l l i hi• Hierarchical Relationships– Has‐Subclass– Is‐A

• Synonymous RelationshipsSynonymous Relationships– Is‐Equal‐ToHas Meaning– Has‐Meaning

• Other relationships– Has‐Data‐About– Has‐Website

Page 53: HOMME: Ontological Explorer

Relationship Finderp

i hi l l i hi• Hierarchical Relationships– Has‐Subclass

C l i hi i l i– Is‐A

• Synonymous Relationships

Common relationships in ontologies

Synonymous Relationships– Is‐Equal‐ToHas Meaning– Has‐Meaning

• Other relationships– Has‐Data‐About– Has‐Website

Page 54: HOMME: Ontological Explorer

Relationship Finderp

i hi l l i hi• Hierarchical Relationships– Has‐Subclass Top down

class

Has‐Subclass

– Is‐A

• Synonymous RelationshipsclassBottom up

Synonymous Relationships– Is‐Equal‐ToHas Meaning

class

is a– Has‐Meaning

• Other relationships instance

is a

– Has‐Data‐About– Has‐Website

Page 55: HOMME: Ontological Explorer

Has‐Subclass Relationship FinderpCommon relationships in ontologies

• Hierarchical Relationships– Has‐Subclass Top down

class

Has‐Subclass

• Utilizing ODP Categories

• Mapping with crowd wisdoms: frequent queries

class

Mapping with crowd wisdoms: frequent queries

• For instance“ l h ”– Query: “travel agent phone”

– ODP category: air/travel/agent

– Output: travel has‐Subclass travel agent

Page 56: HOMME: Ontological Explorer

Is‐A Relationship Finderp

Hi hi l R l ti hiCommon relationships in ontologies

• Hierarchical Relationships– Is‐A

• Word sequences with crowd wisdom

class

Has‐SubclassBottom up

• Word sequences with crowd wisdom– Queries, ODP titles

• Hierarchies among word sequences

class

– Word sequences with “of”– Additional words for ambiguous words

• For instanceclass

is aFor instance– Query: “apple company”– Ambiguous word: apple

instance

is a

g pp– Additional words: company– Output: apple company Is‐A company

Page 57: HOMME: Ontological Explorer

Relationship Finderp

i hi l l i hi• Hierarchical Relationships– Has‐Subclass– Is‐A

• Synonymous RelationshipsReferring to the same concepts

Synonymous Relationships– Is‐Equal‐ToHas Meaning– Has‐Meaning

• Other relationships– Has‐Data‐About– Has‐Website

Page 58: HOMME: Ontological Explorer

Synonymous Relationship Finder(1)y y p ( )

• Many word sequences refer to the same concepts• Many word sequences refer to the same concepts.

• Is‐Equal‐To– <cartoonnetwork>, and <cartoon, network>

• Has‐Meaning– <ae>, <american, eagle>, and <american, eagle, outfitter>, , g , , g ,

• Finds distinct queries and ODP data referring to same concepts.

• Steps:1. Groups queries based on navigational intention

– Intention inferred from clicked URLs– Groups the navigational queries based on the clicked URL

2. ODP data is added to the groups based on their referring URLs.O data s added to t e g oups based o t e e e g U s

Page 59: HOMME: Ontological Explorer

Synonymous Relationship Finder(2)y y p ( )

• For instance:– Query: “american eagle”Q y g

– Clicked URL: www.ae.com

ODP title: “american eagle outfitter”– ODP title:  american eagle outfitter

– Output:• “ae” has‐Meaning ”American eagle”

• ”American eagle” has‐Meaning “american eagle f ”outfitter”

Page 60: HOMME: Ontological Explorer

Relationship Finderp

i hi l l i hi• Hierarchical Relationships– Has‐Subclass– Is‐A

• Synonymous RelationshipsSynonymous Relationships– Is‐Equal‐ToHas Meaning– Has‐Meaning

• Other relationships– Has‐Data‐About– Has‐Website

Page 61: HOMME: Ontological Explorer

Has‐Data‐About Relationship Finderp

• S t i d d t t t i b• Some terms in word sequences denote concepts present in a web site.

• Finds frequent match between query terms and parts of clicked URLs.

• For instance:– Query: “bowling for soup”– Clicked URL: wwwmtv com/music/artist/bowlingforsoupartist jhtmlClicked URL: www.mtv.com/music/artist/bowlingforsoupartist.jhtml– Output:

• “mtv” has‐Data‐About “music”• “mtv” has‐Data‐About “artist”mtv  has Data About artist• “mtv” has‐Data‐About “bowling for soup”

Page 62: HOMME: Ontological Explorer

Has‐Website Relationship Finderp

d f i d• Uses word sequences from queries, URLs, and ODP titles

• For instance:– Query: “online dictionary”– Clicked URL: www.m‐w.com– ODP title: “merriam‐webster online”– Output:p

• “online dictionary” has‐Website www.m‐w.com• “merriam‐webster online” has‐Website www.m‐w.com

Page 63: HOMME: Ontological Explorer

Iterative Process

• The extracted relationships are used to improve the term extraction process.p p

C i i b h T• Constant interaction between the Term Extractor and the Relationship Finder.

Page 64: HOMME: Ontological Explorer

Framework

Page 65: HOMME: Ontological Explorer

Concept Cluster Finderp

U h f i d b T M• Uses the features matrix generated by Term Mapper.

• Uses k‐means algorithm to cluster queries.

• Each cluster automatically labeled based on cluster yrepresentative.– Features with highest scores

Page 66: HOMME: Ontological Explorer

EXPERIMENTAL EVALUATION

Page 67: HOMME: Ontological Explorer

Setupp

• Three data sources:– Search log by MS Live Labs from US users in May 2006

• 1,512,556 navigational queries extracted

– Open Directory Project (ODP) 

– Delicious tags crawled from February to May 2010

• Implementation:P f d PHP J S i I f Vi T lki– Prototype front end: PHP + JavaScript InfoVis Toolkit

Page 68: HOMME: Ontological Explorer

Demonstration

Page 69: HOMME: Ontological Explorer

Ontology Buildergy

Page 70: HOMME: Ontological Explorer

Demonstration

Page 71: HOMME: Ontological Explorer

Concept Linker (1)p ( )

Page 72: HOMME: Ontological Explorer

Concept Linker (2)p ( )

Page 73: HOMME: Ontological Explorer

Experimental Results – Concept Linkerp p

O k d h k• Our work was compared to other works:– Single‐link Agglomerative Hierarchical clustering(AHC)– DBSCAN

• We want to evaluate ability to discover query clusters.

• Ground truth: manually labeled 50 queries fromGround truth: manually labeled 50 queries from each category.

Page 74: HOMME: Ontological Explorer

HOMME and AHC

Page 75: HOMME: Ontological Explorer

HOMME and DBSCAN

Page 76: HOMME: Ontological Explorer

Experimental Results ‐ Relationship Fi dFinder

• 11 volunteers checked sample of output relationshipsp

E h h k d 100 l f h l i hi• Each checked 100 tuples for each relationship type.– Total 400 output relationships

– All checked same setAll checked same set

Page 77: HOMME: Ontological Explorer

Relationship Finder Evaluated by H E tHuman Expert

Page 78: HOMME: Ontological Explorer

CONCLUSIONS AND FUTURE WORK

Page 79: HOMME: Ontological Explorer

Conclusions

• The proposed approach uses heterogeneous sources to – Effectively cluster queries related to a concept.

Extract relationships between concepts– Extract relationships between concepts automatically.

• The relationships recognized by HOMME are also recognized by humans  most of the time.

Page 80: HOMME: Ontological Explorer

Future Work

• Improve coverage for Relationship Finder

• Add more relationship types

• Improve execution times for offline partImprove execution times for offline part