Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by...

9
Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05

Transcript of Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by...

Page 1: Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.

Mining the Semantic Web:Requirements for Machine Learning

Fabio Ciravegna, Sam ChapmanPresented by

Steve Hookway10/20/05

Page 2: Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.

What is the Semantic Web A way to automate reasoning with

web data RDF

A uniform way to describe resources (subject,predicate,object)

Ontology Hierarchical structure of data Property restrictions Implicit typing

Page 3: Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.

Adding Meta-Data

A prerequisite for Semantic Web (SW) is structured knowledge

Manual Approach Too Much data Trust Issues Noise

This process needs to be automated

Page 4: Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.

Armadillo Automatically annotate web pages Validity based on a number of weak

techniques Redundant Information Rating of Sources Context around a capture

(LP)² - Extraction of knowledge Makes use of Natural Language Processing

(NLP)

Page 5: Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.

(LP)² Induce tagging rules

Generalize NLP and keep best rules <tag> Remove covered instances from pool High Precision, Low Recall

Contextual Tagging Recovers rules and constrains their application

</tag> Correction and Validation

Shifts tags to correct position (within d spaces) Validation

Page 6: Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.

Heterogeneity Armadillo

Uses weak NLP Uses intra-document relation

recognition Requirements

Must adapt to different document types

Relation Extraction

Page 7: Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.

Bootstrapping Learning Armadillo

Unsupervised approach – user only validates

User cannot drive system towards interesting documents and facts

Requirements Identify triples Goal: Bootstrap learning on a large scale

User needs a role to guide learning

Page 8: Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.

Content Cleaning and Normalization Armadillo

Noise added during unsupervised (LP)²

Use the multiple weak evidence to help avoid poor seeds

Requirements Handle noisy training data

Page 9: Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.

Conclusion Semantic Web

Meta-Data Armadillo – a tool for IE

Evidence Building and Validation Extraction of knowledge (LP)²

A survey of requirements in mining web content for SW meta-data