Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by...
-
Upload
silvia-griffith -
Category
Documents
-
view
215 -
download
3
Transcript of Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by...
Mining the Semantic Web:Requirements for Machine Learning
Fabio Ciravegna, Sam ChapmanPresented by
Steve Hookway10/20/05
What is the Semantic Web A way to automate reasoning with
web data RDF
A uniform way to describe resources (subject,predicate,object)
Ontology Hierarchical structure of data Property restrictions Implicit typing
Adding Meta-Data
A prerequisite for Semantic Web (SW) is structured knowledge
Manual Approach Too Much data Trust Issues Noise
This process needs to be automated
Armadillo Automatically annotate web pages Validity based on a number of weak
techniques Redundant Information Rating of Sources Context around a capture
(LP)² - Extraction of knowledge Makes use of Natural Language Processing
(NLP)
(LP)² Induce tagging rules
Generalize NLP and keep best rules <tag> Remove covered instances from pool High Precision, Low Recall
Contextual Tagging Recovers rules and constrains their application
</tag> Correction and Validation
Shifts tags to correct position (within d spaces) Validation
Heterogeneity Armadillo
Uses weak NLP Uses intra-document relation
recognition Requirements
Must adapt to different document types
Relation Extraction
Bootstrapping Learning Armadillo
Unsupervised approach – user only validates
User cannot drive system towards interesting documents and facts
Requirements Identify triples Goal: Bootstrap learning on a large scale
User needs a role to guide learning
Content Cleaning and Normalization Armadillo
Noise added during unsupervised (LP)²
Use the multiple weak evidence to help avoid poor seeds
Requirements Handle noisy training data
Conclusion Semantic Web
Meta-Data Armadillo – a tool for IE
Evidence Building and Validation Extraction of knowledge (LP)²
A survey of requirements in mining web content for SW meta-data