Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
-
Upload
william-murphy -
Category
Documents
-
view
219 -
download
0
description
Transcript of Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
Toward Semantic Search:RDFa based facet browser
Jin Guang ZhengTetherless World Constellation
Introduction
The current state of the art in search:– Keyword based search mechanism
• Easy to use, low learning curve• Use statistics analysis, machine learning, and natural
language processing technologies to improve search result
Problem:– limited conceptual level understanding on both queries &
documents• “Jaguar”: the car vs the animal• “Understand” the document base on most frequent keyword
– Lack of inference: • ISWC and sub-events
Research Question
Problem 1: Conceptual level understanding on queries and documents.
How can we use semantic web technologies to improve search results by helping search engine “understand” user's intention to search and “understand” the content of the document?
ChallengesUnderstand User's intention to search:
Trade off:
Usability
More semantics (Structured Query)
Need to find the right point where usability and semantic can both be satisfied
Challenges
1. Unstructured Document: Most documents are unstructured text encode in html format. Hard to perform structured query against unstructured data. Need Structured data in/for documents.
2. Perform structured query against documents with structured data.
Approach: User Side
Facet Browse:– Construct the structured query– Help user filter, navigate the search result
Example:
CarAnimal
Approach:Document Side
RDFa or Other Metadata format:– Embedding Structured Metadata into the document– Index RDFa data: “understand” the document base on the
structured data.
Example:<div about=”#Jaguar” typeof=”_:Car”>.....</div>
Research PlanTimeline & Tasks
Research on:1. RDFa Parsing – How current parsers work? Do they parse RDFa correctly? Time? – 2 weeks: Collect parsers, and testing data, perform test on the parsers and collect testing results
4. Analyze Exisiting RDFa data – How much data? What vocabularies? – 3 weeks: Crawl RDFa data, perform analysis on the vocabularies
5. RDFa Indexing – How to index RDFa data so we can retrieve the document through RDFa data? – 4 weeks: Develop an indexing algorithm and test algorithm
2. Facet Generation – What vocabularies? How many facets? – 2 weeks: Perfom analysis on vocabularies and documents
2. Facet Ranking – Which facet can really help user? – 3 weeks: Develop ranking algorithm and test algorithm
Questions
THANK YOU !