Exploring Content with Wikipedia
-
Upload
yegin-genc -
Category
Technology
-
view
204 -
download
0
description
Transcript of Exploring Content with Wikipedia
![Page 1: Exploring Content with Wikipedia](https://reader034.fdocuments.in/reader034/viewer/2022042701/559e9ee81a28abcd048b478d/html5/thumbnails/1.jpg)
Exploring Content with Semantic Transformations
using Collaborative Knowledge Bases
Yegin Genc
Prof. Jeffrey V. Nickerson
![Page 2: Exploring Content with Wikipedia](https://reader034.fdocuments.in/reader034/viewer/2022042701/559e9ee81a28abcd048b478d/html5/thumbnails/2.jpg)
OBJECTIVE
Understanding text automatically to support search driven exploratory activities.
![Page 3: Exploring Content with Wikipedia](https://reader034.fdocuments.in/reader034/viewer/2022042701/559e9ee81a28abcd048b478d/html5/thumbnails/3.jpg)
EXPLORATORY SEARCH
LOOKUP LEARN INVESTIGATE
Fact retrievalKnown item searchNavigation
Knowledge acquisitionComprehension/interpretationComparison
AccretionAnalysisExclusion/Negation
Marchionini, G. (2006)
![Page 4: Exploring Content with Wikipedia](https://reader034.fdocuments.in/reader034/viewer/2022042701/559e9ee81a28abcd048b478d/html5/thumbnails/4.jpg)
EXPLORATORY SEARCH
ILL-STRUCTURED PROBLEM
• No single right approach
• Problem definitions change as new information is gathered
![Page 5: Exploring Content with Wikipedia](https://reader034.fdocuments.in/reader034/viewer/2022042701/559e9ee81a28abcd048b478d/html5/thumbnails/5.jpg)
Foreign minorities, Germany
![Page 6: Exploring Content with Wikipedia](https://reader034.fdocuments.in/reader034/viewer/2022042701/559e9ee81a28abcd048b478d/html5/thumbnails/6.jpg)
Text: “ Foreign Minorities Germany ”
![Page 7: Exploring Content with Wikipedia](https://reader034.fdocuments.in/reader034/viewer/2022042701/559e9ee81a28abcd048b478d/html5/thumbnails/7.jpg)
Exploratory Search Task
Given a journal abstract, rank other abstracts based on their relevancy to the seed abstract.
Evaluation is based on relevancy and diversity.
![Page 8: Exploring Content with Wikipedia](https://reader034.fdocuments.in/reader034/viewer/2022042701/559e9ee81a28abcd048b478d/html5/thumbnails/8.jpg)
D: Documents K: Concepts W: Words
= d
k
DOCUMENT – CONCEPTΘ (D x K)
*d
k
DOCUMENT – W0RDD (D x W )
CONCEPT– WORDK (W x K)
Argsort (row.sum(Θ) )
Seed Document
Candidates
n-grams(1 to 3)
Concepts
(candidates that match to a Wikipedia Page title and connected through Ontology)
Tf-idf(D) Tf-idf(K)
![Page 9: Exploring Content with Wikipedia](https://reader034.fdocuments.in/reader034/viewer/2022042701/559e9ee81a28abcd048b478d/html5/thumbnails/9.jpg)
EXTRACTING CONCEPT NETWORK
“Representation independence formally characterizes theencapsulation provided by language constructs for dataabstraction and justifies reasoning by simulation.Representation independence has been shown for avariety of languages and constructs but not for sharedreferences to mutable state; indeed it fails in general forsuch languages. This article formulates representationindependence for classes, in an imperative, object-oriented language with pointers, subclassing and dynamicdispatch, class oriented visibility control, recursive typesand methods, and a simple form of module. An instanceof a class is considered to implement an abstraction usingprivate fields and so-called representation objects.Encapsulation of representation objects is expressed by arestriction, called confinement, on aliasing.Representation independence is proved for programssatisfying the confinement condition. A static analysis isgiven for confinement that accepts common designs suchas the observer and factory patterns. The formalizationtakes into account not only the usual interface between aclient and a class that provides an abstraction but also theinterface (often called \\protected\\") between the classand its subclasses."
![Page 10: Exploring Content with Wikipedia](https://reader034.fdocuments.in/reader034/viewer/2022042701/559e9ee81a28abcd048b478d/html5/thumbnails/10.jpg)
EXTRACTING CONCEPT NETWORK
“Representation independence formally characterizes theencapsulation provided by language constructs for dataabstraction and justifies reasoning by simulation.Representation independence has been shown for avariety of languages and constructs but not for sharedreferences to mutable state; indeed it fails in general forsuch languages. This article formulates representationindependence for classes, in an imperative, object-oriented language with pointers, subclassing and dynamicdispatch, class oriented visibility control, recursive typesand methods, and a simple form of module. An instanceof a class is considered to implement an abstraction usingprivate fields and so-called representation objects.Encapsulation of representation objects is expressed by arestriction, called confinement, on aliasing.Representation independence is proved for programssatisfying the confinement condition. A static analysis isgiven for confinement that accepts common designs suchas the observer and factory patterns. The formalizationtakes into account not only the usual interface between aclient and a class that provides an abstraction but also theinterface (often called \\protected\\") between the classand its subclasses."
![Page 11: Exploring Content with Wikipedia](https://reader034.fdocuments.in/reader034/viewer/2022042701/559e9ee81a28abcd048b478d/html5/thumbnails/11.jpg)
WIKIPEDIA PAGES AS CONCEPTS
Solar System“The Solar System[a] consists of the Sun and the astronomical objectsgravitationally bound in orbitaround it, all of which formedfrom the collapse of a giant molecular cloudapproximately 4.6 billion years ago…”
(http://en.wikipedia.org/wiki/Solar_System)
Word Stem Occ. Freq.
abstract 53 0.056
program 44 0.046
langu 33 0.035
spec 16 0.017
comput 12 0.013
conceiv 12 0.013
dat 12 0.013
bk = p(Wi | k) ={Wi Î k}
{Wi Î k}i
N
å
βk : Per-concept word distribution
![Page 12: Exploring Content with Wikipedia](https://reader034.fdocuments.in/reader034/viewer/2022042701/559e9ee81a28abcd048b478d/html5/thumbnails/12.jpg)
RANKING DOCUMENTS
D: Documents K: ConceptsW: Words
=d
k
DOCUMENT – CONCEPTΘ (D x K)
*d
k
DOCUMENT – W0RDD (D x W )
CONCEPT– WORDK (W x K)
![Page 13: Exploring Content with Wikipedia](https://reader034.fdocuments.in/reader034/viewer/2022042701/559e9ee81a28abcd048b478d/html5/thumbnails/13.jpg)
SORT DOCUMENTS
D: Documents K: ConceptsW: Words
=d
k
DOCUMENT – CONCEPTΘ (D x K)
*d
k
DOCUMENT – W0RDD (D x W )
CONCEPT– WORDK (W x K)
![Page 14: Exploring Content with Wikipedia](https://reader034.fdocuments.in/reader034/viewer/2022042701/559e9ee81a28abcd048b478d/html5/thumbnails/14.jpg)
EXPERIMENT
Given a journal abstract, rank other abstracts based on their relevancy to the seed abstract.
• Data: 619 abstracts of the Journal of the ACM (JACM) and their references.
• Task: Select Top-k (5,10,15, and 20) relevant abstracts.
• Observe: Relevancy (measured by LSA vector similarity) and Diversity (measured through the coverage of the references.)
![Page 15: Exploring Content with Wikipedia](https://reader034.fdocuments.in/reader034/viewer/2022042701/559e9ee81a28abcd048b478d/html5/thumbnails/15.jpg)
MAXIMAL MARGINAL RELEVANCE
• a measure to increase the diversity of documents retrieved by an IR system
-Similarity to query: BM25 (Xapian1)-Similarity to results: LSA similarity (Gensim2)
1. http://xapian.org
2. http://radimrehurek.com/gensim/
![Page 16: Exploring Content with Wikipedia](https://reader034.fdocuments.in/reader034/viewer/2022042701/559e9ee81a28abcd048b478d/html5/thumbnails/16.jpg)
MMR RESULTS
![Page 17: Exploring Content with Wikipedia](https://reader034.fdocuments.in/reader034/viewer/2022042701/559e9ee81a28abcd048b478d/html5/thumbnails/17.jpg)
WIKI-BASED MODEL VS MMR
![Page 18: Exploring Content with Wikipedia](https://reader034.fdocuments.in/reader034/viewer/2022042701/559e9ee81a28abcd048b478d/html5/thumbnails/18.jpg)
CONCLUDING REMARKS
• Our Wiki based technique provides high diversity with low relevancy loss.
• Semantics embedded in concept networks extracted from Wikipedia can improve exploratory search tasks.