Entity Search Engine
-
Upload
drtc-indian-statistical-institute-bangalore -
Category
Technology
-
view
243 -
download
0
description
Transcript of Entity Search Engine
ENTITY SEARCH ENGINE : A NEW SEARCH TOOL
Speaker : Tanmay Mondal , MSLIS 2013-2015
Indian Statistical Institute , Bangalore Documentation Research and Training Centre Seminar ( 1 ) - 2014
OverviewOverview
Present ApproachPresent Approach
Entity SearchEntity Search
Benefit of Entity SearchBenefit of Entity Search
Entity & Its FacetsEntity & Its Facets
Main Work of ESEMain Work of ESE
Popular Entity SearchPopular Entity Search
OKKAM-OKKAM-Enabling a Web of EntitiesEnabling a Web of Entities
Workflow of OkkamWorkflow of Okkam
My LibraryMy Library
ReferencesReferences
Present Approach
● Information is everywhere & it is growing exponentially
● A traditional information extraction approach is to scan every
document in any collection
● As document collection is the set of all web pages indexed by a
search engines
● Time consuming for users for getting pin-pointed information
Person
Location Organization Nationality Religion Product
Phone Number
Email Address/URL
Distance
Date
Time
Money Generic Number
For specific Information
Problem of identifying and linking / grouping different manifestations of the same real world object
Web of Documents Web of Entites
Cluster the records that correspond to same entity
Entity Search
● Entity refers to any object or a thing that can be uniquely identified in
the world
● It's a better match search queries with a database containing hundreds of
millions of "entities"● Each entity is in relation with many entites
● The answer entities have specific information & identifying the right
relationship among the entities● Semantic or faceted search on entities
Why ?
● When people use retrieval systems they are often not searching for
documents or text passages● Summarization of entities and concepts
● The named entities (persons, organizations, locations, products...) play a
central role in answering such information needs
● At least 20-30% of the queries submitted to Web SE are simply entities
● ~71% of Web search queries contain named entities
**Source - Building Taxonomy of Web Search Intents for Name Entity
Queries by Xiaoxin Yin & Sarthak Shah
Benefit of Entity Search
● Entities are often categorized into a taxonomy
● Primary task of the user is often to make a decision
● More structured than document based
● Entity is associated with the same URI across the different repositories
● Entity Information Integration● More understandable by Human
● Increase precision & less Time Consuming
Entity & Its Facets
● An entity must be distinguished from other entities Can be anything
including an abstract thing like Diseases ,Imaginary art etc.
● Type of an entity refers to a generic class into which the given entity is
classified.
● Attribute refers to a property (predicate) associated with an entity.
● Value refers to the value of an attribute (for a given entity).
● Relation provides more information with many entites
● Entity, Prof. S.R. Ranganathan is a person , IBM is an organization
Main Work of ESE
● Entity Retrieval : Entity search engines can return aranked list of entities most relevant for a user query
● Entity Relationship / Fact Mining and Navigation : It discover interesting relationships / facts about the entities associated with their queries
● Prominence Ranking : Detect the popularity of an entity and enable users to browse entities in different categories
● Entity Description Retrieval : Entity description blocks for each entity information about an object in a web page is generally grouped together as an object block
Popular Entity Search
● Product search-Various Products like Books, Electronics, Clothes, etc.
● People search-Experts, Friends, Profile of famous persons, etc.
● Location search-Travel, Address ,Business, Govt Offices, etc.
Idea about entity search engine
Main Work of ESE
● Entity Retrieval : Entity search engines can return aranked list of entities most relevant for a user query
● Entity Relationship / Fact Mining and Navigation : It discover interesting relationships / facts about the entities associated with their queries
● Prominence Ranking : Detect the popularity of an entity and enable users to browse entities in different categories
● Entity Description Retrieval : Entity description blocks for each entity information about an object in a web page is generally grouped together as an object block
Various ESE
● Freebase-http://www.freebase.com/● Sindice-http://sindice.com/● Geneview-http://bc3.informatik.hu-berlin.de/● Okkam-http://www.okkam.org/● WolframAlpha-http://www.wolframalpha.com/● Yatedo-http://www.yatedo.com/● GeoNames-http://www.geonames.org/● Dbpedia-http://dbpedia.org/About● EntityCube-http://entitycube.research.microsoft.com/ etc......
OKKAM-Enabling a Web of Entities
● Any collection of data and information about any type of entities
published on the Web can be integrated into a single virtual,
decentralized, open knowledge base.
● It leads to a faster, more efficient and more precise way to
deal with the flood of information available on the Web today
Entities should not be multiplied beyond necessity
OKKAM ENS
● OKKAM ENS is for entity search, where storage, indexing and matching technology was built for finding an entity given its description
● Every entity (individual, instance, “thing”) is assigned a global identifier, ideally unique
● More than 7.5 million entity repository with more structured formEntity identifiers should not be multiplied beyond necessity
Project Partners
● University of Trento, Italy (Co-Ordinator) ● L3S Research Center, Germany● SAP Research, Germany● Expert System, Italy● Elsevier B.V., Netherlands● Europe Unlimited SA, Belgium● National Microelectronics Application Center (MAC), Ireland● Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland● DERI Galway, Ireland● University of Malaga, Spain● INMARK, Spain● Agenzia Nazionale Stampa Associata (ANSA), Italy
Sources Of Information
● Wikipedia Provides lists of countries, cities, members of particulars
domains which is very common for our search query
● GeoNames contains over 10 million geographical names and consists of
over 9 million unique features of 2.8 million populated places and 5.5
million alternate names
● OkkamDBManager Another important information source for OKKAM
can be generic databases like extranets, online shops or publishing
houses
● OkkamManualEntry Another solution we provide to insert new entities
is the manual case
Data extracted from any unstructed sources more effectively
Cogito Semantic Technology
● Semantic analysis engine and complete semantic
network for a complete understanding of text
● Transforming unstructured information into structured
data
● Identifies the most relevant concepts
● Interprets the meaning of texts
● Precisely extracts information
● Automatically connet entites extracted from sources
SensigrafoSensigrafo● Enables the disambiguation of terms
● It allows Cogito to understand the meaning of words and
context
● Extraction of data and metadata
● Product development, competitive intelligence,marketing
,Finance, Media & Publishing, Oil & Gas, Life Sciences &
Pharma, Government and Telecommunications and many
activities where knowledge sharing is critical
● More than 1 million concepts,more than 4 million
relationships
Workflow of OkkamWorkflow of Okkam
● Storage: A scalable repository of entity profiles, in which billions of entities are assigned an ID and a profile, to distinguish one entity from another
● Matching: Requests from client applications arrive in the form of a bag of keywords or a collection of name value pairs (unstructured or semi-structured queries
● ID storage and management: stores, maintains and makes available for reuse IDs (URIs) for anything which is named in a networked environment
● Lifecycle Management: It takes care of the evolution Storage of the repository and of all entity profiles through different time
Entity Query & Matching in Okkam
ISI
Wolfram|Alpha
● Wolfram|Alpha is an engine for computing answers and
providing knowledge
● It generates output by doing computations from its own
internal knowledge base, instead of searching the
web and returning links
● It is an online service that answers factual queries
directly by computing the answer
● Make all systematic knowledge immediately computable
and accessible to everyone
5 nearest stars
How many newspapers are available in the globe
Overall Difficulties
● The number of entities could be huge
● Information Redundancy
● Information Fragmentation
● Entity Information Integration
● A single algorithm for finegrained entity matching may not exist
● Store and retrieve using IR based techniques
● Matching on very large datasets
● Natural Language Processing
Contd...
● Availability of a knowledge base is less● Multi domain entites ‐● Deduplication Problem● Some names and relationships could be incorrect & the
information may not be updatetodate ● Name disambiguation is still largely unsolved● ESEs are at early age
Creating knowledge bases from text and unstructured data is the goal
My Library
● Entites are for UseEntites are for Use
● Each Entity has its own attributes & relationEach Entity has its own attributes & relation
● Every Entity has its importanceEvery Entity has its importance
● Save the Time for finding out EntitesSave the Time for finding out Entites
● Entites are growing rapidlyEntites are growing rapidly
References
1. Statistical Entity Extraction from Web by Zaiqing Nie, Ji-Rong Wen, and Wei-Ying Ma, Fellow, IEEE2. State of the art in IE, overview, comparison and analysis by Stefan Dumitrescu ,PhD Student3. The Entity Name System: Enabling the Web of Entities by Heiko Stoermer, Themis Palpanas, George Giannakopoulos,University of Trento4. Hybrid entity clustering using crowds and data by Jongwuk Lee, Hyunsouk Cho,Jin-Woo Park,Young-rok Cha,Seung-won Hwang, Zaiqing Nie ,Ji-Rong Wen5. Supporting Entity Search:A Large-Scale Prototype Search Engine byTao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang
References...
6. OKKAM: Enabling a Web of Entities by Paolo Bouquet ,Heiko Stoermer ,Daniel Giacomuzzi ,University of Trento7. Entity Data Management in OKKAM by Themis Palpanas 1 , Junaid Chaudhry 2 , Periklis Andritsos 1 , Yannis Velegrakis 1 ,1 University of Trento,2 Ajou University8. SPACE AND TIME ENTITY REPOSITORY Human-enhanced time-awaremulti media search funded by EU07 See :http://issuu.com/cubrikproject/docs/issuu.cubrik.d41.unitn.wp4.v1.09. http://api.okkam.org/search/10. http://www.wolframalpha.com/