Entity Search Engine

42
ENTITY SEARCH ENGINE : A NEW SEARCH TOOL Speaker : Tanmay Mondal , MSLIS 2013-2015 Indian Statistical Institute , Bangalore Documentation Research and Training Centre Seminar ( 1 ) - 2014

description

It is quite often observed that when people use retrieval systems, they do not just search documents or text passages in the first place, but for some information contained inside, which is related to some entities, for instance, person, organization, location, events, time, etc. The goal is to find out various kinds of valuable semantic information about real-world entites embedded in different web pages and databases. But It is a difficult task for us to find out specific or exact information about entities from present search engines. So we need search engines, which will identify our queries across different domains and extract structured information about entities.

Transcript of Entity Search Engine

Page 1: Entity Search Engine

ENTITY SEARCH ENGINE : A NEW SEARCH TOOL

Speaker : Tanmay Mondal , MSLIS 2013-2015

Indian Statistical Institute , Bangalore Documentation Research and Training Centre Seminar ( 1 ) - 2014

Page 2: Entity Search Engine

OverviewOverview

Present ApproachPresent Approach

Entity SearchEntity Search

Benefit of Entity SearchBenefit of Entity Search

Entity & Its FacetsEntity & Its Facets

Main Work of ESEMain Work of ESE

Popular Entity SearchPopular Entity Search

OKKAM-OKKAM-Enabling a Web of EntitiesEnabling a Web of Entities

Workflow of OkkamWorkflow of Okkam

My LibraryMy Library

ReferencesReferences

Page 3: Entity Search Engine

Present Approach

● Information is everywhere & it is growing exponentially

● A traditional information extraction approach is to scan every

document in any collection

● As document collection is the set of all web pages indexed by a

search engines

● Time consuming for users for getting pin-pointed information

Page 4: Entity Search Engine

Person

Location Organization Nationality Religion Product

Phone Number

Email Address/URL

Distance

Date

Time

Money Generic Number

For specific Information

Problem of identifying and linking / grouping different manifestations of the same real world object

Page 5: Entity Search Engine

Web of Documents Web of Entites

Cluster the records that correspond to same entity

Page 6: Entity Search Engine

Entity Search

● Entity refers to any object or a thing that can be uniquely identified in

the world

● It's a better match search queries with a database containing hundreds of

millions of "entities"● Each entity is in relation with many entites

● The answer entities have specific information & identifying the right

relationship among the entities● Semantic or faceted search on entities

Page 7: Entity Search Engine

Why ?

● When people use retrieval systems they are often not searching for

documents or text passages● Summarization of entities and concepts

● The named entities (persons, organizations, locations, products...) play a

central role in answering such information needs

● At least 20-30% of the queries submitted to Web SE are simply entities

● ~71% of Web search queries contain named entities

**Source - Building Taxonomy of Web Search Intents for Name Entity

Queries by Xiaoxin Yin & Sarthak Shah

Page 8: Entity Search Engine

Benefit of Entity Search

● Entities are often categorized into a taxonomy

● Primary task of the user is often to make a decision

● More structured than document based

● Entity is associated with the same URI across the different repositories

● Entity Information Integration● More understandable by Human

● Increase precision & less Time Consuming

Page 9: Entity Search Engine

Entity & Its Facets

● An entity must be distinguished from other entities Can be anything

including an abstract thing like Diseases ,Imaginary art etc.

● Type of an entity refers to a generic class into which the given entity is

classified.

● Attribute refers to a property (predicate) associated with an entity.

● Value refers to the value of an attribute (for a given entity).

● Relation provides more information with many entites

● Entity, Prof. S.R. Ranganathan is a person , IBM is an organization

Page 10: Entity Search Engine

Main Work of ESE

● Entity Retrieval : Entity search engines can return aranked list of entities most relevant for a user query

● Entity Relationship / Fact Mining and Navigation : It discover interesting relationships / facts about the entities associated with their queries

● Prominence Ranking : Detect the popularity of an entity and enable users to browse entities in different categories

● Entity Description Retrieval : Entity description blocks for each entity information about an object in a web page is generally grouped together as an object block

Page 11: Entity Search Engine

Popular Entity Search

● Product search-Various Products like Books, Electronics, Clothes, etc.

● People search-Experts, Friends, Profile of famous persons, etc.

● Location search-Travel, Address ,Business, Govt Offices, etc.

Page 12: Entity Search Engine
Page 13: Entity Search Engine

Idea about entity search engine

Page 14: Entity Search Engine

Main Work of ESE

● Entity Retrieval : Entity search engines can return aranked list of entities most relevant for a user query

● Entity Relationship / Fact Mining and Navigation : It discover interesting relationships / facts about the entities associated with their queries

● Prominence Ranking : Detect the popularity of an entity and enable users to browse entities in different categories

● Entity Description Retrieval : Entity description blocks for each entity information about an object in a web page is generally grouped together as an object block

Page 15: Entity Search Engine

Various ESE

● Freebase-http://www.freebase.com/● Sindice-http://sindice.com/● Geneview-http://bc3.informatik.hu-berlin.de/● Okkam-http://www.okkam.org/● WolframAlpha-http://www.wolframalpha.com/● Yatedo-http://www.yatedo.com/● GeoNames-http://www.geonames.org/● Dbpedia-http://dbpedia.org/About● EntityCube-http://entitycube.research.microsoft.com/ etc......

Page 16: Entity Search Engine

OKKAM-Enabling a Web of Entities

● Any collection of data and information about any type of entities

published on the Web can be integrated into a single virtual,

decentralized, open knowledge base.

● It  leads  to  a  faster,  more  efficient  and  more  precise  way  to 

deal with the flood of information available on the Web today

Entities should not be multiplied beyond necessity

Page 17: Entity Search Engine

OKKAM ENS

● OKKAM  ENS  is  for  entity  search,  where  storage,  indexing and matching technology was built for finding an entity given its description

● Every entity (individual, instance, “thing”) is assigned a global identifier, ideally unique

● More than 7.5 million entity repository with more structured formEntity identifiers should not be multiplied beyond necessity

Page 18: Entity Search Engine

Project Partners

● University of Trento, Italy (Co-Ordinator) ● L3S Research Center, Germany● SAP Research, Germany● Expert System, Italy● Elsevier B.V., Netherlands● Europe Unlimited SA, Belgium● National Microelectronics Application Center (MAC), Ireland● Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland● DERI Galway, Ireland● University of Malaga, Spain● INMARK, Spain● Agenzia Nazionale Stampa Associata (ANSA), Italy

Page 19: Entity Search Engine
Page 20: Entity Search Engine

Sources Of Information

● Wikipedia Provides lists of countries, cities, members of particulars

domains which is very common for our search query

● GeoNames contains over 10 million geographical names and consists of

over 9 million unique features of 2.8 million populated places and 5.5

million alternate names

● OkkamDBManager Another important information source for OKKAM

can be generic databases like extranets, online shops or publishing

houses

● OkkamManualEntry Another solution we provide to insert new entities

is the manual case

Page 21: Entity Search Engine

Data extracted from any unstructed sources more effectively

Page 22: Entity Search Engine

Cogito Semantic Technology

● Semantic analysis engine and complete semantic

network for a complete understanding of text

● Transforming unstructured information into structured

data

● Identifies the most relevant concepts

● Interprets the meaning of texts

● Precisely extracts information

● Automatically connet entites extracted from sources

Page 23: Entity Search Engine

SensigrafoSensigrafo● Enables the disambiguation of terms

● It allows Cogito to understand the meaning of words and

context

● Extraction of data and metadata

● Product development, competitive intelligence,marketing

,Finance, Media & Publishing, Oil & Gas, Life Sciences &

Pharma, Government and Telecommunications and many

activities where knowledge sharing is critical

● More than 1 million concepts,more than 4 million

relationships

Page 24: Entity Search Engine
Page 25: Entity Search Engine
Page 26: Entity Search Engine

Workflow of OkkamWorkflow of Okkam

● Storage: A scalable repository of entity profiles, in which billions of entities are assigned an ID and a profile, to distinguish one entity from another

● Matching: Requests from client applications arrive in the form of a bag of keywords or a collection of name value pairs (unstructured or semi-structured queries

● ID storage and management: stores, maintains and makes available for reuse IDs (URIs) for anything which is named in a networked environment

● Lifecycle Management: It takes care of the evolution Storage of the repository and of all entity profiles through different time

Page 27: Entity Search Engine

Entity Query & Matching in Okkam

Page 28: Entity Search Engine
Page 29: Entity Search Engine
Page 30: Entity Search Engine

ISI

Page 31: Entity Search Engine

Wolfram|Alpha

● Wolfram|Alpha is an engine for computing answers and

providing knowledge

● It generates output by doing computations from its own

internal knowledge base, instead of searching the

web and returning links

● It is an online service that answers factual queries

directly by computing the answer

● Make all systematic knowledge immediately computable

and accessible to everyone

Page 32: Entity Search Engine

5 nearest stars

Page 33: Entity Search Engine
Page 34: Entity Search Engine

How many newspapers are available in the globe

Page 35: Entity Search Engine
Page 36: Entity Search Engine

Overall Difficulties

●  The number of entities could be huge     

●  Information Redundancy

● Information Fragmentation

● Entity Information Integration

●  A single algorithm for fine­grained entity matching may not exist

●  Store and retrieve using IR based techniques 

●  Matching on very large datasets

● Natural Language Processing

Page 37: Entity Search Engine

Contd...

● Availability of a knowledge base is less● Multi domain entites ‐● Deduplication Problem● Some  names  and  relationships  could  be  incorrect  &  the 

information may not be update­to­date ● Name disambiguation is still largely unsolved● ESEs are at early age

Creating knowledge bases  from  text  and unstructured data  is  the goal

 

Page 38: Entity Search Engine
Page 39: Entity Search Engine

My Library

● Entites are for UseEntites are for Use

● Each Entity has its own attributes & relationEach Entity has its own attributes & relation

● Every Entity has its importanceEvery Entity has its importance

● Save the Time for finding out EntitesSave the Time for finding out Entites

● Entites are growing rapidlyEntites are growing rapidly

Page 40: Entity Search Engine

References

1. Statistical Entity Extraction from Web by Zaiqing Nie, Ji-Rong Wen, and Wei-Ying Ma, Fellow, IEEE2. State of the art in IE, overview, comparison and analysis by Stefan Dumitrescu ,PhD Student3. The Entity Name System: Enabling the Web of Entities by Heiko Stoermer, Themis Palpanas, George Giannakopoulos,University of Trento4. Hybrid entity clustering using crowds and data by Jongwuk Lee, Hyunsouk Cho,Jin-Woo Park,Young-rok Cha,Seung-won Hwang, Zaiqing Nie ,Ji-Rong Wen5. Supporting Entity Search:A Large-Scale Prototype Search Engine byTao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang

Page 41: Entity Search Engine

References...

6. OKKAM: Enabling a Web of Entities by Paolo Bouquet ,Heiko Stoermer ,Daniel Giacomuzzi ,University of Trento7. Entity Data Management in OKKAM by Themis Palpanas 1 , Junaid Chaudhry 2 , Periklis Andritsos 1 , Yannis Velegrakis 1 ,1 University of Trento,2 Ajou University8. SPACE AND TIME ENTITY REPOSITORY Human-enhanced time-awaremulti media search funded by EU07 See :http://issuu.com/cubrikproject/docs/issuu.cubrik.d41.unitn.wp4.v1.09. http://api.okkam.org/search/10. http://www.wolframalpha.com/

Page 42: Entity Search Engine