Named entity Fusepool project

Post on 29-May-2015

30 views 0 download

Transcript of Named entity Fusepool project

FusepoolNamed Entity RecognitionGábor Reményi, GeoX

Named Entity Recognition (NER)

● Named Entitieso Persons, Organizations, Locations, Diseases,

Products, etc.o The aim of NER is to locate entities in unstructured

text documents

● Domains (NER models)o News-basedo Mobile technology o Chemical elementso Cancero Diseases

Named Entity Recognition 2.(NER)

● Uses statistical models to locate entities in texts● Predicts the entities based on the context of the text

o Can recognize new entities Entities outside of the training data

o False positive entitieso False negative entities

● Creating new models is very time-consumingo Well defined domaino List of entities from the domaino Considerate amount of annotated training text

Dictionary Matching(SMA)

● Aho-Corasick dictionary-matching algorithm to locate keywords in textso Alternative solution for entity extractiono Not model based, no training

● Digital search treeo Allows very fast search

● No prediction, only matchingo Cannot find new keywordso No false +/- entities

NER versus SMANER versus SMA examples 1.

http://82.141.158.251/cner_v1/

1.: Mr. Washington lives in Seattle. He has a company named Washington Iron Co. that is the biggest iron producer in Washington.

2.: Gabor Remenyi lives in Budapest. Gabor has a company named Remenyi Iron Co. that is the biggest iron producer in Hungary.

3.: Gabor Remenyi lives in Kiskunfalva. Gabor has a company named Remenyi Iron Co. that is the biggest iron producer in Hungary.

4.: Gabor Remenyi lives with Kiskunfalva. Gabor has a company named Remenyi Iron Co. that is the biggest iron producer in Hungary.

NER versus SMANew entitiesFalse positiveFalse negativ

NER versus SMA examples 2.