Relation-wise Automatic Domain-Range Information Management for Knowledge Entries
-
Upload
national-inistitute-of-informatics-nii-tokyo-japann -
Category
Technology
-
view
32 -
download
0
Transcript of Relation-wise Automatic Domain-Range Information Management for Knowledge Entries
Relation-wise Automatic Domain-Range Information Management for
Knowledge Entries
Md-Mizanur Rahoman & Ryutaro Ichise
The Graduate University for Advanced Studies, Tokyo, Japan
National Institute of Informatics, Tokyo, Japan
Begum Rokeya University, Rangpur, Bangladesh
Outline
• Background
• Problem & Possible Solution
• Proposed Framework
• Experiment
• Conclusion
30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 2
Background
• knowledge-base (KB) construction and management gained interest
• relations play great role in KB
• construction – generation of knowledge entries <Subject, relation, Object>
• e.g., <Obama, born_in, Hawaii>
• management – validation of knowledge entries
• e.g., domain(born_in) = Person, range(born_in) = Place
• not all knowledge-base maintain domain-range validation for relation, e.g., Freebase
30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 3
Problem• existence of wrong entries – e.g., in current
• costly maintenance - domain-range selection is not automatic
• manual checking time consuming
• require domain level expertise
30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 4
Subject Relation Object
Paprika type Book
Paprika author Yasutaka Tsutsui
Freedom in Exile type Book
Freedom in Exile author 14
Possible Solution
• Intuition
• Subjects of a relation should hold some similarity
• extract features for Subject entities and generate learning model e.g.,
• Subject(born_in) will only comply if it is Person i.e., domain
• Objects of a relation should hold some similarity
• extract features for Objects entities and generate learning model e.g.,
• Object(born_in) will only comply if it is Place i.e., range
30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 5
Proposed Framework• required resource
• language specific relation - e.g., born_in, spouse, author etc.
• language specific training example - e.g., entries
• language specific large text corpus - e.g.,
30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 6
Subject Relation Object
Obama born_in Hawaii
Trump born_in New York
Clinton born_in Chicago
… … …
Proposed Framework
• process
• Word Vectorizer
• generate features for words from a large text corpus
• Model Generator
• generate supervised machine learning models for the extracted features
30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 7
Word Vectorizer
• take large text corpus e.g.,
• use Word2Vec* implementation for word embedding
• generate feature vectors for text vocabulary
• maintain linguistic context for the corpus
• put similar words into similar kind of vectors
30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 8
* https://code.google.com/p/word2vec/
Model Generator (1/4)
• For each relation
• collect positive and negative training words
• collect feature vectors for training words
• generate two supervised machine learning models (domain & range model) that classify
• a word element should belong to domain or not
• a word element should belong to range or not
30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 9
Model Generator (2/4)
• positive features• collected from existing knowledge entries
• divided into Subject element feature vectors and Object element feature vectors
30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 10
Subject Relation Object
Obama born_in Hawaii
Trump born_in New York
Clinton born_in Chicago
… … …
Model Generator (3/4)
• negative features
• collected for random vocabularies of text corpus
• excluded for positive word elements that already considered
• maintained for same number of negative and positive training
30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 11
Model Generator (4/4)
• models• domain model
• generated for Subject element feature vectors and negative word feature vectors
• used decision tree-based learning model
• range model • generated for Object element feature vectors and negative
word feature vectors• used decision tree-based learning model
30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 12
Experiment
• resource• relations – 32 frequent English relations (among first 100)
• Cat-1 – range values are distributed over domain e.g., candidate • Cat-2 – range values are concentrated over domain e.g., genre
• training example – entries for the relations
• Text corpus – English
• evaluation metrics - accuracy
30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 13
Result
• purpose – show how accurately it can detect correct (pos) and incorrect (neg) entries, and mix (i.e., pos + neg)
• finding – same type of word belong to same kind of feature vectors, model generalize the words
30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 14
Conclusion
• Observation• a relation should hold same type of elements as Subject and same
type of elements as Object• generalization of Subject and Object can automatically generate
domain and range for a relation - experiment result support this assumption
• Future Work• look for more sophisticated learning model other than decision
tree• want to investigate different word embedding other than the
default in word2vec
30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 15