Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto...

Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis

Akihiro Yamamoto Joint work with Madori IKEDAGraduate School of Informatics

Kyoto University

28, Nov., 2014

• This talk is based on:Ikeda, M., Yamamoto, A.: Classification by Selecting Plausible Formal Concepts in a Concept Lattice, In Proc. of Workshop on Formal Concept Analysis meets Information Retrieval (FCAIR2013), CEUR Workshop Proceedings, vol. 977, pp. 22-35, Moscow, Russia, 2013.

2

OutlineGoal• Extending many thesauri in a uniform manner– finding (extrapolating) appropriate meaning of

unknown terms

Result• A multi-class multi-label classification method– uses formal concept analysis and does not use

parametric machine learning– does not need explicit feature selection• but selects features implicitly

– is adapted to update of test data– classifies test data better than k-NN (k-nearest

neighbor algorithm)

Why Formal Concept Analysis today• FCA is very popular in Japanese Discovery Science

community, –Mainly because of the efficient algorithm by Uno

in NII–but it is not well known that it is for “ FCA”

• Syntax and Semantics of Logic can be interpreted as an instance of FCA,–We would like to present an application of FCA to

NLP–Our work might be interpreted as a type of

abduction

Thesaurus Extension• Relating unregistered terms with senses related with

semantically similar registered terms– Classifying unregistered terms into senses– Multi-class multi-label classification

“dog”

“fox”

“wolf”

“rock”“stone”

“flint” sense1

sense2

“jackal” (unregistered)

Thesaurus

“dog”

“fox”

“wolf”


“flint” sense1

sense2

“jackal”

• A popular approach [Agirre et al.’00, Uramoto’96]1. Calculating semantic similarities among unregistered and

registered terms• Semantic similarities are estimated by using

characteristics of terms acquired from corpora2. Relating unregistered terms based on the semantic

similarities

(Text) Corpus

• Corpora consist of texts in a natural language– Many kinds of characteristics are also contained• POS(part of speech), N-gram, frequencies, …

“The quick brown fox jumps over the lazy dog”

The quick brown, quick brown fox, brown fox jumps,fox jumps over, jumps over the, the lazy dog

3-gram:over the lazy,

Thedeterminer

quickadjective

brownadjective

foxsingular noun

jumpsverb present

overpreposition

thedeterminer

lazyadjective

POS:(part of speech) dog

singular noun

A text:

• Semantic similarities are estimated by using characteristics in corpora

.

.

.

(Text) Corpus

• Corpora consist of texts in a natural language– Many kinds of characteristics are also contained• POS(part of speech), N-gram, frequencies, …

“The quick brown fox jumps over the lazy dog”

The quick brown, quick brown fox, brown fox jumps,fox jumps over, jumps over the, the lazy dog

3-gram:over the lazy,

Thedeterminer

quickadjective

brownadjective

foxsingular noun

jumpsverb present

overpreposition

thedeterminer

lazyadjective

POS:(part of speech) dog

singular noun

A text:

• Semantic similarities are estimated by using characteristics in corpora

.

.

.

• Many kinds of corpora are available for many languages– Google N-gram (many languages), Penn (English),

L’Arboratorie (French), NEGRA (German), Dependency Treebank for Russian, ART Dependency corpus (Japanese), …

• Generally, a corpus has large amount of data– Data are automatically gathered by using parsers and

syntactic analyzers– A corpus is a set of high dimensional data

Thesaurus Extension• Classifying unregistered terms into senses based on

semantic similarities

term characteristicsdogflintfox

jackalrock

stonewolf

Corpus

“dog”

“fox”

“wolf”


“flint” sense1

sense2


Thesaurus

1. Some characteristics are selected as features

Feature space

dog

fox

wolf

rockstone

flintjackal

dog

foxwolf

rockstone

flintsense1 sense2

jackal? ?

Classification



term featuresdogflintfox

jackalrock

stonewolf

Corpus

Thesaurus

Feature space

dog

wolf rock

stoneflintjackal

dogwolf

rockstone

flintsense1 sense2

jackal? ?

Classification

“dog”

“fox”

“wolf”


“flint” sense1

sense2


fox

fox




term featuresdogflintfox

jackalrock

stonewolf

Corpus

Thesaurus

Feature space

dogfox

wolf rock

stoneflintjackal

dogwolf

rockstone

flintsense1 sense2

Classification

“dog”

“fox”

“wolf”


“flint” sense1

sense2


fox

2. Finding neighbors

jackal


3. Classifying the unregistered term

Accuracy of classification depends on features

features

Extending Many Thesauri• For each thesaurus, an appropriate features are required

term

Thesaurus1 Thesaurus2 Thesaurus3

t7

t1

t8

t6t3 t5

t2t4

u

featuresterm term features

t7t6

t8

t2 t4

t3

t1

t5

t7

t1

t8

t6t3 t5

t2

t4t7

t2t8

t3 t5

t6

t1

t4

t7t6

t8

t2 t4

t3

t1

t5u

t8t2 t3 t5

t1

ut6 t7 t4

u uu

features

Extending Many Thesauri• For each thesaurus, an appropriate features are required

term


t7

t1

t8

t6t3 t5

t2t4

u

featuresterm term features

t7t6

t8

t2 t4

t3

t1

t5

t7

t1

t8

t6t3 t5

t2

t4t7

t2t8

t3 t5

t6

t1

t4

t7t6

t8

t2 t4

t3

t1

t5u

t8t2 t3 t5

t1

ut6 t7 t4

u uu

• Extending many thesauri needs various features corresponding to each thesaurus– Selecting features for multi-label multi-class

classification is difficult and time-consuming• For a thesaurus, its extension is executed repeatedly– Corpora are frequently updated (new unregistered

terms appear, new characteristics are adopted)

Our Idea for Extending Many Thesauri1. Preparing various types of sets of terms as sets of candidates for

neighbors with no thesauri2. Selecting some of the prepared sets as sets of neighbors of

unregistered terms for each thesaurus

t7

t1

t8

t6t3 t5

t2t4

ut7t6

t8

t2 t4

t3

t1

t5u

t8t2 t3 t5

t1

ut6 t7 t4

u

t7

t1 t5 t2

t3

t8

t6 t4

The prepared setsare representedin a concept lattice




t7

t1

t8

t6t3 t5

t2t4

ut7t6

t8

t2 t4

t3

t1

t5u

t8t2 t3 t5

t1

ut6 t7 t4

u

t7

t1 t5 t2

t3

t8

t6 t4


Thesaurus1

t7t6

t8

t2 t4

t3

t1

t5u

Neighbors




t7

t1

t8

t6t3 t5

t2t4

ut7t6

t8

t2 t4

t3

t1

t5u

t8t2 t3 t5

t1

ut6 t7 t4

u

t7

t1 t5 t2

t3

t8

t6 t4


Thesaurus1 Thesaurus2

t7t6

t8

t2 t4

t3

t1

t5

t7t2

t8

t3 t5

t6

t1

t4u u

Neighbors




t7

t1

t8

t6t3 t5

t2t4

ut7t6

t8

t2 t4

t3

t1

t5u

t8t2 t3 t5

t1

ut6 t7 t4

u

t7

t1 t5 t2

t3

t8

t6 t4



t7t6

t8

t2 t4

t3

t1

t5

t7

t1

t8

t6t3 t5

t2

t4t7

t2t8

t3 t5

t6

t1

t4u uu

Neighbors

Problems and SolutionProblems in extending many thesauri• Extending many thesauri needs various features

corresponding to each thesaurus– Selecting features for multi-label multi-class

classification is difficult and time-consuming• For a thesaurus, its extension is executed repeatedly– Corpora are frequently updated (new unregistered

terms appear, new characteristics are adopted)

Our solution1. Preparing various types of sets of candidates for

neighbors in a concept lattice generated from a corpus– A concept lattice can be updated incrementally by

presented algorithms [Nishimura et al. 14, Choi et al.’06, Soldano et al.’10, Valtchev et al.’01]

2. Selecting some of the sets of candidates as sets of neighbors of unregistered terms for each thesaurus

Contents1. Thesauri and thesaurus extension by using

corpora2. A classification method using Formal Concept

Analysis– Formal definition–Preparation of candidates for neighbors– Selection of candidates for neighbors

3. Experiments4. Conclusion

Formal Concept Analysis (1/2)• A (formal) context K=(G, M, I)– G: a set of objects– M: a set of attributes– I: a subset of G×M

• Galois connection– AI:={m∈M|∀g∈A.(g, m)∈I} for A⊆G– BI:={g∈G|∀m∈B.(g, m)∈I} for B⊆G

• A formal concept c=(A, B) of K– AI=B, A=BI for A⊆G, B⊆M– Ex(c):=A, the extent of c– In(c):=B, the intent of c

K0=(G0, M0, I0)

m1 m2 m3 m4 m5 m6 m7

g1 × ×g2 × × ×g3 × × ×g4 × × ×g5 × × ×g6 × × ×g7 × × ×

Formal Concept Analysis (2/2)• orignally proposed by Rudolf Wille for classifying data– represent relationship between objects and

attributes• It is said that he referred Galois connection in

algebraic geometry– objects : extended fields, manifolds attributes : bases, polynomials

• On the viewpoint of graph theory, a closed item set is a clique in the bipartite graph– An efficient algorithm for enumerating all cliques

was developed by Uno et al.

Instances of FCA (1/3)Frequent Pattern Mining from Transaction Databases G: the set of transaction IDs M: the set of item IDs I : transaction data

m1 m2 m3 m4 m5 m6 …g1 × ×g2 × × ×g3 × × ×g4 × × ×g5 × × ×g6 × × ×…

Instances of FCA (2/3)Logic G: the set of formulae M: the class of interpretations I : m |- g

m1 m2 m3 m4 m5 m6 …g1 × ×g2 × × ×g3 × × ×g4 × × ×g5 × × ×g6 × × ×…

m1 m2 m3 m4 m5 m6 …g1 × ×g2 × × ×g3 × × ×g4 × × ×g5 × × ×g6 × × ×…

Formal Languages G: the set of words M: a class of grammar I : S ®g m

Instances of FCA (3/3)• In thesaurus extension,

G : the set of terms M: the set of characteristics I : corpus

m1 m2 m3 m4 m5 m6 …g1 × ×g2 × × ×g3 × × ×g4 × × ×g5 × × ×g6 × × ×…

Principal Filter• The concept lattice B(K) of a context K=(G, M, I)– γg: the object concept ({g}II, {g}I) of an object g∈G– ↑c: the principal filter {c’∈B(K)|c≤c’} of c∈B(K)• c≤c’ for formal concepts c,c’∈B(K) iff Ex(c)≤Ex(c’)

K0=(G0, M0, I0)

g1g2g3g4g5g6g7c1

c7g2g3

m1m2m4 c9g5g6

m2m5m6 c10g7

m3m5m7

c11m1m2m3m4m5m6m7

c8g4

m2m4m6

c4g1g2g3

m1m2 c5g2g3g4

m2m4 c6g4g5g6

m2m6

c2g1g2g3g4g5g6

m2 c3g5g6g7

m5

B(K0)

↑γg4

γg4

m1 m2 m3 m4 m5 m6 m7

g1 × ×g2 × × ×g3 × × ×g4 × × ×g5 × × ×g6 × × ×g7 × × ×

Training Set• A training set τ=(T, L)– T: a subset of objects, T⊆G• g∈T: a known object• u∈T - G: an unknown object

– L: a set of labels– L: T→2L

g1 g2

l1 l4

g3 g5

l5

g6 g7

l2 l3 l6 l7

L0

T0

l8L0

τ0=(T0, L0)

Training Set• A training set τ=(T, L)– T: a subset of objects, T⊆G• g∈T: a known object• u∈T - G: an unknown object

– L: a set of labels– L: T→2L

g1 g2

l1 l4

g3 g5

l5

g6 g7

l2 l3 l6 l7

L0

T0

l8L0

τ0=(T0, L0)

• In thesaurus extension,– A thesaurus is a training set• Registered terms• are known objects• Unregistered terms• are unknown objects• Senses are labels

SensesRegistered terms

LT L

Thesaurus: τ=(T, L)

“dog”

“fox”

“wolf”


“flint” sense1

sense2


corpora2. A classification method using a concept lattice– Formal definition–Preparation of candidates for neighbors– Selection of candidates for neighbors


Preparation of Candidates• Preparation is constructing a concept lattice from a context• A set of candidates for neighbors of an unknown object is the extent of a

formal concept– A context K=(G, M, I), a training set τ=(T,L)

• u: an unknown object u∈G\T– All prepared sets of candidates for neighbors of u are represented by

every concept c ↑∈ γu as its extent |Ex(c)|

ug3

g5

g1

g6

g2g4

↑γu g1g2g3g4g5g6u c1

c8u

c5g2g3uc6g4g5u

c2g1g2g3g4g5u

Selection of Candidates• Neighbors for an unknown object u are extracted from extents

of selected formal concepts– The selected formal concepts are called plausible formal

concepts• Policies of selecting plausible formal concepts

1. Precision oriented:– Neighbors of u should be similar to each other2. Recall oriented:– Objects near neighbors of u should be also neighbors of u

only if Policy 1 is guaranteed• Score σ(c, τ) of concept c under a training set τ(T, L)– Similarities among objects in Ex(c)

Selection of Candidates• Neighbors for an unknown object u are extracted from extents

of selected formal concepts– The selected formal concepts are called plausible formal

concepts• Policies of selecting plausible formal concepts

1. Precision oriented:– Neighbors of u should be similar to each other2. Recall oriented:– Objects near neighbors of u should be also the neighbors

only if Policy 1 is guaranteed• Score σ(c, τ) of concept c under a training set τ(T, L)– Similarities among objects in Ex(c)

Score σ(c, τ) of concept c is calculated as

σ(c, τ)=

1

0

sim(gi, gj)Σ|Ex(c)∩T|i=1 j=i+1Σ|Ex(c)∩T|

2|Ex(c)∩T|

otherwise,

if|Ex(c)∩T|=0,

if|Ex(c)∩T|=1, and

where sim(gi, gj)= |L(gi)∪L(gj)||L(gi)∩L(gj)|

Pairwise similarity of objects(Jaccard index)

The average of pairwise similarities

Plausible Formal Concepts

• Policies of selecting plausible formal concepts1. Precision oriented:– Neighbors of u should be similar to each other2. Recall oriented:– Objects near neighbors of u should be also the neighbors

only if Policy 1 is guaranteed

• For an unknown object u, a formal concept c ↑∈ γu is plausible if both of following regulations are satisfied1. Precision oriented regulation:• (c, τ)≥σ(c’, τ) for any other formal concept c’ ↑∈ γu

2. Recall oriented regulation:• |Ex(c)|≥|Ex(c’)| for any other • formal concept c’ ↑∈ γu s.t. σ(c, τ)=σ(c’, τ)

• Selecting plausible concepts can be regarded as implicit feature selection


• σ(c2, τ0)

=(sim(g1, g2)+sim(g1, g3)+sim(g1, g4)+sim(g1, g5)+sim(g2, g3)

=+sim(g2, g4)+sim(g2, g5)+sim(g3, g4)+sim(g3, g5)+sim(g4, g5))/10=(0.25+0+0.25+0+0.2+0+0+0.2+0.25+0.666…)/10=0.181…

g1 g2

l1 l4

g3 g4

l5

g5 g6

l2 l3 l6 l7

L0

T0

l8L0

τ0=(T0, L0)↑γu g1g2g3g4g5g6u c1

c8u0

c5g2g3u0.2 c6g4g5u

0.666…

c2g1g2g3g4g5u0.181…

0.187…


• The formal concept c6 is the only plausible formal concept of the unknown object u

• Finally, the unknown object u is related to labels l1, l6, and l7 like the neighbors g4 and g5 in the plausible formal concept is related

g1 g2

l1 l4

g3 g4

l5

g5 g6

l2 l3 l6 l7

L0

T0

l8L0

τ0=(T0, L0)↑γu g1g2g3g4g5g6u c1

c8u0

c5g2g3u0.2 c6g4g5u

0.666…

c2g1g2g3g4g5u0.181…

0.187…

u

• L(u)=∪g N(∈ u)L(g), N(u)=∪c P(∈ u)Ex(c)– P(u): a set of plausible formal concepts of u

Time Complexity• Classifying an unknown object u– O(|↑γu|M2)• M = mean size of Ex(c)\T, c ↑∈ γu

–More unknown objects reduce time-costg1g2g3g4g5g6g7

c1

c7g2g3c9g5g6

c10g7

c11

c8g4

c4g1g2g3c5g2g3g4

c6g4g5g6

c2g1g2g3g4g5g6 c3g5g6g7

unknown

scored

Thesauri and Corpora for Experiments• Two thesauri are used as two training sets– Japanese WordNet 1.0 (JWN)– Bunruigoihyo (BGH)

• Two corpora are used as two contexts– Kyoto University’s case frame data (KCF) as K1=(G1,

M1, I1)– Japanese Web 4-gram (J4g) as K2=(G1, M2, I2)– and the third context (combined) is K3=(G1, M1∪M2,

I1∪I2)

• Experiments– 10-fold cross validation for each pair of a training

set (thesaurus) and context (corpus)–Measure precision and recall– Compare with k-NN for k={1, 5, 10}

Experimental Results• Our method is better than k-NN in all accuracies

method precision recall simple precision recall simplewith T 0.039 0.274 0.347 0.164 0.553 0.562

without T (#classified)

0.066 0.094 0.141 0.238 0.329 0.363

1-NN 0.026 0.024 0.037 0.103 0.103 0.1145-NN 0.007 0.036 0.056 0.031 0.150 0.167

10-NN 0.004 0.038 0.061 0.016 0.169 0.193with T 0.007 0.079 0.114 0.028 0.248 0.269

without T(#classified)

0.007 0.052 0.084 0.029 0.237 0.259

1-NN 0.007 0.007 0.010 0.027 0.027 0.0315-NN 0.002 0.013 0.020 0.014 0.070 0.075

10-NN 0.002 0.018 0.028 0.010 0.100 0.119with T 0.030 0.072 0.108 0.132 0.250 0.285

without T(#classified)

0.030 0.063 0.098 0.134 0.255 0.283

1-NN 0.009 0.009 0.014 0.039 0.039 0.0435-NN 0.004 0.018 0.031 0.017 0.085 0.094

10-NN 0.002 0.024 0.039 0.011 0.116 0.011

KCF

J4g

Com

bine

d

JWN BGH

(443.2 / 764) (524.3 / 764)

(715.1 / 764) (753.5 / 764)

(745.5 / 764) (716.3 / 764)

Statistics for Contexts and Training Sets

KCF J4g Combined# objects 7,636 7,636 7,636

# attributes 19,313 7,135 26,448mean # attributes of objects 3.85 4.70 8.55

# formal concepts 11,960 20,066 30,540mean size of extents 2.55 6.04 4.89

mean # scored concept for an object 2.99 14.87 18.58

• In the experiments, our method works faster– Classifying an unknown object u

• O(|↑γu|M2)– M = mean size of Ex(c)\T, c ↑∈ γu

– Other methods• O(|T|)

JWN BGH# known objects 6,872 6,872

# labels 9,560 595mean # labels of objects 2.19 2.89

Conclusion• We proposed a multi-class multi-label classification method– provides a uniform manner to extending many thesauri– does not need parameters– selects features implicitly• Preparing candidates for neighbors as formal concepts• Selecting formal concepts as neighbors of unknown

objects (plausible formal concept)• By experiments, we confirm that our method works faster

and classifies more accurately than k-NN

• Future work– improving classification accuracy of our method

References1. [Agirre 09] Agirre, E., Ansa, O., Hovy, E., Martinez, D.: Enriching Very Large

Ontologies Using the WWW. Proc. of ECAI’00 Workshop “Ontology Learning” (2000)

2. [Choi 06] Choi, V., Huang, Y.: Faster Algorithms for Constructing a Galois Lattice, Enumerating All Maximal Bipartite Cliques and Closed Frequent Sets. SIAM Conference on Discrete Mathematics (2006)

3. [Davey 02] Davey, B., A., Priestly, H., A.: Introduction to Lattice and Order. Cambridge University Press (2002)

4. [Deng 12] Deng, H., Runger, G.: Feature Selection via Regularized Trees. Proc. of IJCNN’12, pp. 1–8. (2012)

5. [Ganter 99] Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer-Verlag (1999)

6. [GSK 09] 言語資源協会 , http://www.gsk.or.jp (2009)7. [Kudo 04] Kudo, T., Kazawa, H.: Web Japanese N-gram Version 1, Gengo Shigen

Kyokai (2004)8. [NINJAL 04] 国立国語研究所 , http://www.ninjal.ac.jp/archives/goihyo (2004)

References9. [Lopez 06] Lopez, F., G., Torres, M., G., Melian, B., Perez, J., A., M., Moreno-Vega,

J., M.: Solving Feature Subset Selection Problem by a Parallel Scatter Search. European Journal of Operational Research, vol. 169, no. 2, pp. 477–489 (2006)

10. [Mok 12] Mok, S., W., H., Gao, H., E., Bond, F.: Using Wordnet to Predict Numeral Classifiers in Chinese and Japanese. Proc. of GWC’12 (2012)

11. [Soldano 10] Soldano, H., Ventos, V., Champesme, M., Forge, D.: Incremental Construction of Alpha Lattices and Association Rules. Proc. of KES’10, pp. 351–360. Springer (2010)

12. [Tan 2005] Tan, P., Steinbach, M., Kumar, V.: Introduc- tion to Data Mining. Addison Wesley (2005)

13. [Uramoto 96] Uramoto, N.: Positioning Unknown Words in a Thesaurus by Using Information Extracted from a Corpus. Proc. of COLING’96, vol. 2, pp. 956–961. Association for Computational Linguistics (1996)

14. [Valtchev 01] Valtchev, P., Missaoui, R.: Building Concept (Galois) Lattices from Parts: Generalizing the Incremental Methods. Proc. of ICCS’01, pp. 290–303. Springer (2001)

15. [Wang 06] Wang, J., Ge, N.: Automatic Feature Thesaurus Enrichment: Extracting Generic Terms From Digital Gazetteer. Proc. of JCDL’06, pp. 326–333. IEEE (2006)

Thank you!

Contexts for Experiments• Two contexts are constructed from two Japanese corpora– Kyoto University’s case frame data• A case frame is a relation between a noun and a

predicate“inu ga otoko ni hoete iru (in English, a dog is barking to a man)”

inu (dog)ga (is/do)

otoko (man)hoeru (bark) ni (to)

hoeru (bark), ga (is/do) hoeru (bark), ni(to)

inu (dog) ×otoko (man) ×

ga otoko ni hoete iruinu × × ×

otoko × × ×

– Japanese Web 4-gram

Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto...

Documents

Transcript of Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto...