Download - Classification and clustering methods by probabilistic latent semantic indexing model

No. 1

Classification and clustering methods Classification and clustering methods by probabilistic latent semantic by probabilistic latent semantic

indexing model indexing model

A Short Course at Tamkang UniversityTaipei, Taiwan, R.O.C., March 7-9, 2006

Shigeich HirasawaShigeich Hirasawa

Department of Industrial and ManagemDepartment of Industrial and Management Systems Engineering ent Systems Engineering

School of Science and EngineeringSchool of Science and EngineeringWaseda University, JapanWaseda University, Japan

[email protected]@hirasa.mgmt.waseda.ac.jp

No. 2

Format Example in paper archives matrix

Fixed forma

t

Items

- The name of authors- The name of journals- The year of publication- The name of publishers

- The name of countries- The year of publication- The citation link

Free forma

tText

The text of a paper - Introduction - Preliminaries ……. - Conclusion

1. Introduction1. Introduction

Document

DIG 1,0

DTH ,2,1,0

G = [ gmj ]: An item-document matrix

H = [ hij ] : A term-document matrix

dj : The j-th documentti : The i-th termim : The m-th item

gmj : The selected result of the m-th item (im ) in the j-th document (dj )

hij : The frequency of the i-th term (ti ) in the j-th document (dj )

1. Introduction

No. 32. Information Retrieval Model 2. Information Retrieval Model

Text Mining:- Information Retrieval including- Clustering- Classification

Information Retrieval ModelBase Model

Set theory(Classical) Boolean ModelFuzzyExtended Boolian Model

Algebraic

(Classical) Vector Space Model (VSM) [BYRN99]Generalized VSMLatent Semantic Indexing (LSI) Model [BYRN99]Neural Network Model

Probabilistic

(Classical) Probabilistic ModelExtended Probabilistic ModelProbabilistic LSI (PLSI) Model [Hofmann99]Inference Network ModelBayesian Network Model

2. Information Retrieval Model

No. 4

tf ( i,j ) = fij : The number of the i-th term (ti ) in the j-th document (dj ) (Local weight)

Weight wij is given by

The Vector Space Model (VSM)2. Information Retrieval Model 2. Information Retrieval Model

ijij aiidfjitfw )(),( ijij aiidfjitfw )(),(

idf ( i,j ) = log (D/df(i)) : General weight

df( i ) : The number of documents in D for which the term ti appears

(1)

No. 52. Information Retrieval Model 2. Information Retrieval Model

(term vector) ti = (ai1 , ai2 , … , aiD) : The i-th row

(document vector) dj = (a1j , a2j , … , aTj) : The j-th column

(query vector) q = (q1 , q2 , … , qT)T

The similarity s ( q , dj ) between q and dj :

)cosine(||||

),(T

T

j

jjdqs

dq

dq

(2)

No. 6

The Latent Semantic Indexing (LSI) Model

2. Information Retrieval Model 2. Information Retrieval Model

(1) be

intoeq.(2.1).

eq.(2.6)

T

D

minmin

No. 7

|ˆ||ˆ|

ˆˆ),()similality(

T

T

j

jjdqs

dq

dq

where

: the j-th canonical vector

(2)


be

11ˆ)vectorquery(

K

Kqq R

eq.(2.4)eq.(2.4)

No. 8

The Probabilistic LSI (PLSI) Model


(1) Preliminary

A) A=[aij], aij = fij :the number of a term ti

B) reduction of dimension similar to LSI

C) latent class (state model based on factor analysis)

D) (i) an independence between pairs ( ti , dj )(ii) a conditional independence between ti and dj

},max{ DTK

: a set of states

z td

)Pr( kz

)|Pr( kj zd )|Pr( ki zt(2.10)

(2.11)

No. 9

(2)


be

(2.13)

eq.eq.(2.1).(2.1).

No. 10

(3)


(2.14)

(2.17)

(2.16)

(2.15)

eq.eq.(2.11),(2.11),

eq.(2.13)eq.(2.13)

No. 113. Proposed Method3. Proposed Method3. Proposed Method

3.1 Classification method

categories: C1, C2, … , CK

(1) Choose a subset of documents D* ( D) which are already categorized and compute representative document vectors d*

1, d*2,

…, d*K:

where nk is the number of selected documents to compute the representative document vector from Ck.

(2) Compute the probabilities Pr(zk), Pr(dj|zk ) and Pr(ti|zk) which maximizes eq.(13) by the TEM algorithm, where ∥Z∥=K.

(3) Decide the state for dj as

)( ˆˆ kkCz

kjjkjkk

zddzdz ˆˆ )|Pr()|Pr(max

kj C

jk

k n d

dd1*

(3.1)

(3.2)

No. 12

(1) Choose a proper K (≥S) and compute the probabilities Pr(zk), Pr(dj|zk),

and Pr(ti|zk) which maximizes eq.(13) by the TEM algorithm, where ∥Z

∥ =K.

(2) Decide the state for dj as

(3.3)

If S=K, then kj cd ˆ

kjjkjkk

zddzdz ˆˆ )|Pr()|Pr(max

)( ˆˆ kkcz

3. Proposed Method3. Proposed Method3. Proposed Method

3.2 Clustering method

||Z|| = K : The number of latent states S : The number of clusters

SK

No. 133. Proposed Method3. Proposed Method

(3) If S<K, then compute a similarity measure s(zk, zk'):

(3.4)

(3.5)

and use the group average distance method with the similarity measure s(zk,zk') for agglomeratively clustering the states zk's until the n

umber of clusters becomes S. Then we have S clusters, and the members of each cluster are those of a cluster of states.

||||),(

'

'T

'kk

kkkk zzs

zz

zz

T21 ))|Pr(,),|Pr(),|(Pr( kTkkk ztztzt z

No. 144. Experimental Results4. Experimental Results4. Experimental Results

4.1 Document sets

Table 4.1: Document sets

contents format# words

Tamount categorize

# selected document

DL+DT

(a) articles of Mainichi news paper in ’94 [Sakai99]

Free(texts only)

107,835101,058

(see Table 4.2)

Yes (9+1

categories)

300 (S=3)

(b)200～ 300 (S=2～ 8)

(c)

Questionnaire(see Table 4.3 in detail)

fixed and free (see Table 4.3) 3,993 135+35

Yes (2

categories)

135+35

No. 154. Experimental Results4. Experimental Results

4.2 Classification problem: (a)

• Experimental data: Mainichi Newspaper in ‘94 (in Japanese) 300 article, 3 categories (free format only)

Conditions of (a)

category contents # articles DL+DT

# used for training DL

# used for test DT

C1 business 100 50 50

C2 local 100 50 50

C3 sports 100 50 50

total 300 150 150

Table 4.2: Selected categories of newspaper

• LSI : K = 81PLSI: K = 10


Results of (a)

methodfrom

Ck

to Ck

C1 C2 C3

VS method

C1 17 4 29

C2 8 38 4

C3 15 4 31

LSI method

C1 16 6 28

C2 6 43 1

C3 12 5 33

PLSI methodC1 41 0 9

C2 0 47 3

C3 13 6 31

Proposed method

C1 47 0 3

C2 0 50 0

C3 4 2 44

Table 4.5: Classified number form Ck to for each method

kC ˆ


Method Classification error

VSM 42.7%

LSI 38.7%

PLSI 20.7%

Proposed method

6.0%

Table 4.6: Classification error rate


Clustering process by EM algorithm

step : 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 32

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 64

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 128

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 256

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 512

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 1024

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 2048

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 4096

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツsports

localbusiness

business

sports

local


4.3 Classification Problem: (b)

We choose S = 2, 3, …, 8 categories, each of which contains DL=100～ 450 articles randomly chosen. The half of them DL is used for training, and the rest of them DT, for test.

Condition of (b)


Results of (b)

Fig. 4.2: Classification error rate for DL


Results of (b)

Fig. 4.3: Classification error rate for S


4.4 Clustering Problem: (c)

Format

Number of questions

Examples

Fixed (item)

7 major questions2

- For how many years have you used computers?- Do you have a plan to study abroad?- Can you assemble a PC?- Do you have any license in information technology?- Write 10 terms in information technology which you know4.

Free (text)

5 questions3

- Write about your knowledge and experience on computers.- What kind of job will you have after graduation?- What do you imagine from the name of the subject?

Table 4.3: Contents of initial questionnaire

2 Each question has 4-21 minor questions.3 Each text is written within 250-300 Chinese and Japanese characters.4 There is a possibility to improve the performance of the proposed method by elimination of these items.

Student Questionnaire


4.4 Clustering Problem: (c)

Table 4.4: Object classes

Name of subject CourseNumber of students

Introduction to Computer Science

(Class CS)Science Course 135

Introduction to Information Society

(Class IS)Literary Course 35

Object classes


Condition of (c)

I) First, the documents of the students in Class CS and those in Class IS are merged.

II) Then, the merged documents are divided into two class (S=2) by the proposed method.

Class

CS

ClassIS

True class

Merge Clustering by the proposed method

Clustering error C(e)

Fig.4.4 Class partition problem by clustering method

No. 25

initial value

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 16

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 32

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 64

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 128

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 256

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 512

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 16

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 32

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 64

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 128

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 256

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 512

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 16

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 32

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 64

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 128

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 256

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 512

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 1024

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 16

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 32

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 64

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 128

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 256

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 512

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 16

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 32

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 64

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 128

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 256

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 16

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 32

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 64

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 128

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 256

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 512

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 16

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 32

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 64

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 128

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 256

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 512

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

students

Results of (c)

2

2

25.0

S

K

Fig.4.7 Clustering process by EM algorithm, K=2

No. 26

similarity

No. 27

初期値

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 32

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 64

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 128

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 256

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 512

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 1024

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 32

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 64

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 128

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 256

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 512

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 1024

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 32

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 64

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 128

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 256

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 512

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 1024

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 32

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 64

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 128

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 256

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 512

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 1024

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 32

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 64

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 128

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 256

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 32

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 64

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 128

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 256

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 512

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 32

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 64

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 128

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 256

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 512

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 1024

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 2048

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

4. Experimental Results4. Experimental Results

2

3

25.0

S

K

S=K=2 C(e)=0.411K-means method

Fig.4.7 Clustering process for EM algorithm, K=3

No. 28

C(e) : the ratio of the number of students in the difference set between divided two classes and the original classes to the number of the total students.


Fig.4.5 Clustering error rate C(e) vs. λ

(Item only)(Text only)


Fig.4.6 Clustering error rate C(e) vs. K

No. 304. Experimental Results4. Experimental ResultsResults of (c)

Statistical analysis by discriminant analysis

jjj xaxaxaaz 5522110

ISClass:0

CSClass:0

j

j

dz

dz


Another Experiment

ClassCS

CassG

ClassS

Clustering by the proposed method

Clustering error C(e)

Clustering for class partition problem

Only form IQ

Fig. Another Class partition problem by clustering method

S: Specialist

G: Generalist


(1) Member of students in each class

class

Characteristics of students

student's selection

S - Having a good knowledge of technical terms- Hoping the evaluation by exam

G - Having much interest in use of a computer

Clustering

S- Having much interest in theory - Having higher motivation for a graduate school

G- Having much interest in use of a computer- Having a good knowledge of system using the computer

No. 33

(2) Member of students in each class

By discriminant analysis, two classes are evaluated for each partition which are interpreted in table 5. The most convenient case for characteristics of students should be chosen.


No. 345. Concluding Remarks5. Concluding Remarks5. Concluding Remarks

(1) We have proposed a classification method for a set of documents and extend it to a clustering method.

(2) The classification method exhibits its better performance for a document set with comparatively small size by using the idea of the PLSI model.

(3) The clustering method also has good performance. We show that it is applicable to documents with both fixed and free formats by introducing a weighting parameter λ.

(4) As an important related study, it is necessary to develop a method for abstracting the characteristics of each cluster [HIIGS03][IIGSH03-b].

(5) An extension to a method for a set of documents with comparatively large size also remains as a further investigation.