No. 1
Classification and clustering methods Classification and clustering methods by probabilistic latent semantic by probabilistic latent semantic
indexing model indexing model
A Short Course at Tamkang UniversityTaipei, Taiwan, R.O.C., March 7-9, 2006
Shigeich HirasawaShigeich Hirasawa
Department of Industrial and ManagemDepartment of Industrial and Management Systems Engineering ent Systems Engineering
School of Science and EngineeringSchool of Science and EngineeringWaseda University, JapanWaseda University, Japan
[email protected]@hirasa.mgmt.waseda.ac.jp
No. 2
Format Example in paper archives matrix
Fixed forma
t
Items
- The name of authors- The name of journals- The year of publication- The name of publishers
- The name of countries- The year of publication- The citation link
Free forma
tText
The text of a paper - Introduction - Preliminaries ……. - Conclusion
1. Introduction1. Introduction
Document
DIG 1,0
DTH ,2,1,0
G = [ gmj ]: An item-document matrix
H = [ hij ] : A term-document matrix
dj : The j-th documentti : The i-th termim : The m-th item
gmj : The selected result of the m-th item (im ) in the j-th document (dj )
hij : The frequency of the i-th term (ti ) in the j-th document (dj )
1. Introduction
No. 32. Information Retrieval Model 2. Information Retrieval Model
Text Mining:- Information Retrieval including- Clustering- Classification
Information Retrieval ModelBase Model
Set theory(Classical) Boolean ModelFuzzyExtended Boolian Model
Algebraic
(Classical) Vector Space Model (VSM) [BYRN99]Generalized VSMLatent Semantic Indexing (LSI) Model [BYRN99]Neural Network Model
Probabilistic
(Classical) Probabilistic ModelExtended Probabilistic ModelProbabilistic LSI (PLSI) Model [Hofmann99]Inference Network ModelBayesian Network Model
2. Information Retrieval Model
No. 4
tf ( i,j ) = fij : The number of the i-th term (ti ) in the j-th document (dj ) (Local weight)
Weight wij is given by
The Vector Space Model (VSM)2. Information Retrieval Model 2. Information Retrieval Model
ijij aiidfjitfw )(),( ijij aiidfjitfw )(),(
idf ( i,j ) = log (D/df(i)) : General weight
df( i ) : The number of documents in D for which the term ti appears
(1)
No. 52. Information Retrieval Model 2. Information Retrieval Model
(term vector) ti = (ai1 , ai2 , … , aiD) : The i-th row
(document vector) dj = (a1j , a2j , … , aTj) : The j-th column
(query vector) q = (q1 , q2 , … , qT)T
The similarity s ( q , dj ) between q and dj :
)cosine(||||
),(T
T
j
jjdqs
dq
dq
(2)
No. 6
The Latent Semantic Indexing (LSI) Model
2. Information Retrieval Model 2. Information Retrieval Model
(1) be
intoeq.(2.1).
eq.(2.6)
T
D
minmin
No. 7
|ˆ||ˆ|
ˆˆ),()similality(
T
T
j
jjdqs
dq
dq
where
: the j-th canonical vector
(2)
2. Information Retrieval Model 2. Information Retrieval Model
be
11ˆ)vectorquery(
K
Kqq R
eq.(2.4)eq.(2.4)
No. 8
The Probabilistic LSI (PLSI) Model
2. Information Retrieval Model 2. Information Retrieval Model
(1) Preliminary
A) A=[aij], aij = fij :the number of a term ti
B) reduction of dimension similar to LSI
C) latent class (state model based on factor analysis)
D) (i) an independence between pairs ( ti , dj )(ii) a conditional independence between ti and dj
},max{ DTK
: a set of states
z td
)Pr( kz
)|Pr( kj zd )|Pr( ki zt(2.10)
(2.11)
No. 9
(2)
2. Information Retrieval Model 2. Information Retrieval Model
be
(2.13)
eq.eq.(2.1).(2.1).
No. 10
(3)
2. Information Retrieval Model 2. Information Retrieval Model
(2.14)
(2.17)
(2.16)
(2.15)
eq.eq.(2.11),(2.11),
eq.(2.13)eq.(2.13)
No. 113. Proposed Method3. Proposed Method3. Proposed Method
3.1 Classification method
categories: C1, C2, … , CK
(1) Choose a subset of documents D* ( D) which are already categorized and compute representative document vectors d*
1, d*2,
…, d*K:
where nk is the number of selected documents to compute the representative document vector from Ck.
(2) Compute the probabilities Pr(zk), Pr(dj|zk ) and Pr(ti|zk) which maximizes eq.(13) by the TEM algorithm, where ∥Z∥=K.
(3) Decide the state for dj as
)( ˆˆ kkCz
kjjkjkk
zddzdz ˆˆ )|Pr()|Pr(max
kj C
jk
k n d
dd1*
(3.1)
(3.2)
No. 12
(1) Choose a proper K (≥S) and compute the probabilities Pr(zk), Pr(dj|zk),
and Pr(ti|zk) which maximizes eq.(13) by the TEM algorithm, where ∥Z
∥ =K.
(2) Decide the state for dj as
(3.3)
If S=K, then kj cd ˆ
kjjkjkk
zddzdz ˆˆ )|Pr()|Pr(max
)( ˆˆ kkcz
3. Proposed Method3. Proposed Method3. Proposed Method
3.2 Clustering method
||Z|| = K : The number of latent states S : The number of clusters
SK
No. 133. Proposed Method3. Proposed Method
(3) If S<K, then compute a similarity measure s(zk, zk'):
(3.4)
(3.5)
and use the group average distance method with the similarity measure s(zk,zk') for agglomeratively clustering the states zk's until the n
umber of clusters becomes S. Then we have S clusters, and the members of each cluster are those of a cluster of states.
||||),(
'
'T
'kk
kkkk zzs
zz
zz
T21 ))|Pr(,),|Pr(),|(Pr( kTkkk ztztzt z
No. 144. Experimental Results4. Experimental Results4. Experimental Results
4.1 Document sets
Table 4.1: Document sets
contents format# words
Tamount categorize
# selected document
DL+DT
(a) articles of Mainichi news paper in ’94 [Sakai99]
Free(texts only)
107,835101,058
(see Table 4.2)
Yes (9+1
categories)
300 (S=3)
(b)200~ 300 (S=2~ 8)
(c)
Questionnaire(see Table 4.3 in detail)
fixed and free (see Table 4.3) 3,993 135+35
Yes (2
categories)
135+35
No. 154. Experimental Results4. Experimental Results
4.2 Classification problem: (a)
• Experimental data: Mainichi Newspaper in ‘94 (in Japanese) 300 article, 3 categories (free format only)
Conditions of (a)
category contents # articles DL+DT
# used for training DL
# used for test DT
C1 business 100 50 50
C2 local 100 50 50
C3 sports 100 50 50
total 300 150 150
Table 4.2: Selected categories of newspaper
• LSI : K = 81PLSI: K = 10
No. 164. Experimental Results4. Experimental Results
Results of (a)
methodfrom
Ck
to Ck
C1 C2 C3
VS method
C1 17 4 29
C2 8 38 4
C3 15 4 31
LSI method
C1 16 6 28
C2 6 43 1
C3 12 5 33
PLSI methodC1 41 0 9
C2 0 47 3
C3 13 6 31
Proposed method
C1 47 0 3
C2 0 50 0
C3 4 2 44
Table 4.5: Classified number form Ck to for each method
kC ˆ
No. 174. Experimental Results4. Experimental Results
Method Classification error
VSM 42.7%
LSI 38.7%
PLSI 20.7%
Proposed method
6.0%
Table 4.6: Classification error rate
No. 184. Experimental Results4. Experimental Results
Clustering process by EM algorithm
step : 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d)
経済
スポーツ
社会
経済社会
スポーツ
step : 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d)
経済
スポーツ
社会
経済社会
スポーツ
step : 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d)
経済
スポーツ
社会
経済社会
スポーツ
step : 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d)
経済
スポーツ
社会
経済社会
スポーツ
step : 8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d)
経済
スポーツ
社会
経済社会
スポーツ
step : 16
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d)
経済
スポーツ
社会
経済社会
スポーツ
step : 32
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d)
経済
スポーツ
社会
経済社会
スポーツ
step : 64
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d)
経済
スポーツ
社会
経済社会
スポーツ
step : 128
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d)
経済
スポーツ
社会
経済社会
スポーツ
step : 256
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d)
経済
スポーツ
社会
経済社会
スポーツ
step : 512
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d)
経済
スポーツ
社会
経済社会
スポーツ
step : 1024
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d)
経済
スポーツ
社会
経済社会
スポーツ
step : 2048
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d)
経済
スポーツ
社会
経済社会
スポーツ
step : 4096
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d)
経済
スポーツ
社会
経済社会
スポーツsports
localbusiness
business
sports
local
No. 194. Experimental Results4. Experimental Results
4.3 Classification Problem: (b)
We choose S = 2, 3, …, 8 categories, each of which contains DL=100~ 450 articles randomly chosen. The half of them DL is used for training, and the rest of them DT, for test.
Condition of (b)
No. 204. Experimental Results4. Experimental Results
Results of (b)
Fig. 4.2: Classification error rate for DL
No. 214. Experimental Results4. Experimental Results
Results of (b)
Fig. 4.3: Classification error rate for S
No. 224. Experimental Results4. Experimental Results
4.4 Clustering Problem: (c)
Format
Number of questions
Examples
Fixed (item)
7 major questions2
- For how many years have you used computers?- Do you have a plan to study abroad?- Can you assemble a PC?- Do you have any license in information technology?- Write 10 terms in information technology which you know4.
Free (text)
5 questions3
- Write about your knowledge and experience on computers.- What kind of job will you have after graduation?- What do you imagine from the name of the subject?
Table 4.3: Contents of initial questionnaire
2 Each question has 4-21 minor questions.3 Each text is written within 250-300 Chinese and Japanese characters.4 There is a possibility to improve the performance of the proposed method by elimination of these items.
Student Questionnaire
No. 234. Experimental Results4. Experimental Results
4.4 Clustering Problem: (c)
Table 4.4: Object classes
Name of subject CourseNumber of students
Introduction to Computer Science
(Class CS)Science Course 135
Introduction to Information Society
(Class IS)Literary Course 35
Object classes
No. 244. Experimental Results4. Experimental Results
Condition of (c)
I) First, the documents of the students in Class CS and those in Class IS are merged.
II) Then, the merged documents are divided into two class (S=2) by the proposed method.
Class
CS
ClassIS
True class
Merge Clustering by the proposed method
Clustering error C(e)
Fig.4.4 Class partition problem by clustering method
No. 25
initial value
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.70step: 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.70step: 2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.70step: 4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.70step: 8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.70step: 16
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.70step: 32
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.70step: 64
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.70step: 128
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.70step: 256
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.70step: 512
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.75step: 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.75step: 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.75step: 2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.75step: 4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.75step: 8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.75step: 16
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.75step: 32
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.75step: 64
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.75step: 128
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.75step: 256
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.75step: 512
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.80step: 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.80step: 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.80step: 2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.80step: 4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.80step: 8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.80step: 16
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.80step: 32
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.80step: 64
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.80step: 128
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.80step: 256
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.80step: 512
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.80step: 1024
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.85step: 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.85step: 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.85step: 2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.85step: 4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.85step: 8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.85step: 16
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.85step: 32
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.85step: 64
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.85step: 128
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.85step: 256
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.85step: 512
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.90step: 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.90step: 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.90step: 2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.90step: 4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.90step: 8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.90step: 16
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.90step: 32
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.90step: 64
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.90step: 128
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.90step: 256
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.95step: 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.95step: 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.95step: 2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.95step: 4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.95step: 8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.95step: 16
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.95step: 32
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.95step: 64
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.95step: 128
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.95step: 256
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.95step: 512
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 0.95step: 1024
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 1.00step: 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 1.00step: 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 1.00step: 2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 1.00step: 4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 1.00step: 8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 1.00step: 16
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 1.00step: 32
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 1.00step: 64
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 1.00step: 128
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 1.00step: 256
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 1.00step: 512
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
beta: 1.00step: 1024
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)
学生 CS
IS
students
Results of (c)
2
2
25.0
S
K
Fig.4.7 Clustering process by EM algorithm, K=2
No. 26
similarity
No. 27
初期値
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.70step : 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.70step : 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.70step : 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.70step : 8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.70step : 16
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.70step : 32
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.70step : 64
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.70step : 128
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.70step : 256
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.70step : 512
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.70step : 1024
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.75step : 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.75step : 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.75step : 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.75step : 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.75step : 8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.75step : 16
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.75step : 32
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.75step : 64
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.75step : 128
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.75step : 256
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.75step : 512
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.75step : 1024
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.80step : 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.80step : 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.80step : 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.80step : 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.80step : 8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.80step : 16
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.80step : 32
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.80step : 64
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.80step : 128
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.80step : 256
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.80step : 512
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.80step : 1024
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.85step : 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.85step : 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.85step : 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.85step : 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.85step : 8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.85step : 16
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.85step : 32
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.85step : 64
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.85step : 128
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.85step : 256
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.85step : 512
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.85step : 1024
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.90step : 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.90step : 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.90step : 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.90step : 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.90step : 8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.90step : 16
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.90step : 32
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.90step : 64
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.90step : 128
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.90step : 256
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.95step : 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.95step : 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.95step : 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.95step : 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.95step : 8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.95step : 16
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.95step : 32
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.95step : 64
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.95step : 128
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.95step : 256
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 0.95step : 512
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 1.00step : 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 1.00step : 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 1.00step : 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 1.00step : 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 1.00step : 8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 1.00step : 16
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 1.00step : 32
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 1.00step : 64
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 1.00step : 128
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 1.00step : 256
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 1.00step : 512
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 1.00step : 1024
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
β : 1.00step : 2048
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)
p(z 2
|d) CS IS
C1C3
C2
4. Experimental Results4. Experimental Results
2
3
25.0
S
K
S=K=2 C(e)=0.411K-means method
Fig.4.7 Clustering process for EM algorithm, K=3
No. 28
C(e) : the ratio of the number of students in the difference set between divided two classes and the original classes to the number of the total students.
4. Experimental Results4. Experimental Results
Fig.4.5 Clustering error rate C(e) vs. λ
(Item only)(Text only)
No. 294. Experimental Results4. Experimental Results
Fig.4.6 Clustering error rate C(e) vs. K
No. 304. Experimental Results4. Experimental ResultsResults of (c)
Statistical analysis by discriminant analysis
jjj xaxaxaaz 5522110
ISClass:0
CSClass:0
j
j
dz
dz
No. 314. Experimental Results4. Experimental Results
Another Experiment
ClassCS
CassG
ClassS
Clustering by the proposed method
Clustering error C(e)
Clustering for class partition problem
Only form IQ
Fig. Another Class partition problem by clustering method
S: Specialist
G: Generalist
No. 324. Experimental Results4. Experimental Results
(1) Member of students in each class
class
Characteristics of students
student's selection
S - Having a good knowledge of technical terms- Hoping the evaluation by exam
G - Having much interest in use of a computer
Clustering
S- Having much interest in theory - Having higher motivation for a graduate school
G- Having much interest in use of a computer- Having a good knowledge of system using the computer
No. 33
(2) Member of students in each class
By discriminant analysis, two classes are evaluated for each partition which are interpreted in table 5. The most convenient case for characteristics of students should be chosen.
4. Experimental Results4. Experimental Results
No. 345. Concluding Remarks5. Concluding Remarks5. Concluding Remarks
(1) We have proposed a classification method for a set of documents and extend it to a clustering method.
(2) The classification method exhibits its better performance for a document set with comparatively small size by using the idea of the PLSI model.
(3) The clustering method also has good performance. We show that it is applicable to documents with both fixed and free formats by introducing a weighting parameter λ.
(4) As an important related study, it is necessary to develop a method for abstracting the characteristics of each cluster [HIIGS03][IIGSH03-b].
(5) An extension to a method for a set of documents with comparatively large size also remains as a further investigation.
Top Related