An Empirical Study on Selective Sampling in Active...
Transcript of An Empirical Study on Selective Sampling in Active...
![Page 1: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/1.jpg)
1
An Empirical Study on Selective Samplingin Active Learning for Splog Detection
Taichi Katayama1
Takehito Utsuro1
Yuuki Sato2
Takayuki Yoshinaka3
Yasuhide Kawada4
Tomohiro Fukuhara5
1University of Tsukuba, 2Konami Corporation, 3Tokyo Denki University,
4Navix Co., Ltd., 5University of Tokyo,
AIRWeb2009, April 21nd, 2009 @Madrid, Spain. WWW2009
![Page 2: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/2.jpg)
2
Background
• Opinion Mining from Blogs
• Splogs are Serious Noise in Opinion Mining– e.g., larger scale statistics (2008 Mar.)
• 40% of Japanese Blog Articles in BuzzPulse, nifty are Splogs, 2007 Oct. � 2008 Feb.
• Automatic Detection is highly Expected.
![Page 3: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/3.jpg)
3
keyword stuffed blog
![Page 4: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/4.jpg)
4
Rumor of “FC Tokyo”(a football
team in Japan)
“FC Tokyo”
Blog snippet retrieved with
“FC Tokyo”
![Page 5: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/5.jpg)
5
Blog snippet retrieved with
“LOUIS VUITTON Key case”
pop-up advertisement automatically inserted by the blog host system
![Page 6: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/6.jpg)
6
$50 Software Package for Massive Splog Creation
Featuring• SEO• Affiliate Program
in link in link
satellite
satellite
satellite satellite
satellite
satellite
main site
![Page 7: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/7.jpg)
7
Background
• Opinion Mining from Blogs
• Splogs are Serious Noise in Opinion Mining– e.g., larger scale statistics (2008 Mar.)
• 40% of Japanese Blog Articles in BuzzPulse, nifty are Splogs, 2007 Oct. � 2008 Feb.
• Automatic Detection is highly Expected.
![Page 8: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/8.jpg)
8
Previous studies on splog detection
• [P.Kolari 2007]– Words– URLs– Anchor texts– Links – HTML meta tags
• [Y.-R.Lin 2007]– Temporal self similarities of
• Posting time• Posting contents• Affiliated links
• [G.Mishne 2005]– Language models among the blog post , the comment ,and
pages linked by the comments
![Page 9: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/9.jpg)
9
Evaluation with two data sets“Does splog change over time?”
1. Years 2007-2008 (720 sites)2. Years 2008-2009 (720 sites)
![Page 10: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/10.jpg)
10
��
��
��
��
��
��
���
� �� � � �� �� �� �� �� �� ���
Recall(%)
Prec
isio
n(%
)
��
��
��
��
��
��
���
� �� � � �� �� �� �� �� ��
Recall(%)
Prec
isio
n(%
)
Recall/Precision curves with confidence measureTrain 07-08(720
sites)
Train 07-08 (360 sites) +08-09 (360
sites)
Train 07-08 (360 sites) +08-09 (360
sites)
Train 07-08(720 sites)
Splog detection Authentic blog detection
Test 08-09 (40 sites)
![Page 11: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/11.jpg)
11
Purpose of This Research (1)
• Needs for continuously updating splog/authentic blog data setsyear by year
• How to reduce human supervision?
• May active learning framework work?
![Page 12: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/12.jpg)
12
Purpose of This Research (2)
• Optimal Strategies for Selective Sampling in Active Learning
• Guided by Certain Confidence Measure
random samples,
samples balanced with a
confidence measure
samples with theleast confidence
![Page 13: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/13.jpg)
13
Outline
1. Definition of splog sites2. Splog detection by Machine learning
– SVM– Confidence Measure– Features
3. Active learning4. Evaluation5. Future works
![Page 14: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/14.jpg)
14
Definition of splog sites• If one of the followings holds for the given
blog sites, then it is mostly splog– originally written text is not included– originally written text is included but many
• “links top affiliated sites” or• ”advertisement articles” or• “articles with adult content”
are included (judged individually by considering the contents of each blog)
• Otherwise, the given blog sites is an authentic blog
![Page 15: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/15.jpg)
15
Splog Detection by SVMs
• a tool – TinySVM
• the kernel function:– 2nd order linear
• confidence measure – the distance from the separating hyperplane
to each test instance
![Page 16: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/16.jpg)
16
A Confidence Measure
��
�
�
�
�
��
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
Lower Bound (authentic blog)
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
��
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
Separatinghyperplane
Lower Bound (splog)
�:splog�:authentic blog
![Page 17: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/17.jpg)
17
Features for splog detection
1. Total frequency of URLs not linked from splogs2. Co-occurrence between Noun Phrases and
Splogs• Sum of
3. Noun Phrases in Anchor Texts and linked URLs• Total frequency of anchor text noun phrases
• in splogs• out-linked to splog URLs and Blacklist URLs
• Total frequency of anchor text noun phrases• in splogs• out-linked to authentic blog URLs Whitelist URLs
)phrasenoun,splog(2 w�
![Page 18: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/18.jpg)
18
Feature1: URLs are not linked from splog
splogAuthentic
blogAuthentic
blogsplog splog
More than one inward links from splogs
more than oneinward links
from authentic blogs
url
included only in splogs
included only in authentic blogs
url
url
url
url
url
url
Whitelist:defined as
these URLs
Blacklist:defined as
these URLs
![Page 19: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/19.jpg)
19
Value of the Whitelist URLs feature
�������
�
�
������
�
������
�
�
������
�
uu
u
instance testthe
inoffrequencytotal
homepagesblogauthenticof
instancestraining wholein theof
frequencytotal
log
u: Whitelist URLs
![Page 20: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/20.jpg)
20
Features for splog detection
1. Total frequency of URLs not linked from splogs2. Co-occurrence between Noun Phrases and
Splogs• Sum of
3. Noun Phrases in Anchor Texts and linked URLs• Total frequency of anchor text noun phrases
• in splogs• out-linked to splog URLs and Blacklist URLs
• Total frequency of anchor text noun phrases• in splogs• out-linked to authentic blog URLs Whitelist URLs
)phrasenoun,splog(2 w�
![Page 21: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/21.jpg)
21
Feature2: Noun Phrases
Training set
�� �w ���� w ��
splog Authenticblog
��� w ���
w: a noun phrase
freq(splog,w)=a freq(splog,�w)=b
freq(authentic blog,w)=c freq(authentic blog,�w)=d
�� w ���� w ���� w ��
��� w ������ �w ���
�� �w ���� �w ��
![Page 22: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/22.jpg)
22
Value of the splog noun phrase feature
���
����
�����
� instance test in theoffrequencytotal
),splog(log
))()()(()(),splog(
2
22
ww
dcdbcababcadw
w�
�
![Page 23: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/23.jpg)
23
Features for splog detection
1. Total frequency of URLs not linked from splogs2. Co-occurrence between Noun Phrases and
Splogs• Sum of
3. Noun Phrases in Anchor Texts and linked URLs• Total frequency of anchor text noun phrases
• in splogs• out-linked to splog URLs and Blacklist URLs
• Total frequency of anchor text noun phrases• in splogs• out-linked to authentic blog URLs Whitelist URLs
)phrasenoun,splog(2 w�
![Page 24: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/24.jpg)
24
Feature3:Noun Phrases in Anchor Texts and linked URLs
a Splog site s
Blacklist URLs Splog URLsWhitelist URLs Authentic blog
URLs
http://����http://����http://����
http://����http://����http://����
http://����http://����http://����
http://����http://����http://����
http://����http://����http://����
AncfB(w,s)=freq of w
w: a noun phrase in Anchor text
AncfW(w,s)=freq of w
Other URLs
<a href=���>��� w ���</a><a href=���>��� w ���</a>
<a href=���>��� w ���</a>
<a href=���>��� w ���</a><a href=���>��� w ���</a><a href=���>��� w ���</a>
<a href=���>��� w ���</a>
<a href=���>��� w ���</a><a href=���>��� w ���</a>
![Page 25: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/25.jpg)
25
Noun Phrases in Anchor Texts and linked URLs: two features
� � ��
���
w stwAncfBswAncfB ),(),(log
� � ���
�
�
���
�
wtwAncfWswAncfW ),(),(log
homepagessplogtraining
w: noun phrases: a training splog homepaget: a test instance blog homepage
the value of a feature named anchor text noun phrase out-linked to Whitelist URLsfor a test instance blog homepage
the value of a feature named anchor text noun phrase out-linked to Blacklist URLsfor a test instance blog homepage
![Page 26: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/26.jpg)
26
Framework of Active learning
Pool of unlabeledinstances
(initial size of 3504)
(1296 splog and2208 authentic
blog)
TrainingSet
(initial sizeof 10)
(4 splog and6 authentic
Blog)
selectivesamplingIn activelearning
Trainingan SVM
classifier
unlabeled4 sites
Humansupervision
labeled4 sites
250 cycles up to 1010 training instances
![Page 27: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/27.jpg)
27
Statistics of Splog/Authentic Blogs Data Set
390424591445Years2008-2009
total# of authenticblogs
# of splogsData Sets
![Page 28: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/28.jpg)
28
Strategies of selective sampling(1/2)
Low High
High/Low Balanced
�
�
�
�
�
�
�
��
��
�
�
Separatinghyperplane
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
��
��
�
�
Separatinghyperplane
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
��
��
�
�
Separatinghyperplane
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
��
��
�
�
Separatinghyperplane
��
�
�
�
�
�
��
�
�
�:splog�:authentic blog
![Page 29: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/29.jpg)
29
Strategies of selective sampling(2/2)
Low-Sp/Au High-Au
High/Low-Au Balanced-Sp/Au
�
�
�
�
�
�
�
��
��
�
�
Separatinghyperplane
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
��
��
�
�
Separatinghyperplane
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
��
��
�
�
Separatinghyperplane
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
��
��
�
�
Separatinghyperplane
��
�
�
�
�
�
��
�
�
�:splog�:authentic blog
![Page 30: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/30.jpg)
30
Outline
1. Definition of splog sites2. Splog detection by Machine learning
– SVM– Confidence Measure– Features
3. Active learning4. Evaluation5. Future works
![Page 31: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/31.jpg)
31
Measure for Performance evaluationafter active learning cycles
• Recall/Precision– Splog detection
– Authentic blog detection is considered in a similar fashion
• “| Tr |= 3500”, “Random”– “| Tr |= 3500” indicates a classifier trained with the whole 3504
instances in the pool– “Random” indicates a classifier trained with randomly selected
training instances
|Ts(splog)||)Ts(LBDTs(splog)|
recall
|)Ts(LBD||)Ts(LBDTs(splog)|
precision
s
s
s
�
�
![Page 32: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/32.jpg)
32
Lower Bound of the Confidence Measure
�
�
�
��
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
��
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
� �
�
��
�
�
��
�
Separatinghyperplane
Lower Bound (splog)�:splog
�:authentic blog
)( sLBDTsTs(splog): the set ofreference splog sites
![Page 33: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/33.jpg)
33
Measure for Performance evaluationafter active learning cycles
• Recall/Precision– Splog detection
– Authentic blog detection is considered in a similar fashion
• “| Tr |= 3500”, “Random”– “| Tr |= 3500” indicates a classifier trained with the whole 3504
instances in the pool– “Random” indicates a classifier trained with randomly selected
training instances
|Ts(splog)||)Ts(LBDTs(splog)|
recall
|)Ts(LBD||)Ts(LBDTs(splog)|
precision
s
s
s
�
�
![Page 34: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/34.jpg)
34
��
��
��
��
��
��
��
��
��
���
� �� � � �� �� �� �� ��
Recall(%)
Prec
isio
n(%
)
High/Low-AuLow-Sp/Au
High-Au
Random
Balanced-Sp/Au
|Tr|=3500
Recall/precision curve of Splog detection
![Page 35: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/35.jpg)
35
��
��
��
�
��
��
��
��
�
��
� �� � � �� �� �� �� ��
Recall(%)
Prec
isio
n(%
)
High/Low-AuLow-Sp/Au
High-Au
RandomBalanced-Sp/Au
|Tr|=3500
Recall/precision curve of Authentic blog Detection
![Page 36: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/36.jpg)
36
Evaluation results: comparison of strategies for selective sampling
|TR|=3500RandomHigh/Low
Blance
HighLow
Low Random
�
Previous studies of active learning for text classification tasks
Splog/authentic blog detection
![Page 37: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/37.jpg)
37
Support Vectors• only the support vectors have effect on deciding the
position of the separating hyperplane• the number of support vectors can be regarded as the
complexity of the learning task
�
�
�
�
�
�
�
�
�
��
�Separatinghyperplane
�
�
�
�
�
�
�
�
�
�,��support vector
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
![Page 38: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/38.jpg)
38
�
�
��
��
��
���
��
���
���
���
��
�
� ��� �� �� ��� ��� ��� ��� ��� ��� ���� ����
# of Training Instances
# of
Sup
port
Vec
tors
�High/Low-Au
�Low-Sp/Au
�High-Au
� Random
�Balanced-Sp/Au
Changes in # of Support Vectors
RandomHigh/LowBalance
Low High
![Page 39: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/39.jpg)
39
Evaluation result: # of support vectors• The number of support vectors linearly
increases• Performance of splog/authentic blog
detection increase much more slowly • About 20% of training instances are
constantly selected as support vectors
• In this task, more effective features should be added.
![Page 40: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/40.jpg)
40
�
��
��
��
��
��
��
���
� ��� �� �� ��� ��� ��� ��� ��� ��� ���� ����
# of Training Instances
Prec
ison
(%)
High/Low-Au Low-Sp/Au
High-Au
Random
Balanced-Sp/Au
|Tr|=3500
Change in maximum precision with recall as 30 %
of Splog Detection
![Page 41: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/41.jpg)
41
��
��
��
��
��
��
��
��
� ��� �� �� ��� ��� ��� ��� ��� ��� ���� ����
# of Training Instances
Prec
isio
n(%
)
High/Low-Au
Low-Sp/Au
High-Au
Ranom
Balanced-Sp/Au
|Tr|=3500
Change in maximum precision with recall as 30 %
of Authentic blog Detection
![Page 42: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/42.jpg)
42
Evaluation result: # of support vectors• The number of support vectors linearly
increases• Performance of splog/authentic blog
detection increase much more slowly • About 20% of training instances are
constantly selected as support vectors
• In this task, more effective features should be added.
![Page 43: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/43.jpg)
43
Future works
• Incorporating other features– Post time and intervals – Html structures
• Manual examination of support vectors
![Page 44: An Empirical Study on Selective Sampling in Active ...airweb.cse.lehigh.edu/2009/slides/Katayam-active... · 1 An Empirical Study on Selective Sampling in Active Learning for Splog](https://reader036.fdocuments.in/reader036/viewer/2022071002/5fbf0f638e17064ae43c7ade/html5/thumbnails/44.jpg)
44
Thanks for your attention