Graph Algorithms in Bioinformatics - UCSD CSE - Bioinformatics
Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.
-
Upload
branden-walton -
Category
Documents
-
view
216 -
download
0
Transcript of Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.
![Page 1: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/1.jpg)
Jhih-sin Jheng2009/09/01
Machine Learning and Bioinformatics Laboratory
![Page 2: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/2.jpg)
Reference
Measurement and Classification of Humans and Bots in Internet ChatSteven Gianvecchio, Mengjun Xie, ZhenyuWu, and Haining WangDepartment of Computer ScienceThe College of William and Mary(USENIX Security),2008
2
![Page 3: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/3.jpg)
OutlineBackgroundMeasurementClassification SystemExperimental EvaluationConclusion
3
![Page 4: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/4.jpg)
OutlineBackgroundMeasurementClassification SystemExperimental EvaluationConclusion
4
![Page 5: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/5.jpg)
Chat Bots vs. BotNetsBotNets – networks of compromised machines
some use chat systems (IRC) for C&C, others use P2P, HTTP, etc.
abuse various systemsChat Bots – automated chat programs
some are helpful, e.g., chat loggerscan abuse chat systems and their users
Send spam ,spread malicious software , mount phishing attacks
Our focus is on the Yahoo! Chat system.
5
![Page 6: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/6.jpg)
OutlineBackgroundMeasurementClassification SystemExperimental EvaluationConclusion
6
![Page 7: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/7.jpg)
MeasurementAugust-November 2007 – we collect data
August 2007 – Yahoo! adds CAPTCHAvery few chat bots
October 2007 – bots are back
7
![Page 8: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/8.jpg)
MeasurementAugust and November 2007
many chat bots1,440 hours of chat logs147 chat logs21 chat rooms
8
![Page 9: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/9.jpg)
MeasurementTo create our dataset, we read and label the
chat users ashuman, bot, or ambiguous
In total, we recognized 14 different types of chat botsdifferent triggering mechanismsdifferent text generation techniques
9
![Page 10: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/10.jpg)
Types of Chat BotsPeriodic Bots – sends messages based on
periodic timersRandom Bots – sends messages based on
random timersResponder Bots – responds to messages of
other usersReplay Bots – replays messages of other
users
10
![Page 11: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/11.jpg)
Humansinter-message delay – evidence of heavy tailmessage size – well fit by Exponential
(λ=0.034)
11
![Page 12: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/12.jpg)
Periodic Botsinter-message delay – several clusters with
high probabilitiesmessage size – messages built from templates
approximate a normal distribution
12
![Page 13: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/13.jpg)
Random Botsinter-message delay – Equilikely distribution at
40, 64, and 88; Uniform distribution 45-125message size – messages selected from a small
database
13
![Page 14: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/14.jpg)
Responder Botsinter-message delay – human-like timingmessage size – multiple templates of different
lengths
14
![Page 15: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/15.jpg)
Replay Botsinter-message delay – cluster with high
probabilities (replay bots are periodic)message size – human-like size, well fit by
Exponential (λ=0.028)
15
![Page 16: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/16.jpg)
OutlineBackgroundMeasurementClassification SystemExperimental EvaluationConclusion
16
![Page 17: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/17.jpg)
Classification SystemEntropy Classifier
detects abnormal behaviorbased on message sizes and inter-message
delaysaccurate but slow
Machine Learning Classifierdetects “learned” patternsbased on message contentfast but must be trained
17
![Page 18: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/18.jpg)
18
Observation – chat bots are less complex than humans, and thus, lower in entropyexploits the low entropy of chat bots
Corrected Conditional Entropy Test (CCE)estimates higher-order entropy
Entropy Test (EN)estimates first-order entropy
Entropy Classifier
18
![Page 19: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/19.jpg)
Machine Learning ClassifierObservation - chat spam like email spam is a
text classification problemexploits message content of chat bots
CRM114a powerful text classification system
19
![Page 20: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/20.jpg)
20
Hybrid Classification System entropy classifier builds and maintains
the bot corpus machine learning classifier uses the bot
and human corpora
BOT CORPUS
CLASSIFY AS CHAT BOT
HUMAN CORPUS
CLASSIFY AS HUMAN
INPUT
ENTROPY CLASSIFIER
MACHINE LEARNING
CLASSIFIER
![Page 21: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/21.jpg)
OutlineBackgroundMeasurementClassification SystemExperimental EvaluationConclusion
21
![Page 22: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/22.jpg)
Experimental EvaluationTypes of Chat Bots
Periodic BotsRandom BotsResponder BotsReplay Bots
Classifiersentropy classifier – 100 messagesmachine learning classifier – 25 messages
22
![Page 23: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/23.jpg)
Experimental EvaluationClassification Tests
Ent – entropy classifier SupML – fully-supervised ML classifier, trained
on AUG BOTSSupMLre – fully-supervised ML classifier,
retrained on NOV BOTSEntML – entropy-trained ML on AUG BOTS
23
![Page 24: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/24.jpg)
AUG BOTS NOV BOTS
periodic random respond periodic random replay human
test TP TP TP TP TP TP FP
EN(imd) 121/121 68/68 1/30 51/51 109/109 40/40 7/1713
CCE(imd) 121/121 49/68 4/30 51/51 109/109 40/40 11/1713
EN(ms) 92/121 7/68 8/30 46/51 34/109 0/40 7/1713
CCE(ms) 77/121 8/68 30/30 51/51 6/109 0/40 11/1713
OVERALL 121/121 68/68 30/30 51/51 109/109 40/40 17/1713
24
Entropy Classifier EN – entropy CCE – corrected conditional entropy (imd) – inter-message delay (ms) – message size
![Page 25: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/25.jpg)
AUG BOTS NOV BOTS
periodic random respond periodic random replay human
test TP TP TP TP TP TP FP
EN(imd) 121/121 68/68 1/30 51/51 109/109 40/40 7/1713
CCE(imd) 121/121 49/68 4/30 51/51 109/109 40/40 11/1713
EN(ms) 92/121 7/68 8/30 46/51 34/109 0/40 7/1713
CCE(ms) 77/121 8/68 30/30 51/51 6/109 0/40 11/1713
OVERALL 121/121 68/68 30/30 51/51 109/109 40/40 17/1713
25
EN(imd) and CCE(imd) problems against responder bots detect most other chat bots
![Page 26: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/26.jpg)
AUG BOTS NOV BOTS
periodic random respond periodic random replay human
test TP TP TP TP TP TP FP
EN(imd) 121/121 68/68 1/30 51/51 109/109 40/40 7/1713
CCE(imd) 121/121 49/68 4/30 51/51 109/109 40/40 11/1713
EN(ms) 92/121 7/68 8/30 46/51 34/109 0/40 7/1713
CCE(ms) 77/121 8/68 30/30 51/51 6/109 0/40 11/1713
OVERALL 121/121 68/68 30/30 51/51 109/109 40/40 17/1713
26
EN(ms) and CCE(ms) problems against random and replay
bots detect most other chat bots
![Page 27: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/27.jpg)
AUG BOTS NOV BOTS
periodic random respond periodic random replay human
test TP TP TP TP TP TP FP
EN(imd) 121/121 68/68 1/30 51/51 109/109 40/40 7/1713
CCE(imd) 121/121 49/68 4/30 51/51 109/109 40/40 11/1713
EN(ms) 92/121 7/68 8/30 46/51 34/109 0/40 7/1713
CCE(ms) 77/121 8/68 30/30 51/51 6/109 0/40 11/1713
OVERALL 121/121 68/68 30/30 51/51 109/109 40/40 17/1713
27
OVERALL detects all chat bots false positive rate is ~0.01 100 messages
![Page 28: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/28.jpg)
AUG BOTS NOV BOTS
periodic random respond periodic random replay human
test TP TP TP TP TP TP FP
Ent 121/121 68/68 30/30 51/51 109/109 40/40 17/1713
SupML 121/121 68/68 30/30 14/51 104/109 1/40 0/1713
SupMLre 121/121 68/68 30/30 51/51 109/109 40/40 0/1713
EntML 121/121 68/68 30/30 51/51 109/109 40/40 1/1713
28
Entropy and Machine Learning Classifiers Ent – entropy classifier (from last slide) SupML – fully-supervised ML classifier,
trained on AUG BOTS SupMLre – fully-supervised ML
classifier, retrained on NOV BOTS EntML – entropy-trained ML on AUG
BOTS
![Page 29: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/29.jpg)
AUG BOTS NOV BOTS
periodic random respond periodic random replay human
Test TP TP TP TP TP TP FP
Ent 121/121 68/68 30/30 51/51 109/109 40/40 17/1713
SupML 121/121 68/68 30/30 14/51 104/109 1/40 0/1713
SupMLre 121/121 68/68 30/30 51/51 109/109 40/40 0/1713
EntML 121/121 68/68 30/30 51/51 109/109 40/40 1/1713
29
Ent OVERALL results from previous slide
![Page 30: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/30.jpg)
AUG BOTS NOV BOTS
periodic random respond periodic random replay human
test TP TP TP TP TP TP FP
Ent 121/121 68/68 30/30 51/51 109/109 40/40 17/1713
SupML 121/121 68/68 30/30 14/51 104/109 1/40 0/1713
SupMLre 121/121 68/68 30/30 51/51 109/109 40/40 0/1713
EntML 121/121 68/68 30/30 51/51 109/109 40/40 1/1713
30
SupML has problems against November bots needs to be retrained for new bots
SupMLre detects all bots
![Page 31: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/31.jpg)
AUG BOTS NOV BOTS
periodic random respond periodic random replay human
test TP TP TP TP TP TP FP
Ent 121/121 68/68 30/30 51/51 109/109 40/40 17/1713
SupML 121/121 68/68 30/30 14/51 104/109 1/40 0/1713
SupMLre 121/121 68/68 30/30 51/51 109/109 40/40 0/1713
EntML 121/121 68/68 30/30 51/51 109/109 40/40 1/1713
31
EntML false positive rate is ~0.0005
(Ent is ~0.01) 25 messages
![Page 32: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/32.jpg)
OutlineBackgroundMeasurementClassification SystemExperimental EvaluationConclusion
32
![Page 33: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/33.jpg)
ConclusionMeasurements
overall, chat bots are less complex than humans
some chat bots more human-likeClassification System
exploits benefits of both classifiersquickly classifies known chat botsaccurately classifies unknown chat bots
33
![Page 34: Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ec85503460f94bd5d6c/html5/thumbnails/34.jpg)
Thank you !