Slow learners, socially maladjusted and emotionally disturbed students
Presentation on experimental setup for verigying - "Slow Learners are Fast"
-
Upload
robin-srivastava -
Category
Technology
-
view
196 -
download
0
Transcript of Presentation on experimental setup for verigying - "Slow Learners are Fast"
![Page 1: Presentation on experimental setup for verigying - "Slow Learners are Fast"](https://reader033.fdocuments.in/reader033/viewer/2022052323/5585ac24d8b42ae22a8b4e51/html5/thumbnails/1.jpg)
Machine Learning on Cell Processor
Supervisor: Dr. Eric McCreath Student: Robin Srivastava
![Page 2: Presentation on experimental setup for verigying - "Slow Learners are Fast"](https://reader033.fdocuments.in/reader033/viewer/2022052323/5585ac24d8b42ae22a8b4e51/html5/thumbnails/2.jpg)
Background and Motivation
Machine Learning
Batch Learning
Online Learning
Email-N ……..… email-2 Email-1
HAM
SPAM
![Page 3: Presentation on experimental setup for verigying - "Slow Learners are Fast"](https://reader033.fdocuments.in/reader033/viewer/2022052323/5585ac24d8b42ae22a8b4e51/html5/thumbnails/3.jpg)
Background and Motivation
Machine Learning
Sequential in Nature
Batch Learning
Online Learning
Email-N ……..… email-2 Email-1
HAM
SPAM
![Page 4: Presentation on experimental setup for verigying - "Slow Learners are Fast"](https://reader033.fdocuments.in/reader033/viewer/2022052323/5585ac24d8b42ae22a8b4e51/html5/thumbnails/4.jpg)
Object Performance evaluation of a parallel online machine
learning algorithm (Langford et. al. [1]) Target Machines
Cell Processor: One 3 GHz 64-bit IBM PowerPC, six specialized co-processors
Intel Dual Core Machine: 2GHz dual core processor, 1.86 GB of main memory
![Page 5: Presentation on experimental setup for verigying - "Slow Learners are Fast"](https://reader033.fdocuments.in/reader033/viewer/2022052323/5585ac24d8b42ae22a8b4e51/html5/thumbnails/5.jpg)
Stochastic Gradient Descent Step 1: Initialize weight vector w0 with some arbitrary
values Step 2: Update the weight vector as follows
where is the gradient of error function and is the learning rate
Step 3: Follow Step 2 for all the units for data €
w(t+1) = wt −η∇E wt( )
€
∇E
€
η
![Page 6: Presentation on experimental setup for verigying - "Slow Learners are Fast"](https://reader033.fdocuments.in/reader033/viewer/2022052323/5585ac24d8b42ae22a8b4e51/html5/thumbnails/6.jpg)
Delayed Stochastic Gradient Descent Step 1: Initialize weight vector w0 with some arbitrary
values Step 2: Update the weight vector as follows
where is the gradient of error function and is the learning rate
Step 3: Follow Step 2 for all the units for data €
w(t+1) = wt −η∇E wt−τ( )
€
∇E
€
η
![Page 7: Presentation on experimental setup for verigying - "Slow Learners are Fast"](https://reader033.fdocuments.in/reader033/viewer/2022052323/5585ac24d8b42ae22a8b4e51/html5/thumbnails/7.jpg)
Implementation Model C
ompl
ete
Dat
aset
![Page 8: Presentation on experimental setup for verigying - "Slow Learners are Fast"](https://reader033.fdocuments.in/reader033/viewer/2022052323/5585ac24d8b42ae22a8b4e51/html5/thumbnails/8.jpg)
Implementation Dataset – TREC 2007 Public Corpus
Number of mail: 75,419 Each mail classified as either ‘ham’ or ‘spam’
Pre-processing Total number of features extracted: 2,218,878 Pre-processed email format
<Number of features><space><index>:<count><space>…………..<index>:<count>
![Page 9: Presentation on experimental setup for verigying - "Slow Learners are Fast"](https://reader033.fdocuments.in/reader033/viewer/2022052323/5585ac24d8b42ae22a8b4e51/html5/thumbnails/9.jpg)
Memory Requirement Algorithm Implemented
Online Logistic Regression with delayed update Requirement per level of parallelization
Two private copy of weight vectors Two shared copy of weight vectors Two error gradients Required Dimension for each = Number of features = 2,218,878 Data type: Float (On Cell takes 4 bytes) Total = (6 x 2218878) x 4 = 53,253,072 bytes = 50.78 MB Size occupied by other auxiliary variables
Alternatively Make only shared copy use the full dimension Total size = (2 x 2218878) x 4 = 16.9 MB + others
![Page 10: Presentation on experimental setup for verigying - "Slow Learners are Fast"](https://reader033.fdocuments.in/reader033/viewer/2022052323/5585ac24d8b42ae22a8b4e51/html5/thumbnails/10.jpg)
Limitations on Cell Memory limitation of SPE
Available: 256 KB Required: approx. 51 MB Work Around:
Reduced the number of features Done one more level of pre-processing
SIMD limitation The time wasted in preparing the data for SIMD surpassed its
benefits for this implementation
![Page 11: Presentation on experimental setup for verigying - "Slow Learners are Fast"](https://reader033.fdocuments.in/reader033/viewer/2022052323/5585ac24d8b42ae22a8b4e51/html5/thumbnails/11.jpg)
Results Serial implementation of logistic regression on Intel Dual
core took 36.93 and 36.45 sec respectively for two consecutive executions.
Parallel implementation using stochastic gradient process
![Page 12: Presentation on experimental setup for verigying - "Slow Learners are Fast"](https://reader033.fdocuments.in/reader033/viewer/2022052323/5585ac24d8b42ae22a8b4e51/html5/thumbnails/12.jpg)
Results (contd.) Performance on Cell
Tim
e in
mic
rose
cond
s
![Page 13: Presentation on experimental setup for verigying - "Slow Learners are Fast"](https://reader033.fdocuments.in/reader033/viewer/2022052323/5585ac24d8b42ae22a8b4e51/html5/thumbnails/13.jpg)
References ① John Langford, Alexander J. Samola and Martin Zinkevich.
Slow learners are fast published in Journal of Machine Learning Research 1(2009)
② Michael Kistler, Michael Perrone, Fabrizio Petrini. Cell Multiprocessor Communication Network: Built for Speed.
③ Thomas Chen , Ram Raghavan , Jason Dale and Eiji Iwata. Cell Broadband Engine Architecture and its first implementation
④ Jonathan Bartlett. Programming high-performance applications on the Cell/B.E. processor, Part 6: Smart buffer management with DMA transfers
⑤ Introduction to Statistical Machine Learning, 2010 course assignment 1
⑥ Christopher Bishop, Pattern Recognition and Machine Learning.