Cognitive Engine: Boosting Scientific Discovery
-
Upload
diannepatricia -
Category
Technology
-
view
240 -
download
2
Transcript of Cognitive Engine: Boosting Scientific Discovery
Scalable Software Systems Laboratory
Scalable Software Systems Laboratory Department of Electrical and Computer Engineering
CognitiveEngine: Boosting Scientific Discovery Xiaolin Andy Li http://www.andyli.ece.ufl.edu
Scalable Software Systems Laboratory
Information Technology Text in here
1939 1946 1970 1980 1990 New Age
ENIAC
ARPANET The Internet Fiber Optics
Vint Cerf
Bob Kahn Charles Kuen Kao
Mosaic Web Browser Marc Andreessen and Eric Bina
WWW Tim Berners Lee
Martin Cooper, 1973 Steve Jobs, 2007
1G, 1980s 2G, 1990s 3G, 2000s 4G, 2010s
ABC John Atanasoff BSEE@UF, 1925
Scalable Software Systems Laboratory
Cloud Computing n SaaS: Software as a Service
n Salesforce, 1999
n StaaS: Storage as a Service n Amazon S3, 2006; Dropbox, 2008
n PaaS: Platform as a Service n Google App Engine, 2008; Microsoft Azure, 2010; n Docker, 2013; IBM BlueMix, 2014
n IaaS: Infrastructure as a Service n Amazon AWS, 2002; Eucalyptus, 2008 n Rackspace/NASA OpenStack, 2010; Google Compute Engine, 2012
2000
Scalable Software Systems Laboratory
SDN: Software-Defined Networking
*RRJOH�&RQILGHQWLDO�DQG�3URSULHWDU\
*RRJOHV�2SHQ)ORZ�:$1
Nick McKeown
Scott Schenker
Martin Casado
2009
Scalable Software Systems Laboratory
Geoffrey Hinton, Yann LeCun, Yoshua Bengio, Andrew Ng, Demis Hassabis
2013
Scalable Software Systems Laboratory
1970 àà 1990 àà 2010 àà 2030 àà
2D IT Booming Cycles
IT Boom V2 IT Boom V3 IT Boom V1
1950 à à à 1980 à à à 2010 à à à 2040
3D Computing Platform Cycles
2nd Platform 3rd Platform 1st Platform 4th Platform
Towards Intelligent Platform IT Boom V4
Scalable Software Systems Laboratory
Time for Change Current Unified Big Systems
Hadoop
OpenStack
Torque
Pig
Dryad
Pregel
Percolator
CIEL
Container Virtual Machine Bare Metal
Scalable Software Systems Laboratory
GatorCloud - Towards Software-Defined Ecosystems
OpenFlow
Software-Defined
Computing
SDC Apps
Runtime
Big Data
PBS/Torq
Virtual Machine Container
Nova Controller
HPC
Program Models
Software-Defined
Networking
SDN Apps
Low Latency
SDN Hypervisor
OVS
OF-Config
Open Flow
GENI
SDN Controller
High Throughp
ut
Scalable Software Systems Laboratory
GatorCloud Network Topology
2*10Gb/s upgraded to
2*100Gb/s
National Lambda Rail, Internet2, GENI
(via Jacksonville)
UF
Physics CMS/OSG
Data Center
GatorVisor
SSRB CNS Lab
NEB S3Lab
CISE Lab
Apps Controller
Nets Controller
8U
46U
8U
8U
1U2U
3U
3U
3U
8U
46U
8U
8U
1U2U
3U
3U
3U
Data Cloud VM Cloud Cloud Portal
VM Cloud Data Cloud
2
2
2
2
100G
100G
100G 100G 10G
40G
4
4
Cloud Orange Cloud Green
FLR
ECDC HPC Center - ES
Physics HPC Center - Phy 2
100G
Larsen HPC Center - Eng
SSRB Campus Datacenter
Hybrid Controller
Larsen HCS Lab
40G 4
2*10Gb/s upgraded to
2*100Gb/s
Golfer Golfer
Deployed in 2012, one of the first 100Gbps SDN Campus Research Networks in USA
SDN Switch
Phase 1 SDN, 40G/10G Phase 2 SDN, 100G
SDN Control Plane
Scalable Software Systems Laboratory
HiPerGator Supercomputer
Ranking from top500 supercomputer list # 4 among public universities in US # 8 among universities in US # 115 among all machines listed
Major Data Centers at UF HiPerGator Supercomputer CMS/OSG Physics HPC Centers ICBR: Interdisciplinary Center for Biotech Research CTSI: Clinical and Translational Science Institute ACIS/CAC Data Center CHREC Data Center (Novo-G) NEB Data Center
Scalable Software Systems Laboratory
What Changed?
Lecture 1 -
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Convolution Pooling Softmax Other
GoogLeNet VGG MSRA SuperVision
[Krizhevsky NIPS 2012]
Year 2012 Year 2014 Year 2010
Dense grid descriptor: HOG, LBP
Coding: local coordinate, super-vector
Pooling, SPM
Linear SVM
NEC-UIUC
[Lin CVPR 2011] [Szegedy arxiv 2014] [Simonyan arxiv 2014]
4-Jan-16 31
Year 2015
Revolution of Depth
34
5866
86
HOG, DPM AlexNet(RCNN)
VGG(RCNN)
ResNet(Faster RCNN)*
PASCAL VOC 2007 Object Detection mAP (%)
shallow8 layers
16 layers
101 layers
*w/ other improvements & more data
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”. arXiv 2015.
Engines ofvisual recognition
Revolution of Depth
3.57
6.7 7.3
11.7
16.4
25.828.2
ILSVRC'15ResNet
ILSVRC'14GoogleNet
ILSVRC'14VGG
ILSVRC'13 ILSVRC'12AlexNet
ILSVRC'11 ILSVRC'10
ImageNet Classification top-5 error (%)
shallow8 layers
19 layers22 layers
152 layers
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”. arXiv 2015.
8 layers
Beyond Human
Scalable Software Systems Laboratory
CognitiveEngine: Beyond Hadoop and Spark n Bulk Synchronization Parallel
n Both a blessing and a curse n Easy to schedule and arrange dependency n All synchronized
Map
Reduce
Stage
Stage
Stage
Stage
Scalable Software Systems Laboratory
ADD Design Choices n Asynchronous Distributed Datasets (ADD)
n Inherits the easy-to-use programming interface n Differentiate static data (samples) and the iteratively updated data
(parameters) n Automatic asynchronous updates, with user specified bound n Asynchronous-aware scheduling
Scalable Software Systems Laboratory
ADD local copy
ADD System
ADD Server
ADD Server
ADD Client
ADD Client
ADD Client
Training samples
Training samples
Training samples
Async push
Async pull
Feed Forward + Back Propagation
ADD features • Async push and pull of model
update • Users are allowed to specify the
condition of returning from pull/push, so that they don’t have to wait
• Adaptive model update method: all-to-one/tree aggregation/P2P approximate update
• User-controllable tradeoff between asynchrony and convergence rate
• Model snapshot and sharing
Scalable Software Systems Laboratory
Execution
Static Data
Dynamic Data Handler Function State
ADD Partition
ADD Task
ADD Task
ADD Task
Locality Iteration, etc.
Fetch
Compute
Update
Bookkeeping
Scalable Software Systems Laboratory
Advantages n Asynchronous Update n IO / CPU overlap n Fault tolerant n Derive and live with state-of-the-art system
n Spark
n Sharing among jobs and users n Maximizing parallelism of GPUs
Scalable Software Systems Laboratory
DeepApps n DeepScience
n DeepSky n DeepDefense
n DeepHealth n DeepBipolar n DeepVital n DeepGuard n DeepCancer n DeepBot/Dingding
n DeepDrug
Scalable Software Systems Laboratory
The animation shows how Kepler detects planets. As the planet passes between the host star and the spacecraft, the observed star brightness decreases slightly, signaling the potential detection of a planet. Kepler looked at over 150,000 stars continuously for four years in the constellations Cygnus and Lyra, seeking to record the slight periodic brightness changes in stars that could reveal the presence of planets.
Kepler detects planets by taking a photometric measurement of the stars in its field of view every 30 minutes. A planet transit will show as a small periodic dip in the “light curve” of a star over time.
Kepler Data
Goal: Detect planet(s) currently missed by the Kepler Team’s automatic search programs -- likely “super-Earths” with long periods
Scalable Software Systems Laboratory
Quasar Spectra Pair Method
The identification of 2175 bump is based on Mgii absorber catalog with limitation: • We can only identify the 2175 bump in the redshift
range from 0.7 to 2.5. • The method is based on Mg II absorber catalog. If the
Mg ii absorber catalog is not complete, the 2175 bump sample may not be complete.
Scalable Software Systems Laboratory
Analysis of the Effects
(a) Input data with bumps (c) Feature map of last convolutional layer
(b) Filters of the first convolutional layer
Scalable Software Systems Laboratory
Reconstruction of Bumps
(d) Reconstructed input image with bump
(e) Reconstructed input image without bump
Scalable Software Systems Laboratory
DeepDefense Architecture
LSTM
CTC
DataSequence1000
DataSequence2000
DataSequence3000
DataSequence4000
CNN
CNN
CNN
CNN
CNN
LSTM LSTM
LSTM LSTM
LSTM
LSTM LSTM
LSTM
LSTM LSTM
LSTM
LSTM LSTM
LSTM LSTM
LSTM LSTM
Spatial
Temp
oral, Recurrent, C
ascading
LSTM
BPTT
BPTS
Feature Analysis
Ensemble Analysis
Knowledge Fusion
Performance Evaluation
BPTT: Backpropogation Through Time BPTS: Backpropogation Through Space CNN: Convolution Neural Network LSTM: Long Short-Term Memory CTS: Connectionist Temporal Classification
Searchable O
utp
uts
Scalable Software Systems Laboratory
Data-Driven DeepHealth
With Azra Bihorac, Lizi Wu, Parisa Rashidi etc
Scalable Software Systems Laboratory
Bipolar Disorder & Challenge Objectives • Bipolar disorder is a brain disease that causes
unusual mood shifts • Estimated 51% of affected population go
untreated in a given year • Detection not straightforward - symptoms and
test metrics not too dissimilar from other brain disease
• Recent studies indicate heritability and genetic factors as causes opening new area of detection using genome data.
• CAGI challenge given to predict the bipolar disorder using exomes .
• Exome sequencing data of 1000 samples with 500 for training and 500 for prediction challenge Image source http://www.nimh.nih.gov/health/statistics/prevalence/
bipolar-disorder-among-adults.shtml
Scalable Software Systems Laboratory
Data Pre-Processing n Extracted genotype information from the exomes n The genotypes were 0/0,0/1,1/1 and ./. n One-hot-encoding transformation on the genotypes i.e 0/0
encoded as 0100, 0/1 encoded as 0010,etc. n One hot encoding treats all categorical variables equidistant
Scalable Software Systems Laboratory
DeepBipolar V1: Convolutional DNN Genotype data: 2008 * 1000 * 1
32 kernels,kernel size: 4*4*1 , stride: (1,4)
32 kernels,kernel size: 3*3*32 , stride: (1,1)
Max Pooling: Pool size (3,3), stride=(3,2)
2 x 64 kernels,size: 3*3*32 , stride: (1,1)
MP:size (1,3), stride=(3,3)
128 kernels,kernel size: 3*3*64 , stride: (1,1)
128 kernels,kernel size: 3*3*128 , stride: (1,1)
Max Pooling: size (2,2), stride=(2,2)
128 kernels,size: 3*3*128,stride: (1,1)
MP:size (3,3), s=(2,2)
1 kernels,size: 1*1,stride: (1,1)
Fully Connected Layer 64 neurons
Sigmoid - Probability Output Layer
997
502
32 32
995
500
331
249
32
64
329
247
64
327
81
109
245
107
128
79 128
77
105
52
38
128
36
50
128
128
17
24
24
17
1
64
Scalable Software Systems Laboratory
DeepBipolar V2: Convolutional AutoEncoder Genotype data: 2008 * 1000 * 1
32 kernels,kernel size: 4*4*1 , stride: (1,4)
32 kernels,kernel size: 3*3*32 , stride: (1,1)
Max Pooling(MP):size (3,3), stride=(3,2)
64 kernels,kernel size: 3*3*32 , stride: (1,1)
64 kernels,kernel size: 3*3*64 , stride: (1,1)
Max Pooling: Pool size (1,3), stride=(3,3)
128 kernels,kernel size: 3*3*64 , stride: (1,1)
997
502
32 32
995
500
331
249
32
64
329
247
64
327
81
109 107
128
79 128
128
Up Sampling: size (3,3), stride=(3,2)
109
81
245
2 x 64 kernels,size: 3*3*64 Deconvolution
Up Sampling: size (1,3), stride=(3,3)
2 x 32 kernels, size: 3*3*64 Deconvolution
64
327
245
64
329
247
331
249
64 995
32
500
1000
2008
32
1*1 Convolution layer
2008
1000
1
Input data
Scalable Software Systems Laboratory
SDE Controller
SDDC Hypervisor
SDE App Store
GatorCloud: SDN-enabled Campus Cloud
DeepCloud Towards Composable Intelligent Platform
Golfer
GolfVisor
8U
46U
8U
8U
1U2U
3U
3U
3U
8U
46U
8U
8U
1U2U
3U
3U
3U
8U
46U
8U
8U
1U2U
3U
3U
3U
8U
46U
8U
8U
1U2U
3U
3U
3U
Gator, GENI, and Testbed Racks
Internet2/NLR
100G
100G
GENI Apps
GolfStore
Clo
ud D
ashb
oard
Users Researchers Scientists
Developers
Engineers
Admins
IaaS
Paa
S
SaaS
CP
SaaS
Naa
S
HP
Caa
S
iBD
aaS
Security Apps
Network Apps
BigData Apps
Self-
Prot
ectio
n
Major Data Centers at UF HiPerGator Supercomputer CMS/OSG Physics HPC Centers ICBR: Interdisciplinary Center
for Biotech Research CTSI: Clinical and Translational
Science Institute ACIS Data Center NEB Data Center
HPC Apps
Staa
S
Scalable Software Systems Laboratory
S3Lab Research Highlights Finest
Smartphone Indoor
Location Ecosystem
First SDN-enabled Campus Cloud GatorCloud
Fastest Campus
Research Network 100G
IMPACT
Fourth DeepCloud Intelligent Platform
Scalable Software Systems Laboratory
NSF I/UCR Center for Big Learning (Pending)
Deep Learning
Big Systems
Big Data
Intelligence
Member Benefits
• Leveraging the world-class talents (about 40 professors and 200 graduate students) in the era of big learning, big data, and big systems.
• Realizing a 10:1 return on investment.
• Discovering top students in top universities.
• Joining peer members from high-profile companies and research units.
CBL Consortium: University of Florida (UF, South), Carnegie Mellon University (CMU, East), University of Missouri at Kansas City (UMKC, Central), University of Notre Dame (ND, North), and University of Oregon (UO, West), and a large number of industrial partners.