Deep recurrent neutral networks for Sequence Learning in Spark
Advanced spark deep learning
-
Upload
adam-gibson -
Category
Data & Analytics
-
view
1.569 -
download
0
Transcript of Advanced spark deep learning
![Page 1: Advanced spark deep learning](https://reader035.fdocuments.in/reader035/viewer/2022062902/58f9a918760da3da068b6af2/html5/thumbnails/1.jpg)
DeepLeanring4jData Parallel deep learning on spark
![Page 2: Advanced spark deep learning](https://reader035.fdocuments.in/reader035/viewer/2022062902/58f9a918760da3da068b6af2/html5/thumbnails/2.jpg)
The JVM is too slow for numerical compute
Great at network I/O and data access
Great streaming infrastructure
Hardware accel required
Spark - Data Access Layer.
Cuda - Compute layer
![Page 3: Advanced spark deep learning](https://reader035.fdocuments.in/reader035/viewer/2022062902/58f9a918760da3da068b6af2/html5/thumbnails/3.jpg)
Current Landscape
Spark assumes columnar data
Binary (audio/images) is becoming more important
HDFS is great for storing blobs
SQL doesn’t work for pixels and audio frames
The ingredients are here for something great
![Page 4: Advanced spark deep learning](https://reader035.fdocuments.in/reader035/viewer/2022062902/58f9a918760da3da068b6af2/html5/thumbnails/4.jpg)
The solution
Javacpp (cython for java)
64 bit pointers for efficient contiguous access of image and audio data
Leverage java’s distributed systems ecosystem
Add new numerical compute layer (libnd4j)
Allow for heterogeneous compute
Off heap memory
Easy deployment
Data pipelines as a first concern
![Page 5: Advanced spark deep learning](https://reader035.fdocuments.in/reader035/viewer/2022062902/58f9a918760da3da068b6af2/html5/thumbnails/5.jpg)
SKIL (Skymind Intelligence Layer)
![Page 6: Advanced spark deep learning](https://reader035.fdocuments.in/reader035/viewer/2022062902/58f9a918760da3da068b6af2/html5/thumbnails/6.jpg)
JavaCpp
Auto generate JNI bindings for C++ by parsing classes
Allows for easy maintenance and deployment of c++ binaries in java
Write efficient ETL pipelines for images via opencv (javacv)
Integrate other c++ deep learning frameworks (tensorflow,caffe,..)
Allows for productionization of fast (but academic) C++ code using java (kafka,spark) for ETL
64 bit pointers (wasn’t possible before)
![Page 7: Advanced spark deep learning](https://reader035.fdocuments.in/reader035/viewer/2022062902/58f9a918760da3da068b6af2/html5/thumbnails/7.jpg)
“Actual” Streaming frameworks
Kafka
Flink
Spark Streaming
Apex
![Page 8: Advanced spark deep learning](https://reader035.fdocuments.in/reader035/viewer/2022062902/58f9a918760da3da068b6af2/html5/thumbnails/8.jpg)
![Page 9: Advanced spark deep learning](https://reader035.fdocuments.in/reader035/viewer/2022062902/58f9a918760da3da068b6af2/html5/thumbnails/9.jpg)
Nd4j
Heterogenous codebase
Supports cuda, x86 and soon (power)
Shared indexing logic for writing ndarray routines
Memory management in java (even cuda memory!)
Openmp on cpu + routines for common things such as reduce
Pinned memory and async operations
JIT allocation
Spark friendly (runs on multiple threads and devices)
![Page 10: Advanced spark deep learning](https://reader035.fdocuments.in/reader035/viewer/2022062902/58f9a918760da3da068b6af2/html5/thumbnails/10.jpg)
Deployment
Juju
Runs as spark job
Easy to embed in production
![Page 11: Advanced spark deep learning](https://reader035.fdocuments.in/reader035/viewer/2022062902/58f9a918760da3da068b6af2/html5/thumbnails/11.jpg)
Canova
One interface for ETL
Integrates with spark
Easy to extend to write your own custom data pipelines
One interface for generating NDArrays
![Page 12: Advanced spark deep learning](https://reader035.fdocuments.in/reader035/viewer/2022062902/58f9a918760da3da068b6af2/html5/thumbnails/12.jpg)
Conclusion
Built to be friendly to the JVM ecosystem
Allows java to do what its good at
Numpy in java means easy to port things like scikit learn
Data Parallel means commodity hardware JVM assumes works
![Page 13: Advanced spark deep learning](https://reader035.fdocuments.in/reader035/viewer/2022062902/58f9a918760da3da068b6af2/html5/thumbnails/13.jpg)
Future
Model Parallelism
Opencl
Sparse support
Reinforcement learning