Dmitry Spodarets_Infrastructure for the work of data scientists
-
Upload
flyelephant -
Category
Technology
-
view
87 -
download
2
Transcript of Dmitry Spodarets_Infrastructure for the work of data scientists
![Page 1: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/1.jpg)
Infrastructure for the work of Data Scientists
Dmitry Spodarets
![Page 2: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/2.jpg)
Who am I
Dmitry Spodarets• CEO and Founder at FlyElephant
• Lecturer at Odessa PolytechnicUniversity • Organizer of technical
conferences about AI, BigData, HPC, JS, FOSS …
![Page 3: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/3.jpg)
Agenda
• In production
• Database
• Notebook / IDE
• Software
• Deep Learning Tools
• Programming Languages
• Visualization
• Computing power
• Architecture
• Docker
• Cloud Services
• FlyElephant
![Page 4: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/4.jpg)
Data Science
![Page 5: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/5.jpg)
Data Science: people
EngineerDevOps
BusinessData Scientist
![Page 6: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/6.jpg)
In production
Trained Model Deployed Model
Live Data
Historical Data Result
![Page 7: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/7.jpg)
In production
Trained Model Deployed Model
Live Data
Historical Data Result
Training phase
![Page 8: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/8.jpg)
In production
Trained Model Deployed Model
Live Data
Historical Data Result
ProductionTraining phase
![Page 9: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/9.jpg)
Database
![Page 10: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/10.jpg)
Notebook / IDE
![Page 11: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/11.jpg)
Jupyter Notebook
![Page 12: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/12.jpg)
Jupyter Lab
![Page 13: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/13.jpg)
Jupyter LabJupyterLab• JupyterLab is the natural evolution of the Jupyter Notebook user
interface
• JupyterLab is an IDE: Interactive Development Environment
• Flexible user interface for assembling the fundamental building blocks of interactive computing
• Modernized JavaScript architecture based on npm/webpack, plugin system, model/view separation
• Built using PhosphorJS (http://phosphorjs.github.io/)
• Design-driven development process
https://github.com/jupyter/jupyterlabhttp://blog.jupyter.org/2016/07/14/jupyter-lab-alpha/
![Page 14: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/14.jpg)
Software
http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html
![Page 15: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/15.jpg)
Software
New (in this poll) tools that received at least 1% share votes in 2016 were:
• Anaconda, 16%• Microsoft other ML/Data Science tools, 1.6%
• SAP HANA, 1.2%
• XLMiner, 1.2%
http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html
![Page 16: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/16.jpg)
Software
• Tools with the highest growth (among tools with at least 15 users in 2015) were
http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html
(turi)
![Page 17: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/17.jpg)
Deep Learning Tools
• Tensorflow, 6.8%• Theano ecosystem (including Pylearn2), 5.1%• Caffe, 2.3%• MATLAB Deep Learning Toolbox, 2.0%• Deeplearning4j, 1.7%• Torch, 1.0%• Microsoft CNTK, 0.9%• Cuda-convnet, 0.8%• mxnet, 0.6%• Convnet.js, 0.3%• darch, 0.1%• Nervana, 0.1%• Veles, 0.1%• Other Deep Learning Tools, 3.7%
http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html
![Page 18: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/18.jpg)
Programming Languages
• R, 49.0 % share (was 46.9), 4.5% increase• Python, 45.8% share (was 30.3%), 51% increase• Java, 16.8% share (was 14.1%), 19% increase• Unix shell/awk/gawk 10.4% share (was 8.0%), 30% increase• C/C++, 7.3% share (was 9.4%), 23% decrease• Other programming/data languages, 6.8% share (was 5.1%), 34.1%
increase• Scala, 6.2% share (was 3.5%), 79% increase• Perl, 2.3% share (was 2.9%), 19% decrease• Julia, 1.1% share (was 1.1%), 1.6% decrease• F#, 0.4% share (was 0.7%), 41.8% decrease• Clojure, 0.4% share (was 0.5%), 19.4% decrease• Lisp, 0.2% share (was 0.4%), 33.3% decrease
http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html
![Page 19: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/19.jpg)
Visualization
• Tableau• Zoomdata• Qlik• Plotly• Matplotlib 2.0 (for Python)
• Ggplot2 (for R)
• D3.js
![Page 20: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/20.jpg)
Computing power
![Page 21: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/21.jpg)
Computing power
NVIDIADGX-1Deep Learning Supercomputer170/3TFLOPS(GPUFP16/CPUFP32)
intel xeon phi processor
nvidia tesla p100~5 TeraFLOPS
~3TeraFLOPS
![Page 22: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/22.jpg)
Computing power (NVIDIA)
18
ПРОДУКТЫ TESLA ДЛЯ ЛЮБЫХ ЗАДАЧ
СМЕШАННЫЕ ВЫЧИСЛЕНИЯ
Tesla P100 PCIE
МАСШТАБИРУЕМЫЕ ВЫЧИСЛЕНИЯ
Tesla P100 SXM2
СУПЕРКОМПЬЮТЕР ДЛЯ DEEP LEARNING
DGX-1
Полностью интегрированное решение для DL
Вычислительные центры для приложений которые хорошо
масштабируются на GPU
Вычислительные центры имеющие CPU и GPU
приложения
HYPERSCALE ВЫЧИЛСЕНИЯ
Tesla P4, P40
Создание DL-сервисов, обработка видео и
изображений, обучение нейросетей
![Page 23: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/23.jpg)
9
40x Efficient vs CPU, 8x Efficient vs FPGA
0
50
100
150
200
AlexNet
CPU FPGA 1x M4 (FP32) 1x P4 (INT8)
Imag
es/S
ec/W
att
Максимальная эффективность для масштабируемых серверов
P4 # of CUDA Cores 2560
Peak Single Precision 5.5 TeraFLOPS
Peak INT8 22 TOPS
Low Precision 4x 8-bit vector dot product with 32-bit accumulate
Video Engines 1x decode engine, 2x encode engine
GDDR5 Memory 8 GB @ 192 GB/s
Power 50W & 75 W
AlexNet, batch size = 128, CPU: Intel E5-2690v4 using Intel MKL 2017, FPGA is Arria10-115 1x M4/P4 in node, P4 board power at 56W, P4 GPU power at 36W, M4 board power at 57W, M4 GPU power at 39W, Perf/W chart using GPU power
TESLA P4
Computing power (NVIDIA)
![Page 24: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/24.jpg)
10
TESLA P40
P40 # of CUDA Cores 3840
Peak Single Precision 12 TeraFLOPS
Peak INT8 47 TOPS
Low Precision 4x 8-bit vector dot product with 32-bit accumulate
Video Engines 1x decode engine, 2x encode engines
GDDR5 Memory 24 GB @ 346 GB/s
Power 250W
0
20 000
40 000
60 000
80 000
100 000
GoogLeNet AlexNet
8x M40 (FP32) 8x P40 (INT8)
Imag
es/S
ec
4x Boost in Less than One Year
GoogLeNet, AlexNet, batch size = 128, CPU: Dual Socket Intel E5-2697v4
Максимальная пропускная способность для масштабируемых серверов
Computing power (NVIDIA)
![Page 25: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/25.jpg)
Computing power (NVIDIA)
![Page 26: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/26.jpg)
Computing power (NVIDIA)
![Page 27: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/27.jpg)
20
P100 ДЛЯ САМОГО БЫСТРОГО ОБУЧЕНИЯ
0,0x
0,5x
1,0x
1,5x
2,0x
2,5x
AlexnetOWT GoogLenet VGG-D Incep v3 ResNet-50
8x K80 8x M40 8x P40 8x P100 PCIE DGX-1
Deepmark test with NVCaffe. AlexnetOWT/GoogLenet use batch 128, VGG-D uses batch 64, Incep-v3/ResNet-50 use batch 32, weak scaling
K80/M40/P100/DGX-1 are measured, P40 is projected, software optimization in progress, CUDA8/cuDNN5.1, Ubuntu 14.04
Speedup
Img/sec
7172 2194 578 526 661
FP32 Training
Computing power (NVIDIA)
![Page 28: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/28.jpg)
21
NVLINK: ЛИНЕЙНОЕ МАСШТАБИРОВАНИЕ
1,0x
2,0x
3,0x
4,0x
5,0x
6,0x
7,0x
8,0x
1GPU 2GPU 4GPU 8GPU
AlexnetOWT
DGX-1
P100 PCIE
Deepmark test with NVCaffe. AlexnetOWT use batch 128, Incep-v3/ResNet-50 use batch 32, weak scaling, P100 and DGX-1 are measured, FP32 training, software optimization in progress, CUDA8/cuDNN5.1, Ubuntu 14.04
1,0x
2,0x
3,0x
4,0x
5,0x
6,0x
7,0x
8,0x
1GPU 2GPU 4GPU 8GPU
Incep-v3
DGX-1
P100 PCIE
1,0x
2,0x
3,0x
4,0x
5,0x
6,0x
7,0x
8,0x
1GPU 2GPU 4GPU 8GPU
ResNet-50
DGX-1
P100 PCIE
Коэффициент ускорения
2.3x
1.3x
1.5x
Computing power (NVIDIA)
![Page 29: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/29.jpg)
NVIDIA Deep Learning SDK
![Page 30: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/30.jpg)
Computing power (NVIDIA)
25
Jetson TX1
JETSON TX1
GPU 1 TFLOP/s 256-core Maxwell
CPU 64-bit ARM A57 CPUs
Memory 4 GB LPDDR4 | 25.6 GB/s
Video decode 4K 60Hz
Video encode 4K 30Hz
CSI Up to 6 cameras | 1400 Mpix/s
Display 2x DSI, 1x eDP 1.4, 1x DP 1.2/HDMI
Wifi 802.11 2x2 ac
Networking 1 Gigabit Ethernet
PCIE Gen 2 1x1 + 1x4
Storage 16 GB eMMC, SDIO, SATA
Other 3x UART, 3x SPI, 4x I2C, 4x I2S, GPIOs
![Page 31: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/31.jpg)
Computing power (Intel)
![Page 32: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/32.jpg)
Computing power (Intel)
![Page 33: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/33.jpg)
Computing power (Intel)
![Page 34: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/34.jpg)
Computing power (Intel)
• Intel Math Kernel Library (Intel MKL)Natively supports C, C++ and Fortran Development. Cross-language compatible with Java, C#, Python and other languages.
• Intel Data Analytics Acceleration Library (Intel DAAL)Includes Python, C++, and Java APIs and connectors to popular data sources including Spark and Hadoop.
• Intel MPI LibraryNatively supports C,C++ and Fortran development
![Page 35: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/35.jpg)
Architecture
![Page 36: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/36.jpg)
Docker
![Page 37: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/37.jpg)
Cloud Services
Amazon Machine Learning
Azure Machine Learning
Google Machine Learning
![Page 38: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/38.jpg)
Your Home for High Performance Computing
Compute ● Collaborate ● Manage
![Page 39: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/39.jpg)
Solutions for
Data Science EngineeringSimulation
Rendering Academia
![Page 40: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/40.jpg)
FlyElephant Platform
• Computing resources. You can get quick access to different Clouds or HPC clusters from one place.
• Ready-computing infrastructure. With one click you get your software calculation infrastructure and the computing resources that you need.
• Collaboration & Sharing. Create projects, invite colleagues to join them and share the calculation results within the projects.
• Fast Deployment. Deploy your computing tasks as APIs without engineering or DevOps.
• Expert Community. Our partner companies and individual experts will help solve any of your issues very quickly.
![Page 41: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/41.jpg)
Cloud computing
![Page 42: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/42.jpg)
Tools
![Page 43: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/43.jpg)
Computing on a HPC cluster
![Page 44: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/44.jpg)
Support of Docker
![Page 45: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/45.jpg)
Jupyter Notebooks
![Page 46: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/46.jpg)
Storage
![Page 47: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/47.jpg)
Community
![Page 48: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/48.jpg)
Projects
![Page 49: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/49.jpg)
![Page 50: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/50.jpg)
Roadmap(November)
• FlyElephant Box (Closed Beta Preview)• AWS
• OpenStack• Your Docker images
• MPI cluster
• Hadoop/Spark cluster
![Page 51: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/51.jpg)
www.flyelephant.net
![Page 52: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/52.jpg)
slack.flyelephant.net
![Page 53: Dmitry Spodarets_Infrastructure for the work of data scientists](https://reader031.fdocuments.in/reader031/viewer/2022030209/58ad04d11a28ab0b408b4b7f/html5/thumbnails/53.jpg)
We are looking for partners