ESASky SSOSS: Solar System Object Search Service and the ...
Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ......
Transcript of Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ......
![Page 1: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/1.jpg)
Architectures for Scalable Media Object Search
Dennis Sng Deputy Director & Principal Scientist
NVIDIA GPU Technology Workshop
10 July 2014
![Page 2: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/2.jpg)
ROSE LAB OVERVIEW
2
![Page 3: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/3.jpg)
Research Overview
3
Big Data (structured)
Search (Real World
Objects)
• Structured into multiple vertical application domains
• For machine learning & testing
Large Database of
Media Objects
• Fast & rich in content
• Real-time & contextual consumer behaviour analysis
Next-Generation
Object Search
• Scalable media search, processing, & delivery
• Testbed for experimentation
Media Cloud Platform
![Page 4: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/4.jpg)
Solution Architecture
4
Visual Object Database
Infrastructure Cloud
Algorithm
SDK API API API API
Media Analysis
Rigid Object Search
Media Processing
Deformable Object Retrieval
Consumer Analysis
API Tools
Applications Smart Advertising
Education Tourism
Digital Archive
Smart Surveillance
E-Commerce
Transportation/Logistics
![Page 5: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/5.jpg)
Object Categorisation • 2D (Planar) objects: Logos, book
covers, CD covers, labels, coins
• 3D rigid objects: Cars, hardware, product packages
• Deformable objects: Clothes, shoes, bags, toys
• Faces: Genders, age groups, profiles, ethnicity, sentiment
• Landmark & Scenery
5
![Page 6: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/6.jpg)
ROSE Partner Ecosystem
6
Commercial Partners
NTU/PKU Joint-Lab Large-Scale Object Dataset & Analytics
Mobile Search with Contextual Mobility
Media Cloud Platform
Research Partners
Technology Partners
Supported by
Green Expo Technology
![Page 7: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/7.jpg)
STRUCTURED OBJECT DATABASE
7
P3: Media Cloud
Platform
P1: Structured
Object Database
P2:
Engines
Search
Object
Big Data Search Cloud Computing
![Page 8: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/8.jpg)
Framework: Object Database
Module 1 Crawler
Module 2 Resource Database
Module 3 Tools
Module 4 Object
Database
![Page 9: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/9.jpg)
Large Scale while High Quality
220,000 raw images
170,000 non-identical images
45,000 clean images
17 million non-identical images
8 million clean images
21 million raw images
9
12/2013
05/2014
![Page 10: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/10.jpg)
Large-Scale Structured Object Database
eCommerce & Digital
Advertising
Handbag
Shoes
Clothes
Book cover
Trademark & Logo
…
Tourism & Transport
Landmark
Place of interest
Road signs
…
Lifestyle & Hobbies
Fish
Fruit
Cat
Car
…
50M structured objects (clean)
-
1,000,000
2,000,000
3,000,000
4,000,000
5,000,000
6,000,000
7,000,000
8,000,000
eCommerce& Digital
Advertising
Tourism Lifestyle &Hobbies
#Images
Structured Object Database (Apr 2014)
![Page 11: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/11.jpg)
OBJECT SEARCH
11
P3: Media Cloud
Platform
P1: Structured
Object Database
P2:
Engines
Search
Object
Big Data Search Cloud Computing
![Page 12: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/12.jpg)
Whole Image Retrieval
……
Query Image Database Ranked Images
+ =
![Page 13: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/13.jpg)
Visual Object Search
……
Query Object Database
+ =
Ranked Object Detections from Cluttered Images
![Page 14: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/14.jpg)
Structured Object
Database
Indexer
Matching Module
User input
Engine output
Human Computer Interface
User Cro
wd
o
utp
ut
Engi
ne
in
pu
t
Index 1
Index 2
Index n
…
Crowd
User Assisted Object Search Engine
![Page 15: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/15.jpg)
Branded Bag Recognition
• For people, use bag image to find bag Street Scene
Online shops
M40249 TWD: $3800 LV Artsy mm USD: 100 Wholesale fashion USD:110 Artsy MM m40249
![Page 16: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/16.jpg)
Identifying people
Retrieving images of a person
Face Recognition & Retrieval
![Page 17: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/17.jpg)
MEDIA CLOUD PLATFORM
17
P3: Media Cloud
Platform
P1: Structured
Object Database
P2:
Engines
Search
Object
Big Data Search Cloud Computing
![Page 18: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/18.jpg)
P3: Media Cloud Platform
• Testbed – Design an innovative multimedia cloud platform as a
test-bed for large-scale applications
• GPUs – For accelerating machine learning and object search
• Media Processing Technologies – Develop new media processing technologies in
transcoding, visual analytics and quality assessment
18
P3: Media Cloud
Platform
P1: Structured
Object Database
P2:
Engines
Search
Object
![Page 19: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/19.jpg)
ROSE Lab Physical Infrastructure
19
IT Cloud • P3 Cloud cluster
– 7x Dell R720 2U server without GPU
– 84x Intel SNB Processor @2.3GHz
– 434GB RAM @ 1600MHz
• Network Infrastructure
– 1x CISCO Catalyst 3750-x Layer-3 Switch, 48 port (with 1 Gbps link to NTU Campus Network)
– 5x CISCO Catalyst 3560-x Layer-2 IP-based Switch, 48 port
HPC Cloud • GPU Cloud Cluster
– 1x Dell R720 2U server with 2x K20m GPU
– 3x Dell R720 2U server with 1x K20m GPU
– 1x Dell R720 2U server with 1x Intel Xeon Phil MIC
Big Data • P1 Database cluster
– 1x HP Proliant 2U server
– 1x JBOD Storage Chasis
– 1x D-Link NAS with 16TB storage
• Storage Cluster
– 4x Novatte 12-Bay Storage Server
– Up to 160TB Storage Capacity
Experimental Zone • P2 Development cluster
– 1x HP Proliant 2U server
• Experimental GPU Platform
– 3 x GPU Workstations: 2 x Titan Black/GTX770
![Page 20: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/20.jpg)
ROSE Systems Infographic
20
Servers & Storage
• 23 Servers & Workstations
• 135TB Storage
CPUs
• 41 CPUs
• 244 Physical cores
• 488 Hyper-threaded Logical cores
GPUs
• Includes Tesla K20 & K40, Titan Black, GTX770, GTX645,…
• 35 GPUs
• 36,288 CUDA cores
![Page 21: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/21.jpg)
Cloud Operational Management Tools (OMT)
21
P1 Storage
P2 Dev Svr
P3 HPC Cloud #1 Productive
P3 Cloud Svr P3 Cloud Svr P3 Cloud Svr P3 IT Cloud
P2 Dev Svr
P1 Data Svr
Demo Svr Cloud Master Primary
HPC Cloud Master
Cloud Master Secondary
Storage Cluster
P3 HPC Cloud #2 CNN Training
Network Infra
Visualization Portal
IT Cloud Master
21
![Page 22: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/22.jpg)
ACCELERATING TRAINING TIME
Deep Learning
22
![Page 23: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/23.jpg)
Machine Learning
23
Google Brain: 1,000 Servers
16,000 CPU-cores
Model: 1billion Connections Dataset: 10million Images
Learning Time: 3 days
NVIDIA GPUs: 3 Servers 12 GPUs
18,432 CUDA-cores
Cost: 100x less
![Page 24: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/24.jpg)
Deep Learning Model (1x K20)
• Date Set: • Model Scale:
– No. of Fully Connected Layers: 3 – No. of Convolutional Layers: 5 – No. of Connections: 60 million – Size: 800+ MB
• Recognition Accuracy – 15.7% top-5 error rate
Dataset Name #Images #Category Input Resolution
ILSVRC-2012 1.2 Million 1,000 224*224
* ILSVRC = ImageNet Large Scale Visual Recognition Challenge
![Page 25: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/25.jpg)
Speedup Result of Training same Model
0
2
4
6
8
1 2 4 8
Spe
ed
up
Rat
io
GPU Array Size
(x, 128)
(x, x)
Top-1 Recognition Accuracy (on par with competition winner)
Date Set: ILSVRC-2012
5x Speedup
GPUs Batch Size Top-1 Error
1 (128, 128) 42.23%
2 (256 ,256) 42.63%
2 (256 ,128) 42.27%
4 (512 ,512) 43.58%
4 (512 ,128) 44.4%
8 (1024 ,1024) 43.28%
8 (1024 ,128) 42.86%
![Page 26: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/26.jpg)
Training Time vs Scale of Model
0
20
40
60
80
100
120
15 30 60 120
Trai
nin
g Ti
me
(H
r)
Parameter Scale (Million)
1 GPU
2 GPU
Date Set: ILSVRC-2012
![Page 27: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/27.jpg)
TRAINING PLATFORM REFERENCE ARCHITECTURES
Deep Learning
![Page 28: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/28.jpg)
GPU-GPU Communication Latency (1) Without GDR P2P
• GPU to GPU DMA latency:
– 2574.33 us (2MB DMA size)
(2) With GDR P2P • GPU to GPU DMA latency
– 524.28 us (2MB DMA size)
Summary – RDMA enables 4.9x Speed up!
– Cross-IOH DMA charges extra latency (60 – 70 ns)
– Cross-IOH DMA is not eligible to use GDR, latency > Without GDR P2P
(4) With GDR RDMA • GPU to GPU DMA latency
– 600+ us (2MB DMA size)
* Based on the measurement released by GE
(3) With QPI • GPU to GPU via QPI
– > 2574 us (2MB DMA size)
Server w/ 4x PCIe 3.0 x16
Tesla-GPU #1
Tesla-GPU #3
CPU 0
CPU 1
IB Card
InfiniBand Network
(FDR)
IB Card
Tesla-GPU #2
Tesla-GPU #0
PCIe x8
PCIe x8
PCIe x16
PCIe x16
PCIe x16
PCIe x16
QPI
CPU Memory
CPU Memory
![Page 29: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/29.jpg)
PCIe Layout & GPU-GPU RDMA Server w/ 4x PCIe 3.0 x16
Tesla-GPU #1
Tesla-GPU #3
CPU 0
CPU 1
IB Card
InfiniBand Network
(FDR)
IB Card
Tesla-GPU #2
Tesla-GPU #0
Server w/ 4x PCIe 3.0 x16
Tesla-GPU #5
Tesla-GPU #7
CPU 0
CPU 1
IB Card
IB Card
Tesla-GPU #6
Tesla-GPU #4
PCIe x8
PCIe x8
PCIe x16
PCIe x16
PCIe x16
PCIe x16
PCIe x8
PCIe x8
PCIe x16
PCIe x16
PCIe x16
PCIe x16
• Elementary Communication Models – Same Root Complex
(eg. GPU-0 to GPU-1)
– Same Server, Different Root Complex (eg. GPU-0 to GPU-2)
– Different Server (eg. GPU-0 to GPU-4)
QPI
QPI
![Page 30: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/30.jpg)
GPU Cluster with Commodity PC (for Development)
• Each node is High-end Commodity PC
• Nodes are interconnected via GbE network
• GPU communication using MPI
GbE Switch
1x Intel i7 CPU 2x nVidia GeForce GPU
![Page 31: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/31.jpg)
GPU Cluster with HPC Workstation
• Each node is High end Workstation
• Nodes are interconnected via IB network
• GPU communication using GPUDirect
2x Intel Xeon CPU 4x nVidia Tesla GPU
QDR Switch
![Page 32: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/32.jpg)
GPU Cluster with HPC Server
• 4U Compute Node
• Nodes interconnected via multi-home IB network
• GPU communication using GPUDirect
2x Intel Xeon CPU 8x nVidia Tesla GPU
FDR Switch
![Page 33: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/33.jpg)
Future Plans
• GPU Cluster as Deep Learning Training Platform – Various Inter-Connect Speed (eg. QDR vs FDR) – Various Inter-Connect Topology (eg. with & without
redundancy) – Various GPU Processor (eg. K20 vs K40) – Various GPU Density (#GPUs per server)
• GPU-accelerated IaaS
• Deep Learning Training as a Service in GPU-aware Cloud
![Page 34: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/34.jpg)
Industry Collaboration Models • Research Programmes (Research Collaboration Agreements)
– Covers >1 Joint Research Projects
– Assignment of organisation’s research staff to work with ROSE researchers
• Can include Industrial Post-Graduate Programme (IPP) PhD students
• Technology Evaluation/Adoption Projects (Option Agreements) – Focus on evaluation of ROSE technologies, leading to licensing of the
technology, OR
– Focus on usage of Structured Object Database
• Affiliate Programme (Affiliate Agreements) – Newsletters, Briefings & Technology Demos for subscribers
34
![Page 35: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/35.jpg)
Thank You
rose.ntu.edu.sg/index.html
35
![Page 36: Architectures for Scalable Media Object Search - Nvidia behaviour analysis Object Search Next- ... shoes, bags, toys ... Branded Bag Recognition](https://reader031.fdocuments.in/reader031/viewer/2022022516/5b022a087f8b9a952f8f7e16/html5/thumbnails/36.jpg)
Specs of Some Inter-connects
Version Frequency Line Code
Single-Duplex per
lane Bandwidth
Full-Duplex per
lane Bandwidth
Single-Duplex Max
lanes Bandwidth
(GB/s)
Full-Duplex Max lanes Bandwidth
(GB/s)
Original Transfer
Rate
Small Message Minimum
Interconnect Latency
(< 64 Bytes)
Large Message Minimum
Interconnect Latency
@ 4194304 Bytes
QPI
[email protected] 4.8GT/s 9.6 19.2 60 – 75 ns
[email protected] 5.86GT/s 11.72 23.44 60 – 75 ns
[email protected] 6.4GT/s 12.8 25.6 60 – 75 ns
[email protected] 7.2GT/s 14.4 28.8 60 – 75 ns
[email protected] 8GT/s 16 32 60 – 75 ns
PCI-E
1.0 2.5GT/s 8b/10b 250MB/s 0.5GB/s 4 8 2.5GT/s
2.0 5GT/s 8b/10b 500MB/s 1GB/s 8 16 5.0GT/s 1.3 us 1251 us
3.0 8GT/s 128b/130b 1GB/s 2GB/s 16 32 8.0GT/s 0.79 us 1072 us
*4.0 16GT/s 128b/130b 2GB/s 4GB/s 32 64 16.0GT/s
IB QDR 5 10
FDR 7 14