Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW...

29
Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

Transcript of Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW...

Page 1: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

Training Kinect

Mihai BudiuMicrosoft Research, Silicon Valley

UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

Page 2: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

2

Label body parts in depth map

Parallelizing the Training of the Kinect Body Parts Labeling AlgorithmMihai Budiu, Jamie Shotton, Derek G. Murray, and Mark FinocchioBig Learning: Algorithms, Systems and Tools for Learning at Scale, Sierra Nevada, Spain, December 16-17, 2011

Page 3: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

3

Solution: Learn from Data

Classifier

Training examplesMachine learning

Page 4: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

4

Big data

• 1M Training examples• 300,000 pixels/image• 100,000 features• <220 tree nodes/tree• 31 body parts• 3 trees

Dryad

DryadLINQ

Decision forest inference

Classifier

Page 5: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

Execution

Application

Data-Parallel Computation

5

Storage

Language

ParallelDatabases

Map-Reduce

GFSBigTable

CosmosAzureHPC

Dryad

DryadLINQSawzall,FlumeJava

Hadoop

HDFSS3

Pig, HiveSQL ≈SQL LINQSawzall, Java

Page 6: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

6

Dryad = 2-D Piping• Unix Pipes: 1-D

grep | sed | sort | awk | perl

• Dryad: 2-D grep1000 | sed500 | sort1000 | awk500 | perl50

Page 7: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

7

Virtualized 2-D Pipelines

Page 8: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

8

Virtualized 2-D Pipelines

Page 9: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

9

Virtualized 2-D Pipelines

Page 10: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

10

Virtualized 2-D Pipelines

Page 11: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

11

Virtualized 2-D Pipelines• 2D DAG• multi-machine• virtualized

Page 12: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

12

Fault Tolerance

Page 13: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

13

LINQ

Dryad

=> DryadLINQ

Page 14: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

14

LINQ = .Net+ Queries

Collection<T> collection;bool IsLegal(Key);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

Page 15: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

15

DryadLINQ Data Model

Partition

Collection

.Net objects

Page 16: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

16

Collection<T> collection;bool IsLegal(Key k);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

DryadLINQ = LINQ + Dryad

C#

collection

results

C# C# C#

Vertexcode

Queryplan(Dryad job)Data

Page 17: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

17

Kinect Training Pipeline

20x

Page 18: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

18

Partial tree ImagesFeatures

split

New partial tree

Query plan for one tree layer

Parallelize on:• Features• Images• Tree nodes

Page 19: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

19

High cluster utilization

Time

Mac

hine

Page 20: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

20

CONCLUSIONS

Page 21: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

21

Huge Commercial Success

Page 22: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

22

Tremendous Interest from Developers

Page 23: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

23

Consumer Technologies Push The Envelope

Price: 6000$

Price: 150$

Page 24: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

24

Unique Opportunity for Technology Transfer

Page 25: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

25

I can finally explain to my sonwhat I do for a living…

Page 26: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

26

BACKUP SLIDES

Page 27: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

27

10 100 1000 10000 100000 10000000

0.05

0.1

0.158 core machine1000 core cluster

Number of training images (log scale)

core

* h

ours

/ im

age

Training efficiency

Page 28: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

28

Cluster usage for one tree

Time (s)

Machine(235)

Prep

roce

ss

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 (f

aile

d)

19

18.3 hours, 137.2 CPU days, 107421 processes, 29.56 TB data, average parallelism=140

1440

0 pr

oces

ses

Nor

mal

izeTr

ee

Page 29: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

29

DryadLINQ Language Summary

WhereSelectGroupByOrderByAggregateJoin