Data! Data! Data! I Can't Make Bricks Without Clay!

13
“Data! Data! Data! I Can’t Make Bricks Without Clay!”* Shai Fine Principal Engineer, Advanced Analytics, Intel (*) Sherlock Holmes, The Adventure in the Copper Beeches

Transcript of Data! Data! Data! I Can't Make Bricks Without Clay!

“Data! Data! Data! I Can’t Make Bricks Without

Clay!”*Shai Fine

Principal Engineer, Advanced Analytics, Intel

(*) Sherlock Holmes, The Adventure in the Copper Beeches

Big Data, Only a Few Years Back …

Executives Believe in Advanced Analytics

Analytics to the Rescue

• “Without big data analytics, companies are blind and deaf, wandering out onto the web like deer on a freeway”• Geoffrey Moore, Author of Crossing the Chasm

• … and who will lead the way?!

Big Data's High-Priests of AlgorithmsThe Wall Street Journal, Aug. 2014

Adoption of Analytics Faces Hurdles

• Developing Analytics solutions • Far from being an engineering process• There is a chasm to cross between “traditional” BI and Advanced Analytics

• Consumability of Analytics • Deploying Analytics solutions is difficult• Reliability, “Self Maintenance”

• Analytics Workloads are Challenging• Speed (latency, time-to-solution), Throughput, Scalability, …

The ML Building Blocks Concept

There are “infinite” number of algorithms and datasets

But there are finite set of Building Blocks

Building Blocks:A finite set of elements that can be mapped into HW and SW primitives and patterns

Building Blocks

UsagesHigh-level Libraries

Low-level Libraries

Hardware Platforms

Xeon

Xeon Phi

Xeon FPGA

Iris Pro Graphics

Xeon Accel.

New ISA

Tier-1

Cloud

HPC

Enterprise

Academia

Machine Learning Building Blocks

• ML basic building blocks1. Linear Algebra2. Measures3. Special Functions4. Mathematical Optimization5. Data Characteristics6. Data-dependent Compute7. Memory Access 8. Very large models9. Hybrid Methods

• ML Meta building blocks1. Learning Protocols2. Learning Phases3. Algorithmic Flow and Structure

Compute

Data

Compute - Data Interplay

Process

Towards a Comprehensive ML Workload Suite

• Workload design should cover elements of• Compute

• Data Characteristics

• Data – Compute interplay

• Each workload includes• Multiple data sets x Multiple algorithms

• Coverage of relevant data characteristics

• Coverage of compute patterns

The Building Block concept provides a mean for designing the ML Workload Suite

Machine Learning Workloads Suite

Workload Linear Algebra

Measure Calc.

Special Funcs

Math Optim.

Data Characteristics

Data-dep. Compute

Mem.Access

large model

Linear AlgebraSparseDense

X X XUn/Supervised,

Numeric

Data Dependency

X X XUn/Supervised,

Num/CatX X

Large Models X X XUn/Supervised,

NumericX

Workload Dataset Type Characteristics

Linear AlgebraClustered Dense, Numeric

Graphs Sparse, Numeric

Data Dependency

Bio informatics High Dep - Dense/Sparse Clustered Dense Text High Dep – Sparse Manufacturing High Dep – Numeric, Dense

Large Models Images Dense, Numeric

ALGORITHMS

DATASETS

Machine Learning Workloads Suite

Workload Linear Algebra

Measure Calc.

Special Funcs

Math Optim.

Data Characteristics

Data-dep. Compute

Mem.Access

large model

Linear AlgebraSparseDense

X X XUn/Supervised,

Numeric

Data Dependency

X X XUn/Supervised,

Num/CatX X

Large Models X X XUn/Supervised,

NumericX

Workload Dataset Type Characteristics

Linear AlgebraClustered Dense, Numeric

Graphs Sparse, Numeric

Data Dependency

Bio informatics High Dep - Dense/Sparse Clustered Dense Text High Dep – Sparse Manufacturing High Dep – Numeric, Dense

Large Models Images Dense, Numeric

ALGORITHMS

DATASETS

ML Bench 1.0

• Algorithm X Data

• Reference Models

• Data Generator

The “Dwarfs” Connection

• Phill Collela’s “Seven Dwarfs” (2004) –• Patterns of computation and communication

that are important for science and engineering

• Berkley’s view (2006) –• Extended to 13 Dwarfs after examining

the original 7 Dwarfs outside the HPC scope

• US National Research Council’s Committee “Frontiers in Massive Data Analysis” (2013) –• Chapter 10: “The Seven Computational Giants of Massive Data Analysis”

• The ML Building Blocks provide a further extension and a different perspective• Introducing data characteristics and the interplay with compute, communication, memory

“Data! Data! Data! I Can’t Make Bricks Without Clay!”

Thank You