Practical Tips On Handling Big Data

34
Dr. Brian J. Spiering Practical Tips On Handling Big Data

Transcript of Practical Tips On Handling Big Data

Page 1: Practical Tips On Handling Big Data

Dr. Brian J. Spiering

Practical Tips On Handling Big Data

Page 2: Practical Tips On Handling Big Data

hi, brian.Data Science Faculty @GalvanizeU@BrianSpiering

Page 3: Practical Tips On Handling Big Data

RoadmapDefining “Big Data” (aka, you probably don’t have Big Data)

How to avoid Big Data (and associated problems)

Okay, I really have Big Data. What should I do?

1

2

3

Page 4: Practical Tips On Handling Big Data

Defining “Big Data” (aka, you probably don’t have Big Data)

1

Page 5: Practical Tips On Handling Big Data

What is Big Data?“Data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable amounts of time.”

Page 6: Practical Tips On Handling Big Data

What is Big Data?“Data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable amounts of time.”

Data that doesn’t find on a single machine.

Page 7: Practical Tips On Handling Big Data

What is not Big Data?

Page 8: Practical Tips On Handling Big Data

How to avoid Big Data (and associated problems)

2

Page 9: Practical Tips On Handling Big Data

Handling Medium Data

Page 10: Practical Tips On Handling Big Data
Page 11: Practical Tips On Handling Big Data
Page 12: Practical Tips On Handling Big Data
Page 13: Practical Tips On Handling Big Data
Page 15: Practical Tips On Handling Big Data

Big Data Gotcha!

Page 16: Practical Tips On Handling Big Data

Scaling Out

1. Single Local Machine < 10s GB*2. Single Cloud Machine < 2 TB*3. Cloud Cluster of Machines > 2 TB*

* Summer 2016

Page 17: Practical Tips On Handling Big Data
Page 18: Practical Tips On Handling Big Data

Matrix Multiplication

Page 19: Practical Tips On Handling Big Data

Matrix Multiplication: Imperative Implementation

Page 20: Practical Tips On Handling Big Data

Matrix Multiplication: Functional Implementation

Page 21: Practical Tips On Handling Big Data

Matrix Multiplication

Page 22: Practical Tips On Handling Big Data
Page 23: Practical Tips On Handling Big Data

Head, Torso, Tail: Separate models (and hardware)

Page 24: Practical Tips On Handling Big Data

Okay, I really have Big Data. What should I do?

3

Page 25: Practical Tips On Handling Big Data

“But my data is more than 5TB! (and I need it in memory)”

Page 26: Practical Tips On Handling Big Data

“But my data is more than 5TB! (and I need it in memory)”

Your life sucks now… You are stuck with

distributed computing

Page 27: Practical Tips On Handling Big Data
Page 28: Practical Tips On Handling Big Data
Page 29: Practical Tips On Handling Big Data

map reduce

Big Data is functional

Page 30: Practical Tips On Handling Big Data
Page 32: Practical Tips On Handling Big Data

Where have we been?Defining “Big Data” (aka, you probably don’t have Big Data)

How to avoid Big Data (and associated problems)

Okay, I really have Big Data. What should I do?

1

2

3

Page 33: Practical Tips On Handling Big Data

Thank You!Questions?