Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools:...

30
Rainer Sternfeld, CE September 2014 Big Data May, 2015 André Karpištšenko Planet OS Advisor, Co-founder

Transcript of Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools:...

Page 1: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

Rainer Sternfeld, CE September 2014

Big Data

May, 2015

André Karpištšenko Planet OS Advisor, Co-founder

Page 2: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

2 May 2015

Planet OS Presence

Tallinn

Rio de Janeiro

Washington DC

HoustonLos Angeles

Sunnyvale HQ

MontrealTartu

Page 3: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

3 May 2015

Sensor Data Discovery EngineOcean Data Management

From a small buoy to Big Data

Data BuoysMarket: $2 billion

Competitors: 100+ producers Scalability: poor to limited

2008 2012 2014

Market: $5 billion Competitors: 25+

Scalability: good but slow

Market: $100+ billion Competitors: 15+

Scalability: very scalable and fast

Page 4: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

4 May 2015

Big Data

VARIETY VELOCITYVOLUME

Page 5: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

5 May 2015

Big Data: Volume

Fits into memory of one large server (up to 1TB)

Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc

10s of Terabytes, Petabytes

10+ computing nodes

Tools: Hadoop, Spark, etc

Small Data Big Data

Page 6: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

6 May 2015

Variety of Devices & Data Types (Oceanic)

OBSERVER SIGHTINGS

ACOUSTIC RECORDINGS

AERIAL MONITORING

WAVE GLIDERS

IMAGERYANALYSIS

BUOYS & FLOATS

SATELLITETAGGING

ACOUSTIC MODELS

ACOUSTIC DETECTIONS

VESSELAIS DATA

Page 7: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

7 May 2015

Variety of Formats & Locations

COMPRESSEDTAR / ZIP

ONLINEREPOSITORIES

NETWORK STORAGE

GIS GDB / SHP

FILE SHARING PLATFORMS

SCIENTIFIC HDF / NC

DOCUMENTS DOC / PDF

OFFLINEARCHIVES

TABULAR XLS / CSV

LOCALHARD DRIVES

Page 8: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

8 May 2015

VARIETY VERACITY

The 5 V’s of Big Data

VELOCITYVOLUME VALUE

Page 9: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

9 May 2015

Current Market

Connecting Devices Higher velocities Larger volumes Wider varieties

Future MarketAutomated Industries Real-time Decisions Data & Insight Markets

VERACITY

VELOCITY

Trends in industrial machine data

Page 10: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

10 May 2015

Page 11: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

11 May 2015

Page 12: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

12 May 2015

Data

Time

Trends in Sensor Data

“By 2020, 40% of all data ever collected by human kind will be generated by sensors.”

Hewlett Packard:

Page 13: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

13 May 2015

Page 14: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

14 May 2015

An exabyte a day Compressed to 10 petabytes

Page 15: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

15 May 2015

Map of all devices on the Internet

August 2, 2014

Page 16: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

16 May 2015

Robotic ocean-borne sensor platforms

increase productivity

LIQUID ROBOTICS WAVE GLIDER

Page 17: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

17 May 2015

Example: Improved Ocean Operations. Avoiding ship collisions with the North Atlantic Right Whales

72% 98%

Improved rate of whale detection model

Page 18: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

18 March 2015

R/V Jean Charcot

Interactive reporting

Bravante Helping to deliver offshore data reports 80% faster

Page 19: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

19 May 2015

Prediction models are applied to local sensors and are domain-specific

Page 20: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

20 May 2015

Unmanned vehicles are estimated to grow 10x in 10 years

Image Credit: Northrop Grumman

Page 21: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

21 May 2015

Satellites are getting smaller and cheaper.

150 launched since 2011 (3x of the market estimate)

SPIRE, A SAN FRANCISCO STARTUP BUILDING NON-IMAGING LOW-ORBIT NANOSATELLITES USING RF SENSORS

Page 22: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

22 May 2015

Intelligent sensors in our homes and cities

Page 23: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

23 May 2015

Sources http://www.wired.com/images_blogs/beyond_the_beyond/2012/11/ge-industrial.jpg

Image Credit: General Electric

Page 24: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

24 May 2015

History of Big Data Technologies

2003 Google File System 2004 Map Reduce 2005 Big Table

2005 Open Source Started 2011 Stable Release

Page 25: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

25 May 2015

Relational database Columnar storage

Array database Graph storage Key-value store Object storage

In-memory storage Hierarchical data format

Choose the Right Tool for the Right Job

Page 26: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

26 May 2015

Crowd Sourcing, Data Labeling

fog computing next to cloud computing

Page 27: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

27 May 2015

Page 28: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

28 May 2015

Page 29: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

29 May 2015

Google Trends, Interest Over TimeBig Data

Page 30: Big Data - ttja.ee · Big Data: Volume Fits into memory of one large server (up to 1TB) Tools: Python (NumPy, SciPy, SciKit), R, Matlab, etc 10s of Terabytes, Petabytes 10+ computing

Rainer Sternfeld, CE September 2014

[email protected]