Introduction to Big Data

22
big data So What? 12 October 2016 1

Transcript of Introduction to Big Data

Page 1: Introduction to Big Data

big data So What?

12 October 20161

Page 2: Introduction to Big Data

Who am I?• Software guy

• Technology leader with experience in software development as CTOs and development managers of mid-sized teams.

• Doing big data hands-on since 2009

• Running http://meetup.com/bigdatabe since 2011 (1700 members!)

2

@wimvanleuven [email protected]

Page 3: Introduction to Big Data

3

Page 4: Introduction to Big Data

4

Page 5: Introduction to Big Data

“Big data is data that exceeds the processing capacity of conventional database systems.

The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures.”

5

–Edd Dumbill, O’Reilly

What is big data?

http://radar.oreilly.com/2012/01/what-is-big-data.html

Page 6: Introduction to Big Data

…too big…6

IOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIOIOIOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIOIOIOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIOIOIOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIOIOIOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIOIOIOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIOIOIOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIO

Page 7: Introduction to Big Data

… moves to fast …7

Page 8: Introduction to Big Data

8

Page 9: Introduction to Big Data

… doesn’t fit …9

Page 10: Introduction to Big Data

What is Big Data not?• not a delivery model (on-premise vs hosted vs

cloud vs IaaS/PaaS/SaaS vs serverless)

• not a deployment model (private, public, hybrid)

• not a revenue model (license vs subscription vs Pay-as-you-Go)

• not software architecture

10

Page 11: Introduction to Big Data

“We don’t do Hadoop because we have Big Data; we do Big Data because we have

Hadoop.”

11

–Unknown developer, Facebook

What is Big Data? — revisited

Page 12: Introduction to Big Data

New tools and technologies to capture and process data on a cluster of commodity

hardware so that the system acts as one, is resilient to failures and scales linearly.

12

What is Big Data? — revisited

Page 13: Introduction to Big Data

Big Data is no panacea13

• First decide what problem you want to solve; pick a real business problem to add immediate value

• Start small, the technology is made for linear scalability (a 3-node cluster is a cluster!)

• Then become lean: learn through experimentation

Page 14: Introduction to Big Data

Big Data challenges• Beware of hype, Big Data - washing and fad

• Tech infancy

• IT | Biz

• Data is hard

• Lack of skills!

14

Page 15: Introduction to Big Data

Benefits

• Scalability of course

• Collect more and more data

• Robustness inherent to the setup

• More predictable performance

15

Page 16: Introduction to Big Data

16

Questions?

Page 17: Introduction to Big Data

17

Co-existence

BigData

View

ESB

App

ETL

Page 18: Introduction to Big Data

DFS18

1

2

3

4

5

2

4

5

1

2

5

1

3

4

2

3

5

1

3

4

Node A Node B Node C Node D Node E

Page 19: Introduction to Big Data

MapReduce19

4

5

3

2

1

Node A

Node B

Node C

Node D

Node E

Map Shuffle Reduce

x y z

Page 20: Introduction to Big Data

𝛌20

Page 21: Introduction to Big Data

𝛋21

3

1

2

45

Page 22: Introduction to Big Data

22

Q&A