Introduction to Big Data
-
Upload
wim-van-leuven -
Category
Data & Analytics
-
view
20 -
download
0
Transcript of Introduction to Big Data
big data So What?
12 October 20161
Who am I?• Software guy
• Technology leader with experience in software development as CTOs and development managers of mid-sized teams.
• Doing big data hands-on since 2009
• Running http://meetup.com/bigdatabe since 2011 (1700 members!)
2
@wimvanleuven [email protected]
3
4
“Big data is data that exceeds the processing capacity of conventional database systems.
The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures.”
5
–Edd Dumbill, O’Reilly
What is big data?
http://radar.oreilly.com/2012/01/what-is-big-data.html
…too big…6
IOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIOIOIOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIOIOIOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIOIOIOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIOIOIOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIOIOIOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIOIOIOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIO
… moves to fast …7
8
… doesn’t fit …9
What is Big Data not?• not a delivery model (on-premise vs hosted vs
cloud vs IaaS/PaaS/SaaS vs serverless)
• not a deployment model (private, public, hybrid)
• not a revenue model (license vs subscription vs Pay-as-you-Go)
• not software architecture
10
“We don’t do Hadoop because we have Big Data; we do Big Data because we have
Hadoop.”
11
–Unknown developer, Facebook
What is Big Data? — revisited
New tools and technologies to capture and process data on a cluster of commodity
hardware so that the system acts as one, is resilient to failures and scales linearly.
12
What is Big Data? — revisited
Big Data is no panacea13
• First decide what problem you want to solve; pick a real business problem to add immediate value
• Start small, the technology is made for linear scalability (a 3-node cluster is a cluster!)
• Then become lean: learn through experimentation
Big Data challenges• Beware of hype, Big Data - washing and fad
• Tech infancy
• IT | Biz
• Data is hard
• Lack of skills!
14
Benefits
• Scalability of course
• Collect more and more data
• Robustness inherent to the setup
• More predictable performance
15
16
Questions?
17
Co-existence
BigData
View
ESB
App
ETL
DFS18
1
2
3
4
5
2
4
5
1
2
5
1
3
4
2
3
5
1
3
4
Node A Node B Node C Node D Node E
MapReduce19
4
5
3
2
1
Node A
Node B
Node C
Node D
Node E
Map Shuffle Reduce
x y z
𝛌20
𝛋21
3
1
2
45
22
Q&A