Hadoop. An introduction for SQL Server DBAs.
Product Manager exploring Big Data
Red Gate Ventures
@andrewdenty
Andrew Denty
What is Hadoop? 1 Why you should care 2 How to get started 3
What we’re not going to talk about.
• Replacing your existing servers with hadoop • How Hadoop compares to other databases • How to write Map Reduce or Java
Who has used Hadoop? ?
What is Hadoop?
• Open source Apache project • Written in Java • Distributed system: – Shares large workloads – Commodity servers – Scales effectively
Map Reduce
(Java based distributed programming model)
YARN (Yet another resource
negotiator)
HDFS
(Hadoop Distributed File System)
Storage Compute
JBOD It’s just bytes 0II0I0I0I
Scalable Fault tolerant
Why should you care?
• Never again throw away any data! • Once you’ve kept EVERYTHING you can
then derive some insights from all of that data.
http://priceonomics.com/why-ups-trucks-dont-turn-left/
Salary
The things you can’t do with SQL Server
• Distributed processing • Generating insight from vast quantities of
structured and unstructured data.
The Hadoop Journey
Sandbox 2-3 node cluster
Something in production
How to get started now:
• Download & Install a sandbox: – Hortonworks Sandbox - http://bit.ly/1gkkCte – Cloudera QuickStart VM - http://bit.ly/19eOwR3 – Map R Sandbox - http://bit.ly/TWZynR
• Fire it up, import some data with HDFS Explorer - http://bit.ly/1ivuSz5
• Create a table • Run a query…
To sum up…
• Hadoop is a distributed data storage and computation engine
• Hadoop enables you to do things which were impossible with SQL Server… (and get paid more!)
• Get started by downloading a Sandbox – it’s easy!
Product Manager exploring big data
Red Gate Ventures
@andrewdenty
Andrew Denty
Top Related