Big datatraining ranga_1
-
Upload
ranga-vadlamudi -
Category
Technology
-
view
103 -
download
0
description
Transcript of Big datatraining ranga_1
![Page 1: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/1.jpg)
BIG DATA TRAINING
Ranga Vadlamudi March 2014
![Page 2: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/2.jpg)
![Page 3: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/3.jpg)
What is Big Data
• Volume: Large Amounts of Data at rest
• Velocity: milliseconds to seconds to respond
• Variety: Data in many forms (Structured,
Unstructured, MulEmedia, Text etc.)
• Veracity: Data in doubt
![Page 4: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/4.jpg)
• 30 billion pieces of content a month
• 1 Peta byte of content every day
• 2 Billion videos watched everyday
• 3 Billion people will be online
• Sharing 8 zeQabytes of data
![Page 5: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/5.jpg)
![Page 6: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/6.jpg)
CAP THEOREM (Consistency, Availability, ParEEon)
![Page 7: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/7.jpg)
Big Data SoluEons
Big Data
Real Time Querying
Batch Querying
Mining & AnalyEcs
Machine Learning
Storage
![Page 8: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/8.jpg)
Technology
![Page 9: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/9.jpg)
Background • Underlying Technology invented by Google • Google Big-‐Table & Google File System • Doug Cu\ng created NUTCH and Hadoop was spun off at Yahoo
• Yahoo played a key role in developing Hadoop for enterprise applicaEons
![Page 10: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/10.jpg)
Hadoop • Is a framework • Built on commodity hardware • Implements computaEonal paradigm called Map-‐Reduce
• Provides a distributed file system called HDFS to store data
• Node failures are automaEcally handled
![Page 11: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/11.jpg)
Data Becomes BoQleneck
• Ge\ng data to processors is expensive • Typical disk data transfer rate 75MB/sec • 100GB data transfer : 22mins approx. • New approach is needed
![Page 12: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/12.jpg)
Hadoop Solves • Problems where you have lot of data • Mixture of complex and structured data • Speeds up computaEons by distribuEon • Mantra is take computaEon to the data, don’t bring data to computaEon
![Page 13: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/13.jpg)
Hadoop DistribuEons
![Page 14: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/14.jpg)
Hadoop Architecture • Master Slave philosophy • Designed to run on large number of machines • Machines don’t share memory or disk
• Rack them up and run Hadoop on each machine
![Page 15: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/15.jpg)
Hadoop Architecture • Data is divided and spread across servers • Hadoop keeps track of where the data is • Hadoop replicates data to mulEple copies to avoid single point of failure
• MapReduce is a programming model to process large sets of data in parallel
• Map the operaEon out to all servers • Shuffle the results • Reduce the results back into one result set
![Page 16: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/16.jpg)
Hadoop Components
![Page 17: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/17.jpg)
HDFS (Hadoop File System
![Page 18: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/18.jpg)
HDFS • Distributed file system • Highly fault tolerant • HDFS instance can span across many servers • Has large datasets into terabytes to petabytes • Moving computaEon is cheaper than moving data
• Large block sizes (128MB for example)
![Page 19: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/19.jpg)
![Page 20: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/20.jpg)
HDFS Layout
![Page 21: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/21.jpg)
Cloudera Manager
• Management sogware to manage Hadoop ecosystem
• Helps install, manage and maintain a cluster • Resource consumpEon tracking • ProacEve health checks • AlerEng • Config changes
![Page 22: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/22.jpg)
Cloudera CapabiliEes
![Page 23: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/23.jpg)
Demo Cloudera Demo Cassandra Demo Mongo DB
![Page 24: Big datatraining ranga_1](https://reader034.fdocuments.in/reader034/viewer/2022051819/54c6baad4a7959a6418b45f6/html5/thumbnails/24.jpg)
QuesEons?