Hadoop on Azure 101 What is the Big Deal?
description
Transcript of Hadoop on Azure 101 What is the Big Deal?
![Page 1: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/1.jpg)
Hadoop on Azure 101 What is the Big Deal?Dennis MulderSolution Architect – Global Windows Azure Center of ExcellenceMicrosoft Corporation
![Page 2: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/2.jpg)
Agenda
Why Big Data?Understanding the
BasicsMicrosoft and Hadoop
![Page 3: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/3.jpg)
Why Big Data?
![Page 4: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/4.jpg)
![Page 5: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/5.jpg)
![Page 6: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/6.jpg)
![Page 7: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/7.jpg)
Example Scenarios
![Page 8: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/8.jpg)
The Potential: Solving Specific Industry ProblemseCommerce: mining web logs: collaborative filtering, user experience optimisation…Manufacturing: detecting trends and anomalies in sensor data: predicting and understanding faultsCapital Markets: joining market and external data: correlation detection for investment strategy identification, risk calculations…Retail Banking: historical transaction mining: fraud detection, customer segmentation…
Industry-specific data-sets leveraged to improve decision making and generate new revenue streams
![Page 9: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/9.jpg)
OPERATIONAL DATA
Traditional E-Commerce Data Flow
NEW USER REGISTRY
NEW PURCHASE
NEW PRODUCT
Excess Data
LogsETL Some Data
Data Warehouse
![Page 10: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/10.jpg)
OPERATIONAL DATA
New E-Commerce Big Data Flow
Raw Data“Store it All” Cluster
Raw Data“Store it All” Cluster
NEW USER REGISTRY
NEW PURCHASE
NEW PRODUCT
Data Warehouse
Logs
Logs
How much do views for certain products increase when our TV ads run?
![Page 11: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/11.jpg)
Understanding the Basics Move the Compute to the Data
![Page 12: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/12.jpg)
FIRST, STORE THE DATA
Server
ServerServer
So How Does It Work?
Files
Server
![Page 13: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/13.jpg)
SECOND, TAKE THE PROCESSING TO THE DATA
So How Does It Work?
// Map Reduce function in JavaScriptvar map = function (key, value, context) {var words = value.split(/[^a-zA-Z]/);for (var i = 0; i < words.length; i++) {
if (words[i] !== "")context.write(words[i].toLowerCase(),1);}}};var reduce = function (key, values, context) {var sum = 0;while (values.hasNext()) {sum += parseInt(values.next());
}context.write(key, sum);};
ServerServer
ServerServer
RUNTIMECode
![Page 14: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/14.jpg)
MapReduce – Workflow
![Page 15: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/15.jpg)
18Map tasks
53705 $65
53705 $30 53705 $15
54235 $75 54235 $22
02115 $15 02115 $15
44313 $10 44313 $25
44313 $55
5 53705 $15 6 44313 $10
5 53705 $65 0 54235 $22
9 02115 $15 6 44313 $25
3 10025 $95 8 44313 $55
2 53705 $30 1 02115 $15
4 54235 $75 7 10025 $60
Mapper
Mapper
4 54235 $75 7 10025 $60
2 53705 $30 1 02115 $15
10025 $60
5 53705 $65 0 54235 $22
5 53705 $15 6 44313 $10
3 10025 $95 8 44313 $55
9 02115 $15 6 44313 $25
10025 $95
Scenario: Get sum sales grouped by zipCodeDa
taNo
de3
Data
Node
2Da
taNo
de1
Blocks of the Sales file in HDFS
GroupBy
GroupBy
(custId, zipCode, amount)
One output bucket per reduce task
Map
![Page 16: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/16.jpg)
Reducer
Reducer
Reduce tasks
Reducer
53705 $65
54235 $75 54235 $22
10025 $95 44313 $55
10025 $60
Map
per
53705 $30 53705 $15
02115 $15 02115 $15
44313 $10 44313 $25
Map
per
53705 $65
53705 $30
53705 $15
44313 $10 44313 $25
10025 $95 44313 $55
10025 $60
54235 $75 54235 $22
02115 $15 02115 $15
Sort
Sort
Sort
53705 $65
53705 $30
53705 $15
44313 $10 44313 $25 44313 $55
10025 $95 10025 $60
54235 $75 54235 $22
02115 $15 02115 $15
SUM
SUM
SUM
10025 $155 44313 $90
53705 $110
54235 $97
02115 $30
Done!Sh
uffle
Reduce
![Page 17: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/17.jpg)
Hadoop
![Page 18: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/18.jpg)
Hadoop Architecture
![Page 19: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/19.jpg)
Traditional RDBMS vs. MapReduce
TRADITIONAL RDBMS MAPREDUCEData Size Gigabytes (Terabytes) Petabytes (Hexabytes)
Access Interactive and Batch Batch
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
DBA Ratio 1:40 1:3000
Reference: Tom White’s Hadoop: The Definitive Guide
![Page 20: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/20.jpg)
The Hadoop EcosystemETL Tools BI Reporting RDBMS
Reference: Tom White’s Hadoop: The Definitive Guide
![Page 21: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/21.jpg)
Microsoft and Hadoop
![Page 22: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/22.jpg)
Hadoop on Azure Azure Blob
Storage
Name Node
Data Node
Data Node
Data Node
Data Node
S3
HDFS
On Premise Enterprise Content• Transactional DBs• On Prem logs• Internal sensors
Cloud Enterprise Content• Generated in Azure
3rd Party Content• Azure Datamarket
• Generated/stored elsewhere
• Public content• Delivered online
Azure Blob
Storage
SQL Azure
Application end point
What does Hadoop in the Cloud mean?
Where is HDFS?Where is my data stored?Azure Blob Storage vs. HDFS
![Page 23: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/23.jpg)
Detailed OfferingsHive ODBC Driver & Hive Add-in for ExcelIntegration with Microsoft PowerPivot
Hadoop based distribution for Windows Server & AzureStrategic Partnership with Hortonworks
JavaScript framework for HadoopRTM of Hadoop connectors for SQL Server and PDW
![Page 24: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/24.jpg)
Microsoft Big Data Solution
Power View Excel with PowerPivot Embedded BIPredictive Analytics
APPsLOBCRMERP
Microsoft EDW
SSAS SSRS
Devices CrawlersSensors Bots
Hadoop On Windows Server
Hadoop On Windows Azure
![Page 25: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/25.jpg)
Deploying and Interacting With a Hadoop Cluster on Azuredemo
![Page 26: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/26.jpg)
Hadoop on WindowsInsights to all users by activating new types of data
Integrate with Microsoft Business Intelligence
Choice of deployment on Windows Server + Windows AzureIntegrate with Windows Components (AD, Systems Center)Easy installation and configuration of Hadoop on WindowsSimplified programming with . Net & Javascript integration Integrate with SQL Server Data Warehousing
Diffe
rent
iatio
n
![Page 27: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/27.jpg)
Summary Hadoop is about massive compute and massive data The code is brought to the data Map -> Split the work Reduce -> Combine the results Relational databases vs Hadoop?
Wrong question - Serve different needs
![Page 28: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/28.jpg)
Resourceshttp://www.hadooponazure.com/
http://hadoop.apache.org/
![Page 29: Hadoop on Azure 101 What is the Big Deal?](https://reader036.fdocuments.in/reader036/viewer/2022062520/568165d2550346895dd8df68/html5/thumbnails/29.jpg)
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION
IN THIS PRESENTATION.