Apache hadoop for windows server and windwos azure

31
ELASTIC MAPREDUCE FOR AZURE AND ENTERPRISE PRIVATE CLOUDS Brad Sarsfield Engineering Architect Microsoft Big Data | Haodoop March 2012 | revision 1.02 APACHE HADOOP ON AZURE AND WINDOWS MICROSOFT’S APACHE HADOOP-BASED SERVICES FOR AZURE AND ENTERPRISE

description

 

Transcript of Apache hadoop for windows server and windwos azure

Page 1: Apache hadoop for windows server and windwos azure

ELASTIC MAPREDUCE FOR AZURE AND ENTERPRISE PRIVATE CLOUDS

Brad Sarsfield Engineering ArchitectMicrosoft Big Data | HaodoopMarch 2012 | revision 1.02

APACHE HADOOPON AZURE AND WINDOWS

M I C R O S O F T ’ S A PA C H E H A D O O P- B A S E D S E R V I C E S F O R A Z U R E A N D E N T E R P R I S E

Page 2: Apache hadoop for windows server and windwos azure

ISOTOPE BRIDGES BI TO COLLABORATION TO CLOUD

“The next frontier is all about uniting the power of the cloud with the power of data to gain insights that simply weren’t possible even just a few years ago”

Ted Kummert, CVP Business PlatformsSQL PASS, October 2011

Page 3: Apache hadoop for windows server and windwos azure

BIG DATA IS HERE AND HADOOP IS CENTER STAGE

Page 4: Apache hadoop for windows server and windwos azure

ECONOMIC CONTEXT AND EXEMPLAR

140,000-190,000 more deep analytical talent positions

1.5 millionmore data savvy managersin the US alone

$300 billionPotential annual value to US healthcare

15 out of 17sectors in the US have more data stored per company than the US Library of Congress

€250 billionPotential annual value to Europe’s public sector

50-60% increase in the number of Hadoop developers

within organizations already using Hadoop within a year

Learn how large corporations are coping with the increasing flow of unstructured data by using a free software program called Hadoop

Special Report: The CEO’s Guide to Hadoop

http://www.businessweek.com/technology/special-reports/ceo-guide-to-hadoop.html

Page 5: Apache hadoop for windows server and windwos azure

THE 4Vs OF BIG DATA: VOLUME, VELOCITY, VARIABILITY, AND VARIETY

Isotope is designed to enable solution building with all key dimensions in mindDeep integration and coordination with existing Microsoft enterprise, cloud, and BI tools

Page 6: Apache hadoop for windows server and windwos azure

VIBRANT ECOSYSTEM IN ENTERPRISE AND CLOUD WITH MICROSOFT

CassandraHiveScribeHadoop

HadoopOoziePigLatin…

BackTypeHadoopPig HBaseCassandra

MR/GFSBigtableDremel…

SimpleDBDynamoEC2/EMR/S3…

Internal [ Dryad | Cosmos] and External [ Isotope | Azure | Excel | BI | SQL DW | LTH ]

Scalable machine learning and data mining [Mahout]Statistical modeling and analysis [R]Coordination and workflow [Oozie, Cascading]Data integration and transformation [SQOOP, Flume]Social network analytics and petascale graph learning [Pegasus]Real-time stream analytics and business intelligence merged with petascale computation[HStreamming]Scale-out caching and storage [Cassandra, HBase, Riak, Redis, Couchbase, S3]Cloud-oriented data warehousing, pattern discovery, and transformation [Hive, Pig]

Page 7: Apache hadoop for windows server and windwos azure

ENTER ISOTOPE

Isotope is the internal codename for Microsoft’s suite of products to support Hadoop in Windows and Azure

Page 8: Apache hadoop for windows server and windwos azure

OUR DIFFERENTIATORS FOR CLOUD AND ENTERPRISE

Self-service business intelligence at any scale on premise or cloudComplete integration of information assets from log files to collaboration artifacts to enterprise data storesFamiliar and integrated tools for analytics, insight, exploration, modeling, and strategic decision making Transparent, federated identity and security management for all big data services High availability data protection and recovery services for enterprises through cloudEnterprise-grade support for all service, frameworks, and tools

Sensors

Devices

Apps

Bots

Crawlers

ERP

LOBCRM

Structured

Un- and Semi-Structured

Interactive Reports with Crescent

Excel with PowerPivot

Embedded BI Apps

SQL REPORTING

SQL ANALYSIS

SQL DATA WAREHOUSING

HADOOP

EIS

BusinessUsers

Page 9: Apache hadoop for windows server and windwos azure

A SEAMLESS OCEAN OF INFORMATION PROCESSING AND ANALYTICS

EIS / ERP RDBMS File System OData [RSS] Azure Storage

HADOOP[Azure and Enterprise]

OCEAN OF DATA[unstructured, semi-structured, structured]

Java OM Streaming OM HiveQL PigLatin (T)SQL.NET/C#/F#

HDFS

NOSQL ETL

Page 10: Apache hadoop for windows server and windwos azure

PROJECT ISOTOPE OFFERINGS

• Bi-directional connectors between Hadoop and SQL and PDW• ODBC driver for Hadoop• Hive plug-in for Excel• Hosted elastic Hadoop service on Azure • Microsoft’s Apache Hadoop-based solution for Windows Azure• Microsoft’s Apache Hadoop-based solution for Windows Server• JavaScript support for Hadoop, with web-based interactive environment• Contributions back to the open source community via the Apache Foundation

Page 11: Apache hadoop for windows server and windwos azure

HIVE PLUG-IN FOR EXCEL

• Connect Excel directly to Hive• Browse Hive objects – tables, columns, etc.• Construct and issue queries

Page 12: Apache hadoop for windows server and windwos azure

HOSTED ELASTIC HADOOP SERVICE ON AZURE

• Elastic MapReduce, Hive, PigLatin, .Net, Javascript, and integration with BI, DW, and Office Collaboration tools• Simple management UI• Full Hadoop compatibility• Native support for Azure Blob Storage from HDFS

Page 13: Apache hadoop for windows server and windwos azure
Page 14: Apache hadoop for windows server and windwos azure
Page 15: Apache hadoop for windows server and windwos azure
Page 16: Apache hadoop for windows server and windwos azure
Page 17: Apache hadoop for windows server and windwos azure
Page 18: Apache hadoop for windows server and windwos azure
Page 19: Apache hadoop for windows server and windwos azure
Page 20: Apache hadoop for windows server and windwos azure
Page 21: Apache hadoop for windows server and windwos azure
Page 22: Apache hadoop for windows server and windwos azure
Page 23: Apache hadoop for windows server and windwos azure
Page 24: Apache hadoop for windows server and windwos azure
Page 25: Apache hadoop for windows server and windwos azure
Page 26: Apache hadoop for windows server and windwos azure
Page 27: Apache hadoop for windows server and windwos azure

MICROSOFT’S APACHE HADOOP-BASED SOLUTION FOR WINDOWS AZURE

• One-click deployment of Hadoop on Azure cluster

Page 28: Apache hadoop for windows server and windwos azure

MICROSOFT’S APACHE HADOOP-BASED SOLUTION FOR WINDOWS

• All standard Hadoop modules supported:Hadoop | HDFS | Pig | Hive | Monitoring Pages

• One-click installer• Simplified cluster configuration• Integration with Microsoft ecosystem

System Center | Active Directory | etc.

Page 29: Apache hadoop for windows server and windwos azure

ISOTOPE.JS: OUR VB MOMENT FOR BIG DATA

• Write MapReduce jobs in JavaScript• Interactive development environment• Interactive data query and analytics of petascale datasets • HIVE command line for interactive HIVE• Charting and graphing for insight and analytics visualization

// Map Reduce function in JavaScript// ------------------------------------------------------------------

var map = function (key, value, context) {var words = value.split(/[^a-zA-Z]/);for (var i = 0; i < words.length; i++) {

if (words[i] !== "") {

context.write(words[i].toLowerCase(), 1);}

}};

var reduce = function (key, values, context) {var sum = 0;while (values.hasNext()) {

sum += parseInt(values.next());}context.write(key, sum);

};

Page 30: Apache hadoop for windows server and windwos azure

GIVING BACK AND PARTICIPATING IN THE HADOOP COMMUNITY

Eric BaldeschwielerCEO

“We are excited to work with Microsoft to help make Apache Hadoop a compelling platform for storing and processing data. Hortonworks welcomes Microsoft to the Hadoop ecosystem and looks forward to lending our deep domain expertise to help accelerate the delivery of Microsoft’s Apache Hadoop-based solution for Windows Server and service for Windows Azure.”

Microsoft will be working with the community to contribute back significant code to the Apache FoundationMicrosoft has announced a partnership with Hortonworks to help accelerate our open source support

Page 31: Apache hadoop for windows server and windwos azure

SUMMARY

Please visit HadoopOnAzure.com to start using Microsoft’s elastic services for Apache Hadoop Please visit www.microsoft.com/bigdata to learn more about project codename “Isotope” and the broader ecosystem of products and services Microsoft is delivering in 2012 an beyond

APACHE HADOOPON AZURE AND WINDOWS

M I C R O S O F T ’ S A PA C H E H A D O O P- B A S E D S E R V I C E S F O R A Z U R E A N D E N T E R P R I S E