Post on 20-Jan-2015
description
ELASTIC MAPREDUCE FOR AZURE AND ENTERPRISE PRIVATE CLOUDS
Brad Sarsfield Engineering ArchitectMicrosoft Big Data | HaodoopMarch 2012 | revision 1.02
APACHE HADOOPON AZURE AND WINDOWS
M I C R O S O F T ’ S A PA C H E H A D O O P- B A S E D S E R V I C E S F O R A Z U R E A N D E N T E R P R I S E
ISOTOPE BRIDGES BI TO COLLABORATION TO CLOUD
“The next frontier is all about uniting the power of the cloud with the power of data to gain insights that simply weren’t possible even just a few years ago”
Ted Kummert, CVP Business PlatformsSQL PASS, October 2011
BIG DATA IS HERE AND HADOOP IS CENTER STAGE
ECONOMIC CONTEXT AND EXEMPLAR
140,000-190,000 more deep analytical talent positions
1.5 millionmore data savvy managersin the US alone
$300 billionPotential annual value to US healthcare
15 out of 17sectors in the US have more data stored per company than the US Library of Congress
€250 billionPotential annual value to Europe’s public sector
50-60% increase in the number of Hadoop developers
within organizations already using Hadoop within a year
Learn how large corporations are coping with the increasing flow of unstructured data by using a free software program called Hadoop
Special Report: The CEO’s Guide to Hadoop
http://www.businessweek.com/technology/special-reports/ceo-guide-to-hadoop.html
THE 4Vs OF BIG DATA: VOLUME, VELOCITY, VARIABILITY, AND VARIETY
Isotope is designed to enable solution building with all key dimensions in mindDeep integration and coordination with existing Microsoft enterprise, cloud, and BI tools
VIBRANT ECOSYSTEM IN ENTERPRISE AND CLOUD WITH MICROSOFT
CassandraHiveScribeHadoop
HadoopOoziePigLatin…
BackTypeHadoopPig HBaseCassandra
MR/GFSBigtableDremel…
SimpleDBDynamoEC2/EMR/S3…
Internal [ Dryad | Cosmos] and External [ Isotope | Azure | Excel | BI | SQL DW | LTH ]
Scalable machine learning and data mining [Mahout]Statistical modeling and analysis [R]Coordination and workflow [Oozie, Cascading]Data integration and transformation [SQOOP, Flume]Social network analytics and petascale graph learning [Pegasus]Real-time stream analytics and business intelligence merged with petascale computation[HStreamming]Scale-out caching and storage [Cassandra, HBase, Riak, Redis, Couchbase, S3]Cloud-oriented data warehousing, pattern discovery, and transformation [Hive, Pig]
ENTER ISOTOPE
Isotope is the internal codename for Microsoft’s suite of products to support Hadoop in Windows and Azure
OUR DIFFERENTIATORS FOR CLOUD AND ENTERPRISE
Self-service business intelligence at any scale on premise or cloudComplete integration of information assets from log files to collaboration artifacts to enterprise data storesFamiliar and integrated tools for analytics, insight, exploration, modeling, and strategic decision making Transparent, federated identity and security management for all big data services High availability data protection and recovery services for enterprises through cloudEnterprise-grade support for all service, frameworks, and tools
Sensors
Devices
Apps
Bots
Crawlers
ERP
LOBCRM
Structured
Un- and Semi-Structured
Interactive Reports with Crescent
Excel with PowerPivot
Embedded BI Apps
SQL REPORTING
SQL ANALYSIS
SQL DATA WAREHOUSING
HADOOP
EIS
BusinessUsers
A SEAMLESS OCEAN OF INFORMATION PROCESSING AND ANALYTICS
EIS / ERP RDBMS File System OData [RSS] Azure Storage
HADOOP[Azure and Enterprise]
OCEAN OF DATA[unstructured, semi-structured, structured]
Java OM Streaming OM HiveQL PigLatin (T)SQL.NET/C#/F#
HDFS
NOSQL ETL
PROJECT ISOTOPE OFFERINGS
• Bi-directional connectors between Hadoop and SQL and PDW• ODBC driver for Hadoop• Hive plug-in for Excel• Hosted elastic Hadoop service on Azure • Microsoft’s Apache Hadoop-based solution for Windows Azure• Microsoft’s Apache Hadoop-based solution for Windows Server• JavaScript support for Hadoop, with web-based interactive environment• Contributions back to the open source community via the Apache Foundation
HIVE PLUG-IN FOR EXCEL
• Connect Excel directly to Hive• Browse Hive objects – tables, columns, etc.• Construct and issue queries
HOSTED ELASTIC HADOOP SERVICE ON AZURE
• Elastic MapReduce, Hive, PigLatin, .Net, Javascript, and integration with BI, DW, and Office Collaboration tools• Simple management UI• Full Hadoop compatibility• Native support for Azure Blob Storage from HDFS
MICROSOFT’S APACHE HADOOP-BASED SOLUTION FOR WINDOWS AZURE
• One-click deployment of Hadoop on Azure cluster
MICROSOFT’S APACHE HADOOP-BASED SOLUTION FOR WINDOWS
• All standard Hadoop modules supported:Hadoop | HDFS | Pig | Hive | Monitoring Pages
• One-click installer• Simplified cluster configuration• Integration with Microsoft ecosystem
System Center | Active Directory | etc.
ISOTOPE.JS: OUR VB MOMENT FOR BIG DATA
• Write MapReduce jobs in JavaScript• Interactive development environment• Interactive data query and analytics of petascale datasets • HIVE command line for interactive HIVE• Charting and graphing for insight and analytics visualization
// Map Reduce function in JavaScript// ------------------------------------------------------------------
var map = function (key, value, context) {var words = value.split(/[^a-zA-Z]/);for (var i = 0; i < words.length; i++) {
if (words[i] !== "") {
context.write(words[i].toLowerCase(), 1);}
}};
var reduce = function (key, values, context) {var sum = 0;while (values.hasNext()) {
sum += parseInt(values.next());}context.write(key, sum);
};
GIVING BACK AND PARTICIPATING IN THE HADOOP COMMUNITY
Eric BaldeschwielerCEO
“We are excited to work with Microsoft to help make Apache Hadoop a compelling platform for storing and processing data. Hortonworks welcomes Microsoft to the Hadoop ecosystem and looks forward to lending our deep domain expertise to help accelerate the delivery of Microsoft’s Apache Hadoop-based solution for Windows Server and service for Windows Azure.”
Microsoft will be working with the community to contribute back significant code to the Apache FoundationMicrosoft has announced a partnership with Hortonworks to help accelerate our open source support
SUMMARY
Please visit HadoopOnAzure.com to start using Microsoft’s elastic services for Apache Hadoop Please visit www.microsoft.com/bigdata to learn more about project codename “Isotope” and the broader ecosystem of products and services Microsoft is delivering in 2012 an beyond
APACHE HADOOPON AZURE AND WINDOWS
M I C R O S O F T ’ S A PA C H E H A D O O P- B A S E D S E R V I C E S F O R A Z U R E A N D E N T E R P R I S E