Azure HDInsight Hadoop Meets the Cloud Microsoft’s managed Hadoop as a Service 100% open source...
-
Upload
dana-johnston -
Category
Documents
-
view
226 -
download
4
Transcript of Azure HDInsight Hadoop Meets the Cloud Microsoft’s managed Hadoop as a Service 100% open source...
Azure HDInsight
Hadoop Meets the Cloud
Microsoft’s managed Hadoop as a Service
100% open source Apache Hadoop
Built on the latest releases across Hadoop (2.4)
Up and running in minutes with no hardware to deploy
Supported by Microsoft
1) Sensor data from heating, ventilation, and air conditioning (HVAC) systems is loaded into blob storage as comma separated values (CSV)
2) Hive queries are used to expose the data in CSV files as Hive tables. Additional tables are created by enriching this data
3) Excel connects to HDInsight using the Hive ODBC driver, and visualizes the data using Power View
Scenario: Sensor processing and reporting
1) Website log data is loaded into blob storage
2) Hive queries are used to expose the data in blob storage as Hive tables. Additional tables are created by enriching this data
3) Excel connects to HDInsight using the Hive ODBC driver, and generates reports from the data
Scenario: Analyzing Service Exhaust
ScenariosData Warehousing analytics at scaleData cleansingFirst level aggregationsAdvanced Analytics
Machine LearningGraph processing
Programming Models
PigData scripting language
HiveSQL-like set-oriented language
Pegasus, GiraphGraph processing
CascadingDataflow API in Java
Tools for QueryCluster-based Query ConsoleHDInsight Tools for Visual StudioREST-based job submission & managementAzure Data Factory for scheduling and orchestration
Sentiment
Clickstream
Machine/Sensor
Server Logs
Geo-location
Monitor real-time data to…
Prevent
Optimize
Securities FraudCompliance violations
Security breachesNetwork Outages
--- Machine failuresDriver & fleet issues
Application failuresOperational issues
Order routingPricing
Bandwidth allocationCustomer service
OffersPricing Supply chain
RoutesPricing
Site content
Finance Telco Retail Manufactur-ing
Transportation Web
Common Scenarios
Tuples Core Unit of Data Immutable Set of
Key/Value Pair
Bolts Core functions of a
streaming computation Receive tuples and do
stuff Optionally emit additional
tuples
Spouts Source of Streams Wraps a streaming data
source and emits Tuples
Core Components of Apache Storm
Topology Arrangement of Spouts and
Bolts Unit of deployment &
management
TridentTopology topology = new TridentTopology();FixedBatchSpout spout = new FixedBatchSpout(…);Stream stream = topology.newStream(“words”, spout);
stream.each(…, new Myfunction()).groupBy().each(…, new MyFilter()).persistentAggregate(…);
Trident
Fluent, Stream-Oriented API
What is HBaseDistributed, non-relational databaseColumnar, schema-free data modelNoSQL on top of Hadoop
Large scaleLinear scalabilityBillions of rows X millions of columnsMany deployments with 1000+ nodes, PBs of data
Low latencyReal-time random read/writes
Open SourceModeled after Google’s BigTableStarted in 2006
Integration featuresIntegration with Hadoop MapReduce, Hive, TezBulk import of large amount of dataReplication across clusters
Client APIsJava, REST, python, node.js, php, .NET
Data ModelScale-out architectureAutomatic sharding of tablesAutomatic failover Strong consistency for reads and writes
PerformanceColumn FamiliesIn-memory caching on readHigh throughput streaming writes
APIsGet/PutScanCoprocessors
Use case #1: key value storeKey value storeMessage systemsContent management systems
ExamplesFacebook MessagesTwitter-like messagesWebtable – web crawler/indexer
Use case #2: sensor dataSensor dataSocial analyticsTime series databasesInteractive dashboards with trends, counters, etcAudit log systems
ExamplesBloomberg trader terminalOpenTSDB
Use case #4: HBase as a platformRunning on top of HBase using it as a datastore:Phoenix OpenTSDBKiji TephraTitan
Integrated with HBase:Hive PigStorm SparkFlume SolrGanglia
ConclusionHadoop enables big data processing, across query, NoSQL and streamingHDInsight makes it easy to run a Hadoop clusterNo one tool for everything, a set of tools at your disposal, pick the ones that work best
Resources – Query Map/Reduce TutorialHive (source, docs) Pig (source, docs) Scalding (source, docs)Hive on TezMahout on HDInsight
Resources – Storm Apache Storm (source) HDInsight Storm documentationHBase + Storm sampleUsing C# with Storm via SCP.NET
Resources – HBase HBase BookHBase: The Definitive GuideOnline HBase Book
C# HBase SDKhttps://github.com/hdinsight/hbase-sdk-for-net
Tweet Sentiment AppTutorialSource code on github
DocumentationGet started using HBase with Hadoop in HDInsightHDInsight HBase overview
27 Hands on Labs + 8 Instructor Led Labs in Hall 7
DBI Track resources
Free SQL Server 2014 Technical Overview e-book
microsoft.com/sqlserver and Amazon Kindle StoreFree online training at Microsoft Virtual Academy
microsoftvirtualacademy.com Try new Azure data services previews!Azure Machine Learning, DocumentDB, and Stream Analytics
Resources
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
TechNet
Resources for IT Professionals
http://microsoft.com/technet
Sessions on Demand
http://channel9.msdn.com/Events/TechEd
Developer Network
http://developer.microsoft.com
Please Complete An Evaluation FormYour input is important!TechEd Schedule Builder CommNet station or PC
TechEd Mobile appPhone or Tablet
QR code
© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.