Introduction to Microsoft’s Hadoop solution (HDInsight)

55
Introduction to HDInsight James Serra Big Data Evangelist Microsoft [email protected]

Transcript of Introduction to Microsoft’s Hadoop solution (HDInsight)

Page 1: Introduction to Microsoft’s Hadoop solution (HDInsight)

Introduction to HDInsight

James SerraBig Data [email protected]

Page 2: Introduction to Microsoft’s Hadoop solution (HDInsight)

About Me Microsoft, Big Data Evangelist In IT for 30 years, worked on many BI and DW projects Worked as desktop/web/database developer, DBA, BI and DW architect and

developer, MDM architect, PDW/APS developer Been perm employee, contractor, consultant, business owner Presenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data

World conference Certifications: MCSE: Data Platform, Business Intelligence; MS: Architecting

Microsoft Azure Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data Platform Solutions

Blog at JamesSerra.com Former SQL Server MVP Author of book “Reporting with Microsoft SQL Server 2012”

Page 3: Introduction to Microsoft’s Hadoop solution (HDInsight)

AgendaWhat Is Hadoop?Why Deploy To the Cloud?Microsoft’s SolutionHow Do I Get Started?

Page 4: Introduction to Microsoft’s Hadoop solution (HDInsight)

What if you could handle big data?

Data complexity: variety and velocity

Terabytes

Gigabytes

Megabytes

Petabytes Big

DataLog filesSpatial & GPS coordinatesData market feedseGov feedsWeather Text/image

Click streamWikis/blogs

Sensors/RFID/devices

Social sentimentAudio/video

Web 2.0

Web LogsDigital MarketingSearch MarketingRecommendations

AdvertisingMobile

CollaborationeCommerce

ERP/CRMPayables

PayrollInventory

ContactsDeal TrackingSales Pipeline

Page 5: Introduction to Microsoft’s Hadoop solution (HDInsight)

Introducing Apache HadoopApache Open Source ProjectHighly scalable distributed file system (HDFS)Distributed processing on data nodes

Page 6: Introduction to Microsoft’s Hadoop solution (HDInsight)

Data volumeHadoop stores files in a distributed file systemStorage and computation is distributed across many serversFiles can be spread out over multiple nodesHadoop can store very large amounts of dataCombined storage resource can grow with demand from a few nodes to thousands of nodesScales out linearlyVery large files supported including those larger than the capacity of a single node

Files

Page 7: Introduction to Microsoft’s Hadoop solution (HDInsight)

Data varietyHadoop stores files (non-relational store)Files could have a variety of semi-structured or unstructured dataPreviously, these files may not have been seen as providing value or insightsToday, new business questions and insights are being uncovered through data science

SentimentUnderstand how your customersfeel about your brand and products—right now

ClickstreamCapture and analyzewebsite visitors’ data trails and optimize your website

SensorsDiscover patterns in data streaming automatically from remote sensors and machines

GeographicAnalyze location-based data to manage operations where they occur

Server logsResearch logs to diagnose process failures and prevent security breaches

UnstructuredUnderstand patterns in files across millions of web pages, emails, and documents

Page 8: Introduction to Microsoft’s Hadoop solution (HDInsight)

Applications

Devices

HTTP

Inco

min

g

Outg

oing

Data velocityHadoop can stream live data and process them in real-timeHadoop can act as scalable event stream ingestionHadoop can do near real-time in-stream processingData input Event

brokerStream processing Outgoing

Page 9: Introduction to Microsoft’s Hadoop solution (HDInsight)

Governance and integrationData workflow, lifecycle and governanceFalconAtlas

SqoopFlumeNFSWebHDFS

YARN: data operating system

ScriptPig

SearchSolr

SQLHive/Tez, HCatalog

NosqlHbaseAccumulo

Stream Storm

OthersSpark, in-memory, ISV engines

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° °

°

°

N

BatchMap reduce

Data access

HDFS (Hadoop Distributed File System)Data management

AuthenticationAuthorizationAccountingData protectionRangerKnoxAtlasHDFS Encryption

Security Operations

Provision, manage, and monitorAmbariZookeeperCloudbreakSchedulingOozie

Hadoop is a platform with portfolio of projectsGoverned by Apache Software Foundation (ASF)Comprises core services of MapReduce, HDFS, and YARNIn addition to the core, includes functions across: Data services which allow you to manipulate and move data (Hive, HBase, Pig, Flume, Sqoop) Operational services which help manage the cluster (Ambari, Falcon, and Oozie)

Page 10: Introduction to Microsoft’s Hadoop solution (HDInsight)

A Hadoop distribution is a package of projectsTested for consistency across entire package

Knox

Tez

Pig

Hive

Phoe

nix

Accu

mul

o

Stor

m

Mah

out

Solr

Falco

n

Sqoo

p

Flum

e

Amba

ri

Oozie

Zook

eepe

r

HBas

e

Hado

op

and

YARN

Data management

Data access Governance and integration

Operations Security

HDP 2.0 October 2013 2.2.0 0.12.0 0.12.0 0.96.1 0.8.0 1.4.4 1.3.0 1.4.4 3.3.2 3.4.5 .0.4.0

HDP 1.3 May 2013 1.1.2 011.0 0.11.0 0.94.6 0.7.0 1.4.3 1.3.1 1.2.5 3.3.2 3.4.5 .0.4.0

HDP 2.1 April 2014 0.4.0 0.12.1 0.13.0 0.98.0 4.0.0 1.5.1 0.9.1 0.9.0 4.7.2 0.5.0 1.4.4 1.4.0 1.5.1 4.0.0 3.4.5 .0.4.02.4.0

HDP 2.4 May 2016 0.7.0 0.15.0 0.2.1 1.1.2 4.4.0 1.7.0 0.10.0 0.9.0+ 5.2.1 0.6.1 1.4.6 1.5.2 2.2.1 4.2.0 3.4.6 0.6.02.7.1HDP 2.3 July 2015 0.7.0 0.15.0 0.2.1 1.1.1 4.4.0 1.7.0 0.10.0 0.9.0+ 5.2.1 0.6.1 1.4.6 1.5.2 2.1.0 4.2.0 3.4.6 0.6.02.7.1HDP 2.2 Dec 2014 .5.2 0.14.0 0.2.1 .98.4 4.2.0 1.6.1 0.9.3 0.9.0+ 4.10.2 0.6.0 1.4.5 1.5.2 2.0.0 4.1.0 3.4.6 0.5.02.6.0

Page 11: Introduction to Microsoft’s Hadoop solution (HDInsight)

Retail360°view of the customerAnalyze brand sentimentLocalized, personalized promotionsWebsite optimizationOptimal store layout

Financial servicesNew account risk screensFraud preventionTrading riskMaximize deposit spreadInsurance underwritingAccelerate loan processing

TelecomCall detail records (CDRs)Infrastructure investmentNext product to buy (NPTB)Real-time bandwidth allocationNew product development

Utilities, oil, and gasSmart meter stream analysisSlow oil well decline curvesOptimize lease biddingCompliance reportingProactive equipment repairSeismic image processing

Public sectorAnalyze public sentimentProtect critical networksPrevent fraud and wasteCrowd source reporting for repairs to infrastructureFulfill open records requests

ManufacturingSupplier consolidationSupply chain and logisticsAssembly line quality assurance Proactive maintenanceCrowd source quality assurance

HealthcareGenomic data for medical trialsMonitor patient vitalsReduce re-admittance ratesStore medical research dataRecruit cohorts for pharmaceutical trials

Business applications of Hadoop

Page 12: Introduction to Microsoft’s Hadoop solution (HDInsight)

New analytic applications from new dataINDUSTRY USE CASE

SENTIMENTAND WEB

CLICKSTREAMAND BEHAVIOR

MACHINE AND SENSOR

GEOGRAPHIC

SERVER LOGS

STRUCTURED AND UNSTRUCTURED

Financial services

New account risk screens ✔ ✔Trading risk ✔Insurance underwriting ✔ ✔ ✔

TelecomCall detail records (CDR) ✔ ✔Infrastructure investment ✔ ✔Real-time bandwidth allocation ✔ ✔ ✔

Retail360° view of the customer ✔ ✔ ✔Localized, personalized promotions ✔Website optimization ✔

ManufacturingSupply chain and logistics ✔Assembly line quality assurance ✔Crowd-sourced quality assurance ✔

Healthcare Use genomic data in medial trials ✔ ✔ ✔Monitor patient vitals in real-time

PharmaceuticalsRecruit and retain patients for drug trials ✔ ✔

Improve prescription adherence ✔ ✔ ✔ ✔

Oil and gas Unify exploration and production data ✔ ✔ ✔ ✔Monitor rig safety in real-time ✔ ✔ ✔

GovernmentETL offload/federal budgetary pressures ✔ ✔

Sentiment analysis for government programs ✔

Page 13: Introduction to Microsoft’s Hadoop solution (HDInsight)

Main differences vs RDBMS/NoSQLPros• Not a type of database, but rather a open-source software ecosystem that

allows for massively parallel computing• No inherent structure (no conversion to relational or JSON needed)• Good for batch processing, large files, volume writes, parallel scans, sequential

access• Great for large, distributed data processing tasks where time isn’t a constraint

(i.e. end-of-day reports, scanning months of historical data)• Tradeoff: In order to make deep connections between many data points, the

technology sacrifices speed• Some NoSQL databases such as HBase are built on top of HDFS

Page 14: Introduction to Microsoft’s Hadoop solution (HDInsight)

Main differences vs RDBMS/NoSQLCons• File system, not a database• Not good for millions of users, random access, fast individual record lookups or

updates (OLTP)• Not so great for real-time analytics• Lacks: indexing, metadata layer, query optimizer, memory management• Same cons at non-relational: no ACID support, data integrity, limited indexing,

weak SQL, etc• Security limitations

Page 15: Introduction to Microsoft’s Hadoop solution (HDInsight)

AgendaWhat Is Hadoop?Why Deploy To the Cloud?Microsoft’s SolutionHow Do I Get Started?

Page 16: Introduction to Microsoft’s Hadoop solution (HDInsight)

Up-front HW costs Capacity planning Hadoop expertise

Challenges with implementing Hadoop

Page 17: Introduction to Microsoft’s Hadoop solution (HDInsight)

Why Cloud + Big Data?

Speed Scale Economics

Always Up, Always On

Open and flexibleTime to value

Data of all Volume, Variety, Velocity

Massive Compute and Storage

Deployment expertise

Page 18: Introduction to Microsoft’s Hadoop solution (HDInsight)

No HW costs

$0

Unlimited scalePay what you need

Deployed in minutes

Why Hadoop in the Cloud?

Page 19: Introduction to Microsoft’s Hadoop solution (HDInsight)

On-premises Hadoop

SoftwareAppliances

Scenarios For Deploying Hadoop As Hybrid

CloudCloud

Develop/POC

Cloud

Bursting

Cloud

Backup/archive

Page 20: Introduction to Microsoft’s Hadoop solution (HDInsight)

AgendaWhat Is Hadoop?Why Deploy To the Cloud?Microsoft’s SolutionHow Do I Get Started?

Page 21: Introduction to Microsoft’s Hadoop solution (HDInsight)

Introducing Azure HDInsight

Page 22: Introduction to Microsoft’s Hadoop solution (HDInsight)

Hadoop 2.2 and 2.4

80% data compression with ORC

Microsoft contributions to HadoopHadoop on Windows

Hive 100x Query Speed Up

30,000+code linecontributions

HDFS in Cloud (Azure)

REEF for Machine Learning

10,000+engineering hours

Committers

to Hadoop

Page 23: Introduction to Microsoft’s Hadoop solution (HDInsight)

Microsoft + Hortonworks

Promoting Open Hadoop

Engineering alignmentCorporate alignmentField alignment

Page 24: Introduction to Microsoft’s Hadoop solution (HDInsight)

HDInsight Built for Windows or LinuxCustomer ChoiceManaged & supported by MicrosoftFamiliarity of WindowsRe-use common tools, documentation, samples from Hadoop/Linux ecosystemAdd Hadoop projects that were authored on Linux to HDInsightEasier transition from on-premise to cloud

Page 25: Introduction to Microsoft’s Hadoop solution (HDInsight)

HDInsight Supports HiveSQL-like queries on Hadoop data in HDInsightHDInsight provides easy-to-use graphical query interface for HiveHiveQL is a SQL-like language (subset of SQL)Hive structures include well-understood database concepts such as tables, rows, columns, partitionsCompiled into MapReduce jobs that are executed on Hadoop

Dramatic performance gains with Stinger/TezStinger is a Microsoft, Hortonworks and OSS driven initiative to bring interactive queries with HiveBrings query execution engine technology from Microsoft SQL Server to HivePerformance gains up to 100x

Microsoft contribution to Apache code

Hadoop 2.0

1400s44.3s

35.1s

Sample Query

Hive 10 HDP 1.3 /Hive 11

HDP 2.0

32x Speedup40XSpeedup

HDP 2.115s

100xSpeedup

Page 26: Introduction to Microsoft’s Hadoop solution (HDInsight)

HDInsight Supports HBase

Data Node Data Node Data Node Data Node

Task Tracker Task Tracker Task Tracker Task Tracker

Name Node

Job Tracker

HMasterCoordination

Region Server Region Server Region Server Region Server

NoSQL database on data in HDInsightColumnar, NoSQL databaseRuns on top of the Hadoop Distributed File System (HDFS)Provides flexibility in that new columns can be added to column families at any time

Page 27: Introduction to Microsoft’s Hadoop solution (HDInsight)

HDInsight Supports MahoutMachine learning library A library of machine learning algorithms to execute on data in HDFSAlgorithms are not dependent on size of data and can scale with large datasetsLibrary includes: Collaborative Filtering, Classification, Clustering, Dimensionality Reduction, Topic Models

Page 28: Introduction to Microsoft’s Hadoop solution (HDInsight)

HDInsight Supports StormStream analytics for Near-Real Time ProcessingConsumes millions of real-time events from a scalable event broker (ie. Apache Kafka, Azure Event Hub)Performs time-sensitive computationOutput to persistent stores, dashboards or devicesCustomizable with Java + .NETDeeply integrated to Visual Studio

Event Queuing System

Collection Presentation and action

Event producers

Transformation

Long-term storage

Event Hubs

Storage adapters

Stream processi

ngCloud gateways(web APIs)

Field gateways

Applications

Search and query

Data analytics (Excel)

Web/thick client dashboards

Live Dashboards

Apache Storm on

HDInsight

Devices to take action

Kafka /RabbitMQ /ActiveMQ

Web and Social

Devices

Sensors

Azure Stream

Analytics

HDFS

Azure DBs

Azure storage

HBase

Page 29: Introduction to Microsoft’s Hadoop solution (HDInsight)

Spark for Azure HDInsight In Memory Processing on Multiple Workloads

Azure HDInsight

Core Engine

Spark SQL

Spark Streaming

Machine Learning

Graph

ScriptPig

SQL

Hive

NoSQL

Hbase

Streaming Storm

Batch

Map reduce

In Memory Spark

Core Engine

• Single execution model for multiple tasks

• Processing up to 100x faster performance

• Developer friendly (Java, Python, Scala)

• BI tool of choice (Power BI, Tabelau, Qlik, SAP)

• Notebook experience (Jupyter/iPython, Zeppelin)

Page 30: Introduction to Microsoft’s Hadoop solution (HDInsight)

Add Hadoop Projects to HDInsightModify HDInsight clusters with custom scriptAdd Apache Hadoop projects to HDInsightDocumented for Spark, R, Giraph, Solr

HDInsight Allows You To Add Hadoop Projects

Page 31: Introduction to Microsoft’s Hadoop solution (HDInsight)

Easy for DevelopersDeep Visual Studio IntegrationDebug Hive jobs through Yarn logs or troubleshoot Storm topologiesVisualize Hadoop clusters, tables, and storageSubmit Hive queries, Storm topologies (C# or Java spouts/bolts)IntelliSense

IntelliJ IntegrationIntegration with SparkRemote debuggingNative authoring support for Scala and Java

Page 32: Introduction to Microsoft’s Hadoop solution (HDInsight)

Easy for Data ScientistsOut-of-the-box notebook integrationMost popular OSS notebook, Jupyter out-of-the-boxWorked with Jupyter community to enhance kernel to allow Spark execution through REST endpoint

Designed for Data ScientistsCombine code, statistical equations and visualizations Tell a story with the data

Page 33: Introduction to Microsoft’s Hadoop solution (HDInsight)

Easy for Business Analysts

Integration with BI toolsPower BI, Tableau, SAP Lumira and Qlik have integration with SparkPower BI offers streaming connector with Spark StreamDo interactive BI with big data

Page 34: Introduction to Microsoft’s Hadoop solution (HDInsight)

R Server for HDInsight

Only managed, cloud solution for doing R

Familiarity of R (most popular language for data scientists)Scalability of Hadoop and SparkUp to 7x faster using Spark engineTrain and run ML models on datasets of any sizeCloud managed solution (easy setup, elastic, SLA)

Page 35: Introduction to Microsoft’s Hadoop solution (HDInsight)

Introducing Azure HDInsight

Page 36: Introduction to Microsoft’s Hadoop solution (HDInsight)

Hyper scale Infrastructure is the enabler32 Regions Worldwide, 24 Generally Available…

100+ datacenters Top 3 networks in the world 2.5x AWS, 7x Google DC Regions G Series – Largest VM in World, 32 cores, 448GB Ram, SSD…

OperationalAnnounced/Not Operational

Central US

Iowa

West USCaliforni

a

East USVirginia

US GovVirginia

North Central US

Illinois

US GovIowa

South Central US

Texas

Brazil SouthSao Paulo

State

West Europe

Netherlands

China North *

BeijingChina

South *Shanghai

Japan EastTokyo,

Saitama

Japan West

OsakaIndia South

Chennai East AsiaHong Kong

SE AsiaSingapo

re

Australia South East

Victoria

Australia EastNew South

Wales

India CentralPune

Canada EastQuebec City

Canada CentralToronto

India West

Mumbai

Germany North East **

Magdeburg

Germany Central **Frankfurt

North EuropeIreland

East US 2

Virginia

United KingdomRegions

United KingdomRegions

US DoD EastTBD

US DoD WestTBD

* Operated by 21Vianet ** Data Stewardship by Deutsche Telekom

SeoulKorea

(2)

Page 37: Introduction to Microsoft’s Hadoop solution (HDInsight)

Why Microsoft Azure?

Azure Storage

HDInsight

Data Factory

ML

Stream Analytics

Database

DocumentDB

Search

On-premises Hadoop SoftwareAppliances

Azure Facts• >4 trillion objects in Azure• 300,000-1M+ requests per second• Double compute and storage every 6 months

Event Hubs

Page 38: Introduction to Microsoft’s Hadoop solution (HDInsight)

Azure Blob Storage

Page 39: Introduction to Microsoft’s Hadoop solution (HDInsight)

Azure Data Lake Store

Page 40: Introduction to Microsoft’s Hadoop solution (HDInsight)

No hardware challengesHDInsight in the Cloud bypasses hardware costsHardware acquisitionHardware maintenancePerformance tuning

HDInsight in the Cloud bypasses capacity planningSpin up any number of Hadoop nodes on-demandGo from tens of nodes to thousands of nodes

No HW costs

$0

Unlimited scale

Page 41: Introduction to Microsoft’s Hadoop solution (HDInsight)

Deployed in minutesHDInsight in the Cloud Bypasses deployment expertiseHadoop is non-trivial to install and get up and running on multi-nodesEducation gap in IT community regarding Hadoop

HDInsight is deployed in minutesSpin up any number of Hadoop nodes on-demandUp and running in a few clicks (and within minutes)

Deployed in minutes

Page 42: Introduction to Microsoft’s Hadoop solution (HDInsight)

Mission Critical, Enterprise ReadyManaged Hadoop, Backed By An SLAThree Nine’s of Availability 99.9% uptime

HDInsight Auto Replicates DataAutomatic geo-replication of dataData only replicates within the same geo-political (i.e., country, region)

Mission Critical Hadoop

Page 43: Introduction to Microsoft’s Hadoop solution (HDInsight)

Maintenance done for youMinimal IT resources for upgrades/patchingOS patching and security updates done automatically

Minimal IT resources to update Hadoop versions Hadoop versions are rapidly releasing throughout the yearAlways be on the latest version of Hadoop with no effort

HDInsight on Hadoop 2.2April 2014HDInsight on Hadoop 1.1.2Oct 2013

HDInsight on Hadoop 2.4June 2014

O/S Upgrades

O/S Patching

HDInsight adds latest version of Hadoop for you

HDInsight on Hadoop 2.6Feb 2015

HDInsight on Hadoop 2.7.1March 2016

Page 44: Introduction to Microsoft’s Hadoop solution (HDInsight)

Low Cost HDInsight is billed by usageBilled for usageClusters can be deleted when no longer used

No additional price for supportAzure Support includes Hadoop supportWhat usually costs thousands of dollars per node is included

63% Lower Total Cost of Ownership*418% 5 year ROI*3.9 month payback period*63% TCO savings versus on-premises Hadoop*

$£€¥

*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”

Page 45: Introduction to Microsoft’s Hadoop solution (HDInsight)

Introducing Azure HDInsight

Page 46: Introduction to Microsoft’s Hadoop solution (HDInsight)

Scalable, manageable, trusted

1 Billion Microsoft Office users Connect to HDInsight Analyze Visualize

Office 365 is our fastest-growing commercial product ever Share Ask Access

Bringing Hadoop to a billion peopleExcel as the BI tool for everyone

Power BI for collaboration& new experiences

Page 47: Introduction to Microsoft’s Hadoop solution (HDInsight)

DevicesApplicationsDashboards

Making advanced analytics accessible to Hadoop Microsoft Azure Machine Learning

Cloud

Desktop

ML API Service

Microsoft Azure PortalPublish API

Publish API in minutes

Web

ML Studio

Workspace

Easily make changes

ResultsRun & refineTest model typesHistorical data

SQL DB Blobs & tables

HDInsight

SQL Server VM

Page 48: Introduction to Microsoft’s Hadoop solution (HDInsight)

HDInsight vs HDP on Azure VMHDInsight HDP on Azure VMPaaS (setup, scale, manage, patch, etc)

IaaS

Managed by Microsoft Managed by customerStorage separate (Blob or ADLS) Storage in VM (local disk), but can

also have storage in Azure blob or ADLS

Delete VM keeps data Delete VM deletes data (unless external)

Up to 30-days behind latest HDP version

Latest HDP Version

Limited Hadoop projects Unlimited Hadoop projectsMicrosoft supports VM and Hadoop Microsoft: VM, HDP: HadoopNo on-prem version On-prem version

Page 49: Introduction to Microsoft’s Hadoop solution (HDInsight)

Distributed, parallel analytics framework U-SQL (based on C# and SQL)Dial for scaleHides infrastructure complexityVisual Studio integrationInstant scale on demandReduced learning curve

Azure Data Lake AnalyticsAzure Services for big data analytics

YARNHDFS

HDInsightAnalytics Service

Store

Partners

U-SQL

Clickstream

Sensors

Video

Social

Web

Devices

Relational

Applications

56

Page 50: Introduction to Microsoft’s Hadoop solution (HDInsight)

AgendaWhat Is Hadoop?Why Deploy To the Cloud?Microsoft’s SolutionHow Do I Get Started?

Page 51: Introduction to Microsoft’s Hadoop solution (HDInsight)

Get StartedRead documentationhttp://azure.microsoft.com/en-us/documentation/services/hdinsight/

Learning Maphttp://azure.microsoft.com/en-us/documentation/articles/hdinsight-learn-map/

Microsoft Virtual Academyhttp://www.microsoftvirtualacademy.com/training-courses/getting-started-with-microsoft-big-data

Channel 9 Data Exposed Showhttp://channel9.msdn.com/Shows/Data-Exposed

Try 30 day trialhttp://azure.microsoft.com/en-us/pricing/free-trial/

Page 52: Introduction to Microsoft’s Hadoop solution (HDInsight)

Azure getting started• Free Azure account, $200 in credit, https://azure.microsoft.com/en-us/free/• Startups: BizSpark, $750/month free Azure, BizSpark Plus - $120k/year free Azure,

https://www.microsoft.com/bizspark/ • MSDN subscription, Data Platform MVP, $150/month free Azure,

https://azure.microsoft.com/en-us/pricing/member-offers/msdn-benefits/ • Microsoft Educator Grant Program, faculty - $250/month free Azure for a year,

students - $100/month free Azure for 6 months, https://azure.microsoft.com/en-us/pricing/member-offers/msdn-benefits/

• Microsoft Azure for Research Grant, http://research.microsoft.com/en-us/projects/azure/default.aspx

• DreamSpark for students, https://www.dreamspark.com/Student/Default.aspx • DreamSpark for academic institutions:

https://www.dreamspark.com/Institution/Subscription.aspx • Various Microsoft funds so you can learn the technologies or build a client solution

Page 53: Introduction to Microsoft’s Hadoop solution (HDInsight)

Pricing for HDInsightCAPABILITIES STANDARD PREMIUM PREVIEW

Big Data WorkloadsStandard Hadoop and Open Source Projects (Core Hadoop & YARN, Hive & HCatalog, Tez, Pig, Sqoop, Oozie, Zookeeper, Phoenix)Columnar NoSQL (HBase)

Stream processing (Storm)

Interactive processing, real-time stream processing & ML (Spark)

Big Data statistics predictive modeling, and machine learning with R Server

Enterprise ReadinessAdministration – manage, monitor & troubleshoot clustersHadoop version upgrades and patching – Automatic patching and upgradesEncryption of data at rest

Price Standard price per Node HDInsight Standard Price + $0.02/Core-hour for each core used in the cluster during preview (75% discount)

Page 54: Introduction to Microsoft’s Hadoop solution (HDInsight)

Resources What is HDInsight? http://bit.ly/1WpS0at Hadoop and Microsoft http://bit.ly/20Cg2hA Introduction to Hadoop http://bit.ly/1WpTstq

Page 55: Introduction to Microsoft’s Hadoop solution (HDInsight)

Q & A ?James Serra, Big Data EvangelistEmail me at: [email protected] me at: @JamesSerra Link to me at: www.linkedin.com/in/JamesSerra Visit my blog at: JamesSerra.com (where this slide deck is posted via the “Presentations” link on the top menu)