Hortonworks Presentation at Big Data London
-
Upload
hortonworks -
Category
Technology
-
view
2.481 -
download
1
Transcript of Hortonworks Presentation at Big Data London
© Hortonworks Inc. 2013
Hortonworks Enterprise Apache Hadoop
March 5, 2013
Page 1
© Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks
• Our Approach
• Customer Use Cases
Page 2
© Hortonworks Inc. 2013
Housekeeping Items
• Restrooms on 2nd and 4th Floors
• Hadoop Summit – March 20-21 in Amsterdam – PreConference Training on March 18-19
– Discount Code Amst13Spon20
• Download SandBox – QR Code at postcode on table
Page 3
© Hortonworks Inc. 2013
A Brief History of Apache Hadoop
Page 4
2013
Focus on INNOVATION 2005: Yahoo! creates
team under E14 to work on Hadoop
Focus on OPERATIONS 2008: Yahoo team extends focus to
operations to support multiple projects & growing clusters
Yahoo! begins to Operate at scale
Enterprise Hadoop
Apache Project Established
Hortonworks Data Platform
2004 2008 2010 2012 2006
STABILITY 2011: Hortonworks created to focus
on “Enterprise Hadoop“. Starts with 24 key Hadoop engineers from Yahoo
© Hortonworks Inc. 2013
Hortonworks Snapshot
Page 5
• We distribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform
• We engineer, test & certify HDP for enterprise usage
• We employ the core architects, builders and operators of Apache Hadoop
• We drive innovation within Apache Software Foundation projects
• We are uniquely positioned to deliver the highest quality of Hadoop support
• We enable the ecosystem to work better with Hadoop
Develop Distribute Support
We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution
Endorsed by Strategic Partners
Headquarters: Palo Alto, CA Employees: 180+ and growing Investors: Benchmark, Index, Yahoo
© Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks • Our approach
– Leading Open Source Hadoop innovation – Addressing “Enterprise Hadoop” Requirements – Enabling Interoperability of the Ecosystem – Ensuring No Lock-In: 100% Open Source
• Patterns of Use
Page 6
© Hortonworks Inc. 2013 Page 7
Apache Software Foundation Guiding Principles • Release early & often • Transparency, respect, meritocracy
Key Roles held by Hortonworkers • VP & PMC Members
– Arun Murthy (Hadoop), Daniel Dai (Pig), Mahadev Konar (Zookeeper)
• Release Managers – Matt Foley (Hadoop 1.x), Arun Murthy
(Hadoop 2.x), Ashutosh Chauhan (Hive), Daniel Dai (Pig), Alan Gates (HCatalog), Mahadev Konar (Ambari)
• Committers – 54 across all Hadoop-related projects
Apache Hadoop
Test & Patch
Design & Develop
Release
Apache Pig
Apache HCatalog
Apache HBase
Other Apache Projects
Apache Hive
Apache Ambari
“We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog..”
- Jeff Kelly: Wikibon
Apache Community Leadership
© Hortonworks Inc. 2013
Leadership that Starts at the Core
Page 8
• Driving next generation Hadoop – YARN, MapReduce2, HDFS2, High
Availability, Disaster Recovery
• 420k+ lines authored since 2006 – More than twice nearest contributor
• Deeply integrating w/ecosystem
– Enabling new deployment platforms – (ex. Windows & Azure, Linux & VMware HA)
– Creating deeply engineered solutions – (ex. Teradata big data appliance)
• All Apache, NO holdbacks – 100% of code contributed to Apache
© Hortonworks Inc. 2013
Driving Enterprise Hadoop Innovation
Page 9
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
AMBARI
HBASE
HCATALOG
HIVE
PIG
HADOOP CORE
Lines Of Code By Company Source: Apache Software Foundation
Hortonworks Yahoo!
Cloudera Other
Hortonworks Committers
Cloudera Committers
19 9
5 1
1 0
5 0
3 7
14 0
© Hortonworks Inc. 2013
Hortonworks Process for Enterprise Hadoop
Page 10
Upstream Community Projects Downstream Enterprise Product
Hortonworks Data Platform
Design & Develop
Distribute
Integrate & Test
Package & Certify
Apache HCatalog
Apache Pig
Apache HBase
Other Apache Projects
Apache Hive
Apache Ambari
Apache Hadoop
Test & Patch
Design & Develop
Release
No Lock-in: Integrated, tested & certified distribution lowers risk by ensuring close alignment with Apache projects
Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream
Stable Project Releases
Fixed Issues
© Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks • Our approach
– Leading Open Source Hadoop Innovation – Addressing “Enterprise Hadoop” Requirements – Enabling Interoperability of the Ecosystem – Ensuring NO LOCK-IN: 100% Open Source
• Patterns of use
Page 11
© Hortonworks Inc. 2013
Enhancing the Core of Apache Hadoop
Deliver high-scale storage & processing with enterprise-ready platform services
Unique Focus Areas: • Bigger, faster, more flexible
Continued focus on speed & scale and enabling near-real-time apps
• Tested & certified at scale Run ~1300 system tests on large Yahoo clusters for every release
• Enterprise-ready services High availability, disaster recovery, snapshots, security, …
Page 12
HADOOP CORE
Hortonworkers are the architects, operators, and builders of core Hadoop
Distributed Storage & Processing
PLATFORM SERVICES Enterprise Readiness
© Hortonworks Inc. 2013 Page 13
HADOOP CORE
DATA SERVICES
Provide data services to store, process & access data in many ways
Unique Focus Areas: • Apache HCatalog
Metadata services for consistent table access to Hadoop data
• Apache Hive Explore & process Hadoop data via SQL & ODBC-compliant BI tools
Distributed Storage & Processing
Hortonworks enables Hadoop data to be accessed via existing tools & systems
Store, Process and Access Data
PLATFORM SERVICES Enterprise Readiness
Data Services for Full Data Lifecycle
© Hortonworks Inc. 2013
Operational Services for Ease of Use
Page 14
OPERATIONAL SERVICES
Include complete operational services for productive operations & management
Unique Focus Area: • Apache Ambari:
Provision, manage & monitor a cluster; complete REST APIs to integrate with existing operational tools; job & task visualizer to diagnose issues
Only Hortonworks provides a complete open source Hadoop management tool
Manage & Operate at
Scale
DATA SERVICES
Store, Process and Access Data
HADOOP CORE Distributed Storage & Processing
PLATFORM SERVICES Enterprise Readiness
© Hortonworks Inc. 2013
OS Cloud VM Appliance
Page 15
PLATFORM SERVICES
HADOOP CORE
DATA SERVICES
OPERATIONAL SERVICES
Manage & Operate at
Scale
Store, Process and Access Data
Enterprise Readiness
Only Hortonworks allows you to deploy seamlessly across any deployment option
• Linux & Windows • Azure, Rackspace & other clouds • Virtual platforms • Big data appliances
HORTONWORKS DATA PLATFORM (HDP)
Distributed Storage & Processing
Deployable Across a Range of Options
© Hortonworks Inc. 2013
OS Cloud VM Appliance
HDP: Enterprise Hadoop Distribution
Page 16
PLATFORM SERVICES
HADOOP CORE
DATA SERVICES
OPERATIONAL SERVICES
Manage & Operate at
Scale
Store, Process and Access Data
HORTONWORKS DATA PLATFORM (HDP)
Distributed Storage & Processing
Hortonworks Data Platform (HDP) Enterprise Hadoop
• The ONLY 100% open source and complete distribution
• Enterprise grade, proven and tested at scale
• Ecosystem endorsed to ensure interoperability
Enterprise Readiness
© Hortonworks Inc. 2013
OS Cloud VM Appliance
HDP 1.2: Data Services Improvements
Page 17
PLATFORM SERVICES
HADOOP CORE
Enterprise Readiness High Availability, Disaster Recovery, Snapshots, Security, etc…
HORTONWORKS DATA PLATFORM (HDP)
OPERATIONAL SERVICES
DATA SERVICES
HCATALOG
HIVE PIG HBASE
OOZIE
AMBARI
HDFS YARN (in 2.0)
WEBHDFS MAP REDUCE
Hortonworks Data Platform (HDP) Enterprise Hadoop
• The ONLY 100% open source and complete distribution
• Enterprise grade, proven and tested at scale
• Ecosystem endorsed to ensure interoperability
SQOOP
FLUME
© Hortonworks Inc. 2013
Latest Hortonworks Announcements
Two releases in January 2013
Hortonworks Data Platform 1.2 Hortonworks Brings Enterprise Manageability to 100% Open Source Apache Hadoop Distribution
Hortonworks Sandbox Hortonworks accelerates Hadoop skills development with an easy-to-use, flexible and extensible platform to learn, evaluate and use Apache Hadoop
Page 18
JANUARY
15
JANUARY
22
© Hortonworks Inc. 2013
Latest Hortonworks Announcements
February 2013
Hortonworks : New Apache projects Hortonworks fuel the Open Source by releasing three new projects : KNOX / TEZ / STINGER
HDP available on Microsoft Windows To help the Hadoop adoption, Hortonworks release HDP on Microsoft Windows
Page 19
February
20
February
25
© Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks • Our approach
– Leading Open Source Hadoop Innovation – Addressing “Enterprise Hadoop” Requirements – Enabling Interoperability of the Ecosystem – Ensuring No Lock-in: 100% Open Source
• Patterns of use
Page 20
© Hortonworks Inc. 2013
Existing Data Architecture
Page 21
APPLICAT
IONS
DATA
SYSTEMS
TRADITIONAL REPOS RDBMS EDW MPP
DATA
SOURC
ES
OLTP, POS SYSTEMS
OPERATIONAL TOOLS
MANAGE & MONITOR
TradiLonal Sources (RDBMS, OLTP, OLAP)
DEV & DATA TOOLS
BUILD & TEST
Business AnalyLcs
Custom ApplicaLons
Enterprise ApplicaLons
© Hortonworks Inc. 2013
An Emerging Data Architecture
Page 22
APPLICAT
IONS
DATA
SYSTEMS
TRADITIONAL REPOS RDBMS EDW MPP
DATA
SOURC
ES
MOBILE DATA
OLTP, POS SYSTEMS
OPERATIONAL TOOLS
MANAGE & MONITOR
TradiLonal Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
DEV & DATA TOOLS
BUILD & TEST
Business AnalyLcs
Custom ApplicaLons
Enterprise ApplicaLons
HORTONWORKS DATA PLATFORM
© Hortonworks Inc. 2013
Interoperating With Your Tools
Page 23
APPLICAT
IONS
DATA
SYSTEMS
TRADITIONAL REPOS
DEV & DATA TOOLS
OPERATIONAL TOOLS
Viewpoint
Microsoft Applications
HORTONWORKS DATA PLATFORM
DATA
SOURC
ES
MOBILE DATA
OLTP, POS SYSTEMS
TradiLonal Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
© Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks • Our approach
– Leading Open Source Hadoop Innovation – Addressing “Enterprise Hadoop” Requirements – Enabling Interoperability of the Ecosystem – Ensuring No Lock-In: 100% Open Source
• Patterns of use
Page 24
© Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks • Our approach • Patterns of use
Page 25
© Hortonworks Inc. 2013
Operational Data Refinery
Page 26
DATA
SYSTEMS
DATA
SOURC
ES
1
3 1 Capture
Capture all data
Process Parse, cleanse, apply structure & transform
Exchange Push to existing data warehouse for use with existing analytic tools
2
3
Refine Explore Enrich
2
APPLICAT
IONS
Collect data and apply a known algorithm to it in trusted operational process
TRADITIONAL REPOS RDBMS EDW MPP
HORTONWORKS DATA PLATFORM
Business AnalyLcs
Custom ApplicaLons
Enterprise ApplicaLons
TradiLonal Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
© Hortonworks Inc. 2013
Big Data Exploration & Visualization
Page 27
DATA
SYSTEMS
DATA
SOURC
ES
Refine Explore Enrich
APPLICAT
IONS
1 Capture Capture all data
Process Parse, cleanse, apply structure & transform
Exchange Explore and visualize with analytics tools supporting Hadoop
2
3
Collect data and perform iterative investigation for value
3
2 TRADITIONAL REPOS
RDBMS EDW MPP
1
HORTONWORKS DATA PLATFORM
Business AnalyLcs
TradiLonal Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
Custom ApplicaLons
Enterprise ApplicaLons
© Hortonworks Inc. 2013
Application Enrichment
Page 28
DATA
SYSTEMS
DATA
SOURC
ES
Refine Explore Enrich
APPLICAT
IONS
1 Capture Capture all data
Process Parse, cleanse, apply structure & transform
Exchange Incorporate data directly into applications
2
3
Collect data, analyze and present salient results for online apps
3
1
2 TRADITIONAL REPOS
RDBMS EDW MPP
TradiLonal Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
Custom ApplicaLons
Enterprise ApplicaLons
NOSQL
HORTONWORKS DATA PLATFORM
© Hortonworks Inc. 2013
Key 2013 “Enterprise Hadoop” Initiatives
Page 29
Invest In:
– Platform Services – DR, Snapshot, …
– Data Services – In support of Refine,
Explore, Enrich
– Operational Services – Manageability,
Security, …
Tez / “Stinger” Interactive Query
“Gateway” Secure Access
“Continuum” Biz Continuity
HORTONWORKS DATA PLATFORM (HDP)
PLATFORM SERVICES
HADOOP CORE
DATA SERVICES
OPERATIONAL SERVICES
Ambari Manage & Operate
“Herd” Data Integration
HBase Online Data
© Hortonworks Inc. 2013
Stinger: Make Hive Best for All Needs
Page 30
Interac4ve Batch
• Parameterized Reports
• Drilldown • Visualiza4on • Explora4on
• Opera4onal batch processing
• Enterprise Reports • Data Mining
Data Size
5s – 1m 1m – 1h 1h+
Non-‐Interac4ve
• Data prepara4on • Incremental batch processing
• Dashboards / Scorecards
Improve Latency & Throughput • Query engine improvements • New “Optimized RCFile” column store • Next-gen runtime (elim’s M/R latency)
Extend Deep Analytical Ability • Analytics functions • Improved SQL coverage • Continued focus on core Hive use cases
© Hortonworks Inc. 2013
Flexible Support Subscription Programs
Leverage Hortonworks Expertise: Subscription and Support delivered and backed by Hadoop experts; subscriptions based on nodes or storage
Page 31
Developer Support “How to” guidance for developers and archs
Essential Support* Operations support for small research clusters
Standard Support Operations support for dev & test clusters
12 x 5 Web only
12 x 5 Web only
All Sev: 1 business day
All Sev: 1 business day
12 x 5 Web only
Application Design Advice Code Review
Cluster Design, Install, Maintain, Performance
Cluster Design, Install, Maintain, Performance
All Sev: 1 business day 1 seat
3 Contacts
3 Contacts
Patches & Updates
Patches & Updates
* Limited in size and no expansion
Enterprise Support Operations support for critical clusters
24 x 7 Phone &
Web
Sev 1: 1 Hour Sev 2: 4 Bus Hour
Cluster Design, Install, Maintain, Performance
5 Contacts
Patches & Updates
Additional Options
© Hortonworks Inc. 2013
Hortonworks: Best In Class Hadoop Support
• Experienced enterprise support team – Experience supporting enterprise clients in production – Core engineers have real operational
experience: built and supported 44+K nodes in production – Extensive experience in commercial big data offerings
including HDP, MapR, Karmasphere
• Global 24x7 operation – support based in Sunnyvale, UK & India
• Stringent case management processes ensures high quality customer service & responsiveness
Page 32
© Hortonworks Inc. 2013
Transferring Our Hadoop Expertise to You
The expert source for Apache Hadoop training & certification
• World class training programs designed to
help you learn fast – Role-based hands on classes with 50% lab time
• Expert consulting services – Programs designed to transfer knowledge
• Industry leading Hadoop Sandbox program
– Fastest way to learn Apache Hadoop – Multi-level tutorials for wide applicability – Customizable and updateable
Page 33
© Hortonworks Inc. 2013
Summary
• Leading the Innovation in Core Hadoop • Addressing the requirements for Enterprise usage • Enabling interoperability of the ecosystem • No lock-in. 100% Open Source.
• Best in industry support with flexible pricing model
• Find out more – www.hortonworks.com
– http://hortonworks.com/hadoop-training/
Page 34