OOP 2014
-
Upload
emil-andreas-siemes -
Category
Technology
-
view
109 -
download
1
description
Transcript of OOP 2014
Hortonworks: We Do Hadoop.Our mission is to enable your Modern Data Architecture
by Delivering Enterprise Apache Hadoop
Emil A. Siemes
Solution Engineer
January 2014
Our Mission:
Our Commitment
Open LeadershipDrive innovation in the open exclusively via the Apache community-driven open source process
Enterprise RigorEngineer, test and certify Apache Hadoop with the enterprise in mind
Ecosystem EndorsementFocus on deep integration with existing data center technologies and skills
Page 2
Headquarters: Palo Alto, CAEmployees: 300+ and growing
Trusted Partners
Enable your Modern Data Architecture by Delivering Enterprise Apache Hadoop
A Traditional Approach Under Pressure
Page 3
APPL
ICAT
ION
SDA
TA S
YSTE
M
REPOSITORIES
SOU
RCES Existing Sources
(CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Emerging Sources (Sensor, Sentiment, Geo, Unstructured)
Business Analytics
Custom Applications
PackagedApplications
Source: IDC
2.8 ZB in 2012
85% from New Data Types
15x Machine Data by 2020
40 ZB by 2020
Emerging Modern Data Architecture
Page 4
APPL
ICAT
ION
SDA
TA S
YSTE
M
REPOSITORIES
SOU
RCES Existing Sources
(CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Emerging Sources (Sensor, Sentiment, Geo, Unstructured)
OPERATIONALTOOLS
MANAGE & MONITOR
DEV & DATATOOLS
BUILD & TEST
Business Analytics
Custom Applications
PackagedApplications
Drivers of Hadoop Adoption
Page 5
From NEW types of Data (or existing types for longer)
New Business Applications
Most Common NEW TYPES OF DATA
1. SentimentUnderstand how your customers feel about your brand and products – right now
2. ClickstreamCapture and analyze website visitors’ data trails and optimize your website
3. Sensor/MachineDiscover patterns in data streaming automatically from remote sensors and machines
4. GeographicAnalyze location-based data to manage operations where they occur
5. Server LogsResearch logs to diagnose process failures and prevent security breaches
6. Unstructured (txt, video, pictures, etc..)Understand patterns in files across millions of web pages, emails, and documents
Value
+ Keep existing data longer!
Drivers of Hadoop Adoption
Page 7
A Modern Data ArchitectureComplement your existing data systems: the right workload in the right place
Architectural
New Business Applications
Let’s build a Data Lake…
Instructions on:hadoopwrangler.com
Page 8
Knox – Perimeter Level Security
compute&
storage. . .
. . .
. .compute
&storage
.
.
YARN
Data Lake HDP Grid
AMBARI
HDP Data Lake Solution Architecture
Page 9
HCATALOG (table & user-defined metadata)
Step 2: Model/Apply Metadata
Use Case Type 1: Materialize & Exchange
Opens up Hadoop to many new use cases
Stream Processing, Real-time Search,
MPI
YARNApps
INTERACTIVE
Hive Server(Tez/Stinger)
Query/Analytics/
Reporting Tools
Tableau/Excel
Datameer/Platfora/SAP
Use Case Type 2: Explore/Visualize
FALCON (data pipeline & flow management)
Manage Steps 1-4: Data Lifecycle with Falcon
Ingestion
SQOOP
FLUME
Web HDFS
NFS
SOURCE DATA
ClickStream Data
Sales Transaction
/Data
Product Data
Marketing/Inventory
Social Data
EDW
File
JMS
REST
HTTP
Streaming
Step 1:Extract & Load
Oozie (Batch scheduler)
(data processing)HIVE PIG Mahout
Exchange
HBaseClient
Sqoop/Hive
DownstreamData Sources
OLTPHBase
EDW(Teradata)
StormSAS
Elastic Search
TEZ
Step 3: Transform, Aggregate & Materialize
MR2
Step 4: Schedule and Orchestrate
Store all date in a single place, interact in multiple ways
Hadoop 2: The Introduction of YARN
1st Gen of Hadoop
HDFS(redundant, reliable storage)
MapReduce(cluster resource management
& data processing)
HADOOP 2
Single Use SystemBatch Apps
Multi Use Data PlatformBatch, Interactive, Online, Streaming, …
Page 10
Redundant, Reliable Storage(HDFS)
Efficient Cluster Resource Management & Shared Services
(YARN)
Standard QueryProcessing
Hive, Pig
BatchMapReduce
InteractiveTez
Online Data Processing
HBase, Accumulo
Real Time Stream Processing
Stormothers
…
Let’s start simple…
• A solution unifying all data sources of a mobile App–Allowing analytics over all data in one place
– In real time and long term
• Mobile Apps have multiple channels for data:–Data created on the handset (e.g. geo location)–Data created on servers accessed by the mobile app (e.g. app
data, logs)–Data from backend services (e.g. RDBMS)–Store data (e.g. iTunes Connect, Google Play)–Social data (Twitter, App Reviews, etc.)
Page 11
Why Should We Care?
Page 12
• How much revenue did I made? (Not that easy to answer as one could think)
• Where are my customers now?• Can you fulfill requirements from the business like: ”Tell me when our
customers are in a coffee shop so we can offer them e.g. Wifi”• What are my customers thinking about my app/brand?
• Are the ones complaining really using it (correct)?• How can I support marketing activities?• How can I evaluate local marketing activities?• Does positive/negative sentiment effect my downloads?• Will my servers be able to deal with the load in 3 months• …
Design Goals
• Use as much as we have in our stack as possible• Minimize dependencies on stacks beyond Hadoop
–Still make it useful and complete
• Make it fit into a 8GB MacBook/Laptop• Release early & release often
Page 13
iiCaptain
Page 14
Types Of Data For iiCaptain
Page 15
• Geo location data • Store Data
• iTunes Connect, Google Play, Amazon via AppAnnie
• Twitter• RDBMS (Sqoop)• Logs
iiCaptain’s Data Ocean / Data Lake
Page 16
More Details
Page 17
Analytics
Page 18
SQL Interactive Query & Apache Hive
Page 19
Key ServicesPlatform, operational and data services essential for the enterprise
SkillsLeverage your existing skills: development, analytics, operations
IntegrationInteroperable with existing data center investments
Stinger InitiativeBroad, community based effort to deliver the next generation of Apache Hive
ScaleThe only SQL interface to Hadoop designed for queries that scale from TB to PB
SQLSupport broadest range of SQL semantics for analytic applications against Hadoop
SpeedImprove Hive query performance by 100X to allow for interactive query times (seconds)
SQL
Apache Hive• The defacto standard for Hadoop SQL access
• Used by your current data center partners
• Built for batch AND interactive query
Build Process, Shining With Savanna
Page 20
Roadmap
Page 21
- Servlet Engine in YARN - Project Savanna: Continuous Delivery end-2-end- Sentiment Analysis with Flume/Hive and App Reviews- Knox- Falcon- Phoenix
HDP 2.0: Enterprise Hadoop Platform
Page 22
Hortonworks Data Platform (HDP)
• The ONLY 100% open source and most current platform
• Integrates full range of enterprise-ready services
• Certified and tested at scale
• Engineered for deep ecosystem interoperability
OS/VM Cloud Appliance
CORE SERVICES
CORE
Enterprise ReadinessHigh Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots
HORTONWORKS DATA PLATFORM (HDP)
OPERATIONAL SERVICES
DATASERVICES
HDFS
SQOOP
FLUME
NFS
LOAD & EXTRACT
WebHDFS
KNOX*
OOZIE
AMBARI
FALCON*
YARN
MAP TEZREDUCE
HIVE &HCATALOG
PIGHBASE
OPERATIONAL SERVICES
DATASERVICES
CORE SERVICES
HORTONWORKS DATA PLATFORM (HDP)
Schedule
Enterprise ReadinessHigh Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots
Storage
Resource Management
Process
Data Movement
ClusterMgmnt Dataset
Mgmnt Data Access
CORE SERVICES
HORTONWORKS DATA PLATFORM (HDP)
OPERATIONAL SERVICES
DATASERVICES
HDFS
SQOOP
FLUMEAMBARIFALCON
YARN
MAP TEZREDUCE
HIVEPIGHBASE
OOZIE
Enterprise ReadinessHigh Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots
LOAD & EXTRACT
WebHDFS
NFS
KNOX*
Hortonworks: The Value of “Open” for You
Page 23
Validate & Try1. Download the
Hortonworks Sandbox
2. Learn Hadoop using the technical tutorials
3. Investigate a business case using the step-by-step business cases scenarios
4. Validate YOUR business case using your data in the sandbox
Connect With the Hadoop CommunityWe employ a large number of Apache project committers & innovators so that you are represented in the open source community
Avoid Vendor Lock-InHortonworks Data Platform remain as close to the open source trunk as possible and is developed 100% in the open so you are never locked in
The Partners you Rely On, Rely On Hortonworks We work with partners to deeply integrate Hadoop with data center technologies so you can leverage existing skills and investments
Certified for the EnterpriseWe engineer, test and certify the Hortonworks Data Platform at scale to ensure reliability and stability you require for enterprise use
Support from the ExpertsWe provide the highest quality of support for deploying at scale. You are supported by hundreds of years of Hadoop experience
Engage1. Execute a Business Case
Discovery Workshop with our architects
2. Build a business case for Hadoop today