Hp Converged Systems and Hortonworks - Webinar Slides
-
Upload
hortonworks -
Category
Software
-
view
412 -
download
3
description
Transcript of Hp Converged Systems and Hortonworks - Webinar Slides
Page 1 © Hortonworks Inc. 2014
Delivering Apache Hadoop for the Modern Data Architecture
HP & Hortonworks. We do Hadoop Together
Page 2 © Hortonworks Inc. 2014
Your speakers…
Raghu Thiagarajan Director, Partner Product Management, Hortonworks
Chris Daly Chief Outbound Engineer, CSS and Big Data Systems, HP
Page 3 © Hortonworks Inc. 2014
Why Hadoop: Traditional Data Architecture Pressured
2.8 ZB in 2012
85% from New Data Types
15x Machine Data by 2020
40 ZB by 2020
Data source: IDC
SOU
RC
ES
OLTP, ERP, CRM
Documents, Emails
Web Logs, Click
Streams
Social Networks
Machine Generated
Sensor Data
Geolocation Data
Page 4 © Hortonworks Inc. 2014
Sens
or
Serv
er
Logs
Text
So
cial
Geo
grap
hic
Mac
hine
Clic
kstr
eam
Stru
ctur
ed
Uns
truc
ture
d
Financial Services
New Account Risk Screens ✔ ✔
Trading Risk ✔
Insurance Underwriting ✔ ✔ ✔
Telecom Call Detail Records (CDR) ✔ ✔
Infrastructure Investment ✔ ✔
Real-time Bandwidth Allocation ✔ ✔ ✔
Retail 360° View of the Customer ✔ ✔
Localized, Personalized Promotions ✔
Website Optimization ✔
What: Business Applications of Hadoop
Page 5 © Hortonworks Inc. 2014
Sens
or
Serv
er
Logs
Text
So
cial
Geo
grap
hic
Mac
hine
Clic
kstr
eam
Stru
ctur
ed
Uns
truc
ture
d
Manufacturing Supply Chain and Logistics ✔
Preventive Maintenance ✔
Crowd-sourced Quality Assurance ✔
Healthcare Use Genomic Data in Medial Trials ✔ ✔
Monitor Patient Vitals in Real-Time
Pharmaceuticals
Recruit & Retain Patients for Drug Trials ✔ ✔
Improve Prescription Adherence ✔ ✔ ✔
Oil & Gas Unify Exploration & Production Data ✔ ✔ ✔
Monitor Rig Safety in Real-Time ✔ ✔
Government ETL Offload in Response to Budgetary Pressures ✔
Sentiment Analysis for Gov’t Programs ✔
What: Business Applications of Hadoop
Page 6 © Hortonworks Inc. 2014
OPERATIONS TOOLS
Provision, Manage & Monitor
DEV & DATA TOOLS
Build & Test
DAT
A SY
STEM
S A
PPLI
CAT
ION
S
Repositories
ROOMS
Statistical Analysis
BI / Reporting,
Ad Hoc Analysis
Interactive Web & Mobile Apps
Enterprise
Applications
RDBMS EDW MPP
How: Modern Data Architecture with Hadoop
Governa
nce
& In
tegra.
on
Security
Ope
ra.o
ns
Data Access
Data Management
ENTERPRISE HADOOP
SOU
RC
ES
OLTP, ERP, CRM
Documents, Emails
Web Logs, Click Streams
Social Networks
Machine Generated
Sensor Data
Geolocation Data
Page 7 © Hortonworks Inc. 2014
YARN Transforms Hadoop’s Architecture
Enables deep insight across a large, broad, diverse set of data at
efficient scale
Mul.-‐Use Data Pla>orm Store all data in one place, process in many ways
Batch Interac.ve Itera.ve Streaming
1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °
° ° °
° ° °
° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °
° ° °
° ° n
Store any/all raw data sources and processed data over extended periods of time.
YARN : Data Opera.ng System
Page 8 © Hortonworks Inc. 2014
Designing Hadoop Cluster
§ Cluster Storage Capacity
§ Server Specification
§ Cluster Size
§ Factoring Performance
Key Considerations § Any piece of hardware can and will
fail
§ More nodes means less impact on failure
§ Resiliency and fault tolerance improve with scale
§ Build resiliency through scale
§ Still use modern hardware
§ Software handles hardware failures
Page 9 © Hortonworks Inc. 2014
Storage Capacity
§ Key Input § Initial Data Size § 3 year YOY growth § Compression ratio § Intermediate and materialized views § Replication Factor
§ Note § Hard to accurately predict the size of intermediate & materialized views at the start of a
project § Be conservative with compression ratio. Mileage varies by data type § Hadoop needs temp space to store intermediate files
Hadoop Cluster
Raw Data
Work In Process Data
Master Data
Materialized Views
Page 10 © Hortonworks Inc. 2014
Storage Capacity
Total Storage Required
(Initial Size + "YOY Growth + Intermediate Data Size) "X Replication Count "X 1.2"
Compression Ratio"
Good Rule of Thumb
Replication Count = 3""Compression Ratio = 4-5""Intermediate Data Size = 50%-100% of Raw Data Size"
Note
1.2 factor is included in the sizing estimator to account for the temp space requirement of Hadoop"
Page 11 © Hortonworks Inc. 2014
Server Specification § Master Nodes – NameNode, Resource Manager, HBase Master
§ Dual Intel Xeon E5-26xx series processors § 128GB or 256GB RAM per chassis § 4+ – 1TB NL-SAS/SATA Drives RAID10+ Spares
§ Worker Nodes – DataNode, Node Manager and Region Server § Dual Intel Xeon E5-26xx series processors § 128GB RAM or 256GB RAM § 12 – 1-4 TB NLSAS/SATA Drives
§ Gateway Nodes / Edge Nodes § Mirror of Master Nodes configuration
Page 12 © Hortonworks Inc. 2014
Number of Data Nodes
Cluster Size
12
Storage Per Server
Number of Master Nodes § Name Node, Zookeeper § Resource Manager, Zookeeper § Failover Name Node, HBase Master, Hive
Server, Zookeeper § In a half-rack cluster, this would be combined with
Resource Manager § Management Node (Ambari, Ganglia, Nagios)
§ In a half-rack cluster, this would be combined with the Name Node
Total Storage"Required"
Note § Large clusters may need more than 4
master nodes § Start at 2/4 and grow based on usage
Page 13 © Hortonworks Inc. 2014
Factoring Performance
§ Data Nodes § 1 TB drives for performance clusters § 4 TB drives for archive clusters
§ Meeting SLA Requirements § Hadoop workloads are varied § Difficult to assess cluster size based on SLAs without actual testing § Good News: Hadoop performs linearly with scale
§ Enables one to design experiments using a fraction of data § Best Practice Guidance
§ Create a test configuration with a rack of servers § Load a slice of data § Run tests with real-life queries to measure performance & fine tune the system § Scale cluster size based on observed performance
13
Page 14 © Hortonworks Inc. 2014
OPERATIONAL TOOLS
DEV & DATA TOOLS
INFRASTRUCTURE
HDP and HP are deeply integrated in the data center SO
UR
CES
EXISTING Systems
Clickstream Web &Social Geoloca.on Sensor & Machine
Server Logs Unstructured
DAT
A S
YSTE
M
RDBMS EDW MPP HANA
APPLICAT
IONS
BusinessObjects BI Deep Partnerships Hortonworks and HP engaged in deep engineered relationships with the leaders in the data center, such as Microsoft, Teradata, Redhat, & SAP Broad Partnerships Over 600 partners work with Hortonworks to certify their applications to work with Hadoop so they can extend big data to their users
HDP 2.1
Gov
erna
nce
&
Inte
grat
ion
Secu
rity
Ope
ratio
ns
Data Access
Data Management
YARN
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Delivering Apache Hadoop for the Modern Data Architecture HP + Hortonworks Validated Design Christopher Daly
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 16
The HP Approach to Apache Hadoop
Why a Reference Architecture?
• Provides a starting point or baseline
• Maximum flexibility • Customizable to fit YOUR needs • Adopt the parts you want • Replace the parts you don’t
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 17
Solution components
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 18
Pre-deployment considerations / system selection
• Operating system • Computation • Memory • Storage • Network
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 19
High-availability considerations
• Hadoop NameNode HA • ResourceManager HA • OS availability and reliability • Network reliability • Power supply
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 20
Management nodes – The HP ProLiant DL360p Gen8 Server selection
The Management node and head nodes, as tested in the Reference Architecture, contain the following base configuration: 2 x Eight-Core Intel E5-2650 v2 Processors Smart Array P420i Controller with 512MB FBWC 3.6 TB – 4 x 900GB SFF SAS 10K RPM disks 128 GB DDR3 Memory – 8 x 16GB 2Rx4 PC3-14900R-13 10GbE 2P NIC 561FLR-T card
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 21
Worker nodes – ProLiant DL380p Gen8
Server selection
The ProLiant DL380p Gen8 (2U) as configured for the Reference Architecture as a worker node has the following configuration: Dual 10-Core Intel Xeon E5-2670 v2 Processors with Hyper-Threading Twelve 2TB 3.5” 7.2K LFF SATA MDL (22 TB for Data) 128 GB DDR3 Memory (8 x HP 16GB), 4 channels per socket 1 x 10GbE 2 Port NIC FlexibleLOM (Bonded) 1 x Smart Array P420i Controller with 512MB FBWC
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 22
Switch selection
Top of Rack (ToR) switches The 5900AF-48XGT-4QSFP+10GbE is an ideal ToR switch with forty eight 10GbE ports and four 40GbE uplinks providing resiliency, high availability and scalability support. In addition this model comes with support for CAT6 cables (copper wires) and Software defined networking (SDN).
Aggregation switches The FlexFabric 5930-32QSFP+40GbE switch is an ideal aggregation switch as it is well suited to handle very large volumes of inter-rack traffic such as can occur during shuffle and sort operations, or large scale block replication to recreate a failed node
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 23
HP Insight CMU – pushbutton scale-out management
Provision, monitor, and control Thousands of nodes instantly
Push-button roll out Provisioning via cloning for seamless scaling
Rest easy Battletested at top 500 sites for over a decade
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 24
Historical analysis and job recording
HP Insight CMU – GUI Monitoring at a Cluster level
• Designed for Big Data customer
• Multi-petal aggregated, 3D RT, and time series views of cluster metrics
• “Click & zoom” analysis at both solution and component levels
• Proactively identify and isolate performance issues
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 25
Single Rack Reference Architecture
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 26
Multi-Rack Reference Architecture
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 27
Capacity and sizing
Here is a general guideline on data inventory: • Sources of data • Frequency of data • Raw storage • Processed HDFS storage • Replication factor • Default compression turned on • Space for intermediate files
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 28
System configuration guidance Machine Type
Workload Patten/Cluster Type
Storage Processor (# of Cores)
Memory (GB)
Network
Slaves
Balanced workload Four to six 1-2 TB disks
Dual 6/8/10 cores 48-96
Dual 10 GB links for all nodes in a 20 node rack and min 2x10 / 2 x 40 GB interconnect links per rack going to a pair of central switches
Compute intensive
workload Four to six 1-2 TB disks
Dual 8/10/12 cores 48-128
IO intensive workload Twelve 1-2 TB disks
Dual 8/10/12 cores 48-96
HBase clusters Twelve 1-2 TB disks
Dual 8/10/12 cores 48-128
Masters All workload patterns/HBase clusters
Four to six 1-2 TB disks
Dual 6/8/10 cores
Depends on number of file system objects to be created by NameNode.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 29
For More Information Get the Reference Architecture at http://h20195.www2.hp.com/V2/GetDocument.aspx?docname=4AA5-4975ENW Hortonworks www.hortonworks.com HP Solutions for Apache Hadoop hp.com/go/Hadoop HP ProLiant servers hp.com/go/proliant HP Insight Cluster Management Utility (CMU) hp.com/go/cmu HP Networking hp.com/go/networking
Or Contact Me: [email protected]
Page 30 © Hortonworks Inc. 2014
Next Steps...
Download the Hortonworks Sandbox
Learn Hadoop
Build Your Analytic App
Try Hadoop 2
More about HP & Hortonworks http://hortonworks.com/partner/HP
Contact us: [email protected]
Page 31 © Hortonworks Inc. 2014
THANK YOU