Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case
-
Upload
orgad-kimchi -
Category
Technology
-
view
2.540 -
download
5
description
Transcript of Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case
<Insert Picture Here>
Oracle Solaris 11 as a Big Data Platform Apache Hadoop Use Case Orgad Kimchi, Principal Software Engineer
Oracle ISV Engineering
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.2 Oracle Confidential, Proprietary Information
Disclaimer
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle Corporation.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.3 Oracle Confidential, Proprietary Information
Agenda
Hadoop Overview
The Benefits of Using Oracle Solaris Technologies for a
Hadoop Cluster
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.4 Oracle Confidential, Proprietary Information
What is Big Data
Big Data is both: Large and Variable Datasets + New Set of Technologies
Extremely large files of unstructured or semi-structured data Large and highly distributed datasets that are otherwise difficult to manage
as a single unit of information That can economically acquire, organize, store, analyze and extract value
from Big Data datasets – thus facilitating better, more informed business decisions
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.5 Oracle Confidential, Proprietary Information
Introduction To Hadoop
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.6 Oracle Confidential, Proprietary Information
What is Hadoop ?
Originated at Google 2003 Generation of search indexes and web scores Top level Apache project, Consists of two key services
1. Hadoop Distributed File System (HDFS), highly scalable, fault-tolerant , distributed
2. MapReduce API (Java), Can be scripted in other languages
Hadoop brings the ability to cheaply process large amounts of data, regardless of its structure.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.7 Oracle Confidential, Proprietary Information
Components of Hadoop
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.8 Oracle Confidential, Proprietary Information
HDFS
HDFS is the file system responsible for storing data on the cluster
Written in Java (based on Google’s GFS) Sits on top of a native file system (ext3, ext4, xfs, ZFS) POSIX like file permissions model Provides redundant storage for massive amounts of data HDFS is optimized for large, streaming reads of files
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.9 Oracle Confidential, Proprietary Information
The Five Hadoop Daemons Hadoop is comprised of five separate daemons NameNode : Holds the metadata for HDFS Secondary NameNode : Performs housekeeping functions for the
NameNode DataNode : Stores actual HDFS data blocks JobTracker : Manages MapReduce jobs, distributes individual
tasks to machines running the TaskTracker. Coordinates MapReduce stages.
TaskTracker : Responsible for instantiating and monitoring individual Map and Reduce tasks
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.10 Oracle Confidential, Proprietary Information
Hadoop Architecture
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.11 Oracle Confidential, Proprietary Information
The benefits of using Oracle Solaris technologies for a Hadoop cluster
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.12 Oracle Confidential, Proprietary Information
Solaris Zones Hadoop Architecture
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.13 Oracle Confidential, Proprietary Information
Built-in VirtualizationOracle Solaris 11 Zones
• Secure, light-weight virtualization
• Scales to 100s of zones/ node
• Built-in, no cost virtualization
• Combines Isolation with Resource Management
• Widely used for:• Consolidation
• Legacy OS support
• Rapid Application Deployment
• Securely Protecting Applications
Co-engineered with installation, security, ZFS, networking, IPS, SPARC and x86 hypervisors
1 out of 3 Oracle Solaris Systems running Oracle Solaris Zones
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.14 Oracle Confidential, Proprietary Information
Fast provision of new cluster members using the Solaris zones cloning feature
Very high network throughput between the zones for data node replication
Oracle Solaris Zones Benefits
The benefits of using Oracle Solaris Zones for a Hadoop cluster
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.15 Oracle Confidential, Proprietary Information
Oracle Solaris Zones
Source http://dtrace.org/blogs/brendan/2013/01/11/virtualization-performance-zones-kvm-xen
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.16 Oracle Confidential, Proprietary Information
Oracle Solaris Zones
Source http://dtrace.org/blogs/brendan/2013/01/11/virtualization-performance-zones-kvm-xen
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.17 Oracle Confidential, Proprietary Information
Oracle Solaris 11: Storage VirtualizationSecure Datasets for Each Tenant
• Virtual flash-enabled storage pools for speed
• Built-in data services savestorage software costs
• File and block sharing
• Wire-speed encryptionon disk, over the wire
• Extreme data integrity
• Unlimited scale
10x storage savings for virtualization
2x storage compression
Finance Dataset
Finance
Zone
HRDataset
HR
Zone
SalesDataset
Sales
Zone
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.18 Oracle Confidential, Proprietary Information
Immense data capacity,128 bit file system, perfect for big data-set
Optimized disk I/O utilization for better I/O performance with ZFS built-in compression
Secure data at rest using ZFS encryption
Oracle Solaris ZFS Benefits
The benefits of using Oracle Solaris ZFS for a Hadoop cluster
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.19 Oracle Confidential, Proprietary Information
Each Oracle Solaris Zone can have different workload; it can be disk I/O, network I/O, CPU, memory, or combination of these. In addition, a single Oracle Solaris Zone can overload the entire system resources.
•Each Oracle Solaris Zone can have different workload; it can be disk I/O, network I/O, CPU, memory, or combination of these. In addition, a single Oracle Solaris Zone can overload the entire system resources.
DTrace - comprehensive, advanced tracing tool for troubleshooting systematic problems in real time.
Performance analysis
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.20 Oracle Confidential, Proprietary Information
zonestatThe zonestat command allow us to monitor all the Solaris zones running on our environment and provide us in real time statistics for the CPU, memory and Network utilization.
root@global_zone:~# zonestat 10 10
Interval: 1, Duration: 0:00:10SUMMARY Cpus/Online: 128/12 PhysMem: 256G VirtMem: 259G ---CPU---- --PhysMem-- --VirtMem-- --PhysNet-- ZONE USED %PART USED %USED USED %USED PBYTE %PUSE [total] 118.10 92.2% 24.6G 9.62% 60.0G 23.0% 18.4E 100% [system] 0.00 0.00% 9684M 3.69% 40.5G 15.5% - - data-node3 42.13 32.9% 4897M 1.86% 6146M 2.30% 18.4E 100% data-node1 41.49 32.4% 4891M 1.86% 6173M 2.31% 18.4E 100% data-node2 33.97 26.5% 4851M 1.85% 6145M 2.30% 18.4E 100% global 0.34 0.27% 283M 0.10% 420M 0.15% 2192 0.00% name-node 0.15 0.11% 419M 0.15% 718M 0.26% 126 0.00% sec-name-node 0.00 0.00% 205M 0.07% 363M 0.13% 0 0.00%
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.21 Oracle Confidential, Proprietary Information
DISK I/O Performance Monitoring
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.22 Oracle Confidential, Proprietary Information
fsstat
The fsstat command allows us to monitor Disk I/O activity per Disk or per Solaris Zone.
For example: monitoring writes to all ZFS file systems at 10 second intervals.
root@global_zone:~# fsstat -Z zfs 10 10
new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 0 0 0 744 0 11.4K 0 6.01K 5.87M 0 0 zfs:global 0 0 0 151 0 3.27K 0 1.41K 1.94M 7 1.42K zfs:data-node1 0 0 0 359 0 8.72K 0 2.75K 3.95M 22 4.06K zfs:data-node2 0 0 0 413 0 9.03K 0 2.98K 4.22M 21 4.34K zfs:data-node3 0 0 0 14 0 51 0 0 0 0 0 zfs:name-node 0 0 0 14 0 51 0 0 0 0 0 zfs:sec-name-node
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.23 Oracle Confidential, Proprietary Information
DISK I/O - Cont'd
Run the DTrace iopattern script, as shown, to analyze the type of disk I/O workload (is it random or sequential)
root@global_zone:~# /usr/dtrace/DTT/iopattern %RAN %SEQ COUNT MIN MAX AVG KR KW 69 31 236 1024 1048576 448830 103441 0 75 25 577 512 1048576 327938 184306 479 92 8 598 512 1048576 198293 114275 1525 74 26 379 512 1048576 330296 121954 294 66 34 281 1024 1048576 500550 137358 0 80 20 346 1024 1048576 332114 112218 0 81 19 444 512 1048576 290734 124694 1366 65 35 337 512 1048576 490375 161139 244 75 25 704 512 1048576 353086 241105 1642 75 25 444 1024 1048576 386634 167642 0 77 23 666 1024 1048576 397105 258274 0
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.24 Oracle Confidential, Proprietary Information
Visualization
For more information about dim_STAT http://dimitrik.free.fr
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.25 Oracle Confidential, Proprietary Information
Flame Graphs
For more information http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.26 Oracle Confidential, Proprietary Information
Hadoop on an Oracle SPARC T4-2 Server
Source https://blogs.oracle.com/taylor22
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.27 Oracle Confidential, Proprietary Information
For more information
How to Set Up a Hadoop Cluster Using Oracle Solaris Zones How to Build Native Hadoop Libraries for Oracle Solaris 11 How to Set Up a Hadoop
Cluster Using Oracle Solaris (Hands-On Lab) Performance Analysis in a Multitenant Cloud Environment Using
Hadoop Cluster and Oracle Solaris 11 My Blog
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.28 Oracle Confidential, Proprietary Information
QuestionsFollow us on