IBM PureData Systems

33
© 2012 IBM Corporation IBM PureData Systems Meeting data challenges with simplicity, speed & lower cost Artur Wroński IBM Software Group

Transcript of IBM PureData Systems

© 2012 IBM Corporation

IBM PureData System sMeeting data challenges with simplicity, speed & lower cost

Artur WrońskiIBM Software Group

© 2012 IBM Corporation50

© Copyright IBM Corporation 2012. All rights reserve d.U.S. Government Users Restricted Rights - Use, dupli cation or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

THE INFORMATION CONTAINED IN THIS PRESENTATION IS P ROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETE NESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS P ROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INF ORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CH ANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OU T OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENT ATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFF ECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICEN SORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING TH E USE OF IBM PRODUCTS AND/OR SOFTWARE.

IBM, the IBM logo, ibm.com, Information Management, IMS, CICS, DB2, WebSphere, PureSystems, pureScale, PureExperience, PureApplication, PureData and z/OS are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml

Other company, product, or service names may be trademarks or service marks of others.

Disclaimer

© 2012 IBM Corporation51

© 2012 IBM Corporation52

The PureSystems family delivers greater simplicity, speed & lower cost

Expert Integrated System

General Purpose Components

Custom Built System

Today’s problem: Time and effort is spent tuning general purpose components

The PureSystems solution: Simplifying the entire IT project lifecycle

Reduced Time, Cost and Risk

Design/Deploy Manage/Maintain

Design ManageDeploy Maintain

© 2012 IBM Corporation53

Optimized for data services:�Transactional

�Analytics

Expert integrated:�Data platform

�Infrastructure

�Unified platform management

�Built-in expertise

IBM PureData System: Optimized exclusively for data services

Workload optimized performance

Data load ready in hours

Integrated management

Single point of support

Automated maintenancein hours, not days

Data Platform

Delivering Data Services

© 2012 IBM Corporation54

Real Time Fraud Detection

CustomerAnalysisE-commerce

Different types of workloads require different data services

Transaction Processing Reporting and Analytics Operat ional Analytics

Random reads &random updates

Sequential reads & sequential data loads

Random and sequential reads & data loads + continuous ingest

Shared access to all data

Many transactions with narrow data scope accessing the same database

Random reads &random updates

Partitioned data access

Analytics with broad data scope, split into many parts across data partitions to run in parallel

Sequential reads & sequential data loads

Partitioned data access

Analytics split into many parts and narrow scope operations,all running in parallel

Random and sequential reads & data loads + continuous ingest

Scalable Transactional Database Analytics Data Warehouse Operational Data Warehouse

© 2012 IBM Corporation55

For apps like E-commerce…Database cluster services optimized for transactional throughput and scalability

For apps like Customer Analysis…Data warehouse services optimized for high-speed, peta-scale analytics and simplicity

For apps like Real-time Fraud Detection…Operational data warehouse services optimized to balance high performance analytics and real-time operational throughput

Different data workloads have different characteristics

System for Transactions

System for Analytics

System for Operational Analytics

powered by Netezza technology

© 2012 IBM Corporation56

© 2012 IBM Corporation57

Database Architectures: Single, Shared Storage, Shared Nothing

Single Database View

Log

DB2

Log

DB2

Log

DB2

Part 1 Part 2 Part 3

SQL 1’ SQL 1’’ SQL 1’’’

SQL 1

DB2 InfoSphere Warehouse (aka DPF)Ideal for data warehousing with MPP scale out

for near linear scalability and query processing

Log

DB2 DB2 DB2

Single Database View

DB2 pureScale Data SharingIdeal for active/active OLTP/ERP

scale out

Tran 1 Tran 2 Tran 3

Shared Data Access

DB2

Tran

Log

Database

Core DB2Ideal for OLTP and data marts

Optimized for OLTP Optimized for Analytics

© 2012 IBM Corporation58

IBM PureData System for Transactions highlights Optimized exclusively for transactional data workloads

Delivering data services for transactions

System for Transactions

Speed� Industry leading DB2 performance� Database node recovery in seconds1

Simplicity� Database deployment in minutes, not hours1

� Capable of running multiple database software versions � Handles more than 100 databases on 1 system2

� No planned system downtime for firmware / OS upgrades1

Scalability� Scaling up to 30x3

� Designed to expand from small to medium & medium to large configuration with no planned system downtime required

Smart

� Supports Oracle Database apps with minimal change; supports DB2 applications unchanged

� Clients have experienced cases of 10x storage space savings via Adaptive Compression4

Footnotes:1. Based on IBM internal tests and system design for normal operation under

expected typical workload. Individual results may vary.2. Based on one large configuration.3. Based on the designed minimum and maximum processor and memory

resources required for a single database.4. Based on client testing in the DB2 10 Early Access Program.

© 2012 IBM Corporation59

Optimized solution stack

Servers

Storage

Networking

Management

Data

Management

DeploymentDatabases

• Factory integrated and optimized • Server, storage, networking and software

• High data availability • Automatic failure detection and online

recovery

• Solid State exploitation • Automatic management of hot, warm and

cold data for faster performance

• Optimized database patterns• Database patterns pre-tuned and pre-

configured for performance

• Integrated fixes• Zero down time for system maintenance

Reduce your system integration costs

© 2012 IBM Corporation60

PureData System for Transactions -

built-in

Traditional systems -build it yourself

Uninterrupted access to data with consistent perfor mance

In minutes , 1. Just specify cluster

name, description and topology pattern

Over several days/weeks : 1. Define High Availability topology 2. Configure HW/SW/Network 3. Set up storage pools 4. Install multiple operating systems 5. Install database instances 6. Set up primary and secondary

management systems 7. Set up database members 8. Set up backup processes9. Test, tune, reconfigure...

6-node database cluster instance

Simplified deployment with high availability

© 2012 IBM Corporation61

Database

DeployDeploy Virtual Appliance

Metadata

ApplicationServer

Operatingsystem

Virtual

ApplianceMetadata

ApplicationServer

Operatingsystem

Virtual

ApplianceMetadata

HTTPServer

Operatingsystem Cluster

Topology

Consolidate100s of database servers to a single system for optimal resource efficiencyand easier administration

Optimizestorage resources for up to 10x storage space savings

Innovatefaster by deploying new databases in minutes

Acceleratedeployment of new database services in cloud environments using patterns of expertise

DeployDeploy

Captures your expertise in patterns for consistent, reliable deployment of critical databases as a service

Simplified, pattern based database deployment

© 2012 IBM Corporation62

Simplified Application Development

• Higher scalability provided by adding more nodes with no application changes required

• Dev/Test/Staging/Production database repeatability through database patterns

• Integrated data movement tools to speed creation of test databases

• Built-in Oracle compatibility mode for minimal to no application changes

• Higher utilization and lower costsprovided through shared resource management

Applications

DatabaseManagement

Software

DatabaseManagement

Software

DatabaseManagement

Software

DatabaseManagement

Software

© 2012 IBM Corporation63

Configurations Small

¼ Rack

Medium

½ Rack

Large

Full Rack

Chassis 1 1 2

Blacktip ITEs(16 cores per ITE)

6 12 24

Cores 96 192 384

Memory 1.5 TB 3.1 TB 6.1 TB

V7000 1 2 4

V7000 Exp 1 2 4

User Capacity Raw SSD Storage (400 GB drives)

Raw HDD Storage (900 GB drives)

18.6 TB4.8 TB32.4 TB

37.2 TB9.6 TB64.0 TB

74.4 TB19.2 TB

128.0 TB

PureData for TransactionsThree standard configurations to choose from Upgrade Upgrade

© 2012 IBM Corporation64

Self-optimizing Best data placement and access automatically selected based on usage statistics for optimal performance

Self-healing Failed database nodes are isolated and recovered automatically

Self-balancing Data access requests automatically load balanced for optimal performance

Self-monitoring Based on thresholds and alerts, system will monitor and automatically make changes as needed to improve performance

Simplified database administration

Self-tuning Memory management dynamically balances resources

© 2012 IBM Corporation65

Simplified and integrated system management

• Single console to manage all resources and work running on the system

• Role-based security and tasks

• management

• monitoring

• maintenance

• Easy integration with broader enterprise monitoring tools and processes

• Consistent IBM PureSystemsconsole

© 2012 IBM Corporation66

Simplified maintenance with pre-integrated fixes

• All hardware firmware and OS software patches integrated and tested together at the factory

• Can apply hardware and OS maintenance with zero downtime

• Single line of support

• Integrated stack support

Reduce risk and eliminate manual errors when applying maintenance

© 2012 IBM Corporation67

© 2012 IBM Corporation68

Database Architectures: Single, Shared Storage, Shared Nothing

Single Database View

Log

DB2

Log

DB2

Log

DB2

Part 1 Part 2 Part 3

SQL 1’ SQL 1’’ SQL 1’’’

SQL 1

DB2 InfoSphere Warehouse (aka DPF)Ideal for data warehousing with MPP scale out

for near linear scalability and query processing

Log

DB2 DB2 DB2

Single Database View

DB2 pureScale Data SharingIdeal for active/active OLTP/ERP

scale out

Tran 1 Tran 2 Tran 3

Shared Data Access

DB2

Tran

Log

Database

Core DB2Ideal for OLTP and data marts

Optimized for OLTP Optimized for Analytics

© 2012 IBM Corporation69

IBM PureData System for Operational AnalyticsOptimized exclusively for operational analytic data workloads

Speed� Designed for 1000+ concurrent operational queries� Continuous ingest of operational data� MPP analytics (Massively Parallel Processing)

Simplicity� Fast time-to-value� Automatic workload management� Integrated backup on the system� Integrated management and support

Scalability� Multiple sizes with data capacity up to a Petabyte

Smart� In-database analytics for leading applications� Supports DB2 applications unchanged and

Oracle Database apps with minimal change� Clients have experienced cases of 10x storage

space savings via Adaptive Compression

Delivering data services for operational analytics

System for Operational Analytics

© 2012 IBM Corporation7070

Optimized for a mix of interactive and analytic queries

� Preset and configured for top performance, throughput, and efficient resource utilization

� Continuous ingest of operational data

� Balanced throughput and performance through dynamic self-tuning

� Policy-based automatic workload management

� Adaptive compression that delivers storage savings and improved query performance

IBM PureData System for Operational Analytics

© 2012 IBM Corporation71

Real-time operational analytics with continuous data ingest

DW

Dat

a S

ourc

es

Up to the Second Loads,Faster Loads

ETL- Extract, Transform, LoadETL- Extract, Transform, Load

The Time-Value Curve:How does business value change through time?*

Business eventData ready for analytics

Analytics delivered

Value

Decision / Action taken

Decision latencyDecision latency

Time

Val

ue

gain

ed

Faster decision / action

� Complete, built-in support for continuous feed of data into the warehouse

� Parallel processing, and support for multiple connections

� Minimum impact on query performance

Analytics are delivered to decision makers faster

* Above diagram adapted from TDWI Best Practices Report - Operational Data Warehousing by Philip Russom, 4Q 2010

© 2012 IBM Corporation72

PureData System for Operational Analytics hardware and capacity sizes

*Unformatted raw disk capacity

Scalable to PB+*

Extra Small Small Medium Large

64.8 TB* 151.2 TB* 237.6 TB* 324 TB*

�� IBM POWER7 P740 & P730 IBM POWER7 P740 & P730 16 Core servers @ 3.55GHz16 Core servers @ 3.55GHz

�� Blade Network Technologies Blade Network Technologies 10G and 1G Ethernet switches 10G and 1G Ethernet switches

��Brocade SAN switches Brocade SAN switches (SAN48B(SAN48B--5)5)

�� IBM IBM StorwizeStorwize®® V7000 with V7000 with 900GB drives900GB drives

��Ultra SSD I/O Drawers, each with Ultra SSD I/O Drawers, each with six 387GB SSD six 387GB SSD

Scales to PB+

capacity*

© 2012 IBM Corporation73

PureData System for Operational Analytics hardware

© 2012 IBM Corporation74

© 2012 IBM Corporation75

System for Analytics

Delivering data services for analytics

IBM PureData System for AnalyticsOptimized exclusively for analytic data workloads

Speed� 10-100x faster than traditional custom systems*� Patented MPP hardware acceleration

(Massively Parallel Processing)

Simplicity� Data load ready in hours� No database indexes� No tuning � No storage administration

Scalability� Peta-scale data capacity

Smart� Designed to runs complex analytics in minutes,

not hours� Richest set of in-database analytics

* Based on IBM customers' reported results. "Traditional custom systems" refers to systems that are not professionally pre-built, pre-tested and optimized. Individual results may vary.

© 2012 IBM Corporation76

Inside the IBM PureData System for Analytics

Optimized Hardware + Optimized Hardware + SoftwareSoftware

�Hardware accelerated AMPP

�Purpose-built for high performance analytics

�Requires no tuningSnippet Blades Snippet Blades ™™

�Hardware-based query acceleration with FPGAs

�Blistering fast results

�Complex analytics executed as the data streams from disk

Disk EnclosuresDisk Enclosures

�User data, mirror, swap partitions

�High speed data streaming

SMP HostsSMP Hosts

�SQL Compiler

�Query Plan

�Optimize

�Admin

© 2012 IBM Corporation77

Snippet-Blade™ (S-Blade) Components

Intel Quad-Core

Dual-Core FPGADRAM

IBM BladeCenter Server Netezza DB Accelerator

SAS ExpanderModule

SAS ExpanderModule

© 2012 IBM Corporation78

BI Reporting and Ad-Hoc Analysis

� What happened?� When and where?� How much?

Predictive Analytics

� What will happen?� What will the impact be?

Optimization

� What is the best choice?

© 2012 IBM Corporation78

PureData System for Analytics Takes Analytics Beyond Reporting

© 2012 IBM Corporation79

� Basic Math*

� Permutation and Combination*

� Greatest Common Divisor and Least Common Multiple*

� Conversion of Values*

� Exponential and Logarithm*

� Gamma and Beta Functions

� Matrix Algebra+

� Area Under Curve*

� Interpolation Methods*

Transformations MathematicalTime Series

� Linear Regression+

� Logistic Regression+

� Classification

� Bayesian

� Sampling

� Model Testing

� Geospatial Data Type

� Geometric Functions

� Geometric Analysis

Predictive Geospatial* Fuzzy Logix

DB Lytixcapabilities

+ NetezzaAnalytics and Fuzzy Logix DB Lytixcapabilities

� Data Profiling / Descriptive Statistics+

� General Diagnostics

� Statistics+

� Sampling

� Data prep

Pre-Built In-Database Analytics

� Descriptive Statistics+

� Distance Measures*

� Hypothesis Testing*

� Chi-Square & Contingency Tables*

� Univariate & Multivariate Distributions+

� Monte Carlo Simulation*

� Autoregressive+

� Forecasting*

� Association Rules+

� Clustering+

� Feature Extraction+

� DiscriminantAnalysis*

Data Mining

Statistics

© 2012 IBM Corporation80

What’s New?Improved Concurrency, Performance, I/O efficiency and Manageability

BetterBetterPerformancePerformance

Improved Improved Management & EfficiencyManagement & Efficiency

� More than half a dozen performance improvements in:

� Optimizer efficiency

� Memory management

� Communications protocols

� Workload management

� Faster, Better, and completely transparent to the end-user

� 20x greater throughput and concurrency for tactical queries than previous generations of IBM Netezza appliances 1

� Up to 200 queries/second micro analytic workloads

� Directed Data Processing increases throughput for tactical queries

Improved Resiliency Improved Resiliency and Fault Toleranceand Fault Tolerance

� Blade level resilience for continuous high performance

� Enhanced automatic system software resilience for enterprise level requirements

1Based on IBM internal performance benchmarking of previous version to current version.

© 2012 IBM Corporation81

� 10-100x faster than traditional custom systems4

� 20x greater concurrency and throughputfor tactical queries than previous Netezza technology5

� Pattern based database deploymentin minutes, not hours 1

� Handles more than 100 databaseson 1 system2

IBM PureData System is unique

� Continuous ingest of operation data� Designed to handle 1000+ concurrent

operational queries 3

� Up to 10x storage savings with active compression6

System for Transactions

System for Analytics

System for Operational Analytics

powered by Netezza technology

1 Based on IBM internal tests and system design for normal operation under Expected typical workload. Individual results may vary.2 Based on one large configuration3 Based on IBM internal tests of prior generation system, and on system design for normal operations under expected typical workload. Individual results may vary. 4 Based on IBM customers' reported results. "Traditional custom systems" refers to systems that are not professionally pre-built, pre-tested and optimized. Individual results may vary.5 Based on IBM internal performance benchmarking6 Based on client testing is the DB2 10 Early Access Program

Different models pre-optimized exclusively for different data workloadssaving clients time, effort and cost to tune on their own