Cloud Storage: Key to Future of IT
Infrastructure Anil Vasudeva
President & Chief Analyst [email protected]
408-268-0800
Agenda (1) The Need for Cloud Storage Data Centers Infrastructure IT Roadmap to Clouds Cloud Computing Ecosystem & Architectural Stack Storage Usage in Data Centers & in Cloud Mega-Centers Applications mapped by Key Workload & Storage Metrics (2) Cloud Infrastructure Technology Enablers Virtualization: Key to Cloud Storage Infrastructure SSDs: A New SCM Filling Price Performance Gaps Networks: Fast March to 100Gb/sec Hadoop: Meeting Big Data/BI/Data Base Paradigm (3) Drivers for Cloud Storage Adoption Storage Efficiencies and Costs Reduction Tier 4 Storage for Data Protection Real-Time Online Analytics Technology Innovations HW, SW 2005-2015 (3) Cloud Segments: Challenges & Solutions Adopting Public/Private/Hybrid Clouds: Pros & Cons (4) Cloud Storage Requirements Adopting Public/Private/Hybrid Clouds: Pros & Cons Best Practices: Getting Ready for the Cloud (5) Key Take-Aways
Data Centers & Cloud Infrastructure
IT Industry Journey - Roadmap
Cloudization On-Premises > Private Clouds > Public Clouds DC to Cloud-Aware Infrast. & Apps. Cascade migration to SPs/Public Clouds.
Integrate Physical Infrast./Blades to meet CAPSIMS ®IMEX Cost, Availability, Performance, Scalability, Inter-operability, Manageability & Security
Integration/Consolidation
Standard IT Infrastructure- Volume Economics HW/Syst SW (Servers, Storage, Networking Devices, System Software (OS, MW & Data Mgmt SW)
Standardization
Virtualization Pools Resources. Provisions, Optimizes, Monitors Shuffles Resources to optimize Delivery of various Business Services
Automatically Maintains Application SLAs (Self-Configuration, Self-Healing©IMEX, Self-Acctg. Charges etc)
Automation
IT Industry Roadmap IMEX Research
Analytics – BI Predictive Analytics - Unstructured Data From Dashboards Visualization to Prediction Engines using Big Data.
Examples eMail - Yahoo!,Google… Collaboration - Facebook,Twitter … Bus.Apps - SalesForce, GoogleApps, Intuit…
Examples Amazon EC2 Force.com Navitaire
Examples Amazon S3 Nirvanix
Infrastructure HW & Services
- Servers, Network, Storage - Management, Reporting
SaaS
PaaS
IaaS
Platform Tools & Services - Deploy developed platforms ready for Application SW on Cloud Aware Infrastructure
Software-as-a-Service - Servers, Network, Storage - Management, Reporting
Service Providers
Examples Public - BT, Telstra, T-Systems France Telecom Private – Hybrid – IBM/Cloudburst,
Cloud Services Providers Public – Mutitenancy,OnDemand Private - On Premises, Enterprise Hybrid – Interoperable P2P
Cloud Computing Ecosystem
Platform Tools & Services
Operating Systems
Cloud Computing Public Cloud Service
Providers Private Cloud
Enterprise
App SLA
SaaS Applications
….. .Net
Pyth
on
EJB
Ruby
PHP ….. PaaS
IaaS
SaaS
Virtualization Resources (Servers, Storage, Networks)
Hybrid Cloud
App SLA App SLA App SLA App SLA
Man
agem
ent
Virtualized Cloud Infrastructure
Application’s SLA dictates the Resources Required to meet specific requirements of Availability, Performance, Cost, Security, Manageability etc.
7 7
Data Storage Usage – In Corporate Data Centers
I/O Access Frequency vs. Percent of Corporate Data 100%
Cache • Logs • Journals • Temp Tables • Hot Tables
SSD
Disk Arrays • Tables • Indices • Hot Data
Tape Libraries • Back Up Data • Archived Data • Offsite DataVault
2% 10% 50% 100% 1% % of Corporate Data
65%
75%
95%
% o
f I/O
Acc
esse
s
Source:: IMEX Research - Cloud Infrastructure Report ©2009-12
8
Applications Best Suited for Cloud
• Temporary Storage – Test & Dev Data, Excess Storage from Data Centers
• Internet Web Content – Web 2.0 user-generated content, Digital Catalogs, Public Files, iPhone App Content
• Data Archiving & Records Retention – Regulatory compliance, Corporate governance, Audits & searches, Legal, Intellectual
Property, Risk Management
• Data Back Up & Restore – Offsite data protection, Historical Versioning, Online Access, Automated Retrieval
• Intranet File Storage – Employee Portals, Digital Libraries, Document Collaboration, Training Media, File
Shares, Historical Versioning, Online Access, Automated Retrieval
Source: IMEX Research SSD Industry Report ©2010-122
9 9
I/O Access Frequency vs. Percent of Corporate Data
SSD • Logs • Journals • Temp Tables • Hot Tables
FCoE/ SAS
Arrays
• Tables • Indices
• Hot Data
Cloud Storage
SATA • Back Up Data • Archived Data
• Offsite DataVault
2% 10% 50% 100% 1% % of Corporate Data
65%
75%
95%
% o
f I/O
Acc
esse
s NextGen Data Storage Usage in Cloud
Source:: IMEX Research - Cloud Infrastructure Report ©2009-12
Market Segments by Applications
*IOPS for a required response time ( ms) *=(#Channels*Latency-1)
(RAID - 0, 3)
500 100 MB/sec
10 1 50 5
Data Warehousing
OLAP
Business Intelligence (RAID - 1, 5, 6)
IOPS
* (*L
aten
cy-1
)
Web 2.0 Audio
Video
Scientific Computing
Imaging HPC
TP HPC
10K
100 K
1K
100
10
1000 K OLTP
eCommerce Transaction Processing
Source:: IMEX Research - Cloud Infrastructure Report ©2009-12
Cloud Storage Requirements
Source: IMEX Research SSD Industry Report ©2011
DRAM
Flash SSD
Performance Disk
Capacity Disk
Tape
Auto Tiering System using SSDs Data Class (Tiers 0,1,2,3)
Storage Media Type (Flash/Disk/Tape) Policy Engines (Workload Mgmt)
Transparent Migration (Data Placement) File Virtualization (Uninterrupted App.Opns.in Migration)
Replication
RAID – 0,1,5,6,10
Virtual Tape
Back Up/Archive/DR Data Protection
Storage Virtualization
MAID Deduplication Thin Provisioning
Storage Efficiency
Auto Tiered Storage
Workloads Consolidation using VZ
Source: Dan Olds & IMEX Research 2009
• A single server 1.5x larger than standard 2-way server will handle consolidated load of 6 servers. • VZ manages the workloads + important apps get the compute resources they need automatically w/o operator intervention. • Physical consolidation of 15-20:1 is easily possible • Reasonable goal for VZ x86 servers – 40-50% utilization on large systems (>4way), rising as dual/quad core processors becomes available • Savings result in Real Estate, Power & Cooling, High Availability, Hardware, Management
TCO Savings with Virtualization
Cloud Storage Requirements
Source: IMEX Research SSD Industry Report ©2011
DRAM
Flash SSD
Performance Disk
Capacity Disk
Tape
Auto Tiering System using SSDs Data Class (Tiers 0,1,2,3)
Storage Media Type (Flash/Disk/Tape) Policy Engines (Workload Mgmt)
Transparent Migration (Data Placement) File Virtualization (Uninterrupted App.Opns.in Migration)
Replication
RAID – 0,1,5,6,10
Virtual Tape
Back Up/Archive/DR Data Protection
Storage Virtualization
MAID Deduplication Thin Provisioning
Storage Efficiency
Auto Tiered Storage
15 15
SSD Technology – Filling Price/Perf Gaps
HDD
Tape
DRAM
CPU SDRAM
Performance I/O Access Latency
HDD becoming Cheaper, not faster
DRAM getting Faster (to feed faster CPUs) & Larger (to feed Multi-cores & Multi-VMs from Virtualization)
SCM
NOR
NAND PCIe SSD
SATA SSD
Price $/GB
Source: IMEX Research SSD Industry Report ©2010-12
SSD segmenting into PCIe SSD Cache - as backend to DRAM & SATA SSD - as front end to HDD
16 16
SSD in SAN Storage - Perf & TCO
14.2 5.2
75
28
0
64
0
36
145
0
0
50
100
150
200
250
HDD Only HDD/SSD
Cos
t $K
Power & Cooling RackSpace SSDs HDD SATA HDD FC
Pwr/Cool
RackSpace
SSD
HDD-SATA
HDD-FC
0
50
100
150
200
250
300
FC-HDD Only SSD/SATA-HDD
IOPS
0
1
2
3
4
5
6
7
8
9
10
$/IO
P
Performance (IOPS) $/IOP
$/IOPS Improvement
800%
IOPS Improvement
475%
Source: IMEX Research SSD Industry Report ©2010-12
SAN Storage – Performance & TCO Improvements using SSD
SSD Challenges & Solutions: Goals & Best Practices
Best Practices – By leveraging Error Avoidance Algorithms, and Best
Practices of Verification Testing, to keep total functional failure rate <=3% (with defects and wear-outs issues combined)
– In practice, endurance ratings are likely to be significantly higher than typical use, so data errors and failures will be even less.
– Capacity Over-provisioning will provide large increases in random performance and endurance.
– Select SSD based on confirmed EVT Ratings – Use MLC within requirements of Endurance Limits
20%
15%
10%
5%
0% % o
f Driv
es F
ailin
g (A
FR %
)
5 3 2 1 0 4
Years of Use JEDEC
SSD Std.
<=3%
Be aware of Tools & Best Practices … And you should be OK !!
Concerned about SSD Adoption in your Enterprise ?
Using Best-of-Breed Controllers to achieve <=3% AFR and JEDEC Endurance Verification Testing should allow Enterprise Capabile SSDs
Source: Intel IDF’10 & IMEX Research SSD Industry Report 2011 ©IMEX 2010-12
18 18
I/O Forensics for Auto Storage-Tiering
Source: IBM & IMEX Research SSD Industry Report 2011 ©IMEX 2010-12
LBA Monitoring and Tiered Placement • Every workload has unique I/O access signature • Historical performance data for a LUN can
identify performance skews & hot data regions by LBAs
Storage-Tiering at LBA/Sub-LUN Level Storage-Tiered Virtualization
Physical Storage Logical Volume
SSDs Arrays
HDDs Arrays
Hot Data
Cold Data
Automatic Migration
(Policy Based)
19
AutoSmart Storage-Tiering SW: Enhancing Database Throughput
Source: IBM & IMEX Research SSD Industry Report 2011 ©IMEX 2010-12
• DB Throughput Optimization – Every workload has unique I/O access
signature and historical behavior – identify hot “database objects” and smartly
placed in the right tier. – Scalable Throughput Improvement - 300% – Substantial IO Bound Transaction Response
time Improvement - 45%-75%
• Productivity (Response Time) Improvement – Using automated reallocation of hot spot data to SSDs
(typically 5-10% of total data), significant performance improvements is achieved :
• Response time reduction of around 70+% or • IOPS increase of 200% for any I/O intensive loads
– Verticals benefitting from Online Transactions : • Airlines Reservations, Investment Banking Wall St. Stock
Transactions Financial Institutions Hedge Funds etc. plus Low Latency seeking HPC Clustered Systems etc.
Big Data – Driving New Storage Data Stored by Large US Enterprises
Big Data Storage Potential Data Stored by Large US Enterprise
14%
12%
10%
10%
9%
6%
6%
6%
5%
4%
4%
3%
3%
3%
2%
2%
1%
Discrete Manafacturing
Government
Communications and Media
Process manufacturing
Banking
Health Care Providers
Securities & Investment Srvcs
Professional Services
Retail
Education
Insurance
Transportation
Wholesale
Utilities
Resource IndustriesConsumer and Recreational
ServicesConstruction 967
1,312
1,792
831
1,931
370
3,866
278
697
319
870
801
536
1,507
825
150
231
Big Data Cloud Storage Potential Data Stored by Large US Enterprises
Stored Data by Industry (in US 2009 PB)
Stored Data TB/Firm (>1K Employees US)
Harnessing Big Data for Business Insights
Majority of data growth is being driven by unstructured data and billions of large objects
Information is at the center of New Wave of opportunity
80% of world’s data is unstructured driven by rise in Mobility devices, collaboration machine generated data.
Data Sources
Big Data Infrastructure
Business Insights
Type of Data Organizations Analyze
59%
44%
69%
64%
41%
33%
51%
28%
46%
36%
36%
18%
18%
21%
8%
3%
3%
68%
68%
37%
23%
33%
34%
26%
32%
21%
15%
11%
15%
11%
9%
3%
6%
5%
Customer/member data
Transactional data from applications
Application Logs
Other Types of Event Data
Network Monitoring/Network Traffic
Online Retail Transactions
Other Log Files
Call Data Records
Web Logs
Text data from social media and online
Search logs
Trade/quote data
Intelligence/defense data
Multimedia (audio/video/images)
Weather
Smartmeter data
Other (please specify)
HadoopNon Hadoop
23 23
Dat
a: IM
EX R
esea
rch
& Pa
nasa
s
Smart Mobile Devices
Commercial Visualization
Bio/Health Analytics
Predictive Bus. Intelligence
Entertainment-On Demand
Instant On Boot Ups Rugged, Low Power
1GB/s, __ms
Rendering (Texture & Polygons) Very Read Intensive, Small Block I/O
10 GB/s, __ms 4 GB/s, __ms
Data Warehousing Random IO, High OLTPM
Most Accessed Videos Very Read Intensive
10 GB/s, __ms
NextGen Drivers: Big Data Cloud
Big Data – Architectural Goals
Big Data Platform
Meet Enterprise Criterion Meet Requirements of V3
Analyze Data in Native Format
Rise of Big Data Analytics
Demand for Real Time Analytics
Technology Innovations - HW
Technology Innovations: DB SW Tech Innovation 1985 1990 1995 2000 2005 2010 2015
OLTP Transactions DB SW Rows Locking Optimizer Parallel Query Clustering XML Grid OpenSource /
Hadoop
OLAP- Analytics DB SW Indexing Partitioning Columnar Materialized
View Bit Mapped
Index In-Memory Query Binding
Hardware 32 bit SMP NUMA 64 bit Multi-core/ Blades Flash MPP
Big Data Multi-core Columnar In-Memory MPP
Visualization
OLTP Database Innovation Progress
0.1
1
10
100
1000
10000
100000
1985 1990 1995 2000 2005 2010 20150.01
0.1
1
10
100
1000
10000
$/TPMc
TPMc/Processor
$ / T
PMc
TPM
c / P
roce
ssor
Big Data: Analytics DB Technology Impact
0
0
1
10
100
1,000
10,000
100,000
1985 1990 1995 2000 2005 2010 2015 0.0001
0.001
0.01
0.1
1
10
100
1000
10000
100000
.1
.01
DW Size TB $/GB
Big Data – Product Segments
Big Data - Product Metrics
Data Set Size PB
TB
GB
Data Structure
Transaction
Machine
Unstructured
Other
Access/Use Transaction
Search
Analytics
Parallel Processing
Appliance
Cluster < 1K
Cluster > 1K
Memory In-Memory
Flash
DB Technique Columnar
Zero Sharing
No SQL
Data Cataloging SW
Text
Image
Audio
Video
Big Data Ecosystem
Big Data Stack
Servers
Operating System Hypervisor/VMs
Big Data Storage Framework (HDFS)
Big Data Processing Framework (MapReduce)
Big Data Access Framework Pig Hive Sqoop
Big Data (Connectors)
Big Data Orchestration Framework HBase Avro Flume ZooKeeper
BI APPLICATIONS (Query, Analytics, Reporting, Statistics)
EDW
Backup &
Recovery
Managem
ent
Security
Network
BI Framework - Interoperable with Enterprise Data Warehousing
Cloud Types - Benefits & Risks Private Cloud (On Premises)
Public Cloud (Remote)
Hybrid Cloud (Co-Oper)
Charact- eristics
IT acts as provider of services Available to Internal Users only Bidirectional Scalability Reservationless Auto Allocation of
Resources Pay/Use – Metering Mechanism Application & SW Agnostic
Owned and delivered by 3rd Party Application on User Side & App Services
and Resources on Cloud Srvcs Provider side Cloud Provider sells Srvcs to anyone, Data stored in Shared Facility serving multi-
tenants - impact on Perf., BW, & Availability of Apps served.
Agreed upon Cooperation between OnPremises user owned by User and Cloud Services Provider (CSP) on Cloud Side
Onsite Gateway Appliance with abstraction layer between usr location & public cloud site of CSP
Benefits
Instant Provisioning Agility Full Physical Control & Security +
User’s access to Infrastructure Customizable Srvcs & Solns to meet
User’s specific Needs Predictable & Lower Costs solutions
for large/midrange sites
Lower Costs using Shared Offsite Stg Instant Provisioning
Infinite Scalability Efficient Data Sharing by user particularly if
info shared by users dispersed in multi-locations
Storage Costs CAPEX >> OPEX
Improves Performance. by serving Hot I/Os On-Premise
Deduped Data sent to Cloud > Lowers reqd. BW & Monthly Networking Costs
Only Encrypted Data Sent > ensures security for MC Apps
Risks
• Demands extra capacity be available for unforeseen demand in provisioning and management
• Building Private Cloud Infrast. difficult, requires new expertise
• New Cos.now moving as Private Cloud Infras. Builders vs CSPs
• Requires Monitoring, DR Solns, Change Mgmt, Testing, SLA/ Charge Back mechanisms etc.
SLA managed as outsourced contract Full understanding of Performance,
Availability, Costs required per App before migration
Shared Infrastructure awareness for sensitive data despite encryption
Infrastructures Failures/Hacked SensitiveData Exposure visible in Press
Cloud Provider can abruptly exit Biz Lower HA than achievable in Pvt Cloud
Cloud Storage Requirements Expect more from your storage Hyper-efficient Storage Storage that analyzes, adapts, and improves performance Cloud ready storage, with features you can use today
Technology & Best Practices enable storage efficiency Real-time Compression … up to 80% less space Auto Tiering … 3x more performance with just 2% solid-state Storage Virtualization … up to 30%more utilization Thin Provisioning … up to 35%more utilization Data Protection/HA: Fileset level snapshots and writeable file clones Capability for remote mirror File replication and file level snapshots for business continuity and DR
Manageability: Unified SAN & NAS (block and file) storage system with a tightly
integrated management console Support for NFS/CIFS/FTP/HTTPS/SCP file protocols Storage administration with a single user interface Policy-based management of files with user-defined policies!
Cloud Storage – Issues & Solns.
Traditional Data Growth
Sto
rage
Cos
ts R
educ
tion
Capacity Requirements
Snapshots ~ 75%
Thin Provisioning ~30%
DeDuplication ~ 25-95%
Auto-Tiering 65-95%
Thin Replication ~ 95%
RAID*DP ~ 40% vs R10
Virtual Clones ~80%
CAPACITY SAVINGS ~ xx %
Virtualization (VZ) requires Shared Storage for - VMotion - Storage VMotion - HA/DRS - Fault Tolerance Additional Capacity Consumed for - VZ snapshots, - VM Kernel etc Following techniques Reduce Storage Costs
Cloud Storage Requirements
Thin Provisioning & Others Simplifying the increase capacity utilization Automatic avoidance of disk hotspots Extent-based volume management Efficient cache management at disk system (controller & disk arrays) Optimized performance through automated data placement Dynamic QoS management
Efficient Storage Back Up Automated DR Replication
Private Cloud - Requirements Public Cloud
Storage
Costs
Ope
ratio
nal F
lexi
bilit
y
App Silos
VZ
Private Cloud
Storage
Automation Automated Provisioning - Moving Data & Processes Seamlessly Allocation, Self Tuning of Resources to
meet Workload Requirements
Self- Service
Access Resources on demand to speed deployment and delivery
Scale Resource Up/down to optimize their usage,
Release when not needed
Service Catalog Choose pre-defined IT Services/user/dept. Define SLA to efficiently meet services
Service Analytics
Monitor & Analyze Usage for Charge back Interactively auto-tune
performance Availability with SLA
requirements
Private Cloud
Storage
Service Catalog by User/Workload Service Catalog by Workloads
Class Metric
Storage Initial Capacity TB
Storage Protocol (NFS/SCSI …)
Availability Downtime/Year (Max Unplanned) Mts
Copy Retention Time Hrs/Yrs
Operational Recovery
RPO Recovery Point
RTO Recovery Time
Recovery Consistency
Recovery Assurance %
Min # Recovery Points
Disaster Recovery
RPO Recovery Point
RTO Recovery Time
Recovery Consistency
Recovery Assurance %
Min # Recovery Points
Performance I/O Response Time ms (Max/Avg)
IOPs (Min/Assured)
Bandwidth MB/sec (Min/Assured)
Security
Encryption
DLP
Copy Retention Time Hrs/Yrs
Copy Destruction Method -Quick/Secure Erase
Authencity Guarantee
Cloud Storage - Requirements Enable Policy Based Efficient File Management Locally - Identify files for backup or replication - Move to right tier of storage hierarchy including cloud - Delete expired or unwanted files Globally - Localize data > Improve file access performance and reduce network costs. - Allow multiple sites to collaborate on information exchange - Virtualize to a single/global namespace > Multiple users can view same files from universal locations
Collaboration in Cloud - Local or long distance users sharing a dedicated home file System but with individual home directory. - Each site/system has own file set at their local cluster. -Data Center has all Home dirs and manages BUs/hsm - Remote sites periodically prefetch (via policy) or pull on demand - Data Center maintains BUs & R-kive Copy of data. - Each location maintains logs/records for that Geography
Data
Center
Locally Globally
IT Infrastructure in the Cloud SLA
Usage Dept/User
Assets Location
• Cloud Services Hosting • Platform – OS/Processors/#/Speed/Type • Pooled Infrastructure Resources by Application Metrics • Pooled Capacity Provisioning: Processing, Bandwidth, Storage, Repository
• Usage Profiles • Users/Services/Workloads • Applications (OLTP/BI/HPC/Data Streaming) • Execution: Rules Driven, Adaptive Provisioning • Services Abstraction, Adaptive Provisioning
Cloud IT Services
Provider
• Business Priorities • Cost of IT Ops/Charge Back Methods • Response Time/Availability/Throughput, QoS • Transactions/Sessions/Events/Analysis/Reporting • Business Services Managed & Charged
For copy of case study on how a major financial institution implemented Cloud Infrastructure email [email protected]
40
Key Takeaways
• Cloud Storage is increasing being adopted – IT-as-a-Service is being delivered through a virtualized, Shared IT infrastructure, automation
and Service Efficiencies using Private and Hybrid Cloud Infrastructure – Savings through Storage Efficiency using automation that is being integrated – Due diligence in careful planning, a three year roadmap, selection and testing of
vendors/products required before starting migration from existing DC operations • IT Infrastructure to meet needs of your Business Applications/SLA
– Let your business needs determine your Cloud Strategy. – Identify and define specific SLA metrics per user/dept/division. – Strategize for Long Term but take incremental approach to implementing Clouds – Identify and screen promising Cloud Service Providers – their Enterprise Class, Security and
Compliance adherence, Change Controls, Referenced High Availability & Performance Results.
• Be aware of technology and best practices available – Engage the expertise of consultants and system integrators to validate your strategy and
implementation plans
Top Related