A-IMEX- Solving IO Bottleneck
Transcript of A-IMEX- Solving IO Bottleneck
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
Are SSDs Ready for Enterprise Storage Systems
Anil Vasudeva, President & Chief Analyst, IMEX Research
Solving the IO Bottleneck in NextGen DataCenters &
Cloud Computing
Anil VasudevaPresident & Chief [email protected]
© 2007-11 IMEX ResearchAll Rights Reserved
Copying ProhibitedPlease write to IMEX for authorization
IMEXRESEARCH.COM
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
22
Abstract
Solving the I/O Bottleneck in NextGen Data Centers & Cloud Computing
Virtualization brings tremendous advantages to data centers of all sizes – large and small. But the rapid proliferation of virtual machines (VMs) per physical server, in its wake, creates a highly randomized I/O problem, raising performance bottlenecks. To address this I/O issue, the IT industry and storage administrators in particular, have begun the adoption of a slew of newer technology solutions - ranging from faster and larger memories in cache, SSDs for higher IOPS cost effectively, high bandwidth (10GbE) networks using NPIVs for virtualized networks between VMs and shared storage systems along with embedded intelligence software to optimize various VM workloads – all in order to meet various SLA metrics of performance, availability, cost etc..Are you aware of the side-effects that get created from Server Virtualization and prepared to ask pertinent questions of your suppliers of IT Infrastructure Equipment, Storage Virtualization and Data Storage Management software as well as in their implementation to achieve targeted results in performance, availability, scalability, interoperability and data management in their virtualized data centersThis presentation provides an illustrative view of the impact of Server Virtualization on existing storage I/O solutions and best practices. It delineates the roles, capabilities and cost effectiveness of emerging technologies in mitigating the I/O bottlenecks so the IT infrastructure implementers can achieve their targeted performance under various workloads, from their storage systems in virtualized data centers.
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
33
Agenda
• Data Centers & Cloud Infrastructure• Cloud Computing Architecture• Performance Metrics by Workload• Anatomy of Data Access• Data Center Performance Bottlenecks• Improving Query Response Time in OLTP• Role of SSD in Improving I/O Perf. Gap• SCM: A New Storage Class Memory SSDs • Price Erosion & IOPS/GB • Choosing SSD vs. Memory to Improve TPS• New Storage Usage Hierarchy in NGDC & Clouds • IO Bottleneck Mitigation in Virtualized Servers• I/O Forensics for Auto Storage-Tiering• Apps Benefitting from Improved I/O• Key Takeaways• Acknowledgements
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
44
Data Centers & Cloud Infrastructure
Enterprise VZ Data CenterOn-Premise Cloud
Home Networks
Web 2.0Social Ntwks.
Facebook,Twitter, YouTube…
Cable/DSL…Cellular
Wireless
InternetISP
CoreOptical
Edge ISP
ISPISP
ISP
ISP
Supplier/Partners
Remote/Branch Office
Public CloudCenter©
Servers VPNIaaS, PaaS SaaS
Vertical Clouds
ISP
Tier-3Data Base
ServersTier-2 Apps
ManagementDirectory Security PolicyMiddleware Platform
Switches: Layer 4-7,Layer 2, 10GbE, FC Stg
Caching, Proxy, FW, SSL, IDS, DNS,
LB, Web Servers
Application ServersHA, File/Print, ERP, SCM, CRM Servers
Database Servers, Middleware, Data Mgmt
Tier-1 Edge Apps
FC/ IPSANs
Source:: IMEX Research - Cloud Infrastructure Report ©2009-11
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
55
IT Industry’s Journey - Roadmap
CloudizationCloudizationOn-Premises > Private Clouds > Public CloudsDC to Cloud-Aware Infrast. & Apps. Cascade migration to SPs/Public Clouds.
Integrate Physical Infrast./Blades to meet CAPSIMS ®IMEX
Cost, Availability, Performance, Scalability, Inter-operability, Manageability & Security
Integration/ConsolidationIntegration/Consolidation
Standard IT Infrastructure- Volume Economics HW/Syst SW(Servers, Storage, Networking Devices, System Software (OS, MW & Data Mgmt SW)
StandardizationStandardization
VirtualizationVirtualizationPools Resources. Provisions, Optimizes, MonitorsShuffles Resources to optimize Delivery of various Business Services
Automatically Maintains Application SLAs (Self-Configuration, Self-Healing©IMEX, Self-Acctg. Charges etc)
AutomationAutomationSIVAC SIVAC
®IMEX
Source:: IMEX Research - Cloud Infrastructure Report ©2009-11
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
Platform Tools & Services
Operating Systems
Cloud ComputingPublic Cloud Service
Providers Private Cloud
Enterprise
AppApp SLA
SLA
SaaS Applications
…...Net
Python
EJB
Ruby
PHP …..PaaS
IaaS
SaaS
Virtualization
Resources (Servers, Storage, Networks)
HybridCloud
AppApp SLA
SLA AppApp SLA
SLA AppApp SLA
SLA AppApp SLA
SLA
Man
agem
ent
Cloud Computing Architecture
Application’s SLA dictates the Resources Required to meet specific requirements of Availability, Performance, Cost, Security, Manageability etc.
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
*IOPS for a required response time ( ms)*=(#Channels*Latency-1)
(RAID - 0, 3)
500100MB/sec
101 505
Data Warehousing
OLAP
BusinessIntelligence(RAID - 1, 5, 6)
IOPS
* (*L
aten
cy-1
)
Web 2.0Web 2.0AudioAudio
VideoVideo
Scientific ComputingScientific Computing
ImagingImagingHPCHPC
TPHPCHPC
Performance Metrics by Workload
10K
100 K
1K
100
10
1000 K OLTP
eCommerceeCommerceTransaction Transaction ProcessingProcessing
Source:: IMEX Research - Cloud Infrastructure Report ©2009-11
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
Anatomy of Data Access
Disk I/O
Server
Processor
Perf
orm
ance
1980 1990 2000 2010
A 7.2K/15k rpm HDD can do 100/140 IOPS
For the time it takes to do each Disk Operation:- Millions of CPU Operations can be done- Hundreds of Thousands of Memory Operations can be accomplished
I/O Gap
Anatomy of Data Access
Time taken by CPU, Memory, Network, Disk
for a typical I/O Operation during a
Data Access
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
Data Center Performance Bottlenecks
ServersWeb Servers Application Servers Database Servers
ClientsWindows, Linux, Unix
StorageWeb, Application, Database
LAN Access Networks
Storage I/O Access
Device BottlenecksDevice I/O Hotspots Cache Flush Lack of Storage Capacity
Server BottlenecksLack of Srvr Power IO Wait & Queuing CPU Overhead I/O Timeouts
User BottlenecksConnectivity Timeouts, Workload Surges
Storage I/O ConnectLack of Bandwidth Overloaded PCIe Connect Storage Device Contention
ApplicationsExcessive Locking Data Contention I/O Delays/Errors
Network I/ONetwork Congestion Dropped packets Data Retransmissions Timeouts Component Failures
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
1010
• Improving Query Response Time• Cost effective way to improve Query response time for a given number
of users or servicing an increased number of users at a given response time is best served with use of SSDs or Hybrid (SSD + HDDs) approach, particularly for Database and Online Transaction Applications
HDDs14 Drives
HDDs 112 Drives
w short stroking SSDs
12 Drives$$$$$$$$
$$
$$$
IOPS (or Number of Concurrent Users)
Que
ry R
espo
nse
Tim
e (m
s)
0
8
2
4
6
0 20,000 40,00010,000 30,000
Conceptual Only -Not to Scale
Improving Query Response Time in OLTP
Hybrid HDDs
36 Drives + SSDs$$$$
Source: IMEX Research SSD Industry Report ©2011
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
1111
Role of SSD in Improving I/O Perf. Gap
Performance I/O Access Latency
Price$/GB
HDD
Tape
DRAM
CPUSDRAM
HDD becoming Cheaper, not faster
DRAM getting Faster (to feed faster CPUs) & Larger (to feed Multi-cores & Multi-VMs from Virtualization)
SCM
NOR
NANDPCIe SSD
SATASSD
SSD segmenting into PCIe SSD Cache- as backend to DRAM &SATA SSD- as front end to HDD
Source: IMEX Research SSD Industry Report ©2011
Servers
Hybr. Storage
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
1212
SCM: A New Storage Class Memory
• SCM (Storage Class Memory)Solid State Memory filling the gap between DRAMs & HDDsMarketplace segmenting SCMs into SATA and PCIe based SSDs
• Key Metrics Required of Storage Class MemoriesDevice - Capacity (GB), Cost ($/GB),
• Performance - Latency (Random/Block RW Access-ms); Bandwidth BW(R/W- GB/sec)
• Data Integrity - BER (Better than 1 in 10^17)• Reliability - Write Endurance (30K PE Cycles No. of writes before
death);• - Data Retention (5 Years); MTBF (2 millions of Hrs),• Environment - Power Consumption (Watts); • Volumetric Density (TB/cu.in.); Power On/Off Time
(sec),• Resistance - Shock/Vibration (g-force); Temp./Voltage Extremes
4-Corner (oC,V); Radiation (Rad)
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
SSDs - Price Erosion & IOPS/GB
Note: 2U storage rack, • 2.5” HDD max cap = 400GB / 24 HDDs, de-stroked to 20%, • 2.5” SSD max cap = 800GB / 36 SSDs
Source: IMEX Research SSD Industry Report ©2011
0
300
600
2009 2010 2011 2012 2013 2014
Uni
ts (M
illio
ns)
0
4
8
IOPS
/GB
Enterprise HDD Units (M)
SSD Units (M)
IOPS/GB SSDs
IOPS/GB HDDs
HDD
SSD
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
2500
2000
1500
1000
500
0
TPS
Buffer Pool (Memory) GB0 2 4 6 8 10 12 14 16 18 20
SSD PCIe
SSD SATA
HDD RAID 10
2.5 x4.0 x
2.5 x
4.0 x
Choosing SSD vs. Memory to Improve TPS
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
PerformanceData Protection
80% of IOPsScaleCost
Data Reduction
80% of TB
Dat
a A
cces
s
Age of Data1 Day 2 Mo.1 Month 3 Mo. 1 Year1 Week 2 Yrs6 Mo.
Stor
age
Gro
wth
SSDs
Data Storage Usage Patterns –Data Access vs. Age of Data
Source:: IMEX Research - Cloud Infrastructure Report ©2009-11
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
1616
I/O Access Frequency vs. Percent of Corporate Data
SSD• Logs• Journals• Temp Tables• Hot Tables
SSD• Logs• Journals• Temp Tables• Hot Tables
FCoE/ SAS
Arrays• Tables• Indices
• Hot Data
FCoE/ SAS
Arrays• Tables• Indices
• Hot Data
Cloud Storage
SATA• Back Up Data• Archived Data
• Offsite DataVault
Cloud Storage
SATA• Back Up Data• Archived Data
• Offsite DataVault
2% 10% 50% 100%1%% of Corporate Data
65%
75%
95%
% o
f I/O
Acc
esse
s
New Storage Hierarchy in NGDC & Clouds
Source:: IMEX Research - Cloud Infrastructure Report ©2009-11
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
IO Bottleneck Mitigation in Virtualized Servers
Primary Storage
VM Client 1
I/O Reg.
XCLDriver
Disk Controller. SSD
Driver
VM Client 2
I/O Reg.
XCLDriver
VM Client n
I/O Reg.
XCLDriver
XCLVLUN
vSphere ESX
SSD Drive w/ESX Driver
ESX Kernel
XCLMgr.
Offloading IOPS from Primary Storage
Both Applications & Storage Run Faster
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
1818
I/O Forensics for Auto Storage-Tiering
LBA Monitoring and Tiered Placement• Every workload has unique I/O access signature• Historical performance data for a LUN can identify
performance skews & hot data regions by LBAs
Source: IBM & IMEX Research SSD Industry Report 2011 ©IMEX 2010-11
Storage-Tiering at LBA/Sub-LUN Level
Storage-Tiered Virtualization
Physical Storage Logical Volume
SSDs Arrays
HDDs Arrays
Hot Data
Cold Data
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
1919
Dat
a: IM
EX R
esea
rch
& Pa
nasa
s
Smart Mobile Devices
Commercial Visualization
Bioinformatics& Diagnostics
Decision Support Bus. Intelligence
Entertainment-VoD / U-Tube
Instant On Boot UpsRugged, Low Power
1GB/s, __ms
Rendering (Texture & Polygons)Very Read Intensive, Small Block I/O
10 GB/s, __ms 4 GB/s, __ms
Data WarehousingRandom IO, High OLTPM
Most Accessed VideosVery Read Intensive
1GB/s, __ms
Apps Benefitting from Improved I/O
IMEXRESEARCH.COM
© 2010‐11 IMEX Research, Copying prohibited. All rights reserved.
Key Takeaways
• Solving I/O Problems• I/O Bottlenecks occur at multiple places in the Compute Stack, the largest
being at Storage I/O• SSD comes out cheaper/IOP for IO Intensive Apps• To get of Reads – Improve Indexing, archive out old data• Minimize the impact of writes – Get rid of temp tables/filesorts on slow
disks.• Compress big varchar/text/blobs
• Data Forensics and Tiered Placement• Every workload has unique I/O access signature• Historical performance data for a LUN can identify performance skews &
hot data regions by LBAs• Use Smart Tiering to identify hot LBA regions and non-disruptively migrate
hot data from HDD to SSDs.• Typically 4-8% of data becomes a candidate and when migrated to SSDs
can provide response time reduction of ~65% at peak loads