IBM Systems and Technology Group Technical Symposium Melbourne, Australia || August 20 – 23, 2013...
Transcript of IBM Systems and Technology Group Technical Symposium Melbourne, Australia || August 20 – 23, 2013...
IBM Systems and Technology Group Technical Symposium
Melbourne, Australia || August 20 – 23, 2013
sBC06
Architect’s 2013 Guide to Designing Integrated Multi-Product HA/DR/BC
John Sing
Melbourne Australia | August 20 -23 2013
IBM Systems and Technology Group Technical Symposium
Melbourne, Australia || August 20 – 23, 2013
John Sing 31 years of experience with IBM in high end servers, storage, and
software– 2009 - Present: IBM Executive Strategy Consultant: IT Strategy and Planning,
Enterprise Large Scale Storage, Internet Scale Workloads and Data Center Design, Big Data Analytics, HA/DR/BC
– 2002-2008: IBM IT Data Center Strategy, Large Scale Systems, Business Continuity, HA/DR/BC, IBM Storage
– 1998-2001: IBM Storage Subsystems Group - Enterprise Storage Server Marketing Manager, Planner for ESS Copy Services (FlashCopy, PPRC, XRC, Metro Mirror, Global Mirror)
– 1994-1998: IBM Hong Kong, IBM China Marketing Specialist for High-End Storage– 1989-1994: IBM USA Systems Center Specialist for High-End S/390 processors– 1982-1989: IBM USA Marketing Specialist for S/370, S/390 customers (including
VSE and VSE/ESA)
IBM colleagues may access my intranet webpage:– http://snjgsa.ibm.com/~singj/
You may follow my daily IT research blog– http://www.delicious.com/atsf_arizona
You may follow me on Slideshare.net:– http://www.slideshare.net/johnsing1
My LinkedIn:– http://www.linkedin.com/in/johnsing
© 2013 IBM Corporation3
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
3
IBMTECHU.COM
IBM Technical Symposium web portal:
http://www.ibmtechu.com/au
download password: au2013
KEY FEATURES...
– Create a personal agenda using the agenda planner
– View the agenda and agenda changes– Use the agenda search to find the sessions
and/or – Download presentations– Submit Session and Conference Evaluations
Win prizes by submitting
evaluations online. The more evalutions
submitted, the greater chance of
winning
© 2013 IBM Corporation4
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
IBMTECHU.COM/au sBC06Win prizes by submitting
evaluations online. The more evalutions
submitted, the greater chance of
winning
Sessionsurvey
© 2013 IBM Corporation5
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Today’s Goals
Understand today’s challenges and best practices
– for IT High Availability and IT Business Continuity
What has changed? What is the same?– Traditional IT– Internet-scale Design for Fail IT
Strategies for:– Requirements, design, implementation
Step by step approach– Essential role of automation– Accommodating petabyte scale– Exploiting Cloud
5
2013 Clouddeployment
options
© 2013 IBM Corporation6
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Agenda
1. Solving Today’s HA-DR-BC Challenges
2. Guiding HA-DR-BC Principles to mitigate chaos
3. Traditional Workloads vs. Internet Scale Workloads
4. Master Vision and Best Practices Methodology
© 2013 IBM Corporation7
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Recovering today’s real-time massive streaming workflows is challenging
Chart in public domain: IEEE Massive File Storage presentation, author: Bill Kramer, NCSA: http://storageconference.org/2010/Presentations/MSST/1.Kramer.pdf:
n d
© 2013 IBM Corporation8
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Today’s Data and Data Recovery Conundrum:
© 2013 IBM Corporation9
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Many options, including many non-traditional alternatives for user deployments, workload hosting, and recovery models
Traditional alternatives:
Other platforms
Other vendors
Non-traditional alternatives: – The Cloud, the Developing World
Illustrative Cloud examples onlyNo endorsement is implied
or expressed
Inter-
Disciplinary
© 2013 IBM Corporation10
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Finally, we have this ‘little’ problem regarding Mobile proliferation
From IT standpoint, we are clearly seeing “consumerization of IT”
Key is to recognize and exploit hyper-pace reality of BYOD’s associated data
Not just the technology
Also the recovery model (“cloud), the business model, and the required ecosystem
Clayton ChristensenHarvard Business School
http://en.wikipedia.org/wiki/Disruptive_innovation
© 2013 IBM Corporation11
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
So how do we affordably architect HA / BC / DR in 2013?
© 2013 IBM Corporation12
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
What has remained the same?
Data Protection Service Management Storage Efficiency
(Continued good Guiding Principles that mitigate HA/DR/BC chaos)
© 2013 IBM Corporation13
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Application 1Application 3Analytics
report
managementreports
http://xyz.xml
decisionpoint
MQseries
WebSphere
Application 2
SQL
db2
Businessprocess A
Businessprocess B
Businessprocess C
Businessprocess D
Businessprocess E
Businessprocess F
Businessprocess G
Infr
astr
uctu
reA
pp
licati
on
Bu
sin
ess
1. An error occurs on a storage device that correspondingly corrupts a database
2. The error impacts the ability of two or more applications to share critical data
3. The loss of both applications affects two distinctly different business processes
IT Business Continuity must recover at the business processlevel
The Business Process is still the Recoverable Unit
© 2013 IBM Corporation14
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Application 1Application 3Analytics
report
managementreports
http://xyz.xml
decisionpoint
WebSphere
Application 2
SQL
db2
Businessprocess A
Businessprocess B
Businessprocess C
Businessprocess D
Businessprocess E
Businessprocess F
Businessprocess G
Infr
astr
uctu
reA
pp
licati
on
Bu
sin
ess
1. Data input to the cloud
2. Cloud provider outage
3. The loss of Cloud output affects two distinctly different business processes
Cloud is simply another deployment option
But doesn’t change HA/BC fundamental approach
Cloud does not change business process; still the recovery unit
STOP
© 2013 IBM Corporation15
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
When can Cloud recovery can provide extremely fast time to project completion?
Where entire business process recoverable units can be out-sourced to Cloud provider
– Production example: Out-sourcing production, or backup/restore, or integrated, standalon, application to a provider
– Cloud application-as-a-service (AaaS) example: Salesforce.com, etc.
Application 1Application 3Analytics
reportmanagement
reports
http://xyz.xml
decisionpoint
MQseries
WebSphere
Application 2
SQL
db2
Businessprocess A
Businessprocess B
Businessprocess C
Businessprocess D
Businessprocess E
Businessprocess F
Businessprocess G
Tech
nic
al
Ap
plicati
on
Bu
sin
ess
© 2013 IBM Corporation16
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
The trick to leveraging Cloud is:
Understanding that Cloud is simply another
(albeit powerful) deployment choice
Good news:
Fundamental principles for HA/DR/BC haven’t changed
It’s only the deployment options that have changed
© 2013 IBM Corporation17
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Still true: synergistic overlap of valid data protection techniques
Protection of critical Business data Operations continue after a disaster
Costs are predictable and manageableRecovery is predictable and reliable
Fault-tolerant, failure-resistant streamlined infrastructure
with affordable cost foundation
1. High Availability Non-disruptive backups and
system maintenance coupled with continuous availability of
applications
2. Continuous Operations Protection against unplanned
outages such as disasters through reliable, predictable
recovery
3. Disaster Recovery
IT DataProtection
© 2013 IBM Corporation18
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Four Stages of Data Center Efficiency: (pre-req’s for HA/BC/DR)
http://public.dhe.ibm.com/common/ssi/ecm/en/rlw03007usen/RLW03007USEN.PDF http://www-935.ibm.com/services/us/igs/smarterdatacenter.html
April 2012
© 2013 IBM Corporation19
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Done?
?
Still true: Timeline of an IT Recovery ==>
Production ☺ Network Staff
Operations StaffOperations Staff
Data
Operating System
Physical Facilities
Telecom Network
Management Control
Execute hardware, operating system, and data integrity recovery
AssessRPO
Application transactionintegrity recovery
Applications
Now we're done!
Applications Staff
Recovery Time Objective (RTO)of transaction integrity
Recovery Time Objective (RTO)of hardware data integrity
Recovery Point Objective
(RPO)
How much datamust be
recreated?
Outage!
RPO
Telecom bandwidth still the major delimiterfor any fast recovery
© 2013 IBM Corporation20
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
?
Still true: value of Automation for real-time failover ===>
Production ☺ Network StaffOperations StaffOperations Staff
Data
Operating System
Physical Facilities
Telecom Network
Management Control
AssessRPO
Trans.Recov.
Applications
Now we're done!
Applications Staff
RTO trans. integrity
RTO H/W
Recovery Point Objective
(RPO)
How much datamust be
recreated?
Outage!
RPO
HW
•Reliability
•Repeatability
•Scalability
•Frequent Testing
Value of automation
© 2013 IBM Corporation21
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Recovery Time Objective (guidelines only)
15 Min. 1-4 Hr.. 4 -8 Hr.. 8-12 Hr.. 12-16 Hr.. 24 Hr.. Days
Co
st
/ Va
lue
BC Tier 4 – Add Point in Time replication to Backup/Restore
BC Tier 3 – VTL, Data De-Dup, Remote vault
BC Tier 2 – Tape libraries + Automation
BC Tier 7 – Add Server or Storage replication with end-to-end automated server recovery
BC Tier 6 – Add real-time continuous data replication, server or storage
BC Tier 1 – Restore from Tape
Still true: Organize High Availability, Business Continuity Technologies
Balancing recovery time objective with cost / value
BC Tier 5 – Add Application/database integration to Backup/Restore
Recovery from a disk image Recovery from tape copy
© 2013 IBM Corporation22
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Tape Backup
SecsMinsHrsDays Wks Secs Mins Hrs Days Wks
Recovery PointRecovery Point Recovery TimeRecovery Time
Synchronous replication / HA
Periodic Replication
Asynchronous replication
Still true: Replication Technology Drives RPO
For example:
© 2013 IBM Corporation23
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Recovery Time includes:
– Fault detection
– Recovering data
– Bringing applications back online
– Network access
Manual Tape Restore
SecsMinsHrsDays Wks Secs Mins Hrs Days Wks
Recovery PointRecovery Point Recovery TimeRecovery Time
End to end automated clustering
Storage automation
Still true: Recovery Automation Drives Recovery Time
For example:
© 2013 IBM Corporation24
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Integration into IT ManageBusiness Prioritization
StrategyDesign
riskassessment
businessimpactanalysis
Risks,
Vulnerabilities
and Threats
programassessment
Impacts
of
Outage
RTO/RPO
•Maturity Model
•Measure ROI
•Roadmap for Program
ProgramDesign
Current
Capability
Implement programvalidation
Estimated
Recovery Tim
e
ResilienceProgram
Management
Awareness, Regular Validation, Change Management, Quarterly Management Briefings
Business processes drive strategies and they are integral to the Continuity of Business Operations. A company cannot be resilient without having strategies for alternate workspace, staff members, call centers and communications channels.
crisis team
businessresumption
disasterrecovery
highavailability
1. People2. Processes3. Plans4. Strategies5. Networks6. Platforms7. Facilities
Database andSoftware design
High Availability Servers
Storage, Data Replication
High Availabilitydesign
Source: IBM STG, IBM Global Services
Still true: “ideal world” construct for IT High Availability and Business Continuity
© 2013 IBM Corporation25
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
The 2013 Bottom line: (IT Business Continuity Planning Steps)
For today’s real world environment……….
Integration into IT ManageBusiness Prioritization
StrategyDesign
riskassessment
businessimpactanalysis
Risks,
Vulnerabilities
and Threats
programassessment
Impacts
of
Outage
RTO/RPO
• Maturity Model
• Measure ROI
• Roadmap for Program
ProgramDesign
Current
Capability
Implement programvalidation
Estimated
Recovery Tim
e
ResilienceProgram
Management
Awareness, Regular Validation, Change Management, Quarterly Management Briefings
crisis team
businessresumption
disasterrecovery
highavailability
1. People2. Processes3. Plans4. Strategies5. Networks6. Platforms7. Facilities
Database andSoftware design
High Availability Servers
Data Replication
high availabilitydesign
i.e. how to streamline this “ideal” process?1. Collect information for prioritization
2. Vulnerability, risk assessment, scope
3. Define BC targets based on scope
4. Solution option design and evaluation
5. Recommend solutions and products
6. Recommend strategy and roadmap
4. Solution option design and evaluation
5. Recommend solutions and products
6. Recommend strategy and roadmap
2013 key #2:
Workload type
2013 key #1:
need a basicData Strategy
Need faster way than even this simplified 2007 version:
© 2013 IBM Corporation26
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06Streamlined BC ActionsInput Output
2. Vulnerability / Risk Assessment
List of vulnerabilities Defined vulnerabilities
3. Define desired HA/BC targets based on scope
Existing BC capability, KPIs, targets, and success rate
Defined BC baseline targets, architecture, decision and success criteria
4. Solution design andevaluation
Technologies and solution options
Business process segmentsand solutions
5. Recommend solutions and products
Generic solutions that meet criteria
Recommended IBMSolutions and benefits
1. Collect info forprioritization
Business processes, Key Perf. Indicators, IT inventory
Scope, Resource Business Impact
Component effect on business processes
6. Recommend strategy and roadmap
Budget, major project milestones, resource availability, business process priority
Baseline Bus. Cont. strategy, roadmap, benefits, challenges,financial implications andjustification
2005 version
© 2013 IBM Corporation27
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06Streamlined BC ActionsInput Output
2. Vulnerability / Risk Assessment
List of vulnerabilities Defined vulnerabilities
3. Define desired HA/BC targets based on scope
Existing BC capability, KPIs, targets, and success rate
Defined BC baseline targets, architecture, decision and success criteria
4. Solution design andevaluation
Technologies and solution options
Business process segmentsand solutions
5. Recommend solutions and products
Generic solutions that meet criteria
Recommended IBMSolutions and benefits
1. Collect info forprioritization
Business processes, Key Perf. Indicators, IT inventory
Scope, Resource Business Impact
Component effect on business processes
6. Recommend strategy and roadmap
Budget, major project milestones, resource availability, business process priority
Baseline Bus. Cont. strategy, roadmap, benefits, challenges,financial implications andjustification
Do basic HA/DR
Data Strategy
Exploit
Workload Type
2013 version
© 2013 IBM Corporation28
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
How do we get there in 2013?
Bottom line #1: have a basic Data Strategy
Bottom line #2: Exploit Workload type
Data Protection Service Management Storage Efficiency
© 2013 IBM Corporation29
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
i.e. #1: It’s all about the
Data
Now, what do I mean by that?
© 2013 IBM Corporation30
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Applicationscreate data
InformationArchive / Retain / Delete
What is a basic Data Strategy? Specify data usage over it’s lifespan
Fre
qu
ency
of
Acc
ess
and
Use
Time
Informationand data
Management
© 2013 IBM Corporation31
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Business processes drive strategies and they are integral to the Continuity of Business Operations. A company cannot be resilient without having strategies for alternate workspace, staff members, call centers and communications channels.
Integration into IT ManageBusiness Prioritization
StrategyDesign
riskassessment
businessimpactanalysis
Risks,
Vulnerabilities
and Threats
programassessment
Impacts
of
Outage
RTO/RPO
•Maturity Model
•Measure ROI
•Roadmap for Program
ProgramDesign
Current
Capability
Implement programvalidation
Estimated
Recovery Tim
e
ResilienceProgram
Management
Awareness, Regular Validation, Change Management, Quarterly Management Briefings
crisis team
businessresumption
disasterrecovery
highavailability
1. People2. Processes3. Plans4. Strategies5. Networks6. Platforms7. Facilities
Database andSoftware design
High Availability Servers
Storage, Data Replication
High Availabilitydesign
Source: IBM STG, IBM Global Services
Data strategy = collecting information, prioritizing, vulnerability/risk, scope
Data
Strategy
© 2013 IBM Corporation32
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Data Strategy: relationship to Business, IT Strategies
Business Strategy
Business
Scope
Distinct
CompetenciesBusiness
Governance
IT Strategy
Technology
Scope
System
CompetenciesIT
Governance
Organization, Infrastructure,
Process
Process
Skills Tools
IT Infrastructure
And processes
IT
Infrastructure
Processes Skills
Business Strategies
IT Strategy
Data Strategy
Enterprise IT Architecture
IT Infrastructure
People
Process
Structure
Data
Technology
Data Strategy
Data Strategy Defined
© 2013 IBM Corporation33
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
The role of the basic “Data Strategy” for HA / BC purposes
Define major data types “good enough”– i.e. by major application, by business line….– An ongoing journey
For each data type:– Usage– Performance and measurement– Security– Availability– Criticality– Organizational role– Who manages– What standards for this data
• What type storage deployed on• What database • What virtualization
Be pragmatic– Create a basic, “good enough” data strategy for HA/BC purposes
Acquire tools that help you know your data
Data Strategy Defined
Business Strategies
IT Strategy
Data Strategy
Enterprise IT Architecture
IT Infrastructure
People
Process
Structure
Data
Technology
Data Strategy
You have toknow your data
And have abasic “good
enough” strategy for it
© 2013 IBM Corporation34
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Site Load Balancer
Web Server Clusters
Application / DBServer Clusters
Server Clusters Disk
Production Site
Many choices for cloud high availability, replication architectures
Local backup
Applicationor database Replication
ServerReplication
StorageReplic.
Geographic Load Balancer
Geographic Load Balancer Site
Load Balancer
PIT Image, Tape B/U
Web Server Clusters
Application / DBServer Clusters
Server Clusters
Other Site(s)
Workloadbalancer
© 2013 IBM Corporation35
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Today there are two major types of IT workloads:
Transactional IT Internet Scale Workloads
Cloud, High Availability, Resiliency, Disaster Recovery characteristics
Can be done “Agnostic / after the fact” using replication
Data Strategy Use traditional tools/concepts to understand / know data
Storage/server virtualization and pooling
Automation End to end automation of server / storage virtualization
Commonality Apply master vision and lessons learned from internet scale data centers
© 2013 IBM Corporation36
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Therefore, there are two major types of IT HA/DR/BC approaches, depending on workload type:
Transactional IT Internet Scale Workloads
Cloud, High Availability, Resiliency, Disaster Recovery characteristics
Can be designed “Agnostic / after the fact” using server or storage virtualization, replication
Must be “designed into software stack from the beginning”
Data Strategy Use traditional tools/concepts to understand / know data
Storage/server virtualization and pooling
Proven Open Source toolset to implement failure tolerance and redundancy in the application stack
Automation End to end automation of server / storage virtualization and replication
End to end automation of the application software stack providing failure tolerance
Commonality Apply master vision and lessons learned from internet scale data centers
Apply master vision and lessons learned from internet scale data centers
© 2013 IBM Corporation37
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Principles for Internet Scale Workloads
© 2013 IBM Corporation38
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Two different Cloud types
Source: http://it20.info/2012/02/the-cloud-magic-rectangle-tm/
Transactional ITInternet scale wkloads
Transactional ITTransactional ITTransactional ITTransactional IT
© 2013 IBM Corporation39
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Today’s two major IT workload types
Source: http://it20.info/2012/02/the-cloud-magic-rectangle-tm/ Transactional IT Internet scale wkloads
© 2013 IBM Corporation40
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
IT architecture at internet scale
Internet scale architectures fundamental assumptions:
– Distributed aggregation of data
– High Availability, failure tolerance functionality is in software on the server
– Time to Market is everything• Breakage = “OK” if I can insulate that from user
– Affordability is everything– Use open source software where-ever possible
– Expect that something somewhere in infrastructure will always be broken
– Infrastructure is designed top-to-bottom to address this
All other criteria are driven off of these
Criteria:
Cost
Extreme:
- Scale- Parallelism- Performance- Real time-Time to Market
© 2013 IBM Corporation41
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Internet Scale Workload Characteristics - 1
Embarrassingly parallel Internet workload– Immense data sets, but relatively independent records being processed
• Example: billions of web pages, billions of log / cookie / click entries– Web requests from different users essentially independent of each over
• Creating natural units of data partitioning and concurrency• Lends itself well to cluster-level scheduling / load-balancing
– Independence = peak server performance not important– What’s important is aggregate throughput of 100,000s of servers i.e. Very low
inter-process communication
Workload Churn– Well-defined, stable high level API’s (i.e. simple URLs)– Software release cycles on the order of every couple of weeks
• Means Google’s entire core of search services rewritten in 2 years– Great for rapid innovation
• Expect significant software re-writes to fix problems ongoing basis– New products hyper-frequently emerge
• Often with workload-altering characteristics, example = YouTube
*The Data Center as a Computer: Introduction to Warehouse Scale Computing, p.81 Barroso, Holzle
http://www.morganclaypool.com/doi/pdf/10.2200/S00193ED1V01Y200905CAC006
Internet scale Workload presentation by John Sing: http://www.slideshare.net/johnsing1/s-bd03-infinitybeyond2internetscaleworkloadsdatacenterdesignv6speaker
© 2013 IBM Corporation42
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Internet Scale Workload Characteristics - 2
Platform Homogeneity– Single company owns, has technical capability, runs entire platform
end-to-end including an ecosystem– Most Web applications more homogeneous than traditional IT– With immense number of independent worldwide users
1% - 2% of all Internet requests
fail*
Users can’t tell difference between Internet down and
your system down
Hence 99% good enough
Fault-free operation via application middleware– Some type of failure every few hours, including software bugs– All hidden from users by fault-tolerant middleware– Means hardware, software doesn’t have to be perfect
Immense scale: – Workload can’t be held within 1 server, or within max size tightly-clustered
memory-shared SMP– Requires clusters of 1000s, 10000s of servers with corresponding PBs
storage, network, power, cooling, software– Scale of compute power also makes possible apps such as Google Maps,
Google Translate, Amazon Web Services EC2, Facebook, etc.
*The Data Center as a Computer: Introduction to Warehouse Scale Computing, p.81 Barroso, Holzle
http://www.morganclaypool.com/doi/pdf/10.2200/S00193ED1V01Y200905CAC006
Internet scale Workload presentation by John Sing: http://www.slideshare.net/johnsing1/s-bd03-infinitybeyond2internetscaleworkloadsdatacenterdesignv6speaker
© 2013 IBM Corporation43
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
How You (Provider) Build These Clouds
Source: http://it20.info/2012/02/the-cloud-magic-rectangle-tm/
Transactional ITInternet scale, new-gen
wkloads
© 2013 IBM Corporation44
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
What You (Consumer) Get with These Clouds:
Source: http://it20.info/2012/02/the-cloud-magic-rectangle-tm/
Transactional IT Internet scale wkloads
© 2013 IBM Corporation45
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Policy-based Clouds and Design-for-fail Clouds areworkload optimized architectural choices
Policy-based Clouds
• Purpose optimized for longer-lived virtual machines managed by Server Administrator
• Centralizes enterprise server virtualization administration tasks
• High degree of flexibility designed to accommodate virtualization all workloads
• Significant focus on managing availability and QoS for long-lived workloads with level of isolation
• Characteristics derived from exploiting enterprise class hardware
• Legacy applications
Design-for-fail Clouds
• Purpose optimized for shorter-term virtual machines managed via end-user or automated process
• Decentralized control, embraces eventual consistency, focus on making “good enough” decisions
• High degree of standardization
• Significant focus on ensuring availability of control plane
• Characteristics driven by software
• New applications
Transactional IT Internet scale wkloads
© 2013 IBM Corporation46
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Two Cloud workload types
Source: http://it20.info/2012/02/the-cloud-magic-rectangle-tm/
Transactional ITInternet scale wkloads
Transactional ITTransactional ITTransactional ITTransactional IT
© 2013 IBM Corporation47
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
For more reading on Internet Scale Architectures: the following 2008 Google public domain book
Today’s Internet Scale Data Center landscape– Where are they? How big? How fast growing?– What are they being used for? Cloud impact? – Why understand them?
What is internet data center / warehouse-scale computing?
– How is it different? Workloads? – Hardware and software? – How the same?
How best to meld with it / use it / exploit?– Lessons we can applying from Internet scale
computing• Resources to help you on this journey
See John Sing’s other presentation:– sCL02 State of the Cloud - Internet Scale Datacenters,
Workloads, Tradtional IT vs. Design for Fail
Download copy of Google’s seminal book on Internet Scale Architectures: .Download a copy at: http://www.morganclaypool.com/doi/pdf/10.2200/S00193ED1V01Y200905CAC006
Download those charts here: http://www.slideshare.net/johnsing1/s-bd03-infinitybeyond2internetscaleworkloadsdatacenterdesignv6speaker
© 2013 IBM Corporation48
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Summary: two major types of HA/DR/BC approaches depending on workload type:
Transactional IT Internet Scale Workloads
Cloud, High Availability, Resiliency, Disaster Recovery characteristics
Can be designed “Agnostic / after the fact” using server or storage virtualization, replication
Must be “designed into software stack from the beginning”
Data Strategy Use traditional tools/concepts to understand / know data
Storage/server virtualization and pooling
Proven Open Source toolset to implement failure tolerance and redundancy in the application stack
Automation End to end automation of server / storage virtualization and replication
End to end automation of the application software stack providing failure tolerance
Commonality Apply master vision and lessons learned from internet scale data centers
Apply master vision and lessons learned from internet scale data centers
© 2013 IBM Corporation49
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Principles for Architecting IT HA / DR / Business Continuity
© 2013 IBM Corporation50
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Key strategy: segment data into logical storage pools by appropriate Data Protection characteristics (animated chart)
Continuous Availability (CA) – E2E automation enhances RDR– RTO = near continuous, RPO = small as possible (Tier 7)– Priority = uptime, with high value justification
Lower cost
Rapid Data Recovery (RDR) – enhance backup/restore– For data that requires it– RTO = minutes, to (approx. range): 2 to 6 hours– BC Tiers 6, 4– Balanced priorities = Uptime and cost/value
Backup/Restore (B/R) – assure efficient foundation – Standardize base backup/restore foundation – Provide universal 24 hour - 12 hour (approx) recovery capability– Address requirements for archival, compliance, green energy– Priority = cost
Mission Critical
Know and categorize your data -
Provides foundation for affordable data protection
Know and categorize your data -
Provides foundation for affordable data protection
Enabled by
virtualization
© 2013 IBM Corporation51
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
For traditional IT - Virtualization is fundamental to addressing today’s IT diversity
Virtualization
© 2013 IBM Corporation52
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Virtualized IT infrastructure Business Processes
Virtualized systems become the resource pools that enable the recoverability
For traditional IT - Consolidated virtualized systems become the Recoverable Units for IT Business Continuity
Virtualization
© 2013 IBM Corporation53
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Recovery Time Objective
15 Min. 1-4 Hr.. 4 -8 Hr.. 8-12 Hr.. 12-16 Hr.. 24 Hr.. Days
Co
st
/ Va
lue
BC Tier 4 – Add Point in Time replication to Backup/Restore
BC Tier 3 – VTL, Data De-Dup, Remote vault
BC Tier 2 – Tape libraries + Automation
BC Tier 7 – Add Server or Storage replication with end-to-end automated server recovery
BC Tier 6 – Add real-time continuous data replication, server or storage
BC Tier 1 – Restore from Tape
High Availability, Business Continuity Step by Step virtualization journey
Balancing recovery time objective with cost / value
BC Tier 5 – Add Application/database integration to Backup/Restore
Recovery from a disk image Recovery from tape copy
Foundation
Storage pools
© 2013 IBM Corporation54
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06Storage Pools
Apply appropriate server, storage technology
Real Time replication(storage or server or
software)
Real Time replication(storage or server or
software)
Periodic PiT replication:-File System
- Point in Time Disk- VTL to VTL with Dedup
Periodic PiT replication:-File System
- Point in Time Disk- VTL to VTL with Dedup
- Foundation backup/restore- Physical or electronic transport
- Foundation backup/restore- Physical or electronic transport
PetaByteUnstructured
PetaByteUnstructured
PetabyteUnstructured
PetabyteUnstructured
Petabyte unstructured, due to usage and large scale, typically uses
application level intelligent redundancyfailure toleration design
Petabyte unstructured, due to usage and large scale, typically uses
application level intelligent redundancyfailure toleration design
Real-time replication
Point in time
Removable media
File, application, or disk-to-disk
periodic replication
Add automated failover to replicated storage
© 2013 IBM Corporation55
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Recovery Time Objective
Co
st
Methodology Traditional IT:HA / BC / DR in stages, from bottom up
SAN SAN
Add: Point-in-time Copy, disk to disk, Tiered Storage (Tier 4)Foundation: electronic vaulting, automation, tape lib (Tier 3)
Foundation: standardized, automated tape backup (Tier 2, 1)
Disk VTL/De-DupDisk VTL/De-Dup VTL/De-Dup
•IBM FlashCopy, SnapShot•IBM XIV, SVC, DS, SONAS•IBM Tivoli Storage Productivity Center 5.1
•IBM ProtecTier•IBM Virtual Tape Library•IBM Tivoli Storage Manager Backup/restore
•VTL, de-dup, remote replication at tape level
© 2013 IBM Corporation56
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Recovery Time Objective
Co
st
SAN SAN
Add: Point-in-time Copy, disk to disk for backup/restore (Tier 4)Foundation: electronic vaulting, automation, tape lib (Tier 3)
Foundation: standardized, automated tape backup (Tier 2, 1)
Disk VTL/De-DupDisk VTL/De-Dup VTL/De-Dup
Applicationintegration
Applicationintegration
Automate applications, database for replication and automation (Tier 5)Consolidate and implement real time data availability (Tier 6)
Datareplication
Data replication
End to end automated site failover servers, storage, applications (Tier 7)
Dynamic
End to endAutomatedFailover:Server
StorageApplications
Methodology Traditional IT HA / BC / DR in stages, from bottom up
If storage: •Metro Mirror, Global Mirror, Hitachi UR•XIV, SVC, DS, other storage•TPC 5.1
•VMWare•PowerHA on p
•Tivoli FlashCopy Manager
•Server virtualization
© 2013 IBM Corporation57
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Technology Deployment Options in a modern Cloud world
EnterpriseData Center
Private Cloud
1EnterpriseEnterprise
Data Center
Co-lo operated
Managed Private Cloud
Co-lo owned and operated Co-lo owned
and operated
Hosted Private Cloud
2 3
• Consumption models including client-owned and provider-owned assets
• Delivery options including client premise & hosted
• Strategic Outsourcing clients with standardized services
Operated or
Co-located
Enterprise AEnterprise
BEnterprise C
Shared Cloud Services
4
• Standardized, multi-tenant service
• Pay-per-usage model with provider-owned assets
Pay-per-Usage
User A
User B
User C
User D
User E
Public Cloud Services
5
• Supporting compute-centric workloads
• Finer granularity in multi-tenancy model
• Provider-owned assets
Compute Cloud Persistent StoragePrivate Cloud
• Client-managed cloud
• Internal or partner implementation services
© 2013 IBM Corporation58
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Cloud as remote site deployment options
Real Time replication(storage or server or
software)
Real Time replication(storage or server or
software)
Periodic PiT replication:-File System
- Point in Time Disk- VTL to VTL with Dedup
Periodic PiT replication:-File System
- Point in Time Disk- VTL to VTL with Dedup
- Point in Time Copies- Physical or electronic transport
- Point in Time Copies- Physical or electronic transport
PetaByteUnstructured
PetaByteUnstructured
PetabyteUnstructured
PetabyteUnstructured
Petabyte level storage typicallyuses intelligent file or application replication
due to large scale, usage patterns
Petabyte level storage typicallyuses intelligent file or application replication
due to large scale, usage patterns
ProductionRecovery
inCloud
© 2013 IBM Corporation59
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
VirtualizedStorage
Data strategy remote cloud
Real Time replication(storage or server or
software)
Real Time replication(storage or server or
software)
Periodic PiT replication:-File System
- Point in Time Disk- VTL to VTL with Dedup
Periodic PiT replication:-File System
- Point in Time Disk- VTL to VTL with Dedup
- Point in Time Copies- Physical or electronic transport
- Point in Time Copies- Physical or electronic transport
PetaByteUnstructured
PetaByteUnstructured
PetabyteUnstructured
PetabyteUnstructured
Petabyte level storage typicallyuses intelligent file or application replication
due to large scale, usage patterns
Petabyte level storage typicallyuses intelligent file or application replication
due to large scale, usage patterns
Real-time replication
Point in time
Removable media
Disk-to-disk replication
Automated failover
© 2013 IBM Corporation60
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Local Cloud deployment from data standpoint
PetaByteUnstructured
PetaByteUnstructured
© 2013 IBM Corporation61
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Cloud providerresponsibilityfor HAand BC Real Time replication
(storage or server or software)
Real Time replication(storage or server or
software)
Periodic PiT replication:-File System
- Point in Time Disk- VTL to VTL with Dedup
Periodic PiT replication:-File System
- Point in Time Disk- VTL to VTL with Dedup
- Point in Time Copies- Physical or electronic transport
- Point in Time Copies- Physical or electronic transport
PetaByteUnstructured
PetaByteUnstructured
PetabyteUnstructured
PetabyteUnstructured
Petabyte level storage typicallyuses intelligent file or application replication
due to large scale, usage patterns
Petabyte level storage typicallyuses intelligent file or application replication
due to large scale, usage patterns
YourProduction
In Cloud
Recovery By
CloudProvider
© 2013 IBM Corporation62
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Recovery Time Objective
15 Min. 1-4 Hr.. 4 -8 Hr.. 8-12 Hr.. 12-16 Hr.. 24 Hr.. Days
Co
st
/ Va
lue
BC Tier 4 – Add Point in Time replication to Backup/Restore
BC Tier 3 – VTL, Data De-Dup, Remote vault
BC Tier 2 – Tape libraries + Automation
BC Tier 7 – Add Server or Storage replication with end-to-end automated server recovery
BC Tier 6 – Add real-time continuous data replication, server or storage
BC Tier 1 – Restore from Tape
Today’s world: High Availability, Business Continuity is a Step by Step data strategy / workload journey
Balancing recovery time objective with cost / value
BC Tier 5 – Add Application/database integration to Backup/Restore
Recovery from a disk image Recovery from tape copy
Workload Types
Data Strategy
Clouddeploymentif needed
© 2013 IBM Corporation63
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Recovery Time Objective
15 Min. 1-4 Hr.. 4 -8 Hr.. 8-12 Hr.. 12-16 Hr.. 24 Hr.. Days
Co
st
/ Va
lue
BC Tier 4 – Add Point in Time replication to Backup/Restore
BC Tier 3 – VTL, Data De-Dup, Remote vault
BC Tier 2 – Tape libraries + Automation
BC Tier 7 – Add Server or Storage replication with end-to-end automated server recovery
BC Tier 6 – Add real-time continuous data replication, server or storage
BC Tier 1 – Restore from Tape
Recovery from a disk image Recovery from tape copy
Step by Step Virtualization, High Availability, Business Continuity data strategy
Balancing recovery time objective with cost / value
BC Tier 5 – Add Application/database integration to Backup/Restore
Continuous AvailabilityContinuous Availability
Rapid Data RecoveryRapid Data Recovery
Backup/RestoreBackup/Restore
Workload typesData Strategy
Clouddeploymentif needed
© 2013 IBM Corporation65
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
Summary
Understand today’s best practices– for IT High Availability and IT Business Continuity
What has changed? What is the same?– Principles for requirements = no change
• Data Strategy– Deployment for true internet scale wkloads:
• Application level redundancy
Strategies for:– Requirements, design, implementation– In-house vs. out-sourcing
Step by step approach– Automation, virtualization essential– Segment workloads traditional vs. petabyte scale– Exploiting Cloud
DataStrategy
Workloadtypes
Clouddeployment
options
© 2013 IBM Corporation66
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
IBMTECHU.COM/au sBC06Win prizes by submitting
evaluations online. The more evalutions
submitted, the greater chance of
winning
Sessionsurvey
© 2013 IBM Corporation67
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
© 2013 IBM Corporation68
IBM Systems and Technology Group Technical Symposium
Melbourne Australia | August 20 -23 2013 sBC06
IBM Redbook documents fundamental methodologies discussed today
SG24-6547-03
IBM System Storage Business Continuity: Part 1 Planning Guide
See chapters 3, 6, and 7
John Sing is architect and co-author of this book
http://www.redbooks.ibm.com/abstracts/sg246547.html