Agile Data Platform: Revolutionizing Database Cloning
Kyle Haileyhttp://kylehailey.com
• Mind-meld with the illustrious members of the OakTable • 32 talks over 2 days, right next door to Oracle OpenWorld
More info. and registration: http://oaktableworld.com
OakTable World: Sept 23 & 24, San Francisco
NOW
Problem in IT
Get the right dataTo the right peopleAt the right time
Part I : Cloning Technology
Virtual Thin Clone Physical
Part II : Agile Data Acceleration
Database Cloning Challenge
If you can’t satisfy the business demands then your process is broken.
Problem
Developers
QA and UAT
Reports
First copy
Production
• CERN - European Organization for Nuclear Research
• 145 TB database• 75 TB growth each year• Dozens of developers want copies.
Tradeoff: Speed, Quality, Cost
What We’ve Seen
1. Inefficient QA: Higher costs of QA2. QA Delays : Greater re-work of code3. Sharing DB Environments : Bottlenecks4. Using DB Subsets: More bugs in Prod5. Slow Environment Builds: Delays
“if you can't measure it you can’t manage it”
1. Inefficient QA: Long Build times
Build Time
QA Test
96% of QA time was building environment$.04/$1.00 actual testing vs. setup
Build
2. QA Delays: bugs found late require more code re-work
Build QA Env QA Build QA Env QA
Sprint 1 Sprint 2 Sprint 3
Bug CodeX
1 2 3 4 5 6 70
10203040506070
Delay in Fixing the bug
Cost ToCorrect
Software Engineering Economics – Barry Boehm (1981)
3. Full Copy Shared : Bottlenecks
Frustration Waiting
Old Unrepresentative Data
4. Subsets : cause bugs
Production4. Subsets : cause bugs
Classic problem is that queries that run fast on subsets hit the wall in production.
Developers are unable to test against all data
The Production ‘Wall’
5. Slow Environment Builds: 3-6 Months to Deliver Data
Management
DBA
System Admin
Storage Admin
Developers Submit Request
Disk Capacity?
Approve Request $$ (2 Weeks)
Approve Request $$
(1 Week)
RequestAdditional Storage?
ProvisionCapacity
File SystemConfigured?
Configure LUNS & Build File System
Coordinate Replication w/ Infrastructure
Re-Parameterize & Configure DB
Mount Recovery DB to
Specific PIT
Begin Work
Approve Request $$ (2 Weeks)
(3 Days)
(3 Days)
(2 Days)
(3 Days)
(3 Days)
…….1-2 Weeks of Approvals, Delays, and Provisioning……
15
5. Slow Environment Builds: culture of no
DBA Developer
Never enough environments
bottlenecks
What We’ve Seen
1. Inefficient QA: Higher costs2. QA Delays : Increased re-work3. Sharing DB : Bottlenecks4. Subset DB : Bugs5. Slow Environment Builds: Delays
Clone 1 Clone 3Clone 2
99% of blocks are identical
Clone 1 Clone 2 Clone 3
Thin Clone
I. Clonedb Oracle II. EMC
• Copy on first write (COFW)III. Netapp
• write anywhere file system (WAFL)• & EMC VNX redirect on write (ROW)
IV. ZFS
2. Thin Cloning
RMAN backup
dNFSsparse fileI. clonedb
RMAN backup
dNFSsparse fileI. clonedb
CloneDB
1. dNFS 11.2.0.2+– cd $ORACLE_HOME/rdbms/lib– make -f ins_rdbms.mk dnfs_on
2. Clonedb.pl initSOURCE.ora output.sql– MASTER_COPY_DIR="/rman_backup”– CLONE_FILE_CREATE_DEST="/nfs_mount”– CLONEDB_NAME="clone"
3. sqlplus / as sysdba @output.sql– startup nomount PFILE=initclone.ora – Create control file backup location– dbms_dnfs.clonedb_renamefile ('/backup/file.dbf' , '/clone/file.dbf');– alter database open resetlogs;
Tim Hallwww.oracle-base.com/articles/11g/clonedb-11gr2.php
I. Clonedb Oracle II. EMC
• Copy on first write (COFW)III. Netapp
• write anywhere file system (WAFL)• & EMC VNX redirect on write (ROW)
IV. ZFS
2. Thin Cloning
D
ActiveFile
SystemSnapshot
CBA
II. EMC Copy on Write
D
ActiveFile
SystemSnapshot
DCBA
Write penalty (read and two writes)Limit 16 snapshotsNo Branching (snapshots of snapshots)
II. EMC Copy on Write
I. Clonedb Oracle II. EMC
• Copy on first write (COFW)III. Netapp
• write anywhere file system (WAFL)• & EMC VNX redirect on write (ROW)
IV. ZFS
2. Thin Cloning
Data Blocks
root
III. Netapp and EMC VNX
• 255 snapshots• Branching possible
I. Clonedb Oracle II. EMC
• Copy on first write (COFW)III. Netapp
• write anywhere file system (WAFL)• & EMC VNX redirect on write (ROW)
IV. ZFS
2. Thin Cloning
Snapshot rootLive root
ZilIntent Log
IV. ZFS Allocate on Write
Unlimited Instantaneous SnapshotsUnlimited Instantaneous ClonesBranching easy and unlimited
FS vs. ZFS
• FS per Volume
• FS limited bandwidth
• Storage stranded
• Many FS in a pool
• Grow automatically
• All bandwidth
Storage PoolVolume
FS
Volume
FS
Volume
FS ZFS ZFS ZFS
I. Clonedb Oracle II. EMC III. Netapp IV. ZFS
2. Thin Cloning
Database Luns
Production FilerTarget A
Target B
Target C
snapshotclones
1. Put database in hot backup2. Take Snapshot3. Clone Snapshot (ZFS & Netapp)4. Export Clone5. Mount on target host
InstanceInstance
InstanceInstance
InstanceInstance
InstanceInstance
Instance
Source
Database LUNs
snapshotclonesProduction Filer
Development Filer
Problem: How do you get data off Production?
Instance
Target A
Target B
Target C
InstanceInstance
InstanceInstance
InstanceInstance
Instance
Three Core Parts
Production
File System
Instance
DevelopmentStorage
21 3
Copy Sync SnapshotsPurge Time Flow
Clone (snapshot)CompressShare CacheStorage Agnostic
Mount, recover, renameSelf Service, Roles & Security Rollback & Refresh Branch & Tag
Instance
Three Core Parts
Production
File System
Instance
DevelopmentStorage
1
Copy Sync SnapshotsPurge Time Flow
Instance
Snap Manager
SnapManagerRepository
Protection Manager
Snap Drive
Snap Manager
Snap Mirror
Flex Clone
RMANRepository
Production
Development
DBA
Storage Admin
1 tr-3761.pdf
Netapp
NetApp Filer - DevelopmentNetApp Filer - Production
Database Luns
Snap mirror
Snapshot Manager for Oracle
Flexclone
Repository Database
SnapDrive
Protection Manage
Production
Development
1NetappTarget A
Target B
Target C
InstanceInstance
InstanceInstance
InstanceInstance
Instance
Three Core Parts
Production
File System
Instance
DevelopmentStorage
Instance
3
Mount, recover, renameSelf service, refresh & rollbackBranch & tagRoles & security
3
Oracle EM 12c Snap Clone3
EM 12c
Agents instance
• Register Netapp or ZFS with Storage Credentials• Install agents on a LINUX machine to manage the Netapp or ZFS storage. • Register test master database• Enable Snap Clone for the test master database• Set up a zone – set max CPU and Memory and the roles that can see these zones• Set up a pool – a pool is a set of machines where databases can be provisioned• Set up a profile – a source database that can be used for thin cloning• Set up a service template – init.ora values
Test Master
Instance
Source
? instance
CloneLinux
ZFS orNetApp
PoolProfile
ZoneTemplate
Where we Are
Production Development QA UAT
Instance Instance Instance InstanceInstance Instance Instance Instance
Database
File systemFile system
Database
File systemFile system
Database
File system
Database
File systemFile systemFile systemFile system
Database
File system
Production
Instance
Database
Development
Instance
Database
QA
Instance
Database
UAT
Instance
Snapshots
Instance Instance Instance Instance
Want be here
EM 12c: Snap Clone
Production Development
Flexclone Flexclone
Netapp Snap Manager for Oracle
Thin Cloning
3. Database Virtualization
Three Physical CopiesThree Virtual Copies
Data Virtualization Applaince
Choose your virtualization Layer:• Delphix and Oracle SMU
SMU
ZFS Storage Appliance
Oracle 12c SMUOracle Snap Management Utility for ZFS Appliance
• Requires ZFS Appliance• Supports Linux , Solaris 10+, Windows
2008+• GUI
– snapshot source databases – provision virtual databases
Install Delphix on x86 hardware
Intel hardware
Allocate Any Storage to Delphix
Allocate StorageAny type
One time backup of source database
Database
Production
Instance
File system
File systemBeta or in Development
Production
DxFS (Delphix) Compress Data
Database
Production
Instance
Data is compressed typically 1/3 size
File system
Incremental forever change collection
Database
Production
Instance
File system
Changes
• Collected incrementally forever• Old data purged
File system Time Window
Typical Architecture
Production Development QA UAT
Instance Instance Instance InstanceInstance Instance Instance Instance
Database
File systemFile system
Database
File systemFile system
Database
File system
Database
File systemFile systemFile systemFile system
With Delphix
Database
Production
Instance
Database
Development
Instance
Database
QA
Instance
Database
UAT
InstanceInstance Instance Instance Instance
File system
Three Core Parts
Production
Instance
Time Window
Instance
Self Service
Development
21 3
Source Syncing Storage (DxFS)
Fast, Fresh, Full
Instance
Time Window
Instance
Development VDB
Source
Free
Instance
Time Window
Instance
Instance
Instance
gif by Steve Karam
Source
Source
Self Service
Branching
Instance Instance
Instance
Time Window
Time Window
Dev1 VDB
Source
Source Dev VDB
QA VDB (branched from Dev)
End of SprintOr a Code Freeze
Federated Cloning
Federated
Instance
Time Window
Instance
Instance
Instance
Time Window
Source1
Source2Source1
Source2
“I looked like a hero”Tony Young, CIO Informatica
DevOps
DevOps With Delphix
1. Efficient QA: Low cost, high utilization2. Quick QA : Fast Bug Fix3. Every Dev gets DB: Parallelized Dev4. Full DB : Less Bugs5. Fast Builds: Fast Dev, Culture of Yes
1. Efficient QA: Lower cost
Build Time
QA Test
1% of QA time was building environment$.99/$1.00 actual testing vs. setup
Build Time
QA Test
Build
Rapid QA via Branching
2. QA Immediate: bugs found fast and fixed
Sprint 1 Sprint 2 Sprint 3
Bug CodeX
QA QA
Build QA Env QA Build QA Env Q
A
Sprint 1 Sprint 2 Sprint 3
Bug CodeX
3. Private Copies: Parallelize
4. Full Size DB : Eliminate bugs
Production
5. Self Service: Fast, Efficient. Culture of Yes!
Management
DBA
System Admin
Storage Admin
Developers Submit Request
Disk Capacity?
Approve Request $$ (2 Weeks)
Approve Request $$
(1 Week)
RequestAdditional Storage?
ProvisionCapacity
File SystemConfigured?
Configure LUNS & Build File System
Coordinate Replication w/ Infrastructure
Re-Parameterize & Configure DB
Mount Recovery DB to
Specific PIT
Begin Work
Approve Request $$ (2 Weeks)
(3 Days)
(3 Days)
(2 Days)
(3 Days)
(3 Days)
…….1-2 Weeks of Approvals, Delays, and Provisioning……
Quality
• Forensics• A/B testing• Recovery
Investigate Production Bugs
Instance
Time Window
Instance
Development
Anomaly on ProdPossible code bugAt noon yesterday
Spin up VDB of Prod as it was during anomaly
Rewind for patch and QA testing
Instance
Time Window
Instance
Development
Time Window
Prod
A/B testing
Instance
Time Window
Instance
Instance
• Keep tests for compare• Production vs Virtual
– invisible index on Prod– Creating index on virtual
• Flashback vs Virtual
Test A with Index 1
Test B with Index 2
Surgical recover of Production
Instance Instance
Development
Time Window
Spin VDB up Before drop
Problem on ProdDropped Table Accidently
Source
Time Window
Surgical or Full Recovery on VDB
Instance
Instance
Dev1 VDB
Time Window
Dev1 VDB
InstanceSource
Source
Dev2 VDB Branched
Virtual to Physical
Instance Instance
VDB
Source
Time Window
Spin VDB up Before drop
Corruption
Recovery
Business Intelligence
ETL and Refresh Windows
1pm 10pm 8am noon
ETL and DW refreshes taking longer
1pm 10pm 8am noon20112012201320142015
Database going Global
Globalization Reduces Windows
20112012201320142015
1pm 10pm 8am noon
10pm 8am noon 9pm
6am 8am 10pm
ETL and Refresh Windows
20112012201320142015
1pm 10pm 8am noon
10pm 8am noon 9pm
6am 8am 10pm
ETL and DW Refreshes
Instance
Prod
Instance
DW & BI
Data Guard – requires full refresh if usedActive Data Guard – read only, most reports don’t work
Fast Refreshes
• Collect only Changes• Refresh in minutes
Instance Instance
BI / DW
Prod
Temporal Data
BI
a) Fast refreshes
b) Temporal queries
c) Confidence testing
Review: Use Cases
1. Development Accelerationa) Full, Fresh, Fast , Self Serveb) QA Branchingc) Federated
2. Qualitya) Forensicsb) Testing : A/B, upgrade, patchc) Recovery: logical, physical
3. BIa) Fast refreshb) Temporal Datac) Confidence testing
over 10 times
perhaps the single largest storage consolidation opportunity history“
Oracle 12c
80MB buffer cache ?
200GBCache
5000
Tnxs
/ m
inLa
tenc
y
300 ms
1 5 10 20 30 60 100 200
with
1 5 10 20 30 60 100 200Users
8000
Tnxs
/ m
inLa
tenc
y
600 ms
1 5 10 20 30 60 100 200Users
1 5 10 20 30 60 100 200
Database Virtualization
Memory Prices
• EMC sells $1000/GB• X86 memory $30/1GB
• TB RAM on a x86 costs around $32,000 • TB RAM on a VMAX 40K costs around $1,000,000
$1,000,000
$6,000
ERP Project Failures 2011
• NYC CityTime : delays $63 M => $760 M
• Montclair Uni: delays sues PeopleSoft• Idaho : delays ERP cost millions
Standish : IT Project Failure Rate
1994 1996 1998 2000 2002 2004 2009
31% 40% 28% 23% 15% 18% 24%
★http://www.galorath.com/wp/software-project-failure-costs-billions-better-estimation-planning-can-help.php*http://www.pcworld.com/article/246647/10_biggest_erp_software_failures_of_2011.html
Devv2.6 v2.6v2.6
QA UAT
v2.6
v2.6 v2.6v2.6v2.7
v2.6 v2.6v2.6v2.8
v2.6v2.6 v2.6v2.6
v2.6v2.7 v2.6v2.7
v2.6v2.8 v2.6v2.8
Devv2.6 v2.6v2.6
QA UAT
v2.6Production
v2.6 v2.6v2.6v2.7
v2.6 v2.6v2.6v2.8
Source Control for the database data
v2.6v2.6 v2.6v2.6
v2.6v2.7 v2.6v2.7
v2.6v2.8 v2.6v2.8
DevProd
2.6
Dev
QA
Prod
2.6
Dev
QA
UAT
Prod
2.6
Dev
QA
UAT
Prod
Dev
QA
UAT
2.6
2.7
Dev
QA
UAT
Prod
Dev
QA
UAT
2.6
2.7
Dev
QA
UAT2.8
Dev
QA
UAT
Prod
Dev
QA
UAT
2.6
2.7
Dev
QA
UAT2.8
Data Control = Source Control for the Database
Dev
QA
UAT
Dev
QA
UAT
2.6
2.7
Dev
QA
UAT
2.8
Data Control = Source Control for the Database
Production Time Flow
Who is Kyle Hailey
1990 Oracle– 90 support– 92 Ported v6– 93 France– 95 Benchmarking – 98 ST Real World Performance
2000 Dot.Com 2001 Quest 2002 Oracle OEM 10g
Success!First successful OEM design
Who is Kyle Hailey
1990 Oracle– 90 support– 92 Ported v6– 93 France– 95 Benchmarking – 98 ST Real World Performance
2000 Dot.Com 2001 Quest a 2002 Oracle OEM 10g 2005 Embarcadero
– DB Optimizer
Who is Kyle Hailey
• 1990 Oracle 90 support 92 Ported v6 93 France 95 Benchmarking 98 ST Real World Performance
• 2000 Dot.Com• 2001 Quest • 2002 Oracle OEM 10g• 2005 Embarcadero
DB Optimizer• 2010 Delphix
When not being a Geek- Have a little 4 year old boy who takes up all my time
NoCOUG boardIOUG Liaison
About Delphix
• Founded in 2008, launched in 2010• CEO Jedidiah Yueh (founder of Avamar: >$1B revenue))• Based in Silicon Valley, Global Operations• 10% of Fortune 500
$40M
$75M
$850M
$27,000M
Storage
IT
Develop
Business
Good, Cheap, Fast : choose two
Fast
GoodCheap
DevOps With Delphix
1. Efficient QA: Low cost, high utilization2. Quick QA : Fast Bug Fix3. Every Dev gets DB: Parallelized Dev4. Full DB : Less Bugs5. Fast Builds: Fast Dev, Culture of Yes
Top Related