Post on 24-Dec-2015
1© Copyright 2010 EMC Corporation. All rights reserved.
EMC Data Domain :Data Protection and Deduplication
2© Copyright 2010 EMC Corporation. All rights reserved.
Why backup?
Goals– Backups are done for restores
Operational Disaster Recovery
– Disaster recovery requires offsite backup– Operational recovery requires onsite backup– Need both onsite and offsite copies on disk– Need quick restores
Don’t have time for moving physical assets
– Protection of personal data & intellectual property
3© Copyright 2010 EMC Corporation. All rights reserved.
Why So Much Interest in Data Deduplication?
Backup & Archive processes have been overwhelmed by information growth
Primary storage efficiency has become a necessity to cope with massive growth
ROI drives the compelling appeal of Dedupe– Reduced Storage Capacities– Lower Infrastructure Costs– Improved SLA’s – Efficient Replication for Business Continuance/DR
Very important
In use Evaluating / In Near – Long Term plan Not in Plan
DeduplicationOne of the top 10 Technology Considerations 59%
24% Deploying Deduplication 55% 21%
- Source: TheInfoPro Wave 11 Storage Study, 2008
4© Copyright 2010 EMC Corporation. All rights reserved.
Why Do Enterprises Still Use Tape?
• Low upfront cost
• Tape can store the massive amount of redundant data created by backups
• Transportable for offsite DRTAPE
DISK
Backup Storage5x-10x Primary
Primary Storage
5© Copyright 2010 EMC Corporation. All rights reserved.
EMC Data Domain: Leadership and Innovation
• Deduplication storage systems More than 12,000 systems installedMore than 4,300 customersMore than 2,600 PB under Data Domain protection worldwide
• A history of industry firsts
First Deduplication NAS
First Deduplication Volume Replication
Largest Deduplication
Array
First DeduplicationDirectory Replication
First Deduplication Virtual Tape Library
First Deduplication Nearline Storage
Fastest BackupController
Cascaded Replication
2003 2004 2005 2006 2007 2008 2009 2010
First Deduplication
Encryption
First Distributed Processing
6© Copyright 2010 EMC Corporation. All rights reserved.
Data Domain – works with what you have
Database
ArchiveBackup
VMware
7© Copyright 2010 EMC Corporation. All rights reserved.Confidential7
De-duplication principles
Unique segments (4KB-12KB) – varies “on-the-fly”
8© Copyright 2010 EMC Corporation. All rights reserved.Confidential8
De-duplication principles
Unique segments (4KB-12KB) – varies “on-the-fly”
9© Copyright 2010 EMC Corporation. All rights reserved.
Second Friday Full Backup
B C D E F L G H
Data Deduplication: Technology OverviewStore more backups in a smaller footprint
A B C D E F G H I J
Friday Full Backup
A B C D A E F G
Mon Incremental A B H
Tues Incremental C B I
Thurs Incremental A C K
Weds Incremental E G J
Backup Logical Estimated Physical Data Reduction
Monday Incremental 100 GB 7–10x 10 GB
Tuesday Incremental 100 GB 7–10x 10 GB
K L
Wednesday Incremental 100 GB 7–10x 10 GB
Thursday Incremental 100 GB 7–10x 10 GB
Second FRIDAY FULL 1 TB 50–60x 18 GB
TOTAL 2.4 TB 7.8x 308 GB
FRIDAY FULL 1 TB 2–4x 250 GB
10© Copyright 2010 EMC Corporation. All rights reserved.
Deduplication Dramatically Reduces Storage Capacity Requirements
Deduplication10–30 times less data stored versus fulls + incrementals with typical retention policies
0
10
20
30
1 5 10 15 20
Weeks in Use
Dat
a S
tore
d
Deduplication storage
Traditional storage
11© Copyright 2010 EMC Corporation. All rights reserved.
Multi-C
ontrolle
r Syste
ms
with G
lobal Deduplic
ation
1.25
1.5
0.04
ThroughputGB/sec.
Addressable Capacity in TBPost-RAID (Physical)
DD200 (2004)
2011 (est.)
Data Domain SISL™ Scalable Architecture: CPU-Centric
70 >PB
5
3
Distrib
uted Pro
cess
ing For
Single-contro
ller S
ystems
DD880, 7/09Industry’s Fastest
Backup Storage Controller
Data Domain Scale
6-Year Improvement• Throughput: ~90x• Capacity: ~225x
12© Copyright 2010 EMC Corporation. All rights reserved.
Inline vs Post-Process Deduplication:Provisioning & Admin
Post Process:Deduplication After Storing
Inline: Deduplication Before Storing
Other activities unimpeded− Predictable− Simpler
Process contention increases with #processes
− Copy to tape: Too slow to stream tape− Recovery: SLA predictability− Replication: Poor time-to-DR− Deduplication itself if interleaved with backup or
restore
More admin needed to fight these issues
At least 3x disk accesses to shared store
Store Dedupe Dedupe Restore
ReplicateRestore Replicate?
Updedupe?
13© Copyright 2010 EMC Corporation. All rights reserved.
Data Integrity: Data Invulnerability ArchitectureTrust but verify—”hope” is not a strategy
OtherRAID 6NVRAMSnapshots
Data verificationChecksumDeduplication, write to diskVerify
Self-healing file systemCleaningExpired dataDefragVerify
Global Compression
Local Compression
RAID
File System
GenerateChecksum
VerifyData
Verify the file system metadata integrity
Verify user data integrity
Verify stripe integrity
14© Copyright 2010 EMC Corporation. All rights reserved.
Network-Efficient Replication for True Disaster RecoveryLowers WAN costs; improves service level agreements
Source:Remote sites Destination:
Data Center Hub Supports hundreds
of remote sites
1–5%
1–5%
1–5%
Archive data
Backup data
Data Domain DDX Arraywith DD880s
Data Domain system
Flexible replication One-to-many Many-to-one Bi-directional System-to-
system Cascaded
Home
DB
WAN
Home
DIR A
95–99% cross-site bandwidth reduction
Data Domain system
Data Domain system
15© Copyright 2010 EMC Corporation. All rights reserved.
Industry’s Most Scalable Inline Deduplication Systems
DDX Array SeriesSoftware options:DD Boost, DD Virtual Tape Library, DD Replicator, Retention Lock, and DD Encryption
Up to 16 ControllersDD140 Remote Office Appliance
DD600 Appliance Series
DD880
Global Deduplication Array
New
DD140 DD610 DD630 DD660 DD690 DD880Global Deduplication Array
DDX Array
Speed (Other) 450 GB/hr 675 GB/hr 1.1 TB/hr 2.0 TB/hr 2.7 TB/hr 5.4 TB/hr 86.4. TB/hr
Speed (DD Boost) 490 GB/hr 1.3 TB/hr 2.1 TB/hr 2.7 TB/hr 3.9 TB/hr 8.8 TB/hr 12.8 TB/hr 140 TB/hr
Logical capacity 17–43 TB 75–195 TB 165–420 TB .520–1.31 PB .710–1.7 PB 2.8–7.1 PB 5.7–14 .2 PB 45.6–114 PB
Raw capacity 1.5 TB Up to 6 TB Up to 12 TB Up to 36 TB Up to 48 TB Up to 192 TB Up to 384 TB Up to 3.07 PB
Usable capacity 0.86 TB Up to 3.98 TB Up to 8.4 TB Up to 26.1 TB Up to 35.3 TB Up to 142.5 TB Up to 285 TB Up to 2.28 PB
16© Copyright 2010 EMC Corporation. All rights reserved.
Why Data Domain?
• Less disk to resource, less to manage– CPU-centric deduplication– Inline– Green
• Simple, mature, and flexible– Simple, mature appliance– Nearline tier: any fabric, any software, backup or nearline
applications
• Resilience and disaster recovery– Storage of last resort– Cross-site global compression: data center or remote office