An Overview of Database Archiving Toronto DAMA Chapter Meeting 16 September, 2009 Jack E. Olson...
-
Upload
ariel-warner -
Category
Documents
-
view
220 -
download
2
Transcript of An Overview of Database Archiving Toronto DAMA Chapter Meeting 16 September, 2009 Jack E. Olson...
An Overview ofDatabase Archiving
Toronto DAMA Chapter Meeting16 September, 2009
Jack E. Olson
www.svaltech.com
SvalTech
“Database Archiving: How to Keep Lots of Data for a Long Time”Jack E. Olson, Morgan Kaufmann, 2008
Copyright SvalTech, Inc., 2009
2
Topics
Copyright SvalTech, Inc., 2009
SvalTech
• Database Archiving Definitions
• Database Archiving Application Profiles
• Elements of a Successful Implementation
• Solution Comparisons
• Business Case Basics
3
Database Archiving Definitions
SvalTech
Copyright SvalTech, Inc., 2009
4
Definition
Document Archiving word pdf excel XML
File Archiving structured files source code reports
Email Archiving outlook lotus notes
Database Archiving DB2 IMS ORACLE SAP PEOPLESOFT
Physical Documents application forms mortgage papers prescriptions
Multi-media files pictures sound telemetry
The process of removing selected data items from operational databases that are not expected to be referencedagain and storing them in an archive database where they can be retrieved if needed.
SvalTech
Copyright SvalTech, Inc., 2009
5
Data Sources
SvalTech
Data created and maintained by either custom applications orpackaged applications that store data in structured database management systems or structured records in file systems.
transaction datareference data
DB2 SAP IMS Oracle Financials ADABAS Siebel IDMS PeopleSoft ORACLE SQL SERVER VSAM
Copyright SvalTech, Inc., 2009
6
Data Domain – Business Records
SvalTech
The data captured and maintained for a single businessevent or to describe a single real world object.
Databases are collections of Business Records.
Database Archiving is Records Retention.
customer employee stock trade purchase order deposit loan payment
Copyright SvalTech, Inc., 2009
7
Drivers
SvalTech
overloadedoperationaldatabases
Longer Data Retention requirements
Expanded Business
Mergers and Acquisitions
Copyright SvalTech, Inc., 2009
Operational problems
Data Governancee-Records Retentione-Discovery Readiness concerns
Application Changes
8
Data Retention
SvalTech
The requirement to keep data for a business object for a specified period of time. The object cannot be destroyed untilafter the time for all such requirements applicable to it has past.
Business Requirements
Regulatory Requirements
The Data Retention requirement is the longest of all requirement lines.
Copyright SvalTech, Inc., 2009
9
Data Retention
SvalTech
• Retention requirements vary by business object type
• Retention requirements from regulations are exceeding business requirements
• Retention requirements will vary by country
• Retention requirements imply the obligation to maintain the authenticity of the data throughout the retention period
• Retention requirements imply the requirement to faithfully render the data on demand in a common business form understandable to the requestor
• The most important business objects tend to have the longest retention periods
• The data with the longest retention periods tends to be accumulate the largest number of instances
• Retention requirements often exceed 10 years. Requirements exist for 25, 50, 70 and more years for some applications
Copyright SvalTech, Inc., 2009
10
Data Time Lines
SvalTech
createevent discard
eventoperational reference inactive phase phase phase
operational phase can be updated, can be deleted, may participate in processes that create or update other data
reference phase used for business reporting, extracted into businessintelligence or analytic databases, anticipated queries
inactive phase no expectation of being used again, no known business value, being retained solely for the purpose of satisfying retention requirements. Must be available on request in the rare event a need arises.
for a single instance of a business record
Copyright SvalTech, Inc., 2009
11
Data Time Lines
SvalTech
for a single instance of a business record
Create POUpdate POCreate InvoiceBackorderCreate Financial RecordUpdate on ShipUpdate on Ack
Weekly Sales ReportQuarterly Sales report
Extract for data warehouseExtract for bus analysisCommon customer queriesCommon bus queries
Ad hoc requestsLaw suit e-Discovery requestsInvestigation data gathering
Retention requirement
operational reference inactive
Copyright SvalTech, Inc., 2009
12
Data Time Lines
SvalTech
• Some objects exit the operational phase almost immediately (financial records)
• Some objects never exit the operational phase (customer name and address)
• Most transaction data has an operational phase of less than 10% of the retention requirement and a reference phase of less than 20% of the retention requirement
• Inactive data generally does not require access to application programs: only access to ad hoc search and extract tools
Copyright SvalTech, Inc., 2009
13
Application Segments
SvalTech
An application segment is a set of business objects generated from a single versionof an application where all business records in the segment have data consistent witha single metadata definition.
A metadata break is a point in the life of the operational database where a change in metadatais implemented that changes the structure of the data or the manner in which data is encoded.
• An application will have many segments over time
• Minor changes in metadata can sometimes be implemented without forcing a segment change
• Major metadata changes will always generate a segment change where data created in the previous segment cannot be recast to the new metadata definition without some compromise in the data
• Application segments can be generated in parallel with one operational implementation using one version of the application at the same time that another operational instance is using a different version of the application
Copyright SvalTech, Inc., 2009
14
Application SegmentsSvalTech
OS1
time
S1
Application: customer stock transactions
Source 1 = Trades – All Stock Trades
case 1
OS1
time
S1
S2
Application: customer stock transactions
Source 1 = Stock Trades – North American DivisionSource 2 = Stock Trades – Western Europe Division
OS2
case 2
= major metadata break
Copyright SvalTech, Inc., 2009
15
Application SegmentsSvalTech
OS1
time
S1
S2
Application: customer stock transactions
Source 1 = Stock Trades – North American Division – application XSource 2 = Stock Trades – Western Europe Division – application YSource 3 = acquisition of Trader Joe: merged with Source 1 on 7/15/2009Source 4 = acquisition of Trader Pete: merged with Source 1 on 8/15/2009
OS2
case 3
= major metadata break
S3OS3
S2OS4
Copyright SvalTech, Inc., 2009
16
Application SegmentsSvalTech
• A well designed database archive preserves application segments– Data is always kept in segment format – Metadata is preserved at the segment level– The archive administrative catalog shows
• Segments– Segment version number– Time period covered– System generated from
• Time order of consecutive segment strings• Parallel segment strings for the same application
Copyright SvalTech, Inc., 2009
17
Database Archiving Application Profiles
SvalTech
Copyright SvalTech, Inc., 2009
18
Overloaded Operational DatabaseSvalTech
• Transaction data• Lots of data
– Hundreds of millions of rows– High daily transaction rate
• 24/7 operational availability requirement• Long retention period (7 years or more) • Short useful active life (less than 2 years)• Low access requirements during the inactive period
– Very low access frequency– Response time not critical– Access requirements are simple, easily satisfied with ad hoc tools
Copyright SvalTech, Inc., 2009
19
If You Don’t ArchiveSvalTech
• Inactive Data will impact operational performance– Harder to tune– Scans take longer
• Utility functions will take longer to execute– Backups– Database reorganizations
• Recovery Operations take longer– Outage recoveries– Disaster recoveries
• System Costs will Escalate– Need more expensive online storage– Need system upgrades– Pay more for application and DBMS software
• Older data will become less reliable
Copyright SvalTech, Inc., 2009
Continue to keepall data in operationalDatabase.
20
Retired ApplicationSvalTech
• Merger of companies results in an operational application being duplicated
• Data Structures are not compatible– One keeps data elements not in other– One encodes data elements differently– One designed for different OS/DBMS than other
• Decision is made to use one system and abandon the other one
• Meets all requirements of an operational application
Copyright SvalTech, Inc., 2009
21
If You Don’t ArchiveSvalTech
• Must retain old application environment to access data
– Old System
– Old Application Program
– Old DBMS
– Must keep knowledgeable staff to access• Application experts
• System experts
• DBA function
• Or, Must merge data into active application
Copyright SvalTech, Inc., 2009
Pay the high cost of the oldapplication environmentand staff until last recordreaches end of retentionperiod.
$$$$$$$$$$$$$$$$$$$$$$$$
• Higher cost and time of conversion
– Data conversion problems– Data loss– Resolution of data quality issues
• Resulting database is huge– Operational problems– Lengthy Utility runs– Lengthy Recovery periods– Escalating system costs
22
Application Renovation ProjectSvalTech
• Application is undergoing major change– Replaced with packaged application– Legacy modernization– Legacy termination– Rewritten to be web-centric– Need to satisfy new requirements
• Old data structures are out of date– Legacy DBMS– Legacy file system
• Data meets all other requirements for archiving operational application
Copyright SvalTech, Inc., 2009
23
If You Don’t ArchiveSvalTech
• Must convert all data in one system to other system
Copyright SvalTech, Inc., 2009
• More expensive and complex design phase
– Must accommodate old data in new design
– May compromise new design
• Higher and longer conversion period
– Data conversion problems
– Data loss
– Resolution of data quality issues
• Resulting data is less reliable
24
Elements of a Successful Implementation
SvalTech
Copyright SvalTech, Inc., 2009
25
Archive StaffSvalTech
• Database Archive Specialist– Received education on database archive design and implementation– Knows tools available– Experienced– Full time job
• Database Archive Administrator– Received education on database archiving administration– Full time job
• Supporting Roles– Storage Administrators– Database Administrators– Data Stewards– Security Administrators– Compliance staff– IT management– Business Unit Management
Copyright SvalTech, Inc., 2009
26
Architecture of Database Archiving
Archive Server
Operational System
archive catalog
archive storage
OP DB
Archive AdministratorArchive DesignerArchive Data ManagerArchive Access Manager
SvalTech
Archive Extractor
Application program
Archive extractor
Copyright SvalTech, Inc., 2009
27
Archive DesignerSvalTech
• Metadata– Capture current metadata– Validate it– Enhance it– Design archive storage format
• Data– Define business records to be archived– Define source of data– Define data structures within operational system– Define reference data needed to include with it– Define archive format of data
• Policies– Define extract policy (when a record becomes inactive)– Define operational disposal policy (when to remove from operational database)– Define storage policy (how to protect data in archive)– Define discard policy (when to remove from archive)
Copyright SvalTech, Inc., 2009
28
Archive ExtractorsSvalTech
• Extractor process– Verify consistency with design metadata– Extract data as defined in designer– Mark or delete from operational database as defined in designer– Pass data to archive data manager– Keep audit records on everything done– Do not impact operational performance– Support interruptions with transaction level recovery– Support restart– Finish scans within acceptable time periods
• Scheduling– Establish periodic executions– Find non-disruptive periods– Be consistent
Copyright SvalTech, Inc., 2009
29
Archive ExtractorsPhysical vs. Application Extractors
SvalTech
Copyright SvalTech, Inc., 2009
Operational System
OP DB
Archive Extractor
Application program
Archive extractor
Physical ExtractorGets/deletes data directly from the database
tables, rows, columns
Application ExtractorGets/deletes data from an application API
virtual tables, rows, columns
application program
30
Archive Data ManagerSvalTech
• Put data away– Receive data from extractors– Format into archive segment files– Determine metadata version affinity– Format and store metadata files if new– Build or update segment indexes both internal and external
• Execute Storage policies– Encryption/ signatures– Backup copies created and stored– Geographic dispersion of backups– Register archive files with archive catalog– Enter audit trail information
• Fetch metadata on request– Return to accessing programs
• Fetch data on request– Scan archive segments– Search through indexes
• Execute Archive Discard Process– Periodic scheduling– Delete qualifying business records– Update archive catalog
Copyright SvalTech, Inc., 2009
31
Archive AccessSvalTech
• Query Capability– Determine applicability based on archive segment versions of metadata– SQL based in best, if possible– Employ external indexes to determine which archive segments to look into– Employ internal indexes to avoid reading all of an archive segment
• Support standard access tools– Report generation (such as Crystal Reports)– Generic query tools– JDBC interface
• Support metadata version browsing
• Support generation of load files based on query results
• Support generation of load files based on original data source based on query results
Copyright SvalTech, Inc., 2009
32
Archive AdministrationSvalTech
• Manage Archive Catalog– Application archive designs– Audit trails– Results logs
• Manage Archive Storage Systems– Ensure periodic readability checks– Maintain access audit trails
• Manage Archive Access– Authorizations for users– Authorizations for specific events
• Unloads– Ensure audit records are created for all access
• Manage e-Discovery requests
• Ensure Extract and Discard processes are run when they are supposed to
• Manage Metadata Change Process
Copyright SvalTech, Inc., 2009
33
Solution Comparisons
SvalTech
Copyright SvalTech, Inc., 2009
34
SvalTech
Database LOAD FilesSaved image copies
Parallel databasesPartitions of operational db
Reformatted archive segmentsstored as files load files XML files special files
typically homegrownsolutions
typically vendorsolutions
Copyright SvalTech, Inc., 2009
How Archive Data is Stored
35
SvalTech
Requires restaging data to accessNot searchable in archiveProblems handling metadata changes
Don’t get $$$$ savingsRequires database administrationProblems handling metadata changes
IndexedDirect access via SQLSeparated by archive segmentsMetadata resolution across archive segmentsCan exploit storage subsystem capabilitiesCan use hosted storage
Copyright SvalTech, Inc., 2009
Storage ComparisonsDB Solutions
Backup Solutions
Non-DB Special files
parallel
partitioned
db arrays
image copies
unload files
XML
load files plus
proprietary
36
SvalTech
Copyright SvalTech, Inc., 2009
Data Structure Comparisons
Things to Look for
Is metadata maintained in archive
Is metadata validated
Is metadata enhanced
Is data restructured to achieve source independence
from application programs
from DBMS type
from source OS/ hardware
Is reference information captured in archive
Is data maintained in original form in archive forever
Can user see data form prior to conversions
37
SvalTech
Copyright SvalTech, Inc., 2009
Data Access in the Archive
Things to Look for
Can requests be satisfied directly from the archive
Can common generic tools be used
JDBC
Report generators
Can data be unloaded in forms for re-platforming
Can data be accessed efficiently
Is it indexed
Is representations consistent
Are metadata differences accounted for
38
SvalTech
Copyright SvalTech, Inc., 2009
Administration of the Archive
Things to Look for
Is there a full time administrator
Is there an archive catalog database
what is in the archive
where is it stored
Is security maintained
different from operational
Are actions and events logged
39
SvalTech
Copyright SvalTech, Inc., 2009
A Myth
Homegrown Solutions are good enough.
Truth:
They do solve the problem of getting inactive data out of operational databases
However,
They do not realize maximum cost savings
They generally do not realize any cost savings
They generally cannot be directly accessed
They often require original application environment
They are never indexed
They often compromise data integrity across metadata changes
They often offer less protection from data loss
40
SvalTech
Copyright SvalTech, Inc., 2009
A Myth
Homegrown Solutions are cheaper and faster to implement.
Truth:
A good vendor solution will guide you through the process and get done quickly
Managing the archive is easier and cheaper than managing databases
41
Business Case Basics
SvalTech
Copyright SvalTech, Inc., 2009
42
An AssertionSvalTech
Copyright SvalTech, Inc., 2009
To Get the Benefits from Database Archiving
improved operational efficiency
better data governance
lower risks
It does not need to cost a penny.
If done properly, database archiving can realize cost benefits
larger than the cost of implementation and maintaining
the archive. In most cases the savings can justify database
archiving by itself.
43
Reason for Archiving
Operational operational archive
All data in operational db
most expensive system most expensive storage most expensive software
Inactive data in archive db
least expensive system least expensive storage least expensive software
In a typical op db60-80% of datais inactive
This percentageis growing
SvalTech
Size Today
Copyright SvalTech, Inc., 2009
44
Cost Saving ElementsSvalTech
Copyright SvalTech, Inc., 2009
Look for and compute difference in storage costs
front-line vs archive storage
byte counts differences between operational and archive
Look for and compute difference in system costs
operational vs archive systems
are operational system upgrades avoided
are software upgrades avoided
can systems be eliminated for application
can software be eliminated for application
Look for savings on people costs
can people be eliminated or redirected for retired applications
Potential savings on changes/ application renovations
simplification of design
elimination of data conversions
45
Operational Efficiency ImpactsSvalTech
Copyright SvalTech, Inc., 2009
Will operational performance be enhanced with less data
Will utility time periods be reduced (backup, reorganization)
fewer occurrences needed
less data to process each time
Will recovery times be reduced and what is that worth
interruption recoveries
disaster recoveries
Will implementation of data structure changes be improved
avoided
reduced amount of data to unload/modify/reload
46
Risk FactorsSvalTech
Copyright SvalTech, Inc., 2009
Will the saved data have better authenticity
not changed in archive
shielded from updates or damage
traceable back to original form
Will e-Discovery benefit from archiving
can locate and process data outside of operational environment
can easily create legal-hold archive units
Will exposure of data reduced
fewer authorized users against the archive
complete audit trails of all access
47
Business Case SummarySvalTech
Copyright SvalTech, Inc., 2009
• Database Archiving solutions generally provide for lower cost software,
can use lower cost storage more efficiently, and run on smaller machines.
• Each business case is different
Many factors can be used in building business case
Seen an application justified on storage costs alone
Seen an application justified on disaster recovery time alone
Seen an application justified on better data security alone
• Each organization will have many potential applications
• Having a database archiving practice can create synergies across many
applications thus adding more value
48
Final ThoughtsSvalTech
Copyright SvalTech, Inc., 2009
• Database Archiving is coming
• Database Archiving is good• Reduces cost• Improves operational efficiency• Reduces Risk
• Need a complete solution to be effective
• Need professional staff• Educated• Fulltime