2003 Swiss Federal Archives 1 Archiving Snapshots or Transactions Extracting the right data at the...
-
Upload
jeffery-caldwell -
Category
Documents
-
view
213 -
download
0
description
Transcript of 2003 Swiss Federal Archives 1 Archiving Snapshots or Transactions Extracting the right data at the...
2003
Swiss Federal Archives 1
Archiving Snapshots or TransactionsExtracting the right data at the right
time from temporal databases
Niklaus Bütikofer, Swiss Federal Archives10 April 2003
2003
Swiss Federal Archives 2
Types of Databases•Database with additions only•Database with amendments
(„dynamic databases“)o Snapshot databaseo Temporal database
- Valid-time database- Transaction-time database- Bitemporal database
o Mixed snapshot and temporal database
2003
Swiss Federal Archives 3
Snapshot and valid-time DatabasesPERS_ID ADDRESS215 Sonnegg 11309 Thunstrasse 1
PERS_ID
ADDRESS FROM_DATE TO_DATE
215 Rathausgasse 6
1990-08-01 1995-09-30
215 Sonnegg 11 1995-10-01 „now“309 Kramgasse 9 1990-08-01 1994-12-
31309 Belpstrasse
231995-01-01 1998-04-
30309 Thunstrasse
11998-05-01 „now“
Snapshot Database
Valid-time Database
2003
Swiss Federal Archives 4
Archiving snapshot databases
•Snapshots do not tell us when changes occurred•Certain facts can completely disappear
time
Snapshot 1
Snapshot 2
Rathausgasse 6
t1 t2
Sonnegg 11
Kramgasse 9
Thunstrasse 1
Belpstrasse 23
215
309
PERS_ID
ADDRESS
2003
Swiss Federal Archives 5
Archiving logfiles ?
•DBMS record all transactions in systemlogs (journals)
•Purpose: recovery and auditing
time
Logfile 1 Logfile 2 L 3
Roll forward
Roll backward
Snapshot 1
Snapshot 2
t1 t2
2003
Swiss Federal Archives 6
Using logfiles for archives•Even if standard logfiles are not binary, but
SQL statements written in ASCII, they depend on how SQL is implemented in a given system.
•Standard logfiles can only be used for automatic roll back or roll forward in their original system.
•Standard logfiles in archives are only useable for „manual“ verification of single facts.
•How good would standard SQL logfiles work?
2003
Swiss Federal Archives 7
Archiving temporal databases• Temporal databases contain the complete
history of valid-states and/or transactions• For purposes of performance and/or
compliance (with e.g. privacy regulations) data must be periodically deleted resp. archived.
• Solutions:1. Archive all rows (tupels), that are non-current at a
given time (all rows/tupels with TO_DATE before YYYY-MM-DD) and delete them in the database afterwards.
2. Archiving snapshots combined with delete procedure
2003
Swiss Federal Archives 8
Archiving temporal databases (1)
time
Archive all rows (tupels) with
TO_DATE before YYYY-MM-DD
Rathausgasse 6
t1
Sonnegg 11Kramgasse
9Thunstrasse
1Belpstrasse
23
215309
PERS_ID
ADDRESS
547 Archivstrasse 24
•In archived package no complete time-slice possible
2003
Swiss Federal Archives 9
time
Snapshot 1
Snapshot 2
Rathausgasse 6
t1 t2
Sonnegg 11
Kramgasse 9
Thunstrasse 1
Belpstrasse 23
215
309
PERS_ID
ADDRESS
547 Archivstrasse 24
Archiving temporal databases (2)
2003
Swiss Federal Archives 10
PERS_ID
ADDRESS FROM_DATE TO_DATE
215 Rathausgasse 6
1990-08-01 1995-09-30
215 Sonnegg 11 1995-10-01 „now“309 Kramgasse 9 1990-08-01 1994-12-
31309 Belpstrasse
231995-01-01 1998-04-
30309 Thunstrasse 1 1998-05-01 „now“547 Archivstrasse
241990-08-01 „now“
Valid-time Database
Archiving snapshots from temporal databases (II)
S1 / delS1S1 / delS1
S1S1 = Snapshot 1
2003
Swiss Federal Archives 11
When should snapshots be extracted?
•Snapshot databases:o The frequency of snapshots is dependant on the
frequency of data modifications and deletionso When legal or business requirements
necessitate major deletions•Temporal databases:
o When appropriate archival „packages“ are together (size, time period covered)
o Before major schema changeso When legal or business requirements
necessitate major deletions
2003
Swiss Federal Archives 12
Mixed snapshot and temporal database
Master-Data
Business transaction 1
1995-03-051996-07-23
snapshot database
temporal or pseudo-temporal database
• Often archiving and deletion time must be compliant with legal and business requirements
e.g. PERSONBusiness transaction
2 1996-01-101998-12-03Business transaction
3 1997-05-21
„now“
2003
Swiss Federal Archives 13
Archiving mixed snapshot and temporal databases
Master-Data
snapshot database
temporal or pseudo-temporal database
Completely archived 2001-01-01
Snapshot archived 2001-01-01Master-Data
Completely archived 2003-01-01
Snapshot archived 2003-01-01
Business transaction 1
1995-03-051996-07-23
Business transaction 2
1996-01-101998-12-03
2003
Swiss Federal Archives 14
time
Action 1
t1
Action 2
•In archived package no complete time-slice possible
•Schema changes may prevent backward assembly of archived elements
Archiving mixed snapshot and temporal databases (1)
Action 3
AA
A
A = archiving/deletion time
2003
Swiss Federal Archives 15
time
Business transaction 1
t1
•Snapshots allow synchronous research only for the point in time when the snapshot has been taken.
•Synchronous and diachronous research require snapshots and current archiving of action data “packages”.
Archiving mixed snapshot and temporal databases (2)
Business transaction 3
DD
D
D = required deletion time
Business transaction 2
Snapshot 1
2003
Swiss Federal Archives 16
Conclusions (1)•Temporal databases are best suited
for archiving. Archived snapshots allow synchronous and diachronous research, but queries may become complex.
•For other databases neither snapshots nor current archiving of (database or business) transactions can fully satisfy all use requirements.
2003
Swiss Federal Archives 17
Conclusions (2)•Archivists or preservers need to involve
themselves in the design process of databases in order to get the archival function appropriately implemented. Good implementation may be:o Build fully temporal databases.o Build in triggers that write all modifications
of the database to an archival store which is mirroring the current database as a kind of temporal database.
2003
Swiss Federal Archives 18
Open questions•How to deal with schema changes?•How to deal with partial
snapshots?Or, how to deal with referenced data which is not in the snapshot or in the archival „package“?