Refactoring database
description
Transcript of Refactoring database
Agenda
Evolutionary Database DevelopmentThe Process of Database RefactoringDatabase Refactoring StrategiesDatabase Refactoring Patterns
Evolutionary Database Development
Evolutionary Data Modeling
The Agile Model-Driven Development (AMDD) life cycle
Database Regression Testing
1. Quickly add a test, basically just enough code so that your tests now fail.
2. Run your tests - often the complete test suite, although for the sake of speed you may decide to run only a subset - to ensure that the new test does in fact fail.
3. Update your functional code so that it passes the new test.
4. Run your tests again. If the tests fail, return to Step 3; otherwise, start over again.
Configuration Management of Database Artifacts
• Data definition language (DDL) scripts to create the database schema
• Data load/extract/migration scripts• Data model files• Object/relational mapping meta data• Reference data• Stored procedure and trigger definitions• View definitions• Referential integrity constraints• Other database objects like sequences, indexes, and so on• Test data• Test data generation scripts• Test scripts
Developer Sandboxes
A "sandbox" is a fully functioning environment in which a system may be built, tested, and/or run.
The Process of Database Refactoring
The two categories of database architecture
• Single-Application Database Environments• Multi-Application Database Environments
Database Smells
• Multipurpose column• Multipurpose table• Redundant data• Tables with too many columns• Tables with too many rows• "Smart" columns• Fear of change
How Database Refactoring Fits In
Potential development activities on an evolutionary development project
Why DB Refactoring is Hard
Databases are highly coupled to external programs.
The database refactoring process
• Verify that a database refactoring is appropriate.
• Choose the most appropriate database refactoring.
• Deprecate the original database schema.
• Test before, during, and after.• Modify the database schema.• Migrate the source data.• Modify external access
program(s).• Run regression tests.• Version control your work.• Announce the refactoring.
Database Refactoring Strategies
Database Refactoring Strategies
• Smaller changes are easier to apply.• Uniquely identify individual refactorings.• Implement a large change by many small ones.• Have a database configuration table.• Prefer triggers over views or batch synchronization.• Choose a sufficient deprecation period.• Simplify your database change control board (CCB) strategy.• Simplify negotiations with other teams.• Encapsulate database access.• Be able to easily set up a database environment.• Do not duplicate SQL.• Put database assets under change control.• Beware of politics.
Version your database
• Uniquely identify individual refactorings.• Have a database configuration table.
Database Refactoring Patterns
Database Refactoring Categories
Category Description Examples
Structural A change to the definition of one or more tables or views.
•Rename Column
•Drop Table
•Introduce Surrogate Key
Data Quality A change that improves the quality of the information contained within a database.
•Add Lookup Table
•Consolidate Key Strategy
•Make Column Non-Nullable
Referential Integrity
A change that ensures that a referenced row exists within another table and/or that ensures that a row that is no longer needed is removed appropriately.
•Add Foreign Key Constraint
•Introduce Soft Delete
•Introduce Trigger For History
Database Refactoring Categories (Continued)
Category Description Example
Architectural A change that improves the overall manner in which external programs interact with a database.
•Introduce Read-Only Table
•Encapsulate Table With View
•Introduce Index
Method A change to a method (a stored procedure, stored function, or trigger) that improves its quality. Many code refactorings are applicable to database methods.
•Add Parameter
•Rename Method
•Extract Method
Non-Refactoring
Transformation
A change to your database schema that changes its semantics.
•Insert Data
•Introduce New Column
•Introduce New Table
Drop Column (1)
Remove a column from an existing table
Drop Column (2)
Motivation Refactor a database table design Refactor external applications, e.g. no longer used
Potential Tradeoffs The column being dropped may contain valuable data Tables containing many rows
Schema Update Mechanics Choose a remove strategy Drop the column Rework foreign keysPhase I:
COMMENT ON COLUMN person.gender IS ‘Drop date = May 11 2010’;
Phase II:
ALTER TABLE person DROP COLUMN gender;
Drop Column (3)
Data-Migration Mechanics Preserve data
Phase II (before drop column):
CREATE TABLE person_gender AS SELECT id, gender FROM person;
Access Program Update Mechanics Refactor code to use alternate data sources Slim down SELECT statement Refactor database inserts and updates
Drop Table (1)
Remove an existing table from the database
Drop Table (2)
Motivation a table is no longer required and/or used the table has been replaced by another similar data source
Potential Tradeoffs may need to preserve some or all of the data
Schema Update Mechanics resolve data-integrity issues
Phase I:COMMENT ON TABLE person IS ‘Drop date = May 11 2010’;
Phase II:DROP TABLE person;
Data-Migration MechanicsPhase II (before drop table):
CREATE TABLE person_backup AS SELECT * FROM person;
Access Program Update MechanicsAny external programs referencing this table must be refactored to access the alternative data source(s).
Rename Column (1)
Rename an existing table column
Rename Column (2)
Motivation increase the readability of your database schema enable database porting, e.g. reserved keyword conflict
Potential Tradeoffs the cost of refactoring the external applications
Schema Update Mechanics Introduce the new column Introduce a synchronization trigger Rename other columns
Phase I:
ALTER TABLE person ADD sex VARCHAR2(10);
COMMENT ON COLUMN person.gender ‘Renamed to sex, drop date = June 6 2010’;
UPDATE person SET sex = gender;
Rename Column (3)
CREATE OR REPLACE TRIGGER SynchronizeSexBEFORE INSERT OR UPDATE ON personREFERENCING OLD AS OLD NEW AS NEWFOR EACH ROWDECLAREBEGIN IF INSERTING THEN IF :NEW.sex IS NULL THEN :NEW.sex := :NEW.gender; END IF; IF :NEW.gender IS NULL THEN :NEW.gender := :NEW.sex; END IF; END IF;
IF UPDATING THEN IF NOT(:NEW.sex=:OLD.sex) THEN :NEW.gender:=:NEW.sex; END IF; IF NOT(:NEW.gender=:OLD.gender) THEN :NEW.sex:=:NEW.gender; END IF; END IF; END;/
Rename Column (4)
Phase II:
DROP TRIGGER SynchronizeSex;
ALTER TABLE person DROP COLUMN gender;
Data-Migration Mechanics copy all the data from the original column into the new
column Access Program Update Mechanics
External programs that reference this column must be updated to reference columns by its new name
Update any embedded SQL and/or mapping meta data, in this case, we have to update JPA entity
Rename Table (1)
Rename an existing table
Rename Table (2)
Motivation Clarify the table's meaning and intent Conform to accepted database naming conventions
Potential Tradeoffs The cost to refactoring the external applications that access the
table versus the improved readability and/or consistency provided by the new name
Schema Update MechanicsPhase I:
CREATE TABLE people(id NUMBER NOT NULL, firstname VARCHAR2(30), lastname VARCHAR2(20),gender VARCHAR2(10),lastchange DATECONSTRAINT pk_people PRIMARY KEY (id)
); COMMENT ON TABLE people IS ‘Renaming of person, final date = May 11 2010’COMMENT ON TABLE person IS ‘Renamed to people, drop date = June 6 2010’
Rename Table (3)
CREATE OR REPLACE TRIGGERSynchronizePeople
BEFORE INSERT OR UPDATE ON personREFERENCING OLD AS OLD NEW AS NEWFOR EACH ROWDECLAREBEGIN IF updating THEN findAndUpdateIfNotFoundCreatePeople; END IF; IF inserting THEN createNewIntoPeople; END IF; IF deleting THEN deleteFromPeople; END IF;END;/
CREATE OR REPLACE TRIGGERSynchronizePerson
BEFORE INSERT OR UPDATE ON peopleREFERENCING OLD AS OLD NEW AS NEWFOR EACH ROWDECLAREBEGIN IF updating THEN findAndUpdateIfNotFoundCreatePerson; END IF; IF inserting THEN createNewIntoPerson; END IF; IF deleting THEN deleteFromPerson; END IF;END;/
Rename Table (4)
Phase II:
DROP TRIGGER SynchronizePeople;
DROP TRIGGER SynchronizePerson;
DROP TABLE person;
Data-Migration Mechanics Must first copy the data
INSERT INTO people SELECT * FROM person; Access Program Update Mechanics
External access programs must be refactored to work with new table rather than old table
Add Lookup Table (1)
Create a lookup table for an existing column
Add Lookup Table (2)
Motivation Introduce referential integrity Provide code lookup Replace a column constraint Provide detailed descriptions
Potential Tradeoffs Need to be able to provide valid data to populate the lookup table There will be a performance impact resulting from the addition of a foreign
key constraint Schema Update Mechanics
Determine the table structure Introduce the table Determine lookup data Introduce referential constraint
CREATE TABLE state(
State CHAR(2) NOT NULL, Name CHAR(50), CONSTRAINT pk_state PRIMARY KEY (state)
);
Add Lookup Table (3)
ALTER TABLE address ADD CONSTRAINT fk_address_state
FOREIGN KEY (state) REFERENCES state DEFERRABLE; Data-Migration Mechanics
ensure that the data values in the column have corresponding values in the lookup table
INSERT INTO state(state)
SELECT DISTINCT UPPER(state) FROM address;
UPDATE address SET state = ‘CA’ WHERE UPPER(state) in (‘CA’, ‘CALIFORNIA’);
UPDATE state SET name = ‘California’ WHERE state = ‘CA’;
Access Program Update Mechanics Ensure that external programs now use the data values
from the lookup table Some programs may choose to cache the data values,
whereas others will access as needed
Introduce Column Constraint (1)
Introduce a column constraint in an existing table
Introduce Column Constraint (2)
Motivation Ensure that all applications interacting with your database persist
valid data in the column Potential Tradeoffs
Individual applications may have their own unique version of a constraint for this column
Schema Update MechanicsALTER TABLE person ADD CONSTRAINT ck_person_gender CHECK (gender IN (‘MALE’, ‘FEMALE’, ‘UNKNOWN’));
Data-Migration Mechanics Make sure that existing data conforms to the constraint that is
being applied on the columnUPDATE person SET gender = ‘UNKNOWN’ WHERE gender IS NULL;
Access Program Update Mechanics ensure that the access programs can handle any errors being
thrown by the database when the data being written to the column does not conform to the constraint
Introduce Default Value (1)
Let the database provide a default value for an existing table column
Introduce Default Value (2)
Motivation Want the value of a column to have a default value populated when a new
row is added to a table Potential Tradeoffs
Identifying a true default can be difficult Unintended side effects Confused context
Schema Update Mechanics ALTER TABLE person MODIFY lastchange DEFAULT SYSDATE;
Data-Migration Mechanics The existing rows may already have null values in this column, rows that
will not be automatically updated as a result of adding a default valueUPDATE person SET lastchange = sysdate WHERE lastchange IS NULL;
Access Program Update Mechanics Invariants are broken by the new value Code exists to apply default values Existing source code assumes a different default value
Make Column Not-Nullable (1)
Change an existing column such that it does not accept any null values
Make Column Not-Nullable (2)
Motivation Every application updating this column is forced to provide a value for it Remove repetitious logic within applications that implement a not-null
check Potential Tradeoffs
Some programs may currently assume that the column is nullable and therefore not provide such a value
Schema Update MechanicsALTER TABLE person MODIFY lastname NOT NULL;
Data-Migration Mechanics May need to clean the existing if there are existing rows with a null value
in the columnUPDATE person SET lastname = ‘???’ where lastname IS NULL;
Access Program Update Mechanics Refactor all the external programs to provide an appropriate value to this
column whenever they modify a row within the table Must also detect and then handle any new exceptions that are thrown by
the database
Add Foreign Key Constraint (1)
Add a foreign key constraint to an existing table to enforce a relationship to another table
Add Foreign Key Constraint (2)
Motivation Enforce data dependencies at the database level
Potential Tradeoffs Reduce performance within your database Must be aware of the table dependencies in the database
Schema Update Mechanics Choose a constraint checking strategy: immediate/deferred Create the foreign key constraint Introduce an index for the PK of the foreign table (optional)
ALTER TABLE address ADD CONSTAINT fk_person_state
FOREIGN KEY (state) REFERENCES state DEFERRABLE;
Add Foreign Key Constraint (3)
Data-Migration Mechanics Ensure the referenced data exists Ensure that the foreign table contains all required rows Ensure that source table's foreign key column contains
valid values Introduce a default value for the foreign key column
Access Program Update Mechanics Identify and then update any external programs that
modify data in the table where the foreign key constraint was added (Similar/Different/Nonexistent RI code)
All external programs must be updated to handle any exception(s) thrown by the database as the result of the new foreign key constraint
Introduce Soft Delete (1)
Introduce a flag to an existing table that indicates that a row has been deleted
Introduce Soft Delete (2)
Motivation preserve all application data, typically for historical means
Potential Tradeoffs Performance is potentially impacted
Schema Update Mechanics Introduce the identifying column Determine how to update the flag Develop deletion code Develop insertion code
ALTER TABLE person ADD is_deleted BOOLEAN; ALTER TABLE person MODIFY is_deleted DEFAULT FALSE;
Data-Migration MechanicsUPDATE person SET is_deleted = FALSE;
Access Program Update Mechanics change read queries to ensure that data read from the database
has not been marked as deleted all external programs must change physical deletes to updates
Introduce Index (1)
Introduce a new index of either unique or nonunique type
Introduce Index (2)
Motivation Increase query performance on your database reads
Potential Tradeoffs Too many indexes on a table will degrade performance Remove the duplicates first before applying unique index
Schema Update Mechanics Determine type of index Add a new index Provide more disk space
CREATE UNIQUE INDEX unq_person_ssn ON person(ssn);
Data-Migration Mechanics Check for duplicate values if introducing a unique index Duplicate values must be updated or use a nonunique index instead
Access Program Update Mechanics Analyze dependencies to determine which external programs to update Change your queries to make use of this new index
Introduce Read-Only Table (1)
Create a read-only data store based on existing tables in the database
Introduce Read-Only Table (2)
Motivation Improve query performance Summarize data for reporting Create redundant data Replace redundant reads Data security Improve database readability
Potential Tradeoffs The users of the read-only table need to understand both the
timeliness of the copied data as well as the volatility of the source data to determine whether the read-only table is acceptable
Schema Update Mechanics Introduce the new table/materialized view Determine a population strategy
Introduce Read-Only Table (3)
Via materialized view:CREATE MATERIALIZED VIEW person_mv
BUILD IMMEDIATE
REFRESH FORCE ON COMMIT
WITH PRIMARY KEY
AS
SELECT p.id, p.firstname, p.lastname, p.birthday, a.line1 || a.line2 || s.name || a.zipcode
FROM person p, address a, state s
WHERE p.address_id = a.id
AND a.state = s.state;
/
Via new table:CREATE TABLE person_mv (
id NUMBER NOT NULL,
firstname VARCHAR2(30),
lastname VARCHAR2(20),
address VARCHAR2(255),
CONSTRAINT person_mv_id PRIMARY KEY (id)
);
COMMENT ON person_mv ‘read-only table’;
/
Introduce Read-Only Table (4)
Data-Migration Mechanics Copy all the relevant source data into the read-only table Apply your population strategy (real-time or periodic batch)
Periodic refresh Materialized views Use trigger-based synchronization Use real-time application updates
INSERT INTO person_mv(id,firstname,lastname,birthday,address)
SELECT p.id, p.firstname, p.lastname, p.birthday, a.line1 || a.line2 || s.name || a.zipcode
FROM person p, address a, state s
WHERE p.address_id = a.id
AND a.state = s.state;
Access Program Update Mechanics Make sure that the application uses this for read-only purposes Must change all the places where you currently access the source
tables and rework them to use this instead
References
http://www.agiledata.org/Refactoring Databases: Evolutionary Database Design
Questions?
Gossip?
Rumor?
Thanks