Refactoring database

Refactoring Database

Perficient ChinaLancelot Zhu

[email protected]

Agenda

Evolutionary Database DevelopmentThe Process of Database RefactoringDatabase Refactoring StrategiesDatabase Refactoring Patterns

Evolutionary Database Development

Evolutionary Data Modeling

The Agile Model-Driven Development (AMDD) life cycle

Database Regression Testing

1. Quickly add a test, basically just enough code so that your tests now fail.

2. Run your tests - often the complete test suite, although for the sake of speed you may decide to run only a subset - to ensure that the new test does in fact fail.

3. Update your functional code so that it passes the new test.

4. Run your tests again. If the tests fail, return to Step 3; otherwise, start over again.

Configuration Management of Database Artifacts

• Data definition language (DDL) scripts to create the database schema

• Data load/extract/migration scripts• Data model files• Object/relational mapping meta data• Reference data• Stored procedure and trigger definitions• View definitions• Referential integrity constraints• Other database objects like sequences, indexes, and so on• Test data• Test data generation scripts• Test scripts

mk:@MSITStore:C:%5Crefactoringdatabase%5C%5B2006IT%E7%B1%BB%E6%96%B0%E4%B9%A6%E5%90%88%E9%9B%86%5D.AddisonWesley_Refactoring.Databases.Evolutionary.Database.Design(2006.3).chm::/0321293533/23971536.html

Developer Sandboxes

A "sandbox" is a fully functioning environment in which a system may be built, tested, and/or run.

The Process of Database Refactoring

The two categories of database architecture

• Single-Application Database Environments• Multi-Application Database Environments

Database Smells

• Multipurpose column• Multipurpose table• Redundant data• Tables with too many columns• Tables with too many rows• "Smart" columns• Fear of change

How Database Refactoring Fits In

Potential development activities on an evolutionary development project

Why DB Refactoring is Hard

Databases are highly coupled to external programs.

The database refactoring process

• Verify that a database refactoring is appropriate.

• Choose the most appropriate database refactoring.

• Deprecate the original database schema.

• Test before, during, and after.• Modify the database schema.• Migrate the source data.• Modify external access

program(s).• Run regression tests.• Version control your work.• Announce the refactoring.

Database Refactoring Strategies

Database Refactoring Strategies

• Smaller changes are easier to apply.• Uniquely identify individual refactorings.• Implement a large change by many small ones.• Have a database configuration table.• Prefer triggers over views or batch synchronization.• Choose a sufficient deprecation period.• Simplify your database change control board (CCB) strategy.• Simplify negotiations with other teams.• Encapsulate database access.• Be able to easily set up a database environment.• Do not duplicate SQL.• Put database assets under change control.• Beware of politics.

Version your database

• Uniquely identify individual refactorings.• Have a database configuration table.

Database Refactoring Patterns

Database Refactoring Categories

Category Description Examples

Structural A change to the definition of one or more tables or views.

•Rename Column

•Drop Table

•Introduce Surrogate Key

Data Quality A change that improves the quality of the information contained within a database.

•Add Lookup Table

•Consolidate Key Strategy

•Make Column Non-Nullable

Referential Integrity

A change that ensures that a referenced row exists within another table and/or that ensures that a row that is no longer needed is removed appropriately.

•Add Foreign Key Constraint

•Introduce Soft Delete

•Introduce Trigger For History

Database Refactoring Categories (Continued)

Category Description Example

Architectural A change that improves the overall manner in which external programs interact with a database.

•Introduce Read-Only Table

•Encapsulate Table With View

•Introduce Index

Method A change to a method (a stored procedure, stored function, or trigger) that improves its quality. Many code refactorings are applicable to database methods.

•Add Parameter

•Rename Method

•Extract Method

Non-Refactoring

Transformation

A change to your database schema that changes its semantics.

•Insert Data

•Introduce New Column

•Introduce New Table

Drop Column (1)

Remove a column from an existing table

Drop Column (2)

Motivation Refactor a database table design Refactor external applications, e.g. no longer used

Potential Tradeoffs The column being dropped may contain valuable data Tables containing many rows

Schema Update Mechanics Choose a remove strategy Drop the column Rework foreign keysPhase I:

COMMENT ON COLUMN person.gender IS ‘Drop date = May 11 2010’;

Phase II:

ALTER TABLE person DROP COLUMN gender;

Drop Column (3)

Data-Migration Mechanics Preserve data

Phase II (before drop column):

CREATE TABLE person_gender AS SELECT id, gender FROM person;

Access Program Update Mechanics Refactor code to use alternate data sources Slim down SELECT statement Refactor database inserts and updates

Drop Table (1)

Remove an existing table from the database

Drop Table (2)

Motivation a table is no longer required and/or used the table has been replaced by another similar data source

Potential Tradeoffs may need to preserve some or all of the data

Schema Update Mechanics resolve data-integrity issues

Phase I:COMMENT ON TABLE person IS ‘Drop date = May 11 2010’;

Phase II:DROP TABLE person;

Data-Migration MechanicsPhase II (before drop table):

CREATE TABLE person_backup AS SELECT * FROM person;

Access Program Update MechanicsAny external programs referencing this table must be refactored to access the alternative data source(s).

Rename Column (1)

Rename an existing table column

Rename Column (2)

Motivation increase the readability of your database schema enable database porting, e.g. reserved keyword conflict

Potential Tradeoffs the cost of refactoring the external applications

Schema Update Mechanics Introduce the new column Introduce a synchronization trigger Rename other columns

Phase I:

ALTER TABLE person ADD sex VARCHAR2(10);

COMMENT ON COLUMN person.gender ‘Renamed to sex, drop date = June 6 2010’;

UPDATE person SET sex = gender;

Rename Column (3)

CREATE OR REPLACE TRIGGER SynchronizeSexBEFORE INSERT OR UPDATE ON personREFERENCING OLD AS OLD NEW AS NEWFOR EACH ROWDECLAREBEGIN IF INSERTING THEN IF :NEW.sex IS NULL THEN :NEW.sex := :NEW.gender; END IF; IF :NEW.gender IS NULL THEN :NEW.gender := :NEW.sex; END IF; END IF;

IF UPDATING THEN IF NOT(:NEW.sex=:OLD.sex) THEN :NEW.gender:=:NEW.sex; END IF; IF NOT(:NEW.gender=:OLD.gender) THEN :NEW.sex:=:NEW.gender; END IF; END IF; END;/

Rename Column (4)

Phase II:

DROP TRIGGER SynchronizeSex;

ALTER TABLE person DROP COLUMN gender;

Data-Migration Mechanics copy all the data from the original column into the new

column Access Program Update Mechanics

External programs that reference this column must be updated to reference columns by its new name

Update any embedded SQL and/or mapping meta data, in this case, we have to update JPA entity

Rename Table (1)

Rename an existing table

Rename Table (2)

Motivation Clarify the table's meaning and intent Conform to accepted database naming conventions

Potential Tradeoffs The cost to refactoring the external applications that access the

table versus the improved readability and/or consistency provided by the new name

Schema Update MechanicsPhase I:

CREATE TABLE people(id NUMBER NOT NULL, firstname VARCHAR2(30), lastname VARCHAR2(20),gender VARCHAR2(10),lastchange DATECONSTRAINT pk_people PRIMARY KEY (id)

); COMMENT ON TABLE people IS ‘Renaming of person, final date = May 11 2010’COMMENT ON TABLE person IS ‘Renamed to people, drop date = June 6 2010’

Rename Table (3)

CREATE OR REPLACE TRIGGERSynchronizePeople

BEFORE INSERT OR UPDATE ON personREFERENCING OLD AS OLD NEW AS NEWFOR EACH ROWDECLAREBEGIN IF updating THEN findAndUpdateIfNotFoundCreatePeople; END IF; IF inserting THEN createNewIntoPeople; END IF; IF deleting THEN deleteFromPeople; END IF;END;/

CREATE OR REPLACE TRIGGERSynchronizePerson

BEFORE INSERT OR UPDATE ON peopleREFERENCING OLD AS OLD NEW AS NEWFOR EACH ROWDECLAREBEGIN IF updating THEN findAndUpdateIfNotFoundCreatePerson; END IF; IF inserting THEN createNewIntoPerson; END IF; IF deleting THEN deleteFromPerson; END IF;END;/

Rename Table (4)

Phase II:

DROP TRIGGER SynchronizePeople;

DROP TRIGGER SynchronizePerson;

DROP TABLE person;

Data-Migration Mechanics Must first copy the data

INSERT INTO people SELECT * FROM person; Access Program Update Mechanics

External access programs must be refactored to work with new table rather than old table

Add Lookup Table (1)

Create a lookup table for an existing column


Motivation Introduce referential integrity Provide code lookup Replace a column constraint Provide detailed descriptions

Potential Tradeoffs Need to be able to provide valid data to populate the lookup table There will be a performance impact resulting from the addition of a foreign

key constraint Schema Update Mechanics

Determine the table structure Introduce the table Determine lookup data Introduce referential constraint

CREATE TABLE state(

State CHAR(2) NOT NULL, Name CHAR(50), CONSTRAINT pk_state PRIMARY KEY (state)

);


ALTER TABLE address ADD CONSTRAINT fk_address_state

FOREIGN KEY (state) REFERENCES state DEFERRABLE; Data-Migration Mechanics

ensure that the data values in the column have corresponding values in the lookup table

INSERT INTO state(state)

SELECT DISTINCT UPPER(state) FROM address;

UPDATE address SET state = ‘CA’ WHERE UPPER(state) in (‘CA’, ‘CALIFORNIA’);

UPDATE state SET name = ‘California’ WHERE state = ‘CA’;

Access Program Update Mechanics Ensure that external programs now use the data values

from the lookup table Some programs may choose to cache the data values,

whereas others will access as needed

Introduce Column Constraint (1)

Introduce a column constraint in an existing table

Introduce Column Constraint (2)

Motivation Ensure that all applications interacting with your database persist

valid data in the column Potential Tradeoffs

Individual applications may have their own unique version of a constraint for this column

Schema Update MechanicsALTER TABLE person ADD CONSTRAINT ck_person_gender CHECK (gender IN (‘MALE’, ‘FEMALE’, ‘UNKNOWN’));

Data-Migration Mechanics Make sure that existing data conforms to the constraint that is

being applied on the columnUPDATE person SET gender = ‘UNKNOWN’ WHERE gender IS NULL;

Access Program Update Mechanics ensure that the access programs can handle any errors being

thrown by the database when the data being written to the column does not conform to the constraint

Introduce Default Value (1)

Let the database provide a default value for an existing table column

Introduce Default Value (2)

Motivation Want the value of a column to have a default value populated when a new

row is added to a table Potential Tradeoffs

Identifying a true default can be difficult Unintended side effects Confused context

Schema Update Mechanics ALTER TABLE person MODIFY lastchange DEFAULT SYSDATE;

Data-Migration Mechanics The existing rows may already have null values in this column, rows that

will not be automatically updated as a result of adding a default valueUPDATE person SET lastchange = sysdate WHERE lastchange IS NULL;

Access Program Update Mechanics Invariants are broken by the new value Code exists to apply default values Existing source code assumes a different default value

Make Column Not-Nullable (1)

Change an existing column such that it does not accept any null values

Make Column Not-Nullable (2)

Motivation Every application updating this column is forced to provide a value for it Remove repetitious logic within applications that implement a not-null

check Potential Tradeoffs

Some programs may currently assume that the column is nullable and therefore not provide such a value

Schema Update MechanicsALTER TABLE person MODIFY lastname NOT NULL;

Data-Migration Mechanics May need to clean the existing if there are existing rows with a null value

in the columnUPDATE person SET lastname = ‘???’ where lastname IS NULL;

Access Program Update Mechanics Refactor all the external programs to provide an appropriate value to this

column whenever they modify a row within the table Must also detect and then handle any new exceptions that are thrown by

the database

Add Foreign Key Constraint (1)

Add a foreign key constraint to an existing table to enforce a relationship to another table


Motivation Enforce data dependencies at the database level

Potential Tradeoffs Reduce performance within your database Must be aware of the table dependencies in the database

Schema Update Mechanics Choose a constraint checking strategy: immediate/deferred Create the foreign key constraint Introduce an index for the PK of the foreign table (optional)

ALTER TABLE address ADD CONSTAINT fk_person_state

FOREIGN KEY (state) REFERENCES state DEFERRABLE;


Data-Migration Mechanics Ensure the referenced data exists Ensure that the foreign table contains all required rows Ensure that source table's foreign key column contains

valid values Introduce a default value for the foreign key column

Access Program Update Mechanics Identify and then update any external programs that

modify data in the table where the foreign key constraint was added (Similar/Different/Nonexistent RI code)

All external programs must be updated to handle any exception(s) thrown by the database as the result of the new foreign key constraint

Introduce Soft Delete (1)

Introduce a flag to an existing table that indicates that a row has been deleted

Introduce Soft Delete (2)

Motivation preserve all application data, typically for historical means

Potential Tradeoffs Performance is potentially impacted

Schema Update Mechanics Introduce the identifying column Determine how to update the flag Develop deletion code Develop insertion code

ALTER TABLE person ADD is_deleted BOOLEAN; ALTER TABLE person MODIFY is_deleted DEFAULT FALSE;

Data-Migration MechanicsUPDATE person SET is_deleted = FALSE;

Access Program Update Mechanics change read queries to ensure that data read from the database

has not been marked as deleted all external programs must change physical deletes to updates

Introduce Index (1)

Introduce a new index of either unique or nonunique type

Introduce Index (2)

Motivation Increase query performance on your database reads

Potential Tradeoffs Too many indexes on a table will degrade performance Remove the duplicates first before applying unique index

Schema Update Mechanics Determine type of index Add a new index Provide more disk space

CREATE UNIQUE INDEX unq_person_ssn ON person(ssn);

Data-Migration Mechanics Check for duplicate values if introducing a unique index Duplicate values must be updated or use a nonunique index instead

Access Program Update Mechanics Analyze dependencies to determine which external programs to update Change your queries to make use of this new index

Introduce Read-Only Table (1)

Create a read-only data store based on existing tables in the database


Motivation Improve query performance Summarize data for reporting Create redundant data Replace redundant reads Data security Improve database readability

Potential Tradeoffs The users of the read-only table need to understand both the

timeliness of the copied data as well as the volatility of the source data to determine whether the read-only table is acceptable

Schema Update Mechanics Introduce the new table/materialized view Determine a population strategy


Via materialized view:CREATE MATERIALIZED VIEW person_mv

BUILD IMMEDIATE

REFRESH FORCE ON COMMIT

WITH PRIMARY KEY

AS

SELECT p.id, p.firstname, p.lastname, p.birthday, a.line1 || a.line2 || s.name || a.zipcode

FROM person p, address a, state s

WHERE p.address_id = a.id

AND a.state = s.state;

/

Via new table:CREATE TABLE person_mv (

id NUMBER NOT NULL,

firstname VARCHAR2(30),

lastname VARCHAR2(20),

address VARCHAR2(255),

CONSTRAINT person_mv_id PRIMARY KEY (id)

);

COMMENT ON person_mv ‘read-only table’;

/


Data-Migration Mechanics Copy all the relevant source data into the read-only table Apply your population strategy (real-time or periodic batch)

Periodic refresh Materialized views Use trigger-based synchronization Use real-time application updates

INSERT INTO person_mv(id,firstname,lastname,birthday,address)

SELECT p.id, p.firstname, p.lastname, p.birthday, a.line1 || a.line2 || s.name || a.zipcode

FROM person p, address a, state s

WHERE p.address_id = a.id

AND a.state = s.state;

Access Program Update Mechanics Make sure that the application uses this for read-only purposes Must change all the places where you currently access the source

tables and rework them to use this instead

References

http://www.agiledata.org/Refactoring Databases: Evolutionary Database Design

http://www.agiledata.org/



Questions?

Gossip?

Rumor?

Thanks

Refactoring database

Documents

Transcript of Refactoring database