Post on 01-Jan-2016
description
Oracle conversion example:Oracle conversion example:Initial approaches for UCM Initial approaches for UCM
source datasource data
33
IndexIndex Data ModelingData Modeling Current DatafileCurrent Datafile Current DataloadCurrent Dataload Data OverlookData Overlook Two ApproachesTwo Approaches First ApproachFirst Approach Data DistributionData Distribution AdvantagesAdvantages DisadvantagesDisadvantages
44
Second ApproachSecond Approach Basic ModelingBasic Modeling AdvantagesAdvantages Advance WorkAdvance Work Care neededCare needed Our RecommendationOur Recommendation TasksTasks Next stepsNext steps ? Questions? Questions
55
Data modelingData modeling
Conversion of data from Legacy Conversion of data from Legacy (Fortran) to RDBMS (Oracle)(Fortran) to RDBMS (Oracle)
Hardware/softwareHardware/software Sun E6900, OS Solaris 5.10/12 cpu/96 G Sun E6900, OS Solaris 5.10/12 cpu/96 G
RAMRAM Database - Oracle 10gDatabase - Oracle 10g Oracle designer / ErwinOracle designer / Erwin
66
Current datafileCurrent datafile
Big datafile
Geo
Census
Base Data
Legacy process
Data modelingOracle db
Reports
Data Feeds
Data updates Pl/SQL,Shell,C,ETL tool
77
Current DataloadCurrent Dataload
UCM dataUCM data Fortran formatFortran format One big file w/ 180 M recordsOne big file w/ 180 M records Record length is 1543 bytesRecord length is 1543 bytes Most of the fields are varchar2Most of the fields are varchar2 Many fields are blank/no dataMany fields are blank/no data Performance in Oracle inadequate without Performance in Oracle inadequate without
schema redesign to leverage RDMS schema redesign to leverage RDMS capabilitiescapabilities
88
Data Overview (approx)Data Overview (approx)
State of NYState of NY State of CAState of CA State of TXState of TX
District of District of ColumbiaColumbia
DelawareDelaware ConnecticutConnecticut
20 M 31 G20 M 31 G 34 M 52 G34 M 52 G 25 M 38 G25 M 38 G
500 K 750 M500 K 750 M 1 M 1.5 G1 M 1.5 G 1 M 1.5 G1 M 1.5 G
99
Two approachesTwo approaches
First ApproachFirst Approach
Break datafile on the basis of dataBreak datafile on the basis of data E.g. RO level (12)E.g. RO level (12) State level (54-56), including DC, Puerto Rico etc.State level (54-56), including DC, Puerto Rico etc.
Second ApproachSecond Approach
Break datafile into multiple tables with Break datafile into multiple tables with change in field definitions using relational change in field definitions using relational modelmodel
1010
First approachFirst approachBreak datafile on the basis of Break datafile on the basis of
datadataCurrent datafile
Table_CA Table_NY Table_XX Table_YY Table_54
1111
Data distributionData distribution
Uneven data distributionUneven data distribution
Big data tables will be 30+ GBig data tables will be 30+ G
Small data tables will be close to < 1 Small data tables will be close to < 1 GG
1212
Advantages of this kind of Advantages of this kind of segmenting/partitioning:segmenting/partitioning:
State level queries will be faster than State level queries will be faster than currentcurrent
If the data is separated by RO, the If the data is separated by RO, the data will be more distributed w/ less data will be more distributed w/ less tables (close to 12 instead 54-56)tables (close to 12 instead 54-56)
1313
DisadvantagesDisadvantages
Too many tablesToo many tables Many fields are empty and varchar2(100)Many fields are empty and varchar2(100) No normalizationNo normalization Existing queries need to be changed a lotExisting queries need to be changed a lot No normalization technique is used.No normalization technique is used.
For small tables, query will run fast but for For small tables, query will run fast but for big tables, there will be a lot of overheadbig tables, there will be a lot of overhead
Operational tables will be same in numberOperational tables will be same in number Too complicated to run queries, may confuse Too complicated to run queries, may confuse
users while joining main and operational users while joining main and operational tablestables
1414
Second approachSecond approachBreak datafile into few relational Break datafile into few relational
tables with change in field tables with change in field definitionsdefinitionsCurrent datafile
Table1
Table2
Table4
Table3
MAFIDM
AFI
D MAFID
MAFID MAFID
MAFID
1515
Basic ModelingBasic Modeling Database design/logical and physicalDatabase design/logical and physical
Relations will be defined based on a primary keyRelations will be defined based on a primary key In this case, it will be MAFID, which is uniqueIn this case, it will be MAFID, which is unique
varchar2(100) field could be converted to smaller fields varchar2(100) field could be converted to smaller fields based on actual field lengthsbased on actual field lengths
All fields will be mapped with at least one of the fields in the All fields will be mapped with at least one of the fields in the new tablesnew tables
Data will be inserted in multiple efficient tables based on Data will be inserted in multiple efficient tables based on updated data model using relational database design updated data model using relational database design principlesprinciples
1616
AdvantagesAdvantages
FasterFaster QueriesQueries UpdatesUpdates DeletesDeletes AdditionsAdditions
Less maintenanceLess maintenance Same approach can be used for Same approach can be used for
transactional/operational datatransactional/operational data
1717
Advance workAdvance work
Identify each and every field of UNM dataIdentify each and every field of UNM data Check/Define field lengths of each fieldCheck/Define field lengths of each field Map every field to new schemaMap every field to new schema Can some fields be merged together?Can some fields be merged together? Identify and remove duplicate data Identify and remove duplicate data
elements in modelelements in model Define tables and relationships and create Define tables and relationships and create
new schemanew schema Break and load data into these tablesBreak and load data into these tables
1818
Care neededCare needed
Current datafile will be broken into Current datafile will be broken into multiple datafiles for data processingmultiple datafiles for data processing
Load one by one datafile into tablesLoad one by one datafile into tables Test and demonstrate completeness Test and demonstrate completeness
of new modelof new model Craft comparison to Craft comparison to proveprove source and source and
new schema properly include all new schema properly include all Census dataCensus data
1919
Our RecommendationOur Recommendation **** Second Approach Second Approach ****
Why ?Why ? Data distribution will be uniformData distribution will be uniform Less unwanted data is moved to separate tablesLess unwanted data is moved to separate tables This will reduce overhead on the queries of any updatesThis will reduce overhead on the queries of any updates Existing queries can be used with little modificationsExisting queries can be used with little modifications Ongoing data maintenance will be more efficient in Ongoing data maintenance will be more efficient in
RDBMS RDBMS Additional data like RPS can be easily uploaded using Additional data like RPS can be easily uploaded using
same queriessame queries
2020
TasksTasks
Design database using data modeling Design database using data modeling tool/ Oracle designer / Erwin etc.tool/ Oracle designer / Erwin etc.
Create test data from original datafileCreate test data from original datafile Load test data into database tablesLoad test data into database tables Create test scripts to check data Create test scripts to check data
consistency consistency Check indexes for required queriesCheck indexes for required queries Test old data vs. new data Test old data vs. new data
2121
Continued…Continued…
Break data into small filesBreak data into small files Load full data into tablesLoad full data into tables Unit test on data for consistencyUnit test on data for consistency Run queries on the databaseRun queries on the database If needed, fine tune databaseIf needed, fine tune database Use same approach for transactional Use same approach for transactional
data like RPS datadata like RPS data
2222
Next steps…Next steps…
Continued collaboration with Census Continued collaboration with Census team to improve domain team to improve domain understanding for new team understanding for new team membersmembers
Access to Oracle database tools on Access to Oracle database tools on team’s workstationsteam’s workstations
Access to operational Oracle instance Access to operational Oracle instance to begin development of approachto begin development of approach