1 Fundamentals of Relational Database Design and Database Planning J.TrumboFermilabCSS-DSG.

77
1 Fundamentals of Fundamentals of Relational Database Relational Database Design and Design and Database Planning Database Planning J.Trumbo J.Trumbo Fermilab Fermilab CSS-DSG CSS-DSG
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    0

Transcript of 1 Fundamentals of Relational Database Design and Database Planning J.TrumboFermilabCSS-DSG.

11

Fundamentals of Relational Fundamentals of Relational Database Design andDatabase Design and

Database PlanningDatabase Planning

J.TrumboJ.Trumbo

FermilabFermilab

CSS-DSGCSS-DSG

22

OutlineOutline

DefinitionsDefinitions Selecting a dbmsSelecting a dbms Selecting an application layerSelecting an application layer Relational DesignRelational Design PlanningPlanning A very few words about ReplicationA very few words about Replication SpaceSpace

33

DefinitionsDefinitionsWhat is a database?What is a database?

A database is the implementation of freeware or A database is the implementation of freeware or commercial software that provides a means to commercial software that provides a means to organize and retrieve data. The database is the organize and retrieve data. The database is the set of physical files in which all the objects and set of physical files in which all the objects and database metadata are stored. These files can database metadata are stored. These files can usually be seen at the operating system level. usually be seen at the operating system level. This talk will focus on the organize aspect of This talk will focus on the organize aspect of data storage and retrieval.data storage and retrieval.

Commercial vendors include MicroSoft and Oracle.Commercial vendors include MicroSoft and Oracle.Freeware products include mysql and postgres.Freeware products include mysql and postgres.For this discussion, all points/issues apply to both For this discussion, all points/issues apply to both

commercial and freeware products.commercial and freeware products.

44

DefinitionsDefinitionsInstanceInstance

A database A database instanceinstance, or an ‘instance’ is , or an ‘instance’ is made up of the background processes made up of the background processes needed by the database software.needed by the database software.

These processes usually include a These processes usually include a process monitor, session monitor, lock process monitor, session monitor, lock monitor, etc. They will vary from monitor, etc. They will vary from database vendor to database vendor.database vendor to database vendor.

55

DefinitionsDefinitionsWhat is a schema? What is a schema?

A SCHEMA IS NOT A DATABASE, AND A DATABASE IS A SCHEMA IS NOT A DATABASE, AND A DATABASE IS NOT A SCHEMA.NOT A SCHEMA.

A A database instancedatabase instance controls 0 or more databases. controls 0 or more databases. A A databasedatabase contains 0 or more database application contains 0 or more database application

schemas.schemas.A A database application schemadatabase application schema is the set of is the set of

database objects that apply to a specific application. database objects that apply to a specific application. These objects are relational in nature, and are related These objects are relational in nature, and are related to each other, within a database to serve a specific to each other, within a database to serve a specific functionality. For example payroll, purchasing, functionality. For example payroll, purchasing, calibration, trigger, etc. A database application calibration, trigger, etc. A database application schema not a database. Usually several schemas schema not a database. Usually several schemas coexist in a database.coexist in a database.

A A database applicationdatabase application is the code base to is the code base to manipulate and retrieve the data stored in the manipulate and retrieve the data stored in the database application schema.database application schema.

66

Definitions Cont.Definitions Cont.Primary DefinitionsPrimary Definitions

TableTable, a set of columns that contain data. In , a set of columns that contain data. In the old days, a table was called a file.the old days, a table was called a file.

RowRow, a set of columns from a table reflecting a , a set of columns from a table reflecting a record.record.

IndexIndex, an object that allows for fast retrieval , an object that allows for fast retrieval of table rows. Every primary key and foreign of table rows. Every primary key and foreign key should have an index for retrieval speed.key should have an index for retrieval speed.

Primary keyPrimary key, often designated pk, is 1 or , often designated pk, is 1 or more columns in a table that makes a record more columns in a table that makes a record unique. unique.

77

Definitions Cont.Definitions Cont.Primary DefinitionsPrimary Definitions

Foreign keyForeign key, often designated fk, is a , often designated fk, is a common column common between 2 tables common column common between 2 tables that define the relationship between those 2 that define the relationship between those 2 tables.tables.

Foreign keys are either mandatory or optional. Foreign keys are either mandatory or optional. Mandatory forces a child to have a parent by Mandatory forces a child to have a parent by creating a not null column at the child. creating a not null column at the child. Optional allows a child to exist without a Optional allows a child to exist without a parent, allowing a nullable column at the child parent, allowing a nullable column at the child table (not a common circumstance).table (not a common circumstance).

88

Definitions Cont.Definitions Cont.Primary DefinitionsPrimary Definitions

Entity Relationship DiagramEntity Relationship Diagram or ER or ER is a pictorial representation of the is a pictorial representation of the application schema.application schema.

99

Er ExampleEr ExampleSTATUS# STAT_IDo STATUS_NAME* CREATE_DATE* CREATE_USER...

REVISION# REV_IDo REV_NAME* REV_DATE* CREATE_DATE* CREATE_USERo UPDATE_DATEo UPDATE_USER

HISTORY# HIST_ID* DATE_CHANGED* REASON* CREATE_DATE* CREATE_USERo UPDATE_DATEo UPDATE_USER

UNIT# UNIT_ID* UNIT_NAME* CREATE_DATE* CREATE_USER* UPDATE_DATEo UPDATE_USER

PARAMETER# PAR_ID* PAR_NAME* TEXT* VALUEo UPPER_LIMITo LOWER_LIMIT* SRC* DOCUMENTATIONo DRAWINGS* CREATE_DATE* CREATE_USER...

OWNER# OWNER_ID* FIRST_NAME* LAST_NAME* PASSWORD* EMAIL* USERNAME* CREATE_DATE* CREATE_USER...

MODULE# MODULE_ID* MODULE_NAME* CREATE_DATE* CREATE_USER...

describes

hasassociated with

may have

has

creates

describes

has

have

describes

part of

associated withhave

own

1010

Definitions Cont.Definitions Cont.Primary DefinitionsPrimary Definitions

Constraints Constraints are rules residing in the are rules residing in the database’s data dictionary database’s data dictionary governing relationships and governing relationships and dictating the ways records are dictating the ways records are manipulated, what is a legal move manipulated, what is a legal move vs. what is an illegal move. These vs. what is an illegal move. These are of the utmost importance for a are of the utmost importance for a secure and consistent set of data.secure and consistent set of data.

1111

Definitions Cont.Definitions Cont.Primary DefinitionsPrimary Definitions

Data Manipulation LanguageData Manipulation Language or or DML, DML, sql statements that insert, sql statements that insert, update or delete database in a update or delete database in a database. database.

Data Definition LanguageData Definition Language or or DDLDDL, , sql used to create and modify sql used to create and modify database objects used in an database objects used in an application schema.application schema.

1212

Definitions Cont.Definitions Cont.Primary DefinitionsPrimary Definitions

AA transaction transaction is a logical unit of work is a logical unit of work that contains one or more SQL that contains one or more SQL statements. A transaction is an atomic statements. A transaction is an atomic unit. The effects of all the SQL unit. The effects of all the SQL statements in a transaction can be statements in a transaction can be either all either all committedcommitted (applied to the (applied to the database) or all database) or all rolled backrolled back (undone (undone from the database), insuring data from the database), insuring data consistency.consistency.

1313

Definitions Cont.Definitions Cont.Primary DefinitionsPrimary Definitions

A view is a selective presentation of A view is a selective presentation of the structure of, and data in, one or the structure of, and data in, one or more tables (or other views). A view more tables (or other views). A view is a ‘virtual table’, having predefined is a ‘virtual table’, having predefined columns and joins to one or more columns and joins to one or more tables, reflecting a specific facet of tables, reflecting a specific facet of information.information.

1414

Definitions Cont.Definitions Cont.Primary DefinitionsPrimary Definitions

Database Database triggerstriggers are PL/SQL, Java, or C are PL/SQL, Java, or C procedures that run implicitly whenever a procedures that run implicitly whenever a table or view is modified or when some user table or view is modified or when some user actions or database system actions occur. actions or database system actions occur. Database triggers can be used in a variety of Database triggers can be used in a variety of ways for managing your database. For ways for managing your database. For example, they can automate data example, they can automate data generation, audit data modifications, enforce generation, audit data modifications, enforce complex integrity constraints, and customize complex integrity constraints, and customize complex security authorizations. Trigger complex security authorizations. Trigger methodology differs between databases.methodology differs between databases.

1515

Definitions Cont.Definitions Cont.Primary DefinitionsPrimary Definitions

ReplicationReplication is the process of copying and is the process of copying and maintaining database objects, such as tables, in maintaining database objects, such as tables, in multiple databases that make up a distributed multiple databases that make up a distributed database system. database system.

Backups Backups are copies of the database data in a are copies of the database data in a format specific to the database. format specific to the database. Backups are Backups are used to recover one or more files that have been used to recover one or more files that have been physically damaged as the result of a disk failure. physically damaged as the result of a disk failure. Media recovery requires the restoration of the Media recovery requires the restoration of the damaged files from the most recent operating damaged files from the most recent operating system backup of a database. It is of the utmost system backup of a database. It is of the utmost importance to perform regularly scheduled importance to perform regularly scheduled backups.backups.

1616

Definitions Cont.Definitions Cont.

Mission Critical ApplicationsMission Critical Applications

An application is defined as mission critical, An application is defined as mission critical, imho, ifimho, if

1. there are legal implications or financial loss to 1. there are legal implications or financial loss to the institution if the data is lost or unavailable.the institution if the data is lost or unavailable.

2. there are safety issues if the data is lost or 2. there are safety issues if the data is lost or unavailable.unavailable.

3. no data loss can be tolerated. 3. no data loss can be tolerated.

4. uptime must be maximized (98%+).4. uptime must be maximized (98%+).

1717

Definitions Cont.Definitions Cont.

‘‘largelarge’ or ‘’ or ‘very largevery large’ or ‘’ or ‘a lota lot’’

Seems odd, but ‘large’ is a hard definition to Seems odd, but ‘large’ is a hard definition to determine. Vldb is an acronym for very large determine. Vldb is an acronym for very large databases. Its definition varies depending on databases. Its definition varies depending on the database software one selects. Very large the database software one selects. Very large normally indicates data that is reaching the normally indicates data that is reaching the limits of capacity for the database software, or limits of capacity for the database software, or data that needs extraordinary measures need data that needs extraordinary measures need to be taken for operations such as backup, to be taken for operations such as backup, recovery, storage, etc.recovery, storage, etc.

1818

Definitions Cont.Definitions Cont.

Commercial databases do not a have a practical Commercial databases do not a have a practical limit to the size of the load. Issues will be limit to the size of the load. Issues will be backup strategies for large databases.backup strategies for large databases.

Freeware does limit the size of the databases, Freeware does limit the size of the databases, and the number of users. Documentation on and the number of users. Documentation on these issues vary widely from the freeware these issues vary widely from the freeware sites to the user sites. Mysql supposedly can sites to the user sites. Mysql supposedly can support 8T and 100 users. However, you will support 8T and 100 users. However, you will find arguments on the users lists that these find arguments on the users lists that these numbers cannot be met.numbers cannot be met.

1919

Selecting a DBMSSelecting a DBMS

Many options, many decisions, Many options, many decisions, planning, costs, criticality.planning, costs, criticality.

For lots of good information, please For lots of good information, please refer to the urls on the last slides. refer to the urls on the last slides. Many examples of people choosing Many examples of people choosing product.product.

2020

Selecting a DBMSSelecting a DBMSHow do I Choose?How do I Choose?

Which database product is appropriate for my Which database product is appropriate for my application? You must make a requirements application? You must make a requirements assessment. assessment.

Does you database need 24x7 availability?Does you database need 24x7 availability?Is your database mission critical, and no data loss Is your database mission critical, and no data loss

can be tolerated?can be tolerated?Is your database large? (backup recovery methods)Is your database large? (backup recovery methods)What data types do I need? (binary, large objects?)What data types do I need? (binary, large objects?)Do I need replication? What level of replication is Do I need replication? What level of replication is

required? Read only? Read/Write? Read/Write is required? Read only? Read/Write? Read/Write is very expensive, so can I justify it?very expensive, so can I justify it?

2121

Selecting a DBMSSelecting a DBMSHow do I Choose? Cont.How do I Choose? Cont.

If your answer to any of the above is ‘yes’, I would If your answer to any of the above is ‘yes’, I would strongly suggest purchasing and using a commercial strongly suggest purchasing and using a commercial database with support. Support includes:database with support. Support includes:

24x7 assistance with technical issues24x7 assistance with technical issues Patches for bugs and securityPatches for bugs and security The ability to report bugs, and get them resolved in The ability to report bugs, and get them resolved in

a timely manner.a timely manner. Priority for production issuesPriority for production issues Upgrades/new releasesUpgrades/new releases Assistance with and use of proven backup/recovery Assistance with and use of proven backup/recovery

methodsmethods

2222

Selecting a DBMSSelecting a DBMSThe Freeware ChoiceThe Freeware Choice

Freeware is an alternative for applications. Freeware is an alternative for applications. However, be fore warned, support for However, be fore warned, support for these databases is done via email to a these databases is done via email to a ad hoc support group. The level of ad hoc support group. The level of support via these groups may vary over support via these groups may vary over the life of your database. Be prepared. the life of your database. Be prepared. Also expect less functionality than any Also expect less functionality than any commercial product. See http://www-commercial product. See http://www-css.fnal.gov/dsg/external/freeware/css.fnal.gov/dsg/external/freeware/

2323

Selecting a DBMSSelecting a DBMSThe Freeware ChoiceThe Freeware Choice

Freeware is free.Freeware is free.

Freeware is open source.Freeware is open source.

Freeware functionality is improving.Freeware functionality is improving.

Freeware is good for smaller non-Freeware is good for smaller non-mission critical applications.mission critical applications.

2424

Selecting an Application Selecting an Application LayerLayer

Again, planning takes center stage. In the Again, planning takes center stage. In the end you want stability and dependability.end you want stability and dependability.

How many users need access?How many users need access? What will the security requirements be?What will the security requirements be? Are there software licensing issues that Are there software licensing issues that

need consideration?need consideration? Is platform portability a requirement?Is platform portability a requirement? Two tier or three tier architecture?Two tier or three tier architecture?

2525

Selecting an Application Selecting an Application LayerLayer

Direct access to the database layer? Direct access to the database layer? (probably should be avoided)(probably should be avoided)

Are you replicating? How? Where? With Are you replicating? How? Where? With what?what?

There are no utilities that will port data There are no utilities that will port data from 1 database to another (i.e., postgres from 1 database to another (i.e., postgres to mysql). if database portability is a to mysql). if database portability is a requirement, an independent code must requirement, an independent code must be written to satisfy this requirement.be written to satisfy this requirement.

2626

Selecting an Application Layer Selecting an Application Layer Cont.Cont.

Application maintenance issuesApplication maintenance issues People availability, working with users as a team, talent, and People availability, working with users as a team, talent, and

turnover? (historically a huge issue)turnover? (historically a huge issue) A ‘known’ or ‘common’ language?A ‘known’ or ‘common’ language? Freeware? Bug fixes, patches…are they important and timely?Freeware? Bug fixes, patches…are they important and timely? Documentation? Set standards, procedures, code reviews making Documentation? Set standards, procedures, code reviews making

sure the documentation exists and is clear.sure the documentation exists and is clear. Is the application flexible enough to easily accommodate Is the application flexible enough to easily accommodate

business rule changes that mandate modifications? business rule changes that mandate modifications? The availability of an ER diagram at this stage is invaluable. We The availability of an ER diagram at this stage is invaluable. We

consider it a must have.consider it a must have. There are no utilities to port data from 1 type of db to another. There are no utilities to port data from 1 type of db to another.

This lack of portability means a method to move data between This lack of portability means a method to move data between databasesdatabases

must be written independently.must be written independently.

2727

Selecting an Application Selecting an Application LayerLayer

Misc. application definitions…Misc. application definitions…

This presentation is not an application This presentation is not an application presentation, but I will mention a few terms you presentation, but I will mention a few terms you may hear.may hear.

Sql the query language for relational databases. Sql the query language for relational databases. A must learn.A must learn.

ODBC, open database connectivity. The software ODBC, open database connectivity. The software that allows a database to talk to an application.that allows a database to talk to an application.

JDBC, java database connectivity.JDBC, java database connectivity.

2828

Relational DesignRelational Design

The design of the application schema The design of the application schema will determine the usability and will determine the usability and query ability of the application. Done query ability of the application. Done incorrectly, the application and users incorrectly, the application and users will suffer until someone else is will suffer until someone else is forced to rewrite it. forced to rewrite it.

2929

Relational DesignRelational DesignThe SetupThe Setup

The database group has a standard 3 tier The database group has a standard 3 tier infrastructure for developing and deploying infrastructure for developing and deploying production databases and applications. production databases and applications. This infrastructure provides 3 database This infrastructure provides 3 database instances, development, integration and instances, development, integration and production. This infrastructure is applicable production. This infrastructure is applicable to any application schema, mission critical to any application schema, mission critical or not. It is designed to insure or not. It is designed to insure development, testing, feedback, signoff, development, testing, feedback, signoff, and an protected production environment.and an protected production environment.

Each of these instances contain 1 or more Each of these instances contain 1 or more applications. applications.

3030

Relational DesignRelational DesignThe SetupThe Setup

The 3 instances are used as follows:The 3 instances are used as follows:

1.1. Development instance. Developers Development instance. Developers playground. Small in size compared playground. Small in size compared to production. Much of the data is to production. Much of the data is ‘invented’ and input by the ‘invented’ and input by the developers. Usually there is not developers. Usually there is not enough disk space to ever ‘refresh’ enough disk space to ever ‘refresh’ with production data. with production data.

3131

Relational Design Cont.Relational Design Cont.The SetupThe Setup

2. The integration instance is used for 2. The integration instance is used for moving what is thought to be ‘complete’ moving what is thought to be ‘complete’ functionality to a pre production functionality to a pre production implementation. Power users and implementation. Power users and developers work in concert in integration developers work in concert in integration to make sure the specs were followed. to make sure the specs were followed. The users should use integration as their The users should use integration as their sign off area. Cuts from dev to int are sign off area. Cuts from dev to int are frequent and common to maintain the frequent and common to maintain the newest releases in int for user testing.newest releases in int for user testing.

3232

Relational Design Cont.Relational Design Cont.The SetupThe Setup

3. The production instance, real data. Needs 3. The production instance, real data. Needs to be kept pure. NO testing allowed. Very to be kept pure. NO testing allowed. Very few logons. The optimal setup of a few logons. The optimal setup of a production database server machine has production database server machine has ~3 operating system logons, root, the ~3 operating system logons, root, the database logon (ie oracle), and a database logon (ie oracle), and a monitoring tool. In a critical 24x7 supported monitoring tool. In a critical 24x7 supported database, developers, development tools, database, developers, development tools, web servers, log files, all should be kept off web servers, log files, all should be kept off the production database server. the production database server.

3333

Relational Design Cont.Relational Design Cont.The SetupThe Setup

Let’s talk about mission critical & 24x7 a bit.Let’s talk about mission critical & 24x7 a bit.1.1. To optimize a mission critical 24/7 database, the To optimize a mission critical 24/7 database, the

database server machine should be dedicated to database server machine should be dedicated to running the database, nothing else. running the database, nothing else.

2.2. All software products need maintenance and All software products need maintenance and downtime. Resist putting software products on the downtime. Resist putting software products on the db server machine so that their maintenance does db server machine so that their maintenance does not inhibit the running of the database. Further, if not inhibit the running of the database. Further, if the product breaks, it could inhibit access to the the product breaks, it could inhibit access to the database for a long period. Example, a logging database for a long period. Example, a logging application, monitoring users on the db goes wild, application, monitoring users on the db goes wild, fills all available space and halts the database. If fills all available space and halts the database. If this logging app. were not on the dbserver machine, this logging app. were not on the dbserver machine, the db would be unaffected by the malfunction.the db would be unaffected by the malfunction.

3434

Relational Design Cont.Relational Design Cont.The SetupThe Setup

3. All database applications and database software 3. All database applications and database software require modifications. Most times these require modifications. Most times these modification require down time because the modification require down time because the schema or data modifications need to lock entire schema or data modifications need to lock entire tables exclusively. If you are sharing your tables exclusively. If you are sharing your database instance with other many other database instance with other many other applications, and 1 of those applications needs the applications, and 1 of those applications needs the database for an upgrade, all apps may have to take database for an upgrade, all apps may have to take the down time. Avoid this by insuring your 24/7 the down time. Avoid this by insuring your 24/7 database application is segregated from all other database application is segregated from all other software that is not absolutely needed. In that way software that is not absolutely needed. In that way you insure any down times are specific to your you insure any down times are specific to your cause.cause.

3535

Our 1Our 1stst relational example relational exampleDatabases

on d0ora2

(d0ofprd1,

d0ofint1)

schema

applications in

d0ofprd1

(sam, runs, calib)

CPU

(d0ora2)

schema

applications in

d0ofint1

(sam, runs, calib)

A cpu can

house

1 or more

databases

An database can

accommodate 1 or

more instances An instance may

contain 1 or more

application

schemas

3636

What is a schema?What is a schema?

One implements a schema by running One implements a schema by running scripts. These scripts can be run against scripts. These scripts can be run against multiple servers and should be archived.multiple servers and should be archived.

It isIt isTables (columns/datatypes) havingTables (columns/datatypes) having Constraints (not null, unique, Constraints (not null, unique,

foreign & primary keys)foreign & primary keys) TriggersTriggers Indexes Indexes etc.etc.AccountsAccountsPrivileges & RolesPrivileges & RolesServer side processesServer side processes

It is notIt is notThe environment (servers, OS)The environment (servers, OS)The results of queries, I.e objectsThe results of queries, I.e objectsApplication CodeApplication Code

3737

Relational DesignRelational DesignGetting StartedGetting Started

Using your design tool, you will begin by relating objects that Using your design tool, you will begin by relating objects that will eventually become tables. All the other schema objects will eventually become tables. All the other schema objects will fall out of this design.will fall out of this design.

You will spend LOADS of time in your design tool, honing, You will spend LOADS of time in your design tool, honing, redoing, reacting to modifications, etc.redoing, reacting to modifications, etc.

The end users and the designers need to be working almost at The end users and the designers need to be working almost at the same desk for this process. If the end user is the the same desk for this process. If the end user is the designer, the end user should involve additional users to designer, the end user should involve additional users to insure an unbiased and general design.insure an unbiased and general design.

It is highly suggested that the design be kept up to date for It is highly suggested that the design be kept up to date for future documentation and maintainers. future documentation and maintainers.

Tables are related, most frequently in a 0 to many Tables are related, most frequently in a 0 to many relationship. Example, 1 run will result in 0 or more events. relationship. Example, 1 run will result in 0 or more events. Analyzing and defining these relationships results in an Analyzing and defining these relationships results in an application schema.application schema.

3838

What will a good schema What will a good schema design buy you?design buy you?

I am afraid the 80% planning 20% implementation I am afraid the 80% planning 20% implementation rule applies. Gather requirements.rule applies. Gather requirements.

Discovery of data that needs to be gathered.Discovery of data that needs to be gathered. Fast query resultsFast query results Limited application code maintenanceLimited application code maintenance Data flexibilityData flexibility Less painful turnover of application to new Less painful turnover of application to new

maintainers.maintainers. Fewer long term maintenance issues.Fewer long term maintenance issues.

3939

Relational DesignRelational DesignLet’s get startedLet’s get started

Write a requirements document. Write a requirements document. You will not be able to anticipate all You will not be able to anticipate all

requirements, but a document will be a start. requirements, but a document will be a start. A well designed schema naturally allows for A well designed schema naturally allows for additional functionality.additional functionality.

Who are the users? What is their mission?Who are the users? What is their mission?Identify objects that need to be stored/tracked.Identify objects that need to be stored/tracked.Think about how objects relate to each other.Think about how objects relate to each other.Do not be afraid to argue/debate the Do not be afraid to argue/debate the

relationships with others.relationships with others.

4040

Relational DesignRelational DesignSo how do you get there?So how do you get there?

Design tools are available, however, they do not Design tools are available, however, they do not think for you. They will give you a clue that you think for you. They will give you a clue that you are doing something stupid, but it won’t stop you. are doing something stupid, but it won’t stop you. It is highly recommended you use a design tool.It is highly recommended you use a design tool.

A picture says 1000 words. Create ER, entity A picture says 1000 words. Create ER, entity relationship, diagrams.relationship, diagrams.

Get a commitment from the developer(s) to see the Get a commitment from the developer(s) to see the application through to implementation. We have application through to implementation. We have seen several applications redone multiple times. seen several applications redone multiple times. A string of developers tried, left the project, and A string of developers tried, left the project, and left a mess. A new developer started from scratch left a mess. A new developer started from scratch because there was no documentation or design.because there was no documentation or design.

4141

Relational DesignRelational DesignHow do I get there?How do I get there?

Adhere to the recommendations of your database Adhere to the recommendations of your database vendor for setup and architecture.vendor for setup and architecture.

Don’t be afraid to ask for help or to see other Don’t be afraid to ask for help or to see other examples.examples.

Don’t be afraid to pilfer others design work, if it is Don’t be afraid to pilfer others design work, if it is good, if it closely fits your requirements, then use good, if it closely fits your requirements, then use it.it.

Ask questions, schedule reviews with experts and Ask questions, schedule reviews with experts and users.users.

Work with your hardware system administrators to Work with your hardware system administrators to insure you have the hardware you need for the insure you have the hardware you need for the proposed job to be done.proposed job to be done.

4242

Relational DesignRelational DesignCommon MistakesCommon Mistakes

Mistakes we see ALL the timeMistakes we see ALL the time Do not design your schema around your Do not design your schema around your

favorite query. A relational design will favorite query. A relational design will enable all queries to be speedy, not only enable all queries to be speedy, not only your favorite. your favorite.

Don’t design the schema around your Don’t design the schema around your narrow view of the application. Get other narrow view of the application. Get other users involved from the start, ask for input users involved from the start, ask for input and review. and review.

Administrator

4343

Relational DesignRelational DesignCommon MistakesCommon Mistakes

Create a relational structure, not a Create a relational structure, not a hierarchical structure. The ER diagram hierarchical structure. The ER diagram should not necessarily resemble a tree should not necessarily resemble a tree or a circle. It is the logical building of or a circle. It is the logical building of relationships between data. relationships between data. Relationships flow between subsets of Relationships flow between subsets of data. The resulting ER diagram’s ‘look’ data. The resulting ER diagram’s ‘look’ is not a standard by which one can is not a standard by which one can judge the quality of the design. judge the quality of the design.

4444

Relational DesignRelational DesignCommon MistakesCommon Mistakes

Do not create 1 huge table to hold 99% Do not create 1 huge table to hold 99% of the data. We have seen a table with of the data. We have seen a table with 1100+ columns…unusable, 1100+ columns…unusable, unqueryable, required an entire unqueryable, required an entire application rewrite, took over a year, application rewrite, took over a year, made 80 tables from the 1 table.made 80 tables from the 1 table.

Do not create separate schemas for the Do not create separate schemas for the same application or functions within an same application or functions within an application. application.

Use indices and constraints, this is a Use indices and constraints, this is a MUST!MUST!

4545

Relational DesignRelational DesignExamples of Common MistakesExamples of Common Mistakes Using timestamp as the primary key assumes Using timestamp as the primary key assumes

that within a second, no other record will be that within a second, no other record will be inserted. Actually this was not the case, and inserted. Actually this was not the case, and an insert operation failed. Use database an insert operation failed. Use database generated sequences as primary keys and generated sequences as primary keys and NON-UNIQUE index on timestamp. NON-UNIQUE index on timestamp.

A table with more than 900 columns. Such A table with more than 900 columns. Such design will cause chaining since each record design will cause chaining since each record is not going to fit in one block. One record is not going to fit in one block. One record spanning many blocks, thus chaining, hence spanning many blocks, thus chaining, hence bad performance. bad performance.

4646

Relational DesignRelational DesignExamples of Common MistakesExamples of Common Mistakes Do not let the application control a generated Do not let the application control a generated

sequence. Have seen locking issues, and sequence. Have seen locking issues, and duplicate values issues when the application duplicate values issues when the application increments the sequence. Have the database increments the sequence. Have the database increment/lock/constrain the sequence/primary increment/lock/constrain the sequence/primary key. That is why the databases have sequence key. That is why the databases have sequence mechanisms, use them.mechanisms, use them.

Use indices! An Atlas table with 200,000 rows, Use indices! An Atlas table with 200,000 rows, halted during a query. Reason? No indices. halted during a query. Reason? No indices. Added a primary key index, instantaneous Added a primary key index, instantaneous query response. Indices are not wasted space!query response. Indices are not wasted space!

4747

Relational DesignRelational DesignExamples of Common MistakesExamples of Common Mistakes

USE DATABASE CONSTRAINTS!!!!!!USE DATABASE CONSTRAINTS!!!!!!

Have examples where constraints were not Have examples where constraints were not used, but ‘implemented’ via the api. Bugs in used, but ‘implemented’ via the api. Bugs in the api allowed data to be deleted that the api allowed data to be deleted that should not have been deleted, and should not have been deleted, and constraints would have prevented the error. constraints would have prevented the error. Have also seen apis error with ‘cannot Have also seen apis error with ‘cannot delete’ errors. They were trying to force an delete’ errors. They were trying to force an invalid delete, luckily the database invalid delete, luckily the database constraints saved the data. constraints saved the data.

4848

Entity Relationship DiagramsEntity Relationship Diagrams1 to many1 to many

F# F_ID

E# E_ID

D# D_ID

C# C_ID

B# B_ID

A# A_ID

CHILD# CHILD_ID

PARENT# PARENT_ID

belong to

have

belong to

have

belong to

have

belong to

have

4949

Entity Relationship DiagramsEntity Relationship Diagramsmany to manymany to many

I2J2 J2# J2_ID

I2# I2_ID

J# J_ID

I# I_ID

G2H2 H2# H2_ID

G2# G2_ID

H# H_ID

G# G_ID

map to

definemap to

define

relate to

define

map to

definemap to

define

owned by

define

5050

Entity Relationship DiagramsEntity Relationship Diagrams1 to 11 to 1

P# P_ID

O# O_ID

N# N_ID

M# M_ID

L# L_ID

K# K_ID

relate to

define

relate to

define

relate to

define

5151

Relational DesignRelational DesignThe GoodThe Good

CALIB_TYPE# CALIB_TYPE_ID* DESCRIPTION

CALIBRATION# CALIBRATION_ID* TSTARTo TEND

be defined by

define

Calibration type might have 3

rows, drift, pedestal, & gain

This is a parent table.

Each calibration record will be

Defined by drift, pedestal or gain.

In addition to start and end times.

This is a child table.

5252

Relational DesignRelational DesignThe BadThe Bad

GAIN_CALIB# GAIN_CALIB_ID* TSTARTo TEND

PEDESTAL_CALIB# PEDESTAL_CALIB_ID* TSTARTo TEND

DRIFT_CALIB# DRIFT_CALIB_ID* TSTARTo TEND

CALIBRATION# CALIBRATION_ID* TSTARTo TEND

relate to

define

relate to

define

relate to

define

You have now created 3 different children, all reporting the same information, when 1 child would

suffice. Code will have to be written, tested, and maintained for 4 tables now instead of 2.

5353

Relational DesignRelational DesignThe UglyThe Ugly

CALIBRATION(3)# CALIBRATION_ID* TSTARTo TEND

CALIBRATION(2)# CALIBRATION_ID* TSTARTo TEND

CALIBRATION# CALIBRATION_ID* TSTARTo TEND

GAIN_CALIB# GAIN_CALIB_ID* TSTARTo TEND

PEDESTAL_CALIB# PEDESTAL_CALIB_ID* TSTARTo TEND

DRIFT_CALIB# DRIFT_CALIB_ID* TSTARTo TEND

relate to

defines

relate to

defines

relate to

defines

Now you have created 3 different applications, using 6 tables. All of which could be managed with 2 tables.

Extra code, extra testing, extra maintenance.

5454

Relational DesignRelational DesignThe Good…let’s recapThe Good…let’s recap

CALIB_TYPE# CALIB_TYPE_ID* DESCRIPTION

CALIBRATION# CALIBRATION_ID* TSTARTo TEND

be defined by

define

AHHH, back to normal, or normalization as we refer to it.

5555

Relational DesignRelational DesignWhat to expect from a design What to expect from a design

tooltool An entity relationship diagramAn entity relationship diagram The ability to create the ddl (data The ability to create the ddl (data

definition language) neededdefinition language) needed The ability to project disk space usageThe ability to project disk space usage Ddl in a format to allow you to enter Ddl in a format to allow you to enter

the code into a code library (cvs), and the code into a code library (cvs), and that will allow you to run against your that will allow you to run against your databasedatabase

5656

Relational Design Why bother?Relational Design Why bother? Experience from RunII Experience from RunII

TO SAVE TIME AND PRECIOUS PEOPLE RESOURCES!TO SAVE TIME AND PRECIOUS PEOPLE RESOURCES!Personnel consistency does not exist. Application Personnel consistency does not exist. Application

developers come and go regularly. The developers come and go regularly. The documentation that a design product provides documentation that a design product provides will the next developer an immediate will the next developer an immediate understanding of the application in picture understanding of the application in picture format.format.

Application sharing is enhanced when others can Application sharing is enhanced when others can look at your design and determine whether the look at your design and determine whether the application is reusable in their environment. Sam application is reusable in their environment. Sam is a good example of an application that 3 is a good example of an application that 3 experiments are now using.experiments are now using.

5757

Relational DesignRelational DesignWhy bother? Cont.Why bother? Cont.

When an application is under When an application is under construction, the ER diagram goes to construction, the ER diagram goes to every application meeting, and quite every application meeting, and quite possibly the wallet of the application possibly the wallet of the application leader. It is the pictorial answer to leader. It is the pictorial answer to many issues.many issues.

Planning for disk space has been an issue, Planning for disk space has been an issue, the designer tool should assist with this the designer tool should assist with this task.task.

5858

PlanningPlanningOverallOverall

What do I need to plan for?What do I need to plan for?People, hardware, software, obsolescence, People, hardware, software, obsolescence, maintenance, emergencies.maintenance, emergencies.

How far out do I need to plan?How far out do I need to plan?Initially 2-4 years.Initially 2-4 years.

How often do I need to review the plans?How often do I need to review the plans?Annually.Annually.

What if my plan fails or looks undoable?What if my plan fails or looks undoable?Nip it in the bud, be proactive, come up Nip it in the bud, be proactive, come up with options.with options.

5959

PlanningPlanningOverallOverall

Disk space requirements. My experience is all the Disk space requirements. My experience is all the wags, (wild guesses) fall short of what is needed. wags, (wild guesses) fall short of what is needed. It is hard to predict the number of rows in a table. It is hard to predict the number of rows in a table. It would be easier if we knew the amount and It would be easier if we knew the amount and results of the science ahead of time! Remember, results of the science ahead of time! Remember, 10x what you think the data will take.10x what you think the data will take.

Hardware requirements. Experience tells us that Hardware requirements. Experience tells us that the database machine should serve 1 master (if it the database machine should serve 1 master (if it is a large database or mission critical), the is a large database or mission critical), the database, nothing else. Ideally there will be root, a database, nothing else. Ideally there will be root, a database monitor user and a database user, database monitor user and a database user, oracle for example. No apache, no log file areas, oracle for example. No apache, no log file areas, no applications, etcno applications, etc..

6060

PlanningPlanningOverallOverall

Growth and obsolesce. Plan for 3-4 years before Growth and obsolesce. Plan for 3-4 years before needing to replace hardware. Hardware and needing to replace hardware. Hardware and software become obsolete. New/upgraded software software become obsolete. New/upgraded software gives addition functionality that you will want/need.gives addition functionality that you will want/need.

Maintenance. Do you change the oil in your car? Maintenance. Do you change the oil in your car? Plan on 1 morning per month downtime for caring for Plan on 1 morning per month downtime for caring for the hardware and software. Security patches could the hardware and software. Security patches could mandate additional stoppages. I cannot stress how mandate additional stoppages. I cannot stress how important this is. Fire walling will not protect you important this is. Fire walling will not protect you from bugs and obsolescence. If the downtime is not from bugs and obsolescence. If the downtime is not needed, it will not be taken. Planning maintenance needed, it will not be taken. Planning maintenance time is as important as planning to buy disks.time is as important as planning to buy disks.

6161

PlanningPlanningUser RequirementsUser Requirements

Will user requirements influence your Will user requirements influence your hardware & software decisions? hardware & software decisions?

Do you need replication? Do you need replication?

What architecture is your api going to What architecture is your api going to be?be?

How many users will be loading the How many users will be loading the database and hardware?database and hardware?

6262

PlanningPlanningMaintenanceMaintenance

Database/Operating system software Database/Operating system software need upgrades. One always hopes one need upgrades. One always hopes one can get on a stable version of can get on a stable version of something and not upgrade. That is a something and not upgrade. That is a fallacy. Major version upgrades fallacy. Major version upgrades provide needed and new functionality. provide needed and new functionality. Bug patches and security patches are Bug patches and security patches are a never ending fact of life. a never ending fact of life.

6363

PlanningPlanningBackup and RecoveryBackup and Recovery

Backup and recovery procedures of vldb (very Backup and recovery procedures of vldb (very large databases) are difficult at best. Vldb is large databases) are difficult at best. Vldb is normally defined as mulitple Gig or tera byte normally defined as mulitple Gig or tera byte databases. This is probably the most databases. This is probably the most sensitive area when choosing a freeware sensitive area when choosing a freeware database.database.

Hardware plays a part here as well. Insure Hardware plays a part here as well. Insure when planning for hardware there is plan for when planning for hardware there is plan for backup and recovery. Disk and tape may be backup and recovery. Disk and tape may be needed.needed.

6464

Planning Planning Good Practices with a HammerGood Practices with a Hammer

Make a standards document and enforce Make a standards document and enforce its use. When dbas and developers are its use. When dbas and developers are always on the same page, life is easier always on the same page, life is easier for both. Expectations are clear and for both. Expectations are clear and defined. Anger and disappointment are defined. Anger and disappointment are lessened.lessened.

System as well as database standards System as well as database standards need to be followed and enforced.need to be followed and enforced.

6565

PlanningPlanningFailoverFailover

Yikes, we are down!Yikes, we are down!

Everyone always wants 24x7 scheduled Everyone always wants 24x7 scheduled uptime. Until they see the cost. uptime. Until they see the cost.

Make anyone who insists on real 100% uptime Make anyone who insists on real 100% uptime to justify it (and pay for it?). 98-99% uptime to justify it (and pay for it?). 98-99% uptime can be realized at a much lower cost. can be realized at a much lower cost.

Uptime requirements will influence, possibly Uptime requirements will influence, possibly dictate, database choices, hardware dictate, database choices, hardware choices, fte requirements. choices, fte requirements.

6666

PlanningPlanningFailoverFailover

The cheapest method of addressing a failure is The cheapest method of addressing a failure is proactive planning.proactive planning.

Make sure your database and database software are Make sure your database and database software are backed up. Unless you are using a commercial backed up. Unless you are using a commercial database with roll forward recovery, assume you will database with roll forward recovery, assume you will lose all dml since your last backup if you need to lose all dml since your last backup if you need to recover. This should dictate your backup schedule.recover. This should dictate your backup schedule.

Do not forget tape backups as a catastrophic recovery Do not forget tape backups as a catastrophic recovery method.method.

Practice recovery on your integration and development Practice recovery on your integration and development databases. Practice different scenarios, delete a databases. Practice different scenarios, delete a datafile, delete the entire database.datafile, delete the entire database.

6767

ReplicationReplication

Replication is the process of copying Replication is the process of copying and maintaining database objects in and maintaining database objects in multiple databases that make up a multiple databases that make up a distributed database system. distributed database system. Replication can improve the Replication can improve the performance and protect the performance and protect the availability of applications because availability of applications because alternate data access options exist.alternate data access options exist.

6868

Replication Cont.Replication Cont.

Oracle Supports 3 types of replication  READ ONLY Snapshots Oracle Supports 3 types of replication  READ ONLY Snapshots (Materialized views), Advanced Replication and streams (Materialized views), Advanced Replication and streams based replication. based replication.

Streams allows ddl modifications made to the master Streams allows ddl modifications made to the master automatically.automatically.

Streams can be configured in uni-directional ( Single Source Streams can be configured in uni-directional ( Single Source and one or more than targets) or master to master where and one or more than targets) or master to master where updates can happen to any participant database.  updates can happen to any participant database. 

Advanced replication also supports master to master . But Advanced replication also supports master to master . But streams based replication is recommended. streams based replication is recommended.

READ ONLY Snapshots replication from a Sun box to a Sun & READ ONLY Snapshots replication from a Sun box to a Sun & Linux box(s) is being done in CDF.  When a replica is under Linux box(s) is being done in CDF.  When a replica is under maintenance there is failover to another replica.  The replicas maintenance there is failover to another replica.  The replicas are up and running in read only mode if the master is down are up and running in read only mode if the master is down for maintenance. for maintenance.

6969

Replication cont.Replication cont.

Oracle master to master replication allows for Oracle master to master replication allows for updates on both the master and replica sides. updates on both the master and replica sides.

Master to master is a complex and a high Master to master is a complex and a high maintenance replication. It seems to be the 1maintenance replication. It seems to be the 1stst option the unwitting opt for. Both Cern and Fermi option the unwitting opt for. Both Cern and Fermi dbas have requested firm justification before dbas have requested firm justification before considering this type of replication request.considering this type of replication request.

Every link in the multi master would be required to Every link in the multi master would be required to be a fully staffed, as downtime will be critical.be a fully staffed, as downtime will be critical.

7070

Replication cont.Replication cont.1.1. Disk Space for Archives. If receiving site is down for extended Disk Space for Archives. If receiving site is down for extended

period of time, then source db should be tuned enough to hold period of time, then source db should be tuned enough to hold the archives logs, otherwise, one has to reinstantiate the the archives logs, otherwise, one has to reinstantiate the replication. Reasonable downtime for target depends upon replication. Reasonable downtime for target depends upon archive area being generated on source. Space, space and more archive area being generated on source. Space, space and more space.space.

2.2. Conflict Resolution In Master to Master, conflict resolution may Conflict Resolution In Master to Master, conflict resolution may be challenge. Rules should be well defined to resolve the data be challenge. Rules should be well defined to resolve the data conflicts. conflicts.

3.3. Design of Data Model if Primary Keys are populated by Design of Data Model if Primary Keys are populated by sequences , there is very much chance of overlapping the sequences , there is very much chance of overlapping the sequences and will cause integrity constraints. Data Model sequences and will cause integrity constraints. Data Model should be designed very carefully. should be designed very carefully.

4.4. DB Support In Master to Master Replication, all master sites DB Support In Master to Master Replication, all master sites should be in 24*7 support mode. Otherwise , sync up of data will should be in 24*7 support mode. Otherwise , sync up of data will be challenge or one may lead to reinstantiation of replication. be challenge or one may lead to reinstantiation of replication. Reinstantiation is not unplug and play type of situation.Reinstantiation is not unplug and play type of situation.

7171

Freeware Replication Freeware Replication

MySQL has replication in the last MySQL has replication in the last stable version (3.23.32, v4.1 is out). stable version (3.23.32, v4.1 is out). It is master-slave replication using It is master-slave replication using binary log of operations on the server binary log of operations on the server side. It is possible to build star or side. It is possible to build star or chain type structures. chain type structures.

There is a PostgreSQL replication tool. There is a PostgreSQL replication tool. We have not tested it yet.We have not tested it yet.

7272

Lost in SpaceLost in Space

Space is the 1 area consistently under estimated Space is the 1 area consistently under estimated in every application I have seen. Imho, in every application I have seen. Imho, consistently, data volume initial estimates were consistently, data volume initial estimates were undersized by a factor of 2 or 3. For example, undersized by a factor of 2 or 3. For example, RunII events were estimated at 1 billion rows. RunII events were estimated at 1 billion rows. This estimate was surpassed Feb. 2004. We will This estimate was surpassed Feb. 2004. We will probably end up with 4-5 billion event rows. probably end up with 4-5 billion event rows. That is a lot of disk space.That is a lot of disk space.

Disk hardware becomes unsupported, and Disk hardware becomes unsupported, and obsolete in what seems to be a blink of an eye. obsolete in what seems to be a blink of an eye.

7373

Lost In Space cont.Lost In Space cont.

All databases use disk to store data.All databases use disk to store data.

Good rule of thumb: You need 10x the disk to hold a given amount of data in an RDB.

Operate in 2 year cycles:• First 2 years storage available on day 1.• Evaluate growth at end of year 1, begin prep of next 2 yr.

Data Index

Datamirror

Indexmirror

Redo Rollback

Backup Replication

Unexpected?N Gb 8 x N Gb

7474

Lost in Space, cont.Lost in Space, cont.

You will use as much disk space as you You will use as much disk space as you purchase, and then some.purchase, and then some.

Database indices will take MINIMALLY Database indices will take MINIMALLY at least as much space as the tables, at least as much space as the tables, probably considerably more.probably considerably more.

Give WIDE lead time to purchase disk Give WIDE lead time to purchase disk storage. New disks are not installed storage. New disks are not installed and configured over night. They and configured over night. They require planning, downtime and $.require planning, downtime and $.

7575

Additional ReferencesAdditional References

**WARNING some of these may be database specific.**WARNING some of these may be database specific. Intro to database design Intro to database design

http://www.cc.gatech.edu/classes/AY2000/cs4400_http://www.cc.gatech.edu/classes/AY2000/cs4400_spring/cs4400a/spring/cs4400a/

Intro to Oracle tutorial Intro to Oracle tutorial http://w2.syronex.com/jmr/edu/db/http://w2.syronex.com/jmr/edu/db/

Evolutionary Database Design Evolutionary Database Design http://http://www.martinfowler.com/articles/evodb.htmlwww.martinfowler.com/articles/evodb.html mentions 1 dba for atlas mentions 1 dba for atlas

Sql course Sql course http://sqlcourse.com/http://sqlcourse.com/

7676

Additional ReferencesAdditional References

***Highly recommended reading, db ***Highly recommended reading, db comparatives comparatives http://www-css.fnal.gov/dsg/external/freehttp://www-css.fnal.gov/dsg/external/freeware/ware/

db infrastructure standard, support db infrastructure standard, support levels, etc. for fermi computing levels, etc. for fermi computing http://www-http://www-css.fnal.gov/dsg/external/oracle_admincss.fnal.gov/dsg/external/oracle_admin//

7777

Additional ReferencesAdditional References

Oracle Designer tutorial Oracle Designer tutorial http://www-http://www-css.fnal.gov/dsg/internal/ora_adm/index.htcss.fnal.gov/dsg/internal/ora_adm/index.htm#designerm#designer (choose Oracle Designer tutorial or Oracle (choose Oracle Designer tutorial or Oracle Designer Short Cuts and Lessons Learned)Designer Short Cuts and Lessons Learned)

Btev specific additional informationBtev specific additional information http://www-http://www-css.fnal.gov/dsg/external/BTeV/index.css.fnal.gov/dsg/external/BTeV/index.htmlhtml