Normalization Summit Training

download Normalization Summit Training

of 143

Transcript of Normalization Summit Training

  • 7/27/2019 Normalization Summit Training

    1/143

    Summit 97Normalization Is a Nice

    Theoryby David Adams & Dan Beckett

    1997 David Adams & Dan Beckett. All rights reserved.

    Content adapted from Programming 4th Dimension: TheUltimate Guide, by David Adams & Dan Beckett, published byForesight Technology, Inc.

  • 7/27/2019 Normalization Summit Training

    2/143

  • 7/27/2019 Normalization Summit Training

    3/143

    Normalization Is a Nice Theo

  • 7/27/2019 Normalization Summit Training

    4/143

    y

  • 7/27/2019 Normalization Summit Training

    5/143

    Quick Look

    Designing a normalized database structure is the .rst stepwhen building a database that is meant to last.Normalization is a simple, common-sense, process thatleads to .exible, ef.cient, maintainable database structures.

    Well examine the major principles and objectives ofnormalization and denormalization, then take a look at somepowerful optimization techniques that can break the rules ofnormalization.

    What is Normalization?

    Simply put, normalization is a formal process fordetermining which .elds belong in which tables in arelational database. Normalization follows a set of rulesworked out at the time relational databases were born. Anormalized relational database provides several bene.ts:

    Elimination of redundant datastorage. Close modeling of real world entities, processes,and their relationships.

    Structuring of data so that the model is.exible.

    Normalization ensures that you get the bene.ts relationaldatabases offer. Time spent learning about normalizationwill begin paying for itself immediately.

    Why Do They Talk Like

    That?Some people are intimidated by the language ofnormalization. Here is a quote from a classic text onrelational database design:

  • 7/27/2019 Normalization Summit Training

    6/143

  • 7/27/2019 Normalization Summit Training

    7/143

    A relation is in third normal form (3NF) if and only if it is in2NF and every nonkey attribute is nontransitively dependenton the primary key.

  • 7/27/2019 Normalization Summit Training

    8/143

    C.J. DateAn Introduction to Database Systems

    Huh? Relational database theory, and the principles ofnormalization, were .rst constructed by people intimately

    acquainted with set theory and predicate calculus. Theywrote about databases for like-minded people. Because ofthis, people sometimes think that normalization is hard.Nothing could be more untrue. The principles ofnormalization are simple, common-sense ideas that areeasy to apply. Here is another authors description of thesame principle:

    A table should have a field that uniquely identifies each of itsrecords, and each field in the table should describe the subjectthat the table represents.Michael J. Hernandez

    Database Design for Mere Mortals

    That sounds pretty sensible. A table should have somethingthat uniquely identi.es each record, and each .eld in the

    record should be about the same thing. We can summarizethe objectives of normalization even more simply:

    Eliminate redundant fields.

    Avoid merging tables.

    Youve probably intuitively followed many normalizationprinciples all along. The purpose of formal normalization isto ensure that your common sense and intuition are appliedconsistently to the entire database design.

    Design VersusImplementation

    Designing a database structure and implementing adatabase structure are different tasks. When you design astructure it should be described without reference to thespeci.c database tool you will use to implement the system,or what concessions you plan to make for performancereasons. These steps come later. After youve designed thedatabase structure abstractly, then you implement it in aparticular environment4D in our case. Too often peoplenew to database design combine design andimplementation in one step. 4D makes this tempting

    because the structure editor is so easy to use. Implementinga structure without designing it quickly leads to .awedstructures that are dif.cult and costly to modify. Design .rst,implement second, and youll .nish faster and cheaper.

  • 7/27/2019 Normalization Summit Training

    9/143

  • 7/27/2019 Normalization Summit Training

    10/143

    Normalized Design: Pros andCons

  • 7/27/2019 Normalization Summit Training

    11/143

    ProsofNormalizing

    Cons ofNormalizing

    More ef.cientdatabasestructure.Betterunderstanding

    of your data.More .exibledatabasestructure.Easier tomaintaindatabasestructure. Few(if any) costlysurprises downthe road.Validates yourcommon senseand intuition.Avoidredundant.elds. Insurethat distincttables existwhen nec-essary.

    You cant startbuilding thedatabase beforeyou know whatthe user needs.

    Weve implied that there are various advantages toproducing a properly normalized design before youimplement your system. Lets look at a detailed list of thepros and cons:

    We think that the pros outweigh the cons.

    Terminology

    There are a couple terms that are central to a discussion ofnormalization: key and dependency. These are probablyfamiliar concepts to anyone who has built relationaldatabase systems, though they may not be using these

    words. We de.ne and discuss them here as necessarybackground for the discussion of normal forms that follows.

    Primary Key

    The primary key is a fundamental concept in relationaldatabase design. Its an easy concept: each record shouldhave something that identi.es it uniquely. The primary keycan be a single .eld, or a combination of .elds. A tablesprimary key also serves as the basis of relationships withother tables. For example, it is typical to relate invoices to aunique customer ID, and employees to a unique department

    ID.Note: 4D does not implicitly support multi-.eld primary keys, thoughmulti-.eld keys are common in other client-server databases. The simplestway to implement a multi-.eld key in 4D is by maintaining an additional .eldthat stores the concatenation of the components of your multi-.eld key into asingle .eld. A concatenated key of this kind is easy to maintain using a 4D V6trigger.

  • 7/27/2019 Normalization Summit Training

    12/143

  • 7/27/2019 Normalization Summit Training

    13/143

    A primary key should be unique, mandatory, andpermanent. A clas

  • 7/27/2019 Normalization Summit Training

    14/143

  • 7/27/2019 Normalization Summit Training

    15/143

    sic mistake people make when learning to create relationaldatabases is to use a volatile

  • 7/27/2019 Normalization Summit Training

    16/143

    .

  • 7/27/2019 Normalization Summit Training

    17/143

    eld as the primary key. For example, consider this table:

  • 7/27/2019 Normalization Summit Training

    18/143

    [Companies]

    Company Name

    Address

    Company Name is an obvious candidate for the primary key.Yet, this is a bad idea, even if the Company Name is unique.What happens when the name changes after a merger? Notonly do you have to change this record, you have to updateevery single related record since the key has changed.

    Another common mistake is to select a .eld that is usuallyunique and unchanging. Consider this small table:

    [People] Social Security

    Number First Name LastName Date of birth

    In the United States all workers have a Social SecurityNumber that uniquely identi.es them for tax purposes. Ordoes it? As it turns out, not everyone has a Social SecurityNumber, some peoples Social Security Numbers change,and some people have more than one. This is an appealingbut untrustworthy key.The correct way to build a primary key is with a unique andunchanging value. 4Ds Sequence number function, or a

    unique ID generating function of your own, is the easiestway to produce synthetic primary key values.

    Functional Dependency

    Closely tied to the notion of a key is a special normalizationconcept called functional dependence or functionaldependency. The second and third normal forms verify thatyour functional dependencies are correct. So what is afunctional dependency? It describes how one .eld (orcombination of .elds) determines another .eld. Consider an

    example:[ZIP Codes] ZIPCode City County

    State Abbreviation

    State Name

  • 7/27/2019 Normalization Summit Training

    19/143

  • 7/27/2019 Normalization Summit Training

    20/143

    ZIP Code is a unique 5-digit key. What makes it a key? It is akey because it determines the other

  • 7/27/2019 Normalization Summit Training

    21/143

    .

  • 7/27/2019 Normalization Summit Training

    22/143

    elds. For each ZIP Code there is a single city, county, andstate abbreviation. These

  • 7/27/2019 Normalization Summit Training

    23/143

    .

  • 7/27/2019 Normalization Summit Training

    24/143

    elds are function

  • 7/27/2019 Normalization Summit Training

    25/143

  • 7/27/2019 Normalization Summit Training

    26/143

    ally dependent on the ZIP Code

  • 7/27/2019 Normalization Summit Training

    27/143

    .

  • 7/27/2019 Normalization Summit Training

    28/143

    eld. In other words, they belong with this key. Look at thelast two

  • 7/27/2019 Normalization Summit Training

    29/143

    .

  • 7/27/2019 Normalization Summit Training

    30/143

    elds, State Abbreviation and State Name. State Abbreviationdetermines State Name, in other words, State Name isfunctionally dependent on State Abbreviation. StateAbbreviation is acting like a key for the State Name

  • 7/27/2019 Normalization Summit Training

    31/143

    .

  • 7/27/2019 Normalization Summit Training

    32/143

    eld. Ah ha! State Abbreviation is a key, so it belongs inanother table. As we

  • 7/27/2019 Normalization Summit Training

    33/143

  • 7/27/2019 Normalization Summit Training

    34/143

    ll see, the third normal form tells us to create a new Statestable and move State Name into it.

  • 7/27/2019 Normalization Summit Training

    35/143

    Normal Forms

    The principles of normalization are described in a series ofprogressively stricter normal forms. First normal form(1NF) is the easiest to satisfy, second normal form (2NF),

    more dif.cult, and so on. There are 5 or 6 normal forms,depending on who you read. It is convenient to talk aboutthe normal forms by their traditional names, since thisterminology is ubiquitous in the relational database industry.It is, however, possible to approach normalization withoutusing this language. For example, Michael Hernandezshelpful Database Design for Mere Mortals uses plainlanguage. Whatever terminology you use, the mostimportant thing is that you go through the process.Note: We advise learning the traditional terms to simplify communicatingwith other database designers. 4D and ACI do not use this terminology, butnearly all other database environments and programmers do.

    First Normal Form (1NF)

    The .rst normal form is easy to understand and apply:

    A table is in first normal form if it contains no repeatinggroups.

    What is a repeating group, and why are they bad? When youhave more than one .eld storing the same kind ofinformation in a single table, you have a repeating group.Repeating groups are the right way to build a spreadsheet,the only way to build a .at-.le database, and the wrong wayto build a relational database. Here is a common example of

    a repeating group:[Customers]Customer ID

    Customer Name

    Contact Name 1

    Contact Name 2

    Contact Name 3

  • 7/27/2019 Normalization Summit Training

    36/143

  • 7/27/2019 Normalization Summit Training

    37/143

    What

  • 7/27/2019 Normalization Summit Training

    38/143

  • 7/27/2019 Normalization Summit Training

    39/143

    s wrong with this approach? Well, what happens when youhave a fourth contact? You have to add a new

  • 7/27/2019 Normalization Summit Training

    40/143

    .

  • 7/27/2019 Normalization Summit Training

    41/143

    eld, modify your forms, and rebuild your routines. Whathappens when you want to query or report based on allcontacts across all customers? It takes a lot of custom code,and may prove too dif

  • 7/27/2019 Normalization Summit Training

    42/143

    .

  • 7/27/2019 Normalization Summit Training

    43/143

    cult in practice. The structure we

  • 7/27/2019 Normalization Summit Training

    44/143

  • 7/27/2019 Normalization Summit Training

    45/143

    ve just shown makes perfect sense in a spreadsheet, butalmost no sense in a relational database. All of the dif

  • 7/27/2019 Normalization Summit Training

    46/143

    .

  • 7/27/2019 Normalization Summit Training

    47/143

    culties we

  • 7/27/2019 Normalization Summit Training

    48/143

  • 7/27/2019 Normalization Summit Training

    49/143

    ve described are resolved by moving contacts into a relatedtable.

  • 7/27/2019 Normalization Summit Training

    50/143

    [Customers]

    Customer ID

    Customer Name

    [Contacts] Customer ID (this field relates [Contacts] and

    [Customers]) Contact ID Contact Name

    Second Normal Form (2NF)

    The second normal form helps identify when youvecombined two tables into one. Second normal form dependson the concepts of the primary key, and functionaldependency. The second normal form is:

    A relation is in second normal form (2NF) if and only if it is in1NF and every nonkey attribute is fully dependent on the

    primary key.C.J. DateAn Introduction to Database Systems

    In other words, your table is in 2NF if: 1) It doesnt have anyrepeating groups. 2) Each of the .elds that isnt a part of the key is

    functionally dependent on the entire key.

    If a single-.eld key is used, a 1NF table is already in 2NF.

  • 7/27/2019 Normalization Summit Training

    51/143

  • 7/27/2019 Normalization Summit Training

    52/143

    Third Normal Form (3NF)

  • 7/27/2019 Normalization Summit Training

    53/143

    Third normal form performs an additional level of veri.cationthat you have not combined tables. Here are two differentde.nitions of the third normal form:

    A table should have a field that uniquely identifies each of its

    records, and each field in the table should describe the subjectthat the table represents.Michael J. HernandezDatabase Design for Mere Mortals

    To test whether a 2NF table is also in 3NF, we ask, Are any ofthe non-key columns dependent on any other non-keycolumns?Chris Gane

    Computer Aided Software Engineering

    When designing a database it is easy enough to accidentallycombine information that belongs in different tables. In theZIP Code example mentioned above, the ZIP Code tableincluded the State Abbreviation and the State Name. The

    State Name is determined by the State Abbreviation, so thethird normal form reminds you to move this .eld into a newtable. Heres how these tables should be set up:

    [ZIP Codes] ZIP

    Code City County

    State Abbreviation

    [States] State

    Abbreviation State

    Name

    Higher Normal Forms

    There are several higher normal forms, including 4NF, 5NF,BCNF, and PJ/NF. Well leave our discussion at 3NF, which isadequate for most practical needs. If you are interested inhigher normal forms, consult a book like DatesAnIntroduction to Database Systems.

  • 7/27/2019 Normalization Summit Training

    54/143

  • 7/27/2019 Normalization Summit Training

    55/143

    Learning More AboutNormalization

  • 7/27/2019 Normalization Summit Training

    56/143

    ACI University in the United States offers a course ondesigning relational databases that covers normalization indetail. If you cannot attend a course, here are somesuggestions for books that treat these concepts in greater

    detail.Michael J. HernandezDatabase Design for Mere Morals

    A Hands-On Guide to Relational Database

    Design Addison Wesley 1997 ISBN 0-201-

    69471-9

    We recommend Michael Hernandezs very approachablebook on the database design process. (And at $27.95 youcant go wrong.) Unfortunately, he does not correlate hisdiscussion with traditional normalization terminology.

    C. J. DateAn Introduction to Database Systems, Volume

    IFifth Edition Addison-Wesley Systems

    Programming Series 1990 ISBN: 0-201-

    51381-1

    This is the classic textbook on database systems. It is widelyavailable in libraries and technical bookstores. Its worthchecking out of a library for reference, but if it makes youreyes cross put it down.

    Chris Gane

    Computer-Aided Software EngineeringThe Methodologies, The Products, and The

    Future Prentice Hall 1990 ISBN: 0-13-

    176231-1

    Gane, one of the pioneers of RAD (Rapid ApplicationDevelopment), is a widely published author on the bene.tsof CASE (Computer Aided Software Engineering). Hisdiscussions of database normalization are at anintermediate level.

  • 7/27/2019 Normalization Summit Training

    57/143

  • 7/27/2019 Normalization Summit Training

    58/143

    Denormalizatio

  • 7/27/2019 Normalization Summit Training

    59/143

    n

  • 7/27/2019 Normalization Summit Training

    60/143

    Summary Data

    Compound Fields

    Denormalization is the process of modifying a perfectlynormalized database design for performance reasons.Denormalization is a natural and necessary part of databasedesign, but must follow proper normalization. Here are a few

    words from Date on denormalization:The general idea of normalization...is that the databasedesigner should aim for relations in the ultimate normalform (5NF). However, this recommendation should not beconstrued as law. Sometimes there are good reasons forflouting the principles of normalization.... The only hard re-quirement is that relations be in at least first normal form.Indeed, this is as good a place as any to make the point thatdatabase design can be an extremely complex task....Normalization theory is a useful aid in the process, but it isnot a panacea; anyone designing a database is certainly ad-vised to be familiar with the basic techniques ofnormalization...but we do not mean to suggest that the designshould necessarily be based on normalization principles alone.

    Date, pages 528-529

    Deliberate denormalization is commonplace when youreoptimizing performance. If you continuously draw data froma related table, it may make sense to duplicate the dataredundantly. Denormalization always makes your systempotentially less ef.cient and .exible, so denormalize asneeded, but not frivolously.

    There are techniques for improving performance thatinvolve storing redundant or calculated data. Some of thesetechniques break the rules of normalization, other do not.Sometimes real world requirements justify breaking therules. Intelligently and consciously breaking the rules ofnormalization for performance purposes is an acceptedpractice, and should only be done when the bene.ts of thechange justify breaking the rule.Lets examine some powerful summary data techniques:

    CompoundFields SummaryFields SummaryTables

    A compound .eld is a .eld whose value is the combination of two or more.elds in the same record. Compound .elds optimize certain 4D databaseoperations signi.cantly, including queries, sorts,

  • 7/27/2019 Normalization Summit Training

    61/143

  • 7/27/2019 Normalization Summit Training

    62/143

    reports, and data display. The cost of using compound

  • 7/27/2019 Normalization Summit Training

    63/143

    .

  • 7/27/2019 Normalization Summit Training

    64/143

    elds is the space they occupy and the code needed to maintain them. (Com

  • 7/27/2019 Normalization Summit Training

    65/143

  • 7/27/2019 Normalization Summit Training

    66/143

    pound

  • 7/27/2019 Normalization Summit Training

    67/143

    .

  • 7/27/2019 Normalization Summit Training

    68/143

    elds typically violate 2NF or 3NF.)

  • 7/27/2019 Normalization Summit Training

    69/143

    Save Combined Values

    Combining two or more .elds into a single .eld is thesimplest application of the time-space trade-off. Forexample, if your database has a table with addressesincluding city and state, you can create a compound .eld(call it City_State) that is made up of the concatenation ofthe city and state .elds. Sorts and queries on City_State aremuch faster than the same sort or query using the twosource .eldssometimes even 40 times faster.

    City_State stores the city and state in one field.

    The downside of compound .elds for the developer is thatyou have to write code to make sure that the City_State .eldis updated whenever either the city or the state .eld valuechanges. This is not dif.cult to do, but it is important thatthere are no leaks, or situations where the source datachanges and, through some oversight, the compound .eld

    value is not updated.Conditions for Building CompoundFieldsCompound .elds optimize situations that meet any ofthese conditions:

    The database uses multiple .elds in a sequentialoperation. The database combines two or more .eldsroutinely for any purpose.

    The database requires a compoundkey.

    Lets look at each of these situations in detail.USING MULTIPLE FIELDS IN A SEQUENTIALOPERATION

    Sorting on two indexed .elds takes about twenty times longer than sorting on oneindexed .eld. This is a case where you can give your system a huge speedimprovement by using a compound .eld. If, for example, you routinely sort on stateand city, you can improve

  • 7/27/2019 Normalization Summit Training

    70/143

  • 7/27/2019 Normalization Summit Training

    71/143

    sort speeds dramatically by creating an indexed, compound

  • 7/27/2019 Normalization Summit Training

    72/143

    .

  • 7/27/2019 Normalization Summit Training

    73/143

    eld that contains the concatenated state and city

  • 7/27/2019 Normalization Summit Training

    74/143

    .

  • 7/27/2019 Normalization Summit Training

    75/143

    eld values, and sort on this combination.

  • 7/27/2019 Normalization Summit Training

    76/143

    THE SOURCE FIELDS ARE COMBINEDROUTINELY

    If you have .elds that are routinely combined on forms,

    reports and labels, you can optimize these operations bypre-combining the .elds; then the combined values aredisplayed without further calculation. This makes theseoperations noticeably faster. An additional bene.t is thatcombined values makes it easier for end users to constructreports and labels with the information they need.

    COMPOUND KEYS

    A compound key ensures uniqueness based on acombination of two or more .elds. An example of acompound key is the unique combination of an employeesidenti.cation number, .rst name, and last name. In 4D,

    ensuring that these values are unique in combinationrequires a programmed multiple-.eld query or acompound .eld. Once you have created a compound .eld tostore the key, you can use 4Ds built-in uniqueness-veri.cation feature and save the time spent querying. Or youcan query on the compound .eld and .nd the results morequickly than with an equivalent multi-.eld query.When to Create Compound Fields

    The three cases discussed here show obvious examples ofwhere compound .elds improve speed. Compound .elds canalso provide a signi.cant improvement in ease of use that

    justi.es their use in less obvious examples. Your users caneasily query, sort, and report on compound .elds. They donthave to learn how to combine these values correctly in eachof 4Ds editors, and they dont have to ask you to do this forthem. Ease of use for end users almost always counts infavor of this technique even when the time savings aresmall.

    Summary Fields

    A summary .eld is a .eld in a one table record whose value isbased on data in related-many table records. Summary .elds

    eliminate repetitive and time-consuming cross-tablecalculations and make calculated results directly availablefor end-user queries, sorts, and reports without newprogramming. One-table .elds that summarize values inmultiple related records are a powerful optimization tool.Imagine tracking invoices without maintaining the invoicetotal!

  • 7/27/2019 Normalization Summit Training

    77/143

  • 7/27/2019 Normalization Summit Training

    78/143

    Summary

  • 7/27/2019 Normalization Summit Training

    79/143

    .

  • 7/27/2019 Normalization Summit Training

    80/143

    elds like this do not violate the rules of normalization.Normalization is often misconceived as forbidding thestorage of cal

  • 7/27/2019 Normalization Summit Training

    81/143

  • 7/27/2019 Normalization Summit Training

    82/143

    culated values, leading people to avoid appropriate summary

  • 7/27/2019 Normalization Summit Training

    83/143

    .

  • 7/27/2019 Normalization Summit Training

    84/143

    elds.

  • 7/27/2019 Normalization Summit Training

    85/143

    There are two costs to consider when contemplating using asummary .eld: the coding time required to maintainaccurate data and the space required to store the summary.eld.

    Example

    Saving summary account data is such a common practicethat it is easy not to notice that the underlying principle isusing a summary .eld. Consider the following tablestructure:

    A very simple accounting system structure.

    The Total_sales .eld in an Accounts record summarizesvalues found in one or more invoice records. You dont needthe summary .eld in Accounts since the value can becalculated at any point through inspection of the Invoicestable. But, by saving this critical summary data, you obtainthe following advantages:

    A user does not need to wait for an accounts totalto be calculated.

    4Ds built in editorsQuick Report, Order by,Query, and Graphcan use this data directly withoutperforming any special calculations.

    Reports print quickly because there are nocalculations performed.

    You can export the results to another program foranalysis.

    What to Summarize

    Summary .elds work because you, as a designer, had theforesight to create a special .eld to optimize a particularoperation or set of operations. Users may, from time totime, come to you with a new

  • 7/27/2019 Normalization Summit Training

    86/143

  • 7/27/2019 Normalization Summit Training

    87/143

    requirement that is not already optimized. This does notmean that your summary system is not effective, it simplymeans you need to consider expanding it. When consideringwhat to optimize using summary

  • 7/27/2019 Normalization Summit Training

    88/143

    .

  • 7/27/2019 Normalization Summit Training

    89/143

    elds, choose operations that users frequently need, com

  • 7/27/2019 Normalization Summit Training

    90/143

  • 7/27/2019 Normalization Summit Training

    91/143

    plain about, or neglect because they are too slow. Becausesummary

  • 7/27/2019 Normalization Summit Training

    92/143

    .

  • 7/27/2019 Normalization Summit Training

    93/143

    elds require extra space and maintenance, you should usethem

  • 7/27/2019 Normalization Summit Training

    94/143

    only

  • 7/27/2019 Normalization Summit Training

    95/143

    when there is a de

  • 7/27/2019 Normalization Summit Training

    96/143

    .

  • 7/27/2019 Normalization Summit Training

    97/143

    nite user bene

  • 7/27/2019 Normalization Summit Training

    98/143

    .

  • 7/27/2019 Normalization Summit Training

    99/143

    t.

  • 7/27/2019 Normalization Summit Training

    100/143

    Summary Tables

    A summary table is a table whose records summarize largeamounts of related data or the results of a series ofcalculations. The entire table is maintained to optimizereporting, querying, and generating cross-table selections.Summary tables contain derived data from multiple recordsand do not necessarily violate the rules of normalization.People often overlook summary tables based on the

    misconception that derived data is necessarilydenormalized.Lets discusses the use of summary tables by examining adetailed example.

    The ACME Software Company*

    Consider a problem and corresponding database solutionthat uses a summary table. The ACME Software Companyuses a 4D Server database to keep track of customerproduct registrations. Here is the table structure:

    ACMEs registration system structure

    * Any resemblance between ACME Software Company and persons orcompanies living or dead is purely coincidental.

  • 7/27/2019 Normalization Summit Training

    101/143

  • 7/27/2019 Normalization Summit Training

    102/143

    ACME uses this system to generate mailings, ful

  • 7/27/2019 Normalization Summit Training

    103/143

    .

  • 7/27/2019 Normalization Summit Training

    104/143

    ll upgrades, and track registration trends. Each customercan have many registrations, and each product can havemany registrations. In this situation you can easily end upwith record counts like these:

  • 7/27/2019 Normalization Summit Training

    105/143

    Table

    Count

    Customers

    10,000

    Products

    100

    Registrations

    50,000

    Original Record Counts

    With even a modest number of products and customers, theregistrations table quickly becomes very large. A largeregistration table is desirable, as it means that ACME willremain pro.table. However, it makes a variety of other taskstime-consuming:1) Finding customers who own the product SuperGoldPro Classic.

    2) Finding customers who own three or more copies of

    SuperGoldPro Workgroup. 3) Finding customers who own three ormore copies of SuperGoldPro Workgroup and exactly two copies ofSuperGoldPro ReportWriterPlus. 4) Showing a registration trend

    report that tells, for each product, how many customers have onecopy registered, how many have two copies, and so on.

    You can answer these questions with the existing structure.The problem is that it is too dif.cult for users to dothemselves, requires custom programming, and can take along time to execute.In a case like ACMEs, keeping a summary table to simplifyand optimize querying and reporting is well worth the extraspace and effort. Here is a modi.ed table structure:

    ACMEs registration system structure with a summarytable

  • 7/27/2019 Normalization Summit Training

    106/143

  • 7/27/2019 Normalization Summit Training

    107/143

    The Summary table forms a many-to-many relationshipbetween Customers and Products, just like the Registrationstable. From this structure view it does not seem that theSummary table offers any bene

  • 7/27/2019 Normalization Summit Training

    108/143

    .

  • 7/27/2019 Normalization Summit Training

    109/143

    t. Consider, however, these representative record counts:

  • 7/27/2019 Normalization Summit Training

    110/143

    Table

    Count

    Customers

    10,000

    Products

    100

    Registrations

    50,000

    Summary

    13,123

    Reg

    istrations

    Sum

    mary

    Reason

    50,000

    1 One customerregisteredevery productsold. (Try todouble yourcustomers!)

    50,000

    10,000

    Multiple-copyregistrationsare common.

    50,000

    37,500

    Multiple-copyregistrationsaccount forroughly a half(25,000) of allregistrations.

    50,000

    50,000

    No customerowns twocopies of anyproduct. (Timeto increasethe marketingbudget!)

    Record Counts with Summary Table

    The Summary table stores only one record for each productthat a customer owns. In other words, if the customer ownsten copies of a particular product they have ten registrationrecords and one registration summary record. Therelationship between the number of records in the Summary

    table and the Registrations table is data-dependent. Youneed to examine your data to .nd when a summary table isef.cient. Consider the range of possible values:

    Possible summary table record counts

    In cases where the record count is lower in Summary than inRegistrations, an equivalent join from Summary is faster. Notonly this, you can now .nd customers based on complexregistration conditions with simple indexed queries.

    Lets look at how you would satisfy the requests listed at thebeginning of this example with and without the Summary

    table.

  • 7/27/2019 Normalization Summit Training

    111/143

  • 7/27/2019 Normalization Summit Training

    112/143

    T

  • 7/27/2019 Normalization Summit Training

    113/143

    ASK

  • 7/27/2019 Normalization Summit Training

    114/143

    1: A S

  • 7/27/2019 Normalization Summit Training

    115/143

    IMPLE

  • 7/27/2019 Normalization Summit Training

    116/143

    Q

  • 7/27/2019 Normalization Summit Training

    117/143

    UERY

  • 7/27/2019 Normalization Summit Training

    118/143

    WithoutSummarytable

    WithSummarytable

    QueryRegistrationsforSuperGoldProClassic.

    QuerySummary forSuperGoldProClassic.

    Join fromRegistrationsto Custom-ers.

    Join fromSummary toCustomers.

    WithoutSummarytable

    WithSummarytable

    QueryRegistrationsforSuperGoldPro

    QuerySummary forrecords withSuperGoldProand three ormoreregistrations

    SortRegistrationsbyCustomer_IDCreate anempty set inCustomersStep througheach recordin Registra-tions,counting up

    how manycopies acustomer has.If a customerowns three ormore copies,add them toyour set.

    Join fromSummary toCustomers.

    Find customers who own the product SuperGoldPro Classic.

    Steps required to complete task 1.

    The steps are exactly the same in this example. Thedifference is that the Summary table never contains morerecords than the Registration table, and normally containsfar fewer. The speed of a join is dependent on the size of theinitial selection, so reducing the initial selection directlyreduces the time required to complete the join.

    TASK2: A MORE COMPLEX QUERY

    Find customers who own three or more copies of SuperGoldPro.

    Steps required to complete task 2.

    With the summary table in place it is no more dif.cult or timeconsuming to answer this more precise question. A simpletwo-.eld indexed query is followed by a join. An interfacethat gives end users access to this feature is easy toconstruct. Without a summary table you have to perform asequential operation on every matching registrationalaborious and time-consuming operation.

  • 7/27/2019 Normalization Summit Training

    119/143

  • 7/27/2019 Normalization Summit Training

    120/143

    T

  • 7/27/2019 Normalization Summit Training

    121/143

    ASK

  • 7/27/2019 Normalization Summit Training

    122/143

    3: A C

  • 7/27/2019 Normalization Summit Training

    123/143

    OMPOUND

  • 7/27/2019 Normalization Summit Training

    124/143

    Q

  • 7/27/2019 Normalization Summit Training

    125/143

    UERY

  • 7/27/2019 Normalization Summit Training

    126/143

    WithoutSummarytable

    WithSummarytable

    QueryRegistrationsforSuperGold-Pro

    Workgroup.

    QuerySummary forSuperGoldProWorkgroupregistrations

    with a reg-istration countof three ormore.

    SortRegistrationsbyCustomer_ID.Create anempty set inCustomers.Step througheach recordin Regis-trations,counting uphow manycopies acustomerhas. If acustomerowns three ormore copies,add them toyour set.

    Join fromSummary toCustomers.

    Create a setof the foundcustomers

    Create a set ofthe foundcustomers.

    Create an

    empty set inCustomersQueryRegistrationsforSuperGold-ProReportWriterPlus.

    QuerySummary forSuperGoldProWorkgroupregistrationswith a reg-istration countof exactly two.

    Step througheach recordin Regis-trations,counting uphow manycopies acustomerhas. If acustomerowns twocopiesexactly, thenadd them toyour set.

    Join fromSummary toCustomers.

    Create a set ofthe foundcustomers.

    Intersect the

    twoCustomersets to locate

    Intersect the

    two Customersets to locatethe correct

    Find customers who own three or more copies ofSuperGoldPro Workgroup and exactly two copies ofSuperGoldPro Report WriterPlus.

    Steps required to complete task 3.

  • 7/27/2019 Normalization Summit Training

    127/143

    the correctcustomers.

    customers.

    Each of these solutions appears complex, but with thesummary table in place the routine needed is short(commands are shown without full parameters) and quick:

    QUERY([Summary]) RELATE ONE SELECTION([Summary];[Customers]) CREATE SET([Customers])QUERY([Summary]) RELATE ONE SELECTION([Summary];[Customers]) CREATE SET([Customers]) INTERSECT USESET

    The queries and joins are all indexed, non-sequentialoperations, and set operations are extremely fast regardlessof selection size. The alternative method, without thesummary table, requires error-prone custom programming,direct record access, sequential inspection of records, andexcessive network activity under 4D Server.

  • 7/27/2019 Normalization Summit Training

    128/143

  • 7/27/2019 Normalization Summit Training

    129/143

    Notice in the previous examples that after the complicatedcalcula

  • 7/27/2019 Normalization Summit Training

    130/143

  • 7/27/2019 Normalization Summit Training

    131/143

    tions needed without a summary table are completed,

  • 7/27/2019 Normalization Summit Training

    132/143

    all the work is lost

  • 7/27/2019 Normalization Summit Training

    133/143

    . You have found the correct customers for that query, but ifanother user wants the same query

  • 7/27/2019 Normalization Summit Training

    134/143

    the entire process is repeated

  • 7/27/2019 Normalization Summit Training

    135/143

    . If the same user wants a slightly different set of conditionsthey have to wait through another long query. This gives theuser the impres

  • 7/27/2019 Normalization Summit Training

    136/143

  • 7/27/2019 Normalization Summit Training

    137/143

    sion that the system is slow, hard to work with, and hard toget data out of. By saving these calculations in a table forquicker and easier access you improve life for your users. Inaddition, you can now eas

  • 7/27/2019 Normalization Summit Training

    138/143

  • 7/27/2019 Normalization Summit Training

    139/143

    ily generate a variety of reports on the summary data itselfusing the Quick Report editor, or any other system you like.

  • 7/27/2019 Normalization Summit Training

    140/143

    TASK4: A TREND REPORT

    Show a registration trend report that tells us for eachproduct how many customers have one copy registered,how many have two copies, and so on.

    The easiest way to provide this report with a summary table

    in place is with a Quick Report:

    A sample Quick Report showing registration totals andtrends.

    This report requires no code, extra calculations, or special

    programming. All it requires is the time to sort and countthe records, like any sorted Quick Report. A user can create,modify, or export this simple report.

  • 7/27/2019 Normalization Summit Training

    141/143

  • 7/27/2019 Normalization Summit Training

    142/143

    Cost of Maintaining a SummaryTable

  • 7/27/2019 Normalization Summit Training

    143/143

    In order for a summary table to be useful itneeds to be accurate. This means you need toupdate summary records whenever source records change. In our example, this meansthat summary records need to be changed every time a registration is added, deleted, ormodi.ed. This task can be taken care of in the validation script of the record, in the Afterphase of the form (4D 3.x), in a trigger (preferred), or in batch mode. You must also makesure to update summary records if you change source data in your code. Keeping thedata valid requires extra work and introduces the possibility of coding errors, so youshould factor this cost in when deciding if you are going to use this technique.