Normalization Summit Training
-
Upload
dadar-guru -
Category
Documents
-
view
215 -
download
0
Transcript of Normalization Summit Training
-
7/27/2019 Normalization Summit Training
1/143
Summit 97Normalization Is a Nice
Theoryby David Adams & Dan Beckett
1997 David Adams & Dan Beckett. All rights reserved.
Content adapted from Programming 4th Dimension: TheUltimate Guide, by David Adams & Dan Beckett, published byForesight Technology, Inc.
-
7/27/2019 Normalization Summit Training
2/143
-
7/27/2019 Normalization Summit Training
3/143
Normalization Is a Nice Theo
-
7/27/2019 Normalization Summit Training
4/143
y
-
7/27/2019 Normalization Summit Training
5/143
Quick Look
Designing a normalized database structure is the .rst stepwhen building a database that is meant to last.Normalization is a simple, common-sense, process thatleads to .exible, ef.cient, maintainable database structures.
Well examine the major principles and objectives ofnormalization and denormalization, then take a look at somepowerful optimization techniques that can break the rules ofnormalization.
What is Normalization?
Simply put, normalization is a formal process fordetermining which .elds belong in which tables in arelational database. Normalization follows a set of rulesworked out at the time relational databases were born. Anormalized relational database provides several bene.ts:
Elimination of redundant datastorage. Close modeling of real world entities, processes,and their relationships.
Structuring of data so that the model is.exible.
Normalization ensures that you get the bene.ts relationaldatabases offer. Time spent learning about normalizationwill begin paying for itself immediately.
Why Do They Talk Like
That?Some people are intimidated by the language ofnormalization. Here is a quote from a classic text onrelational database design:
-
7/27/2019 Normalization Summit Training
6/143
-
7/27/2019 Normalization Summit Training
7/143
A relation is in third normal form (3NF) if and only if it is in2NF and every nonkey attribute is nontransitively dependenton the primary key.
-
7/27/2019 Normalization Summit Training
8/143
C.J. DateAn Introduction to Database Systems
Huh? Relational database theory, and the principles ofnormalization, were .rst constructed by people intimately
acquainted with set theory and predicate calculus. Theywrote about databases for like-minded people. Because ofthis, people sometimes think that normalization is hard.Nothing could be more untrue. The principles ofnormalization are simple, common-sense ideas that areeasy to apply. Here is another authors description of thesame principle:
A table should have a field that uniquely identifies each of itsrecords, and each field in the table should describe the subjectthat the table represents.Michael J. Hernandez
Database Design for Mere Mortals
That sounds pretty sensible. A table should have somethingthat uniquely identi.es each record, and each .eld in the
record should be about the same thing. We can summarizethe objectives of normalization even more simply:
Eliminate redundant fields.
Avoid merging tables.
Youve probably intuitively followed many normalizationprinciples all along. The purpose of formal normalization isto ensure that your common sense and intuition are appliedconsistently to the entire database design.
Design VersusImplementation
Designing a database structure and implementing adatabase structure are different tasks. When you design astructure it should be described without reference to thespeci.c database tool you will use to implement the system,or what concessions you plan to make for performancereasons. These steps come later. After youve designed thedatabase structure abstractly, then you implement it in aparticular environment4D in our case. Too often peoplenew to database design combine design andimplementation in one step. 4D makes this tempting
because the structure editor is so easy to use. Implementinga structure without designing it quickly leads to .awedstructures that are dif.cult and costly to modify. Design .rst,implement second, and youll .nish faster and cheaper.
-
7/27/2019 Normalization Summit Training
9/143
-
7/27/2019 Normalization Summit Training
10/143
Normalized Design: Pros andCons
-
7/27/2019 Normalization Summit Training
11/143
ProsofNormalizing
Cons ofNormalizing
More ef.cientdatabasestructure.Betterunderstanding
of your data.More .exibledatabasestructure.Easier tomaintaindatabasestructure. Few(if any) costlysurprises downthe road.Validates yourcommon senseand intuition.Avoidredundant.elds. Insurethat distincttables existwhen nec-essary.
You cant startbuilding thedatabase beforeyou know whatthe user needs.
Weve implied that there are various advantages toproducing a properly normalized design before youimplement your system. Lets look at a detailed list of thepros and cons:
We think that the pros outweigh the cons.
Terminology
There are a couple terms that are central to a discussion ofnormalization: key and dependency. These are probablyfamiliar concepts to anyone who has built relationaldatabase systems, though they may not be using these
words. We de.ne and discuss them here as necessarybackground for the discussion of normal forms that follows.
Primary Key
The primary key is a fundamental concept in relationaldatabase design. Its an easy concept: each record shouldhave something that identi.es it uniquely. The primary keycan be a single .eld, or a combination of .elds. A tablesprimary key also serves as the basis of relationships withother tables. For example, it is typical to relate invoices to aunique customer ID, and employees to a unique department
ID.Note: 4D does not implicitly support multi-.eld primary keys, thoughmulti-.eld keys are common in other client-server databases. The simplestway to implement a multi-.eld key in 4D is by maintaining an additional .eldthat stores the concatenation of the components of your multi-.eld key into asingle .eld. A concatenated key of this kind is easy to maintain using a 4D V6trigger.
-
7/27/2019 Normalization Summit Training
12/143
-
7/27/2019 Normalization Summit Training
13/143
A primary key should be unique, mandatory, andpermanent. A clas
-
7/27/2019 Normalization Summit Training
14/143
-
7/27/2019 Normalization Summit Training
15/143
sic mistake people make when learning to create relationaldatabases is to use a volatile
-
7/27/2019 Normalization Summit Training
16/143
.
-
7/27/2019 Normalization Summit Training
17/143
eld as the primary key. For example, consider this table:
-
7/27/2019 Normalization Summit Training
18/143
[Companies]
Company Name
Address
Company Name is an obvious candidate for the primary key.Yet, this is a bad idea, even if the Company Name is unique.What happens when the name changes after a merger? Notonly do you have to change this record, you have to updateevery single related record since the key has changed.
Another common mistake is to select a .eld that is usuallyunique and unchanging. Consider this small table:
[People] Social Security
Number First Name LastName Date of birth
In the United States all workers have a Social SecurityNumber that uniquely identi.es them for tax purposes. Ordoes it? As it turns out, not everyone has a Social SecurityNumber, some peoples Social Security Numbers change,and some people have more than one. This is an appealingbut untrustworthy key.The correct way to build a primary key is with a unique andunchanging value. 4Ds Sequence number function, or a
unique ID generating function of your own, is the easiestway to produce synthetic primary key values.
Functional Dependency
Closely tied to the notion of a key is a special normalizationconcept called functional dependence or functionaldependency. The second and third normal forms verify thatyour functional dependencies are correct. So what is afunctional dependency? It describes how one .eld (orcombination of .elds) determines another .eld. Consider an
example:[ZIP Codes] ZIPCode City County
State Abbreviation
State Name
-
7/27/2019 Normalization Summit Training
19/143
-
7/27/2019 Normalization Summit Training
20/143
ZIP Code is a unique 5-digit key. What makes it a key? It is akey because it determines the other
-
7/27/2019 Normalization Summit Training
21/143
.
-
7/27/2019 Normalization Summit Training
22/143
elds. For each ZIP Code there is a single city, county, andstate abbreviation. These
-
7/27/2019 Normalization Summit Training
23/143
.
-
7/27/2019 Normalization Summit Training
24/143
elds are function
-
7/27/2019 Normalization Summit Training
25/143
-
7/27/2019 Normalization Summit Training
26/143
ally dependent on the ZIP Code
-
7/27/2019 Normalization Summit Training
27/143
.
-
7/27/2019 Normalization Summit Training
28/143
eld. In other words, they belong with this key. Look at thelast two
-
7/27/2019 Normalization Summit Training
29/143
.
-
7/27/2019 Normalization Summit Training
30/143
elds, State Abbreviation and State Name. State Abbreviationdetermines State Name, in other words, State Name isfunctionally dependent on State Abbreviation. StateAbbreviation is acting like a key for the State Name
-
7/27/2019 Normalization Summit Training
31/143
.
-
7/27/2019 Normalization Summit Training
32/143
eld. Ah ha! State Abbreviation is a key, so it belongs inanother table. As we
-
7/27/2019 Normalization Summit Training
33/143
-
7/27/2019 Normalization Summit Training
34/143
ll see, the third normal form tells us to create a new Statestable and move State Name into it.
-
7/27/2019 Normalization Summit Training
35/143
Normal Forms
The principles of normalization are described in a series ofprogressively stricter normal forms. First normal form(1NF) is the easiest to satisfy, second normal form (2NF),
more dif.cult, and so on. There are 5 or 6 normal forms,depending on who you read. It is convenient to talk aboutthe normal forms by their traditional names, since thisterminology is ubiquitous in the relational database industry.It is, however, possible to approach normalization withoutusing this language. For example, Michael Hernandezshelpful Database Design for Mere Mortals uses plainlanguage. Whatever terminology you use, the mostimportant thing is that you go through the process.Note: We advise learning the traditional terms to simplify communicatingwith other database designers. 4D and ACI do not use this terminology, butnearly all other database environments and programmers do.
First Normal Form (1NF)
The .rst normal form is easy to understand and apply:
A table is in first normal form if it contains no repeatinggroups.
What is a repeating group, and why are they bad? When youhave more than one .eld storing the same kind ofinformation in a single table, you have a repeating group.Repeating groups are the right way to build a spreadsheet,the only way to build a .at-.le database, and the wrong wayto build a relational database. Here is a common example of
a repeating group:[Customers]Customer ID
Customer Name
Contact Name 1
Contact Name 2
Contact Name 3
-
7/27/2019 Normalization Summit Training
36/143
-
7/27/2019 Normalization Summit Training
37/143
What
-
7/27/2019 Normalization Summit Training
38/143
-
7/27/2019 Normalization Summit Training
39/143
s wrong with this approach? Well, what happens when youhave a fourth contact? You have to add a new
-
7/27/2019 Normalization Summit Training
40/143
.
-
7/27/2019 Normalization Summit Training
41/143
eld, modify your forms, and rebuild your routines. Whathappens when you want to query or report based on allcontacts across all customers? It takes a lot of custom code,and may prove too dif
-
7/27/2019 Normalization Summit Training
42/143
.
-
7/27/2019 Normalization Summit Training
43/143
cult in practice. The structure we
-
7/27/2019 Normalization Summit Training
44/143
-
7/27/2019 Normalization Summit Training
45/143
ve just shown makes perfect sense in a spreadsheet, butalmost no sense in a relational database. All of the dif
-
7/27/2019 Normalization Summit Training
46/143
.
-
7/27/2019 Normalization Summit Training
47/143
culties we
-
7/27/2019 Normalization Summit Training
48/143
-
7/27/2019 Normalization Summit Training
49/143
ve described are resolved by moving contacts into a relatedtable.
-
7/27/2019 Normalization Summit Training
50/143
[Customers]
Customer ID
Customer Name
[Contacts] Customer ID (this field relates [Contacts] and
[Customers]) Contact ID Contact Name
Second Normal Form (2NF)
The second normal form helps identify when youvecombined two tables into one. Second normal form dependson the concepts of the primary key, and functionaldependency. The second normal form is:
A relation is in second normal form (2NF) if and only if it is in1NF and every nonkey attribute is fully dependent on the
primary key.C.J. DateAn Introduction to Database Systems
In other words, your table is in 2NF if: 1) It doesnt have anyrepeating groups. 2) Each of the .elds that isnt a part of the key is
functionally dependent on the entire key.
If a single-.eld key is used, a 1NF table is already in 2NF.
-
7/27/2019 Normalization Summit Training
51/143
-
7/27/2019 Normalization Summit Training
52/143
Third Normal Form (3NF)
-
7/27/2019 Normalization Summit Training
53/143
Third normal form performs an additional level of veri.cationthat you have not combined tables. Here are two differentde.nitions of the third normal form:
A table should have a field that uniquely identifies each of its
records, and each field in the table should describe the subjectthat the table represents.Michael J. HernandezDatabase Design for Mere Mortals
To test whether a 2NF table is also in 3NF, we ask, Are any ofthe non-key columns dependent on any other non-keycolumns?Chris Gane
Computer Aided Software Engineering
When designing a database it is easy enough to accidentallycombine information that belongs in different tables. In theZIP Code example mentioned above, the ZIP Code tableincluded the State Abbreviation and the State Name. The
State Name is determined by the State Abbreviation, so thethird normal form reminds you to move this .eld into a newtable. Heres how these tables should be set up:
[ZIP Codes] ZIP
Code City County
State Abbreviation
[States] State
Abbreviation State
Name
Higher Normal Forms
There are several higher normal forms, including 4NF, 5NF,BCNF, and PJ/NF. Well leave our discussion at 3NF, which isadequate for most practical needs. If you are interested inhigher normal forms, consult a book like DatesAnIntroduction to Database Systems.
-
7/27/2019 Normalization Summit Training
54/143
-
7/27/2019 Normalization Summit Training
55/143
Learning More AboutNormalization
-
7/27/2019 Normalization Summit Training
56/143
ACI University in the United States offers a course ondesigning relational databases that covers normalization indetail. If you cannot attend a course, here are somesuggestions for books that treat these concepts in greater
detail.Michael J. HernandezDatabase Design for Mere Morals
A Hands-On Guide to Relational Database
Design Addison Wesley 1997 ISBN 0-201-
69471-9
We recommend Michael Hernandezs very approachablebook on the database design process. (And at $27.95 youcant go wrong.) Unfortunately, he does not correlate hisdiscussion with traditional normalization terminology.
C. J. DateAn Introduction to Database Systems, Volume
IFifth Edition Addison-Wesley Systems
Programming Series 1990 ISBN: 0-201-
51381-1
This is the classic textbook on database systems. It is widelyavailable in libraries and technical bookstores. Its worthchecking out of a library for reference, but if it makes youreyes cross put it down.
Chris Gane
Computer-Aided Software EngineeringThe Methodologies, The Products, and The
Future Prentice Hall 1990 ISBN: 0-13-
176231-1
Gane, one of the pioneers of RAD (Rapid ApplicationDevelopment), is a widely published author on the bene.tsof CASE (Computer Aided Software Engineering). Hisdiscussions of database normalization are at anintermediate level.
-
7/27/2019 Normalization Summit Training
57/143
-
7/27/2019 Normalization Summit Training
58/143
Denormalizatio
-
7/27/2019 Normalization Summit Training
59/143
n
-
7/27/2019 Normalization Summit Training
60/143
Summary Data
Compound Fields
Denormalization is the process of modifying a perfectlynormalized database design for performance reasons.Denormalization is a natural and necessary part of databasedesign, but must follow proper normalization. Here are a few
words from Date on denormalization:The general idea of normalization...is that the databasedesigner should aim for relations in the ultimate normalform (5NF). However, this recommendation should not beconstrued as law. Sometimes there are good reasons forflouting the principles of normalization.... The only hard re-quirement is that relations be in at least first normal form.Indeed, this is as good a place as any to make the point thatdatabase design can be an extremely complex task....Normalization theory is a useful aid in the process, but it isnot a panacea; anyone designing a database is certainly ad-vised to be familiar with the basic techniques ofnormalization...but we do not mean to suggest that the designshould necessarily be based on normalization principles alone.
Date, pages 528-529
Deliberate denormalization is commonplace when youreoptimizing performance. If you continuously draw data froma related table, it may make sense to duplicate the dataredundantly. Denormalization always makes your systempotentially less ef.cient and .exible, so denormalize asneeded, but not frivolously.
There are techniques for improving performance thatinvolve storing redundant or calculated data. Some of thesetechniques break the rules of normalization, other do not.Sometimes real world requirements justify breaking therules. Intelligently and consciously breaking the rules ofnormalization for performance purposes is an acceptedpractice, and should only be done when the bene.ts of thechange justify breaking the rule.Lets examine some powerful summary data techniques:
CompoundFields SummaryFields SummaryTables
A compound .eld is a .eld whose value is the combination of two or more.elds in the same record. Compound .elds optimize certain 4D databaseoperations signi.cantly, including queries, sorts,
-
7/27/2019 Normalization Summit Training
61/143
-
7/27/2019 Normalization Summit Training
62/143
reports, and data display. The cost of using compound
-
7/27/2019 Normalization Summit Training
63/143
.
-
7/27/2019 Normalization Summit Training
64/143
elds is the space they occupy and the code needed to maintain them. (Com
-
7/27/2019 Normalization Summit Training
65/143
-
7/27/2019 Normalization Summit Training
66/143
pound
-
7/27/2019 Normalization Summit Training
67/143
.
-
7/27/2019 Normalization Summit Training
68/143
elds typically violate 2NF or 3NF.)
-
7/27/2019 Normalization Summit Training
69/143
Save Combined Values
Combining two or more .elds into a single .eld is thesimplest application of the time-space trade-off. Forexample, if your database has a table with addressesincluding city and state, you can create a compound .eld(call it City_State) that is made up of the concatenation ofthe city and state .elds. Sorts and queries on City_State aremuch faster than the same sort or query using the twosource .eldssometimes even 40 times faster.
City_State stores the city and state in one field.
The downside of compound .elds for the developer is thatyou have to write code to make sure that the City_State .eldis updated whenever either the city or the state .eld valuechanges. This is not dif.cult to do, but it is important thatthere are no leaks, or situations where the source datachanges and, through some oversight, the compound .eld
value is not updated.Conditions for Building CompoundFieldsCompound .elds optimize situations that meet any ofthese conditions:
The database uses multiple .elds in a sequentialoperation. The database combines two or more .eldsroutinely for any purpose.
The database requires a compoundkey.
Lets look at each of these situations in detail.USING MULTIPLE FIELDS IN A SEQUENTIALOPERATION
Sorting on two indexed .elds takes about twenty times longer than sorting on oneindexed .eld. This is a case where you can give your system a huge speedimprovement by using a compound .eld. If, for example, you routinely sort on stateand city, you can improve
-
7/27/2019 Normalization Summit Training
70/143
-
7/27/2019 Normalization Summit Training
71/143
sort speeds dramatically by creating an indexed, compound
-
7/27/2019 Normalization Summit Training
72/143
.
-
7/27/2019 Normalization Summit Training
73/143
eld that contains the concatenated state and city
-
7/27/2019 Normalization Summit Training
74/143
.
-
7/27/2019 Normalization Summit Training
75/143
eld values, and sort on this combination.
-
7/27/2019 Normalization Summit Training
76/143
THE SOURCE FIELDS ARE COMBINEDROUTINELY
If you have .elds that are routinely combined on forms,
reports and labels, you can optimize these operations bypre-combining the .elds; then the combined values aredisplayed without further calculation. This makes theseoperations noticeably faster. An additional bene.t is thatcombined values makes it easier for end users to constructreports and labels with the information they need.
COMPOUND KEYS
A compound key ensures uniqueness based on acombination of two or more .elds. An example of acompound key is the unique combination of an employeesidenti.cation number, .rst name, and last name. In 4D,
ensuring that these values are unique in combinationrequires a programmed multiple-.eld query or acompound .eld. Once you have created a compound .eld tostore the key, you can use 4Ds built-in uniqueness-veri.cation feature and save the time spent querying. Or youcan query on the compound .eld and .nd the results morequickly than with an equivalent multi-.eld query.When to Create Compound Fields
The three cases discussed here show obvious examples ofwhere compound .elds improve speed. Compound .elds canalso provide a signi.cant improvement in ease of use that
justi.es their use in less obvious examples. Your users caneasily query, sort, and report on compound .elds. They donthave to learn how to combine these values correctly in eachof 4Ds editors, and they dont have to ask you to do this forthem. Ease of use for end users almost always counts infavor of this technique even when the time savings aresmall.
Summary Fields
A summary .eld is a .eld in a one table record whose value isbased on data in related-many table records. Summary .elds
eliminate repetitive and time-consuming cross-tablecalculations and make calculated results directly availablefor end-user queries, sorts, and reports without newprogramming. One-table .elds that summarize values inmultiple related records are a powerful optimization tool.Imagine tracking invoices without maintaining the invoicetotal!
-
7/27/2019 Normalization Summit Training
77/143
-
7/27/2019 Normalization Summit Training
78/143
Summary
-
7/27/2019 Normalization Summit Training
79/143
.
-
7/27/2019 Normalization Summit Training
80/143
elds like this do not violate the rules of normalization.Normalization is often misconceived as forbidding thestorage of cal
-
7/27/2019 Normalization Summit Training
81/143
-
7/27/2019 Normalization Summit Training
82/143
culated values, leading people to avoid appropriate summary
-
7/27/2019 Normalization Summit Training
83/143
.
-
7/27/2019 Normalization Summit Training
84/143
elds.
-
7/27/2019 Normalization Summit Training
85/143
There are two costs to consider when contemplating using asummary .eld: the coding time required to maintainaccurate data and the space required to store the summary.eld.
Example
Saving summary account data is such a common practicethat it is easy not to notice that the underlying principle isusing a summary .eld. Consider the following tablestructure:
A very simple accounting system structure.
The Total_sales .eld in an Accounts record summarizesvalues found in one or more invoice records. You dont needthe summary .eld in Accounts since the value can becalculated at any point through inspection of the Invoicestable. But, by saving this critical summary data, you obtainthe following advantages:
A user does not need to wait for an accounts totalto be calculated.
4Ds built in editorsQuick Report, Order by,Query, and Graphcan use this data directly withoutperforming any special calculations.
Reports print quickly because there are nocalculations performed.
You can export the results to another program foranalysis.
What to Summarize
Summary .elds work because you, as a designer, had theforesight to create a special .eld to optimize a particularoperation or set of operations. Users may, from time totime, come to you with a new
-
7/27/2019 Normalization Summit Training
86/143
-
7/27/2019 Normalization Summit Training
87/143
requirement that is not already optimized. This does notmean that your summary system is not effective, it simplymeans you need to consider expanding it. When consideringwhat to optimize using summary
-
7/27/2019 Normalization Summit Training
88/143
.
-
7/27/2019 Normalization Summit Training
89/143
elds, choose operations that users frequently need, com
-
7/27/2019 Normalization Summit Training
90/143
-
7/27/2019 Normalization Summit Training
91/143
plain about, or neglect because they are too slow. Becausesummary
-
7/27/2019 Normalization Summit Training
92/143
.
-
7/27/2019 Normalization Summit Training
93/143
elds require extra space and maintenance, you should usethem
-
7/27/2019 Normalization Summit Training
94/143
only
-
7/27/2019 Normalization Summit Training
95/143
when there is a de
-
7/27/2019 Normalization Summit Training
96/143
.
-
7/27/2019 Normalization Summit Training
97/143
nite user bene
-
7/27/2019 Normalization Summit Training
98/143
.
-
7/27/2019 Normalization Summit Training
99/143
t.
-
7/27/2019 Normalization Summit Training
100/143
Summary Tables
A summary table is a table whose records summarize largeamounts of related data or the results of a series ofcalculations. The entire table is maintained to optimizereporting, querying, and generating cross-table selections.Summary tables contain derived data from multiple recordsand do not necessarily violate the rules of normalization.People often overlook summary tables based on the
misconception that derived data is necessarilydenormalized.Lets discusses the use of summary tables by examining adetailed example.
The ACME Software Company*
Consider a problem and corresponding database solutionthat uses a summary table. The ACME Software Companyuses a 4D Server database to keep track of customerproduct registrations. Here is the table structure:
ACMEs registration system structure
* Any resemblance between ACME Software Company and persons orcompanies living or dead is purely coincidental.
-
7/27/2019 Normalization Summit Training
101/143
-
7/27/2019 Normalization Summit Training
102/143
ACME uses this system to generate mailings, ful
-
7/27/2019 Normalization Summit Training
103/143
.
-
7/27/2019 Normalization Summit Training
104/143
ll upgrades, and track registration trends. Each customercan have many registrations, and each product can havemany registrations. In this situation you can easily end upwith record counts like these:
-
7/27/2019 Normalization Summit Training
105/143
Table
Count
Customers
10,000
Products
100
Registrations
50,000
Original Record Counts
With even a modest number of products and customers, theregistrations table quickly becomes very large. A largeregistration table is desirable, as it means that ACME willremain pro.table. However, it makes a variety of other taskstime-consuming:1) Finding customers who own the product SuperGoldPro Classic.
2) Finding customers who own three or more copies of
SuperGoldPro Workgroup. 3) Finding customers who own three ormore copies of SuperGoldPro Workgroup and exactly two copies ofSuperGoldPro ReportWriterPlus. 4) Showing a registration trend
report that tells, for each product, how many customers have onecopy registered, how many have two copies, and so on.
You can answer these questions with the existing structure.The problem is that it is too dif.cult for users to dothemselves, requires custom programming, and can take along time to execute.In a case like ACMEs, keeping a summary table to simplifyand optimize querying and reporting is well worth the extraspace and effort. Here is a modi.ed table structure:
ACMEs registration system structure with a summarytable
-
7/27/2019 Normalization Summit Training
106/143
-
7/27/2019 Normalization Summit Training
107/143
The Summary table forms a many-to-many relationshipbetween Customers and Products, just like the Registrationstable. From this structure view it does not seem that theSummary table offers any bene
-
7/27/2019 Normalization Summit Training
108/143
.
-
7/27/2019 Normalization Summit Training
109/143
t. Consider, however, these representative record counts:
-
7/27/2019 Normalization Summit Training
110/143
Table
Count
Customers
10,000
Products
100
Registrations
50,000
Summary
13,123
Reg
istrations
Sum
mary
Reason
50,000
1 One customerregisteredevery productsold. (Try todouble yourcustomers!)
50,000
10,000
Multiple-copyregistrationsare common.
50,000
37,500
Multiple-copyregistrationsaccount forroughly a half(25,000) of allregistrations.
50,000
50,000
No customerowns twocopies of anyproduct. (Timeto increasethe marketingbudget!)
Record Counts with Summary Table
The Summary table stores only one record for each productthat a customer owns. In other words, if the customer ownsten copies of a particular product they have ten registrationrecords and one registration summary record. Therelationship between the number of records in the Summary
table and the Registrations table is data-dependent. Youneed to examine your data to .nd when a summary table isef.cient. Consider the range of possible values:
Possible summary table record counts
In cases where the record count is lower in Summary than inRegistrations, an equivalent join from Summary is faster. Notonly this, you can now .nd customers based on complexregistration conditions with simple indexed queries.
Lets look at how you would satisfy the requests listed at thebeginning of this example with and without the Summary
table.
-
7/27/2019 Normalization Summit Training
111/143
-
7/27/2019 Normalization Summit Training
112/143
T
-
7/27/2019 Normalization Summit Training
113/143
ASK
-
7/27/2019 Normalization Summit Training
114/143
1: A S
-
7/27/2019 Normalization Summit Training
115/143
IMPLE
-
7/27/2019 Normalization Summit Training
116/143
Q
-
7/27/2019 Normalization Summit Training
117/143
UERY
-
7/27/2019 Normalization Summit Training
118/143
WithoutSummarytable
WithSummarytable
QueryRegistrationsforSuperGoldProClassic.
QuerySummary forSuperGoldProClassic.
Join fromRegistrationsto Custom-ers.
Join fromSummary toCustomers.
WithoutSummarytable
WithSummarytable
QueryRegistrationsforSuperGoldPro
QuerySummary forrecords withSuperGoldProand three ormoreregistrations
SortRegistrationsbyCustomer_IDCreate anempty set inCustomersStep througheach recordin Registra-tions,counting up
how manycopies acustomer has.If a customerowns three ormore copies,add them toyour set.
Join fromSummary toCustomers.
Find customers who own the product SuperGoldPro Classic.
Steps required to complete task 1.
The steps are exactly the same in this example. Thedifference is that the Summary table never contains morerecords than the Registration table, and normally containsfar fewer. The speed of a join is dependent on the size of theinitial selection, so reducing the initial selection directlyreduces the time required to complete the join.
TASK2: A MORE COMPLEX QUERY
Find customers who own three or more copies of SuperGoldPro.
Steps required to complete task 2.
With the summary table in place it is no more dif.cult or timeconsuming to answer this more precise question. A simpletwo-.eld indexed query is followed by a join. An interfacethat gives end users access to this feature is easy toconstruct. Without a summary table you have to perform asequential operation on every matching registrationalaborious and time-consuming operation.
-
7/27/2019 Normalization Summit Training
119/143
-
7/27/2019 Normalization Summit Training
120/143
T
-
7/27/2019 Normalization Summit Training
121/143
ASK
-
7/27/2019 Normalization Summit Training
122/143
3: A C
-
7/27/2019 Normalization Summit Training
123/143
OMPOUND
-
7/27/2019 Normalization Summit Training
124/143
Q
-
7/27/2019 Normalization Summit Training
125/143
UERY
-
7/27/2019 Normalization Summit Training
126/143
WithoutSummarytable
WithSummarytable
QueryRegistrationsforSuperGold-Pro
Workgroup.
QuerySummary forSuperGoldProWorkgroupregistrations
with a reg-istration countof three ormore.
SortRegistrationsbyCustomer_ID.Create anempty set inCustomers.Step througheach recordin Regis-trations,counting uphow manycopies acustomerhas. If acustomerowns three ormore copies,add them toyour set.
Join fromSummary toCustomers.
Create a setof the foundcustomers
Create a set ofthe foundcustomers.
Create an
empty set inCustomersQueryRegistrationsforSuperGold-ProReportWriterPlus.
QuerySummary forSuperGoldProWorkgroupregistrationswith a reg-istration countof exactly two.
Step througheach recordin Regis-trations,counting uphow manycopies acustomerhas. If acustomerowns twocopiesexactly, thenadd them toyour set.
Join fromSummary toCustomers.
Create a set ofthe foundcustomers.
Intersect the
twoCustomersets to locate
Intersect the
two Customersets to locatethe correct
Find customers who own three or more copies ofSuperGoldPro Workgroup and exactly two copies ofSuperGoldPro Report WriterPlus.
Steps required to complete task 3.
-
7/27/2019 Normalization Summit Training
127/143
the correctcustomers.
customers.
Each of these solutions appears complex, but with thesummary table in place the routine needed is short(commands are shown without full parameters) and quick:
QUERY([Summary]) RELATE ONE SELECTION([Summary];[Customers]) CREATE SET([Customers])QUERY([Summary]) RELATE ONE SELECTION([Summary];[Customers]) CREATE SET([Customers]) INTERSECT USESET
The queries and joins are all indexed, non-sequentialoperations, and set operations are extremely fast regardlessof selection size. The alternative method, without thesummary table, requires error-prone custom programming,direct record access, sequential inspection of records, andexcessive network activity under 4D Server.
-
7/27/2019 Normalization Summit Training
128/143
-
7/27/2019 Normalization Summit Training
129/143
Notice in the previous examples that after the complicatedcalcula
-
7/27/2019 Normalization Summit Training
130/143
-
7/27/2019 Normalization Summit Training
131/143
tions needed without a summary table are completed,
-
7/27/2019 Normalization Summit Training
132/143
all the work is lost
-
7/27/2019 Normalization Summit Training
133/143
. You have found the correct customers for that query, but ifanother user wants the same query
-
7/27/2019 Normalization Summit Training
134/143
the entire process is repeated
-
7/27/2019 Normalization Summit Training
135/143
. If the same user wants a slightly different set of conditionsthey have to wait through another long query. This gives theuser the impres
-
7/27/2019 Normalization Summit Training
136/143
-
7/27/2019 Normalization Summit Training
137/143
sion that the system is slow, hard to work with, and hard toget data out of. By saving these calculations in a table forquicker and easier access you improve life for your users. Inaddition, you can now eas
-
7/27/2019 Normalization Summit Training
138/143
-
7/27/2019 Normalization Summit Training
139/143
ily generate a variety of reports on the summary data itselfusing the Quick Report editor, or any other system you like.
-
7/27/2019 Normalization Summit Training
140/143
TASK4: A TREND REPORT
Show a registration trend report that tells us for eachproduct how many customers have one copy registered,how many have two copies, and so on.
The easiest way to provide this report with a summary table
in place is with a Quick Report:
A sample Quick Report showing registration totals andtrends.
This report requires no code, extra calculations, or special
programming. All it requires is the time to sort and countthe records, like any sorted Quick Report. A user can create,modify, or export this simple report.
-
7/27/2019 Normalization Summit Training
141/143
-
7/27/2019 Normalization Summit Training
142/143
Cost of Maintaining a SummaryTable
-
7/27/2019 Normalization Summit Training
143/143
In order for a summary table to be useful itneeds to be accurate. This means you need toupdate summary records whenever source records change. In our example, this meansthat summary records need to be changed every time a registration is added, deleted, ormodi.ed. This task can be taken care of in the validation script of the record, in the Afterphase of the form (4D 3.x), in a trigger (preferred), or in batch mode. You must also makesure to update summary records if you change source data in your code. Keeping thedata valid requires extra work and introduces the possibility of coding errors, so youshould factor this cost in when deciding if you are going to use this technique.