DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd...

36
DATABASE DESIGN I - 1DL300 Spring 2011 An introductory course on database systems http://www.it.uu.se/edu/course/homepage/dbastekn/vt11/ 2011-02-16 1 Manivsakan Sabesan - UDBL - IT - UU Manivasakan Sabesan Uppsala Database Laboratory Department of Information Technology, Uppsala University, Uppsala, Sweden

Transcript of DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd...

Page 1: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

DATABASE DESIGN I - 1DL300

Spring 2011

An introductory course on database systems

http://www.it.uu.se/edu/course/homepage/dbastekn/vt11/

2011-02-16 1Manivsakan Sabesan - UDBL - IT - UU

Manivasakan SabesanUppsala Database Laboratory

Department of Information Technology, Uppsala University, Uppsala, Sweden

Page 2: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Introduction to Normalization

Elmasri/Navathe ch 14Padron-McCarthy/Risch ch 11

2011-02-16 2Manivasakan Sabesan - UDBL - IT - UU

Manivasakan Sabesan

Department of Information Technology

Uppsala University, Uppsala, Sweden

Page 3: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Normalization and problems in schema design

• Normalization - relational database design– Normalization of relational schemas or “we want to have good relations . . .”

– Functional dependencies

– Normal forms

• Problems in schema design

2011-02-16 3Manivsakan Sabesan - UDBL - IT - UU

• Problems in schema design – Unclear semantics

– Redundancy• Modification problems (updates, insertions deletions)

– Null values

– Spurious tuples

– Multi-valued dependencies

Page 4: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Clear Semantics of Relations

• Semantics of a relation – Meaning resulting from interpretation of attribute values in a tuple

• Easier to explain semantics of relation– Indicates better schema design

2011-02-16 4Manivsakan Sabesan - UDBL - IT - UU

– Indicates better schema design

Page 5: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Guideline 1

• Design relation schema so that it is easy to explain its meaning

• Do not combine attributes from multiple entity types and relationship types into a single relation

2011-02-16 5Manivsakan Sabesan - UDBL - IT - UU

Page 6: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Guideline 1 (cont’d.)

Ename Ssn Bdate Address Dnumber Dname Dmgr_ssn

EMP_DEPT

EMP_PROJ

2011-02-16 6Manivsakan Sabesan - UDBL - IT - UU

Ssn Pnumber Hours Ename Pname Plocation

EMP_PROJ

Page 7: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Example: design of a relational database

• To design a relational database we start from an E-R model and design a number of relational schemas as e.g.:

• emps(ename, salary, dept, dname, dept#, mgr)• supp_info(sname, iname, saddr, price)

2011-02-16 7Manivsakan Sabesan - UDBL - IT - UU

• supp_info(sname, iname, saddr, price)• items(iname, item#, dept)• customers(cname, addr, balance, o#, date)

• includes(o#, iname, quantity)

Page 8: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Ex. continues - bad design causes problems

• We have here chosen to combine the schemas for SUPPLIES and SUPPLIERS and exchange them with one schema that contain all information about suppliers:

SUPPLIERS(SNAME, SADDR)SUPPLIES(SNAME, INAME, PRICE)SUPP_INFO(SNAME, INAME, SADDR, PRICE)

• Due to this decision we will run into several problems with:

2011-02-16 8Manivsakan Sabesan - UDBL - IT - UU

• Due to this decision we will run into several problems with:– Unclear semantics– Redundancy– Modification anomalies

• Update anomalies• Insertion anomalies• Deletion anomalies

Page 9: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

RedundancySUPP_INFO(SNAME, INAME, SADDR, PRICE)

• Redundancy:– When each supplier supplies several products, the address will be repeated for each

product.

2011-02-16 9Manivsakan Sabesan - UDBL - IT - UU

– Storing natural joins of base relations leads to update anomalies

Page 10: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Modification anomalies SUPP_INFO(SNAME, INAME, SADDR, PRICE)

• Update anomaly - inconsistency– When you have redundant data you risk, for example, when updating the address in the

table, to miss updating all occurrences of the redundant values (here several occurrences of address values for the same supplier).

– The result will be that we get inconsistencies among data in the database.

• Insertion anomaly– We can not insert the address for a supplier that do not currently supply any product.

2011-02-16 10Manivsakan Sabesan - UDBL - IT - UU

– We can not insert the address for a supplier that do not currently supply any product.• One can of course assign null values INAME and PRICE, but the one must remember to

remove these null values later when additional information exist. Hence, no tractable solution!

– Furthermore, sinceINAME is part of the key for this relation this is not a good solution. It is probably not even allowed to assign null values to key attributes.

• Deletion anomaly– If all tuples are removed, that corresponds to items that a specific supplier provide,

then all information about that supplier will disappear.

Page 11: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

How to avoid these problems?

• In order to avoid all these problems, we divide the relation (table) in the following manner:

SUPPLIERS(SNAME, SADDR)SUPPLIES(SNAME, INAME, PRICE)

• Disadvantages: – To retrieve addresses for suppliers that provide bananas, we must perform an

2011-02-16 11Manivsakan Sabesan - UDBL - IT - UU

– To retrieve addresses for suppliers that provide bananas, we must perform an expensive join operation, SUPPLIERS * SUPPLIES, that we could get away with if we instead had the relation:SUPP_INFO(SNAME, INAME, SADDR, PRICE)

– The operations SELECT and PROJECT that is much cheaper will do the job in the latter case.

Page 12: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Guideline 2

• Design base relation schemas so that no update anomalies are present in the relations

• If any anomalies are present:– Note them clearly

– Make sure that the programs that update the database will operate correctly

2011-02-16 12Manivsakan Sabesan - UDBL - IT - UU

– Make sure that the programs that update the database will operate correctly

Page 13: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

NULL Values in Tuples

• many attributes can end up with many NULLs in a tuple

• Problems with NULLs– Wasted storage space

– Problems understanding meaning

2011-02-16 13Manivsakan Sabesan - UDBL - IT - UU

– Problems understanding meaning

Page 14: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Guideline 3

• Avoid placing attributes in a base relation whose values may frequently be NULL

• If NULLs are unavoidable:– Make sure that they apply in exceptional cases only, not to a majority of tuples

2011-02-16 14Manivsakan Sabesan - UDBL - IT - UU

Page 15: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Ename Plocation

Smith, John Uppsala

Ssn Pnumber Hours Ename Pname Plocation

123456789 1 32.5 Smith, John SSPI Uppsala

123456789 2 7.5 Smith, John Vortex Luleå

453453453 1 20.0 English, Joyce SSPI Uppsala

EMP_PROJ

EMP_LOCS

Spurious Tuples

2011-02-16 15Manivsakan Sabesan - UDBL - IT - UU

Smith, John Uppsala

Smith, John Luleå

English, Joyce Uppsala

Ssn Pnumber Hours Pname Plocation

123456789 1 32.5 SSPI Uppsala

123456789 2 7.5 Vortex Luleå

453453453 1 20.0 SSPI Uppsala

EMP_PROJ1

Page 16: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

EMP_LOCS

Ssn Pnumber Hours Pname Plocation

123456789 1 32.5 SSPI Uppsala

123456789 2 7.5 Vortex Luleå

453453453 1 20.0 SSPI Uppsala

EMP_PROJ1

Ename Plocation

Smith, John Uppsala

Smith, John Luleå

English, Joyce Uppsala

2011-02-16 16Manivsakan Sabesan - UDBL - IT - UU

Ssn Pnumber Hours Ename Pname Plocation

123456789 1 32.5 Smith, John SSPI Uppsala

453453453 1 20.0 Smith, John SSPI Uppsala

123456789 1 32.5 English, Joyce SSPI Uppsala

453453453 1 20.0 English, Joyce SSPI Uppsala

123456789 2 7.5 Smith, John Vortex Luleå

EMP_PROJ

453453453 1 20.0 SSPI Uppsala

Page 17: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Guideline 4

• Design relation schemas to be applied natural join that are appropriately related – Guarantees that no spurious tuples are generated

• Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations

2011-02-16 17Manivsakan Sabesan - UDBL - IT - UU

key) combinations

Page 18: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Normalization

• What principles do we have to follow to systematically end up with a good schema design.– In 1972, Ted Codd defined a set of conditions that put various requirements on a

relation.

– To fulfill these conditions, the relation schema is in several steps divided into smaller schemas.

2011-02-16 18Manivsakan Sabesan - UDBL - IT - UU

smaller schemas.

• This process is called normalization.

• We need to study the following concepts for performing the normalization: – Functional dependencies

– Normal forms for relations

Page 19: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Functional dependency (FD)

• Let R be a relation schema with attributes A1, ..., An and let X and Y be subsets of {A1, ..., An}.

• Let r(R) determine a relation (instance) of the schema R.

We say that X functionally determines Y, and we write

X ⇒ Y

2011-02-16 19Manivsakan Sabesan - UDBL - IT - UU

• The attribute set Y is said to be dependent of the attribute set X.

• There is a functional dependencybetween X and Y.

X ⇒ Y

if for every pair of tuples t1, t2 ∈ r(R) and for all r(R) the following hold:

If t1[X] = t2[X] it holds that t1[Y] = t2[Y]

Page 20: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Functional dependency - properties

• If X is a candidate key for R we have:– X ⇒ Y for all Y ⊆ {A1, ..., An}.– X ∩ Y do not need to be the empty set.

– Assume that R represents an N:1 relation between the entity types E1 and E2.

– It is further assumed that X ⊆ {A1, ..., An} constitutes a key for E1 and Y constitutes a key for E2,

2011-02-16 20Manivsakan Sabesan - UDBL - IT - UU

– Then we have the dependency X ⇒ Y but not Y ⇒ X.

– If R would be a 1:1 relation, then it also holds that Y ⇒ X.

• A functional dependency is a property of R and not of any relation r(R). The semantic for the participating attributes in a relation decides if there exists a FD or not.

Page 21: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Armstrong’s axioms(inference rules)

• 1. If X ⊇ Y, then X → Y (reflexive rule )

• 2. If X → Y, then XZ → YZ (augmentation rule)

• 3. If X → Y and Y → Z, then X → Z (transitive rule)

2011-02-16 21Manivsakan Sabesan - UDBL - IT - UU

Additional rules:

• 4. If X → YZ, then X → Y (decomposition, or projection, rule)

• 5. If X → Y and X → Z, then X → YZ (union, or additive, rule)

• 6. If X → Y and WY → Z, then WX → Z (psuedotransitive rule)

Page 22: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Example

• The simplest form of FD is between the key attributes and the other attributes in a schema:

{SNAME} ⇒ {SADDR} SUPPLIERS{SNAME, INAME} ⇒ {PRICE} SUPPLIES{CNAME} ⇒ {CADDR, BALANCE} CUSTOMERS{SNAME} ⇒ {SNAME}

2011-02-16 22Manivsakan Sabesan - UDBL - IT - UU

{SNAME} ⇒ {SNAME}

• A slightly more complex FD:{SNAME, INAME} ⇒ {SADDR, PRICE}

– This dependency is the combination of the first two above. We are able to combine these two because we understand the concepts supplier, item, address and price and the relationships among them.

Page 23: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Full functional dependency - FFD

• A functional dependency X ⇒ Y is termed FFD if there is no attribute A ∈ X such that (X- {A}) ⇒ Y holds.

• In other words, a FFD is a dependency that do not contain any unnecessary attributes in its determinant (i.e. the left-hand side of the dependency).

SUPP_INFO(SNAME, INAME, SADDR, PRICE)

2011-02-16 23Manivsakan Sabesan - UDBL - IT - UU

{SNAME, INAME} ⇒ {PRICE} FFD{SNAME, INAME} ⇒ {SADDR} Not FFD{SNAME} ⇒ {SADDR} FFDcompare: f(x,y) = 3x + 2 Not FFD

f(x) = 3x + 2 FFD

Page 24: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Prime attribute

• Definition: an attribute that is a member in any of the candidate keys is called a prime attribute of R.

• For example:– In the schema R(CITY,STREET,ZIPCODE) all attributes are prime attribute,

since we have the dependencies CITY,STREET⇒ ZIPCODE and ZIPCODE⇒CITY and both CITY,STREET and STREET,ZIPCODE are keys.

2011-02-16 24Manivsakan Sabesan - UDBL - IT - UU

CITY and both CITY,STREET and STREET,ZIPCODE are keys.

– In the schema R(A,B,C,D) with the dependencies AB ⇒ C, B ⇒ D and BC ⇒A, are both AB and BC candidate keys. Therefore are A, B and C prime attribute but not D.

• Attributes that is not part of any candidate key is called non-prime or non-key attributes.

Page 25: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

First normal form - 1NF

• Only atomic values are allowed as attribute values in the relational model.– The relations that fulfill this condition are said to be in 1NF

Ssn Ename Pnumber Hours

1234 Smith 1 12

2011-02-16 25Manivsakan Sabesan - UDBL - IT - UU

1234 Smith 1 12

2 7

4534 Wong 3 40

2 26

Page 26: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Second normal form - 2NF

• A relation schema R is in 2NF if:– It is in 1NF

– Every non-key attribute A in R is FFD of each candidate key in R.

2011-02-16 26Manivsakan Sabesan - UDBL - IT - UU

Page 27: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Third normal form - 3NF

• A relation schema R is in 3NF if:– It is in 2NF

– No non-key attribute A in R is allowed to be FFD of any other non-key attribute.

2011-02-16 27Manivsakan Sabesan - UDBL - IT - UU

Page 28: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Example

2011-02-16 28Manivsakan Sabesan - UDBL - IT - UU

Page 29: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Example

2011-02-16 29Manivsakan Sabesan - UDBL - IT - UU

Page 30: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Boyce -Codd normal form - BCNF

• A relation schema R is in BCNF if:– It is in 1NF

– Every determinant X is a candidate key.

• The difference between BCNF and 3NF is that in BCNF, a prime attribute can not be FFD of a non-key (or non-prime) attribute or of a prime attribute (i.e. a partial

2011-02-16 30Manivsakan Sabesan - UDBL - IT - UU

key). Therefore BCNF is a more strict condition.

Page 31: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Example_BCNF

2011-02-16 31Manivsakan Sabesan - UDBL - IT - UU

Page 32: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Additional normal forms

• There are a number of other normal forms available

• These explores additional data dependencies such as:– multi-valued dependencies (4NF)

– join dependencies (5NF)

2011-02-16 32Manivsakan Sabesan - UDBL - IT - UU

Page 33: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Multi-valued dependencies

• Consider the relation classes(course, teacher, book)

course

databasedatabasedatabase

teacher

ToreTore

Martin

book

ElmasriUllmanElmasri

2011-02-16 33Manivsakan Sabesan - UDBL - IT - UU

• The book attribute specifies required literature independent of teacher.

• The relation fulfills BCNF, but still every teacher must be supplied two times.

database database

operating systemoperating system

Martin Martin MartinMartin

ElmasriUllman

SilberschatzShaw

Page 34: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Lossless join

• We decompose a schema R with functional dependencies D into schemas R1 and R2

• R1 and R2 is called a lossless-join decomposition (in relation to D) if R1 * R2 contain the same information as R.– no spurious tuples are generated when we perform the join between the decomposed

relations.

2011-02-16 34Manivsakan Sabesan - UDBL - IT - UU

relations.

– no information is lost when the relation is decomposed.

Page 35: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

3NF/BCNF

• BCNF is a strong condition. It is not always possible to transform (through decomposition) a schema to BCNF and keep the dependencies.

• 3NF has the most of BCNF's advantages and can still be fulfilled without giving up dependencies or the lossless-join property for the relation schema.

• Observe that 3NF admits some sort of redundance that BCNF do not allow. BCNF

2011-02-16 35Manivsakan Sabesan - UDBL - IT - UU

• Observe that 3NF admits some sort of redundance that BCNF do not allow. BCNF requires that one eliminates redundance caused by functional dependencies.

Page 36: DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd normal form ... – It is in 1NF – Every determinant X is a candidate key.Authors: Vincent

Normal forms and normalization• We assume we have an enterprise that buys products from different supplying

companies, and we would like to keep track of our data by means of a database.

• We would like to keep track of what kind of products (e.g. cars) we buy, and from which supplier (e.g. Volvo) we buy them. We can buy each product from several suppliers.

• Further, we want to know the price of the products. The price of a product is naturally dependent on which supplier we bought it from.

2011-02-16 36Manivsakan Sabesan - UDBL - IT - UU

• We would also like to store the address of each supplier, i.e. the city where the supplier is located. Here, we assume that each supplier is only located at one place.

• Perhaps we suddenly need to order a large number of cars from Volvo. It can then be good to know how many people that live in the city where Volvo is located. Thus, we can know if Volvo can employ a sufficient amount of people to produce the cars. Hence we also store the population for each city.