User Profiling in Ego-network : Co-profiling Attributes and Relationships
Richard Merritt1 Data Modelling Entities, Attributes and Relationships.
-
Upload
iris-freeman -
Category
Documents
-
view
213 -
download
0
Transcript of Richard Merritt1 Data Modelling Entities, Attributes and Relationships.
1Richard Merritt
Data Data ModellingModelling
Entities,
Attributesand
Relationships
2Richard Merritt
Data ModellingData Modelling Technique for describing
information structures Information models represent:
things - entities properties of things - attributes associations between things -
relationships
3Richard Merritt
EntitiesEntities Abstractions of real world things
e.g. CUSTOMERdoes not relate to specific customers
...any distinguishable person, place, thing, event or concept about which information is kept. (Bruce 1992)
Specificcustomer
A
4Richard Merritt
AttributesAttributes The elements of data belonging to an
entity are known as its attributes
Customer
A/C No.NameAddressTel No.Credit limit
5Richard Merritt
RelationshipsRelationships Imagine two entities:
Lecturer and Student Lecturers teach students Teaching is the “relationship”
between the two abstract entities
6Richard Merritt
Logical Data ModelLogical Data Model
An Entity Type has a set of attributes
e.g. Customer has attributes ofAccount Number
Name Address
Telephone NumberCredit Limit
7Richard Merritt
Logical Data ModelLogical Data ModelAn Entity Type may have a number of occurrences.
Each Entity Occurrence has a unique set of values for the attributes.
CB
CustomerEntity Type
A/C No.NameAddressTel No.Credit limit
A
CustomerEntity Occurrences
8Richard Merritt
Logical Data ModelLogical Data Model
Customer EntityAttribute ValueAccount Number BL032
Name Bloggs & Son
Address 117 Acacia RdBirmingham 7
Telephone Number 0121 345678Credit Limit £2500
9Richard Merritt
A Table or RelationA Table or Relation
Reg. No. Surname Forename
45632
45315
44355
45422
Jones David
James Sarah
Jones Mary
Hill Simon
Each row of the table is unique.
10Richard Merritt
Entities, Tables & Entities, Tables & RelationsRelations
An Entity Type is represented as a Table (Relation)
Each Row (Tuple) of the Table is an Occurrence of the Entity
Each Column (Domain) of the Table contains the Values of one Attribute of the Entity
11Richard Merritt
Physical Data Physical Data OrganisationOrganisation
An Entity Type is usually implemented as a File in the Physical Storage Medium
Each Entity Occurrence is a Record in the File
The Value of an Attribute of the Entity Occurrence is stored in a Field within the Record
12Richard Merritt
Physical Physical OrganisationOrganisation
physical mediumfile
record
field
13Richard Merritt
one record
order no.customer
itemqty.
itemqty.
itemqty.
itemqty.
order no.customer
repeating fields
Repeating Attributes / Repeating Attributes / FieldsFields
Are repeating attributes (fields)
really attributes of this entity?
OrderFile / Entity
14Richard Merritt
Attributes / Fields of an Attributes / Fields of an OrderOrder
only one item
but …
One Entity Occurrence / Record
Order Customer Item QuantityNumber
1234 Ace XY345 8
Another Occurrence / Record
Order Customer Item 1 Quantity 1 Item 2 Quantity2...........Number
2156 Williams CCD3 2 TR805 5 ...........
15Richard Merritt
Entities or Entities or Attributes?Attributes?
When is data an Attribute of
an Entityand when is it a separate Entity?
Can one Entity ever be considered to be simply Attributes of another?
16Richard Merritt
Diagrammatic Diagrammatic RepresentationRepresentation
customer
Symbols for entities
hard box
soft box
student
17Richard Merritt
Diagrammatic Diagrammatic RepresentationRepresentation
order
Relationship between entities
item
crow’s footone
order can be
for many items
master
detail
18Richard Merritt
Degrees of Degrees of RelationshipRelationship
order
delivery
one to one
order
delivery
one to many
order
delivery
many to many
19Richard Merritt
Optional Optional RelationshipsRelationships
An Order must be for a
Customer
Customer
Sales Order
but
a Customer may not have any orders
optional at the customer end
20Richard Merritt
Consultant SystemConsultant System
A Client has an Account Consultants have a Grade and a
number of Skills Consultants are active on various
Projects Each Project is for one Account
21Richard Merritt
Possible Entities for Consultant Possible Entities for Consultant SystemSystem
Grade
Consultant
SkillProject
Account
Client
22Richard Merritt
Grade
Consultant SkillProject
Account
Client
Initial Attempt at Initial Attempt at RelationshipsRelationships
23Richard Merritt
Resolving One to One Resolving One to One RelationshipsRelationships
Client Account
The account is an attribute of the client and NOT a separate
entity.
Simply merge the entities which have a one to one relationship.
A Client can only have one Account.
24Richard Merritt
One to one relationships One to one relationships resolvedresolved
Grade
Consultant SkillProject
Client
25Richard Merritt
Resolving Many to Many Resolving Many to Many RelationshipsRelationships
Actor Scene
What is the nature of the relationship between an Actor
and a Scene?
Difficult to implement so how can they be replaced?
26Richard Merritt
Resolving Many to Many Resolving Many to Many RelationshipsRelationships
Actor and Scene are both masters to the new linking
entity of Appearance.
Actor Scene
Appearance
Create a linking entity which is a detail to both the original
entities.
27Richard Merritt
Resolving Many to Many Resolving Many to Many RelationshipsRelationships
What would make suitable entities and what attributes
might they have?
Patient Drug
?
Consultant Skill
?
28Richard Merritt
Many to many relationships Many to many relationships resolved resolved
Grade
Consultant Skill
Consultant SkillProject
Activity
ClientAvoid crossed relationshipsResuscitate dead crows!
29Richard Merritt
Developing a Developing a Logical Data Logical Data
StructureStructure identify possible entities
draw initial entity relationship diagram
resolve 1:1 and many:many relationships & check for further entities and relationships
remove redundant relationships
show optionality
30Richard Merritt
Cross-checking the Cross-checking the LDSLDS
The LDS is derived using a “Top Down” approach.
It can be cross-checked by using a “Bottom Up” approach, building up the entities from their
attributes. This technique is called Normalisation
which is the subject of the next lecture.
31Richard Merritt
Relational Data Relational Data AnalysisAnalysis
Normalisation
With thanks to Codd & Date.
32Richard Merritt
Attributes / Fields of an Attributes / Fields of an OrderOrder
only one item
but …
One Entity Occurrence / Record
Order Customer Item QuantityNumber
1234 Ace XY345 8
Another Occurrence / Record
Order Customer Item 1 Quantity 1 Item 2 Quantity2...........Number
2156 Williams CCD3 2 TR805 5 ...........
33Richard Merritt
NormalisationNormalisationobjectivesobjectives
to reduce data redundancy
to hold each data item (attribute) with as few occurrences as possible
to identify and remove any dependencies between data items stored together (in the same table)
34Richard Merritt
A RelationA RelationA two-dimensional A two-dimensional
tabletable
row (tuple)
Reg. No. Surname Forename4563245315
4435545422
Jones DavidJames SarahJones MaryHill Simon
column (attribute)
unique primary key
35Richard Merritt
A Compound A Compound KeyKey
Consultant No. Project No. Time (days)
001001003003
C232C979A176C232
3.010.55.0
20.0
36Richard Merritt
Attributes Attributes Listed Listed
Consultant No.
Project No.
Time (days)
Reg. No.
Surname
Forename
key attribute underlined
compound key
37Richard Merritt
Un-Normalised Un-Normalised DataData
Consultant DetailsNo. Name Address Grade Salary Scale
004 Mary Wheeler 236 Fore Street D S2
Ivybridge
DevonSkills
Code Description Qualification
SK01 Accounting IMA
SK10 CAD / CAM 3 yrs. AutoCAD
SK15 SSADM V4 Certificate
38Richard Merritt
AssumptionsAssumptions Consultant No., Skill Code and Grade are
unique. A consultant can have many skills each
identified by a Skill Code. For each skill only the consultant’s highest
Qualification is recorded. Other consultants may have the same skills
(and Skill Code) but not necessarily the same Qualification.
Each Skill Code has one Description. Each Grade belongs to one Salary Scale.
39Richard Merritt
Un-Normalised Form Un-Normalised Form (UNF)(UNF)
Consultant No.
Name
Address
Grade
Salary Scale
Skill Code
Description
Qualification
key should uniquely identify a row in the table
40Richard Merritt
UNFUNF
list all data attributes allocate primary key identify repeating
group(s)(optional)
UNF
Consultant No.NameAddressGradeSalary Scale (Skill Code Description Qualification)
41Richard Merritt
First Normal Form (1 First Normal Form (1 NF)NF)RuleRule
remove repeating data
Consultant No.
Name
Address
Grade
Salary Scale
Skill Code
Description
Qualification
42Richard Merritt
First Normal Form (1 First Normal Form (1 NF)NF)
Consultant No.
Name
Address
Grade
Salary Scale
Consultant No.
Skill Code
Description
Qualification
The compound key of the new table which contains the repeating group consists of the original key plus the attribute(s) which uniquely identify a single set of repeating values given a single value of the original key.
43Richard Merritt
1NF1NF separate repeating
group copy non-repeating
group unchanged add initial primary
key to repeating group & identify compound key
UNF
Consultant No.NameAddressGradeSalary Scale (Skill Code Description Qualification)
1NF
Consultant No.NameAddressGradeSalary Scale
Consultant No.Skill CodeDescriptionQualification
44Richard Merritt
Second Normal Form (2 Second Normal Form (2 NF)NF)RuleRule
remove part-key dependencies
Consultant No.
Skill Code
DescriptionQualification
45Richard Merritt
Second Normal Form Second Normal Form (2 NF)(2 NF)
Consultant No.
Name
Address
Grade
Salary Scale
Consultant No.
Skill Code
Qualification
Skill Code
Description
46Richard Merritt
2NF2NF
separate part-key dependencies
all other groups are copied across unchanged
do not omit key only groups
1NF
Consultant No.NameAddressGradeSalary Scale
Consultant No.Skill CodeDescriptionQualification
2NF
Consultant No.NameAddressGradeSalary Scale
Consultant No.Skill CodeQualification
Skill CodeDescription
47Richard Merritt
Third Normal Form (3 Third Normal Form (3 NF)NF)RuleRule
remove inter-data (and inter-key) dependenciesConsultant No.
Name
Address
Grade
Salary Scale
48Richard Merritt
Inter-Data Inter-Data DependencyDependency
Salary Scale Grade
S1
S2
S3
ABCDEFG
49Richard Merritt
Third Normal Form (3 Third Normal Form (3 NF)NF)
Consultant No.
Name
Address
Grade
Grade
Salary Scale
Consultant No.
Skill Code
Qualification
Skill Code
Description
*
* denotes a Foreign Key
50Richard Merritt
3NF3NF separate inter-data
(non-key) dependencies
identify foreign keys
2NF
Consultant No.NameAddressGradeSalary Scale
Consultant No.Skill CodeQualification
Skill CodeDescription
3NF
Consultant No.NameAddressGrade
GradeSalary Scale
Consultant No.Skill CodeQualification
Skill CodeDescription
*
51Richard Merritt
Normalisation of Consultant Normalisation of Consultant DataData
1NF
Consultant No.NameAddressGradeSalary Scale
Consultant No.Skill CodeDescriptionQualification
2NF
Consultant No.NameAddressGradeSalary Scale
Consultant No.Skill CodeQualification
Skill CodeDescription
3NF
Consultant No.NameAddressGrade
GradeSalary Scale
Consultant No.Skill CodeQualification
Skill CodeDescription
*
UNF
Consultant No.NameAddressGradeSalary Scale (Skill Code Description Qualification)
52Richard Merritt
NormalisationNormalisation StagesStages
UNF - list attributes & allocate primary key
1NF - remove repeating groups
2NF - remove part key dependencies
3NF - remove inter-data dependencies
53Richard Merritt
Third Normal Form Third Normal Form TestsTests
Given a value for the key of a 3NF relation, is there only one possible occurrence of the associated data (row)?
Is each attribute (column) of the relation dependent on the key, the whole key and nothing but the key?
54Richard Merritt
Consultant No.NameAddressGrade
Grade
Salary Scale
*
CONSULTANT
GRADE
Consultant No.
Skill Code
Qualification
Skill Code
Description
CONSULTANT SKILL
SKILL
Relation (Table) Relation (Table) NamesNames
55Richard Merritt
Summary of the Summary of the NormalisationNormalisation
UNF 1NF 2NF 3NF
CONSULTANT
Consultant No.NameAddressGradeSalary Scale
CONSULTANT SKILL
Consultant No.Skill CodeDescriptionQualification
CONSULTANT
Consultant No.NameAddressGradeSalary ScaleSkill CodeDescriptionQualification
CONSULTANT
Consultant No.NameAddressGradeSalary Scale
CONSULTANT SKILL
Consultant No.Skill CodeQualification
SKILL
Skill CodeDescription
CONSULTANT
Consultant No.NameAddressGrade
GRADE
GradeSalary Scale
CONSULTANT SKILL
Consultant No.Skill CodeQualification
SKILL
Skill CodeDescription
*
56Richard Merritt
3NF Entity-Relationship 3NF Entity-Relationship DiagramDiagram
3NF Relations are Entities
Entities / Relations are linked by their common attributes
Relational Model
57Richard Merritt
Relationships from Relationships from 3NF3NF
Entities are linked by their keysKeys and Foreign Keys are the common attributes.
An Entity / Relation with a Compound Key is a DetailIts Masters are Relations that have as their Key a part of the
Compound Key
An Entity / Relation with a Foreign Key is a DetailIts Master is the Relation which has the Foreign Key as its whole
Key
58Richard Merritt
3NF Relations are 3NF Relations are EntitiesEntities
Consultant Skill
Grade
Consultant Skill
59Richard Merritt
Consultant Skill
Grade
Consultant Skill
Compound Keys are Compound Keys are DetailsDetails
master Skill Code
masterConsultant No.
detail withcompound key
Consultant No.
Skill Code
60Richard Merritt
Foreign Keys are Foreign Keys are DetailsDetails
Consultant Skill
Grade
Consultant Skill
masterGrade
detail with
foreign key
* Grade
61Richard Merritt
Cross-checking the Cross-checking the LDSLDS
The LDS is derived using a “Top Down” approach.
Normalisation is used to cross-check the LDS by using a “Bottom Up” approach, building up the entities and relationships from their attributes.
62Richard Merritt
Compare to LDSCompare to LDSGrade
Consultant Skill
Consultant SkillProject
Activity
Client
63Richard Merritt
Why are the diagrams not the same?What do they have in common?What are the differences? Why?
What do we need to resolve the
differences?
Normalise other documents / data sources.
LDS versus 3NF LDS versus 3NF Entity Relationship Entity Relationship
DiagramDiagram
64Richard Merritt
What is the What is the key?key?
Activity No
12121
Project No
C232C232C979C979A176
Time (days)
0.11.50.32.01.0
Activity Type
write letterdraw DFDinterviewplan projectvisit site
Activity No. is only unique within one project.
Consultant No
003001004004002
65Richard Merritt
A Composite A Composite KeyKey
Project No.
Activity No.
Consultant No.
Activity Type
Time (days)
composite key
a key of more than one attribute where part of the key does not have a unique meaning
66Richard Merritt
Data ModellingData Modelling Derive an LDS - “top down” Normalise all relevant data (documents etc.)
apply 3NF tests Merge relations with the same key
check for synonyms and homonymsapply 3NF tests again
Give each Relation (Entity) a namenames should be consistent with LDS
Draw a 3NF Entity Relationship Diagram Compare LDS & 3NF ERD and resolve
differences
67Richard Merritt
Another Another NormalisationNormalisation
A sample purchase order document.
Assumptions
Identify data items and work out rules and assumptions. It is critical to document the rules or assumptions of the data prior to normalisation. Assumptions may change over the life of the system.
Supplier No., Purchase Order No. and Part No. are unique.Cost Each depends only on Part No.
Supplier No. 54Name Wizcorp Ltd.
Purchase Order No. 46Order Date 15/2/99
Description Cost each Quantity
stainless bolt 10mm x 4cmstainless nut 10mm
3m x 8cm stainless round
0.500.20
82.50
100100
5
Part No.345763457934121
68Richard Merritt
Normalisation of Purchase Normalisation of Purchase
OrderOrder 1NF
Purchase Order No.DateSupplier No.Name
Purchase Order No.Part No.DescriptionCost eachQuantity
2NF
Purchase Order No.DateSupplier No.Name
Purchase Order No.Part No.Quantity
Part No.DescriptionCost each
UNF
Purchase Order No.DateSupplier No.Name (Part No. Description Cost each Quantity)
3NF
*
Purchase Order No.DateSupplier No.
Supplier No.Name
Purchase Order No.Part No.Quantity
Part No.DescriptionCost each
69Richard Merritt
Purchase Order Purchase Order Document 3NFDocument 3NF
*
Purchase Order No.DateSupplier No.
Supplier No.Name
Purchase Order No.Part No.Quantity
Part No.DescriptionCost each
Purchase Order
Supplier
Purchase Order Item
Part
apply 3NF tests
name entities (relations)
70Richard Merritt
Assumptions
Customer No, Order No and Part No are unique but Item is only unique within one order. Price each depends on Part No.. The same Part No can be used for more then one Item on the same order (but the Date Required will vary).
Sales OrderSales OrderCustomer No 54
Name J. Thomas
Order No 76
Date 5/2/99
Description
widget
assembly
rod
Price each
1.50
3.20
1.20
Part No
41235
23467
89965
Value
6.00
12.80
7.20
Item
1
2
3
Order Total 58.00
Date Required
4 23467 assembly 3.20
Quantity
4
4
6
10
32.00
19/2/99
19/2/99
19/2/9926/3/99
71Richard Merritt
Normalisation of Sales Normalisation of Sales
OrderOrder UNF
Sales Order No.DateCustomer No.NameOrder Total (Item Part No. Description Price each Quantity Value Date Required)
1NF
Sales Order No.DateCustomer No.NameOrder Total
Sales Order No.ItemPart No.DescriptionPrice eachQuantityValueDate Required
( )
2NF
Sales Order No.DateCustomer No.NameOrder Total
Sales Order No.ItemPart No.DescriptionPrice eachQuantityValueDate Required
)(
3NF
*
Sales Order No.DateCustomer No.Order Total
Customer No.Name
Sales Order No.ItemPart No.QuantityValueDate Required
Part No.DescriptionPrice each
( )*
72Richard Merritt
Sales Order Document Sales Order Document 3NF3NF
Customer No.Name
Part No.DescriptionPrice each
Customer
Part
Sales Order
Sales Order Item
*
Sales Order No.DateCustomer No.Order Total
Sales Order No.ItemPart No.QuantityValueDate Required
*
( )
73Richard Merritt
Merged 3NFsMerged 3NFsPURCHASE ORDER
Purchase Order No.Purchase Order DateSupplier No.
SUPPLIER
Supplier No.Supplier Name
PURCHASE ORDER ITEM
Purchase Order No.Part No.Quantity
PARTPart No.DescriptionCost eachPrice each
*
SALES ORDER
Sales Order No.Sales Order DateCustomer No.Order Total
CUSTOMER
Customer No.Customer Name
SALES ORDER ITEM
Sales Order No.ItemPart No.QuantityValueDate Required
*
*
( )
74Richard Merritt
3NF Relations are 3NF Relations are EntitiesEntities
Sales Order Part
Customer
Sales OrderItem
Supplier
Purchase OrderItem
Purchase Order
75Richard Merritt
Sales Order Part
Customer
Sales OrderItem
Supplier
Purchase OrderItem
Purchase Order
Compound Keys are Compound Keys are DetailsDetails
masterPurchase Order No.
masterPart No.
detail withcompound keyPurchase Order No.Part No.
composite keySales Order No.Item
not compound
( )
76Richard Merritt
Sales Order Part
Customer
Sales OrderItem
Supplier
Purchase OrderItem
Purchase Order
Foreign Keys are Foreign Keys are DetailsDetails
Sales Order No.is foreign key
77Richard Merritt
LDS from Tutorial LDS from Tutorial ExampleExample
Stock Item
Supplier
Purchase Order
W arehouseSection Customer
Sales Order
DespatchPurchase OrderItem
Sales OrderItem
Despatch Item
78Richard Merritt
LDS versus 3NF LDS versus 3NF Entity Relationship Entity Relationship
DiagramDiagram
What are the differences?
What do we need to resolve these?
Normalise a despatch document.