Year 11 DATA PROCESSING 1st Term

83
Subject : DATA PROCESSING Term: 1ST Session :2014-2015 School: CHRISLAND HIGH SCHOOL IKEJA Class : YEAR 11 Educator : ISAAC-JOSEPH O. O.

Transcript of Year 11 DATA PROCESSING 1st Term

Page 1: Year 11 DATA PROCESSING 1st Term

HOME

Subject : DATA PROCESSINGTerm: 1ST Session :2014-2015 School: CHRISLAND HIGH SCHOOL IKEJAClass : YEAR 11Educator : ISAAC-JOSEPH O. O.

Page 2: Year 11 DATA PROCESSING 1st Term

HOME

SCHEME OF WORK.Week 1: Data modelsWeek 2: Data modellingWeek 3: NormalizationWeek 4: NormalizationWeek 5: Database using Microsoft AccessWeek 6: Mid term Week 7: Data modelsWeek 8: Relational modelWeek 9: File organisationWeek 10: RevisionWeek 10: End of term examination.

Page 3: Year 11 DATA PROCESSING 1st Term

HOME

WEEK 1

DATA MODELS

Page 4: Year 11 DATA PROCESSING 1st Term

HOME

Data typesWhen setting up a database, one needs to think about the 'data type' which to be used for each field.The most common data types are:1. Alphanumeric/text2. Numeric 3. Date and time4. Currency5. Boolean/logical6. Auto number

Page 5: Year 11 DATA PROCESSING 1st Term

HOME

Alphanumeric or Text

This allows you to type in text, numbers and symbolsExamples:• Name: James• Surname: Smith• Address: 73, High Street• Postcode: CV34 5TR• Car Registration: EP06 5TV• Telephone Number: 01926 123456*

Page 6: Year 11 DATA PROCESSING 1st Term

HOME

NumberThis allows a whole number or a decimal number.Only numbers can be entered, no letters or symbolsExamples: 1521.35

CurrencyThis automatically formats the data to have a £ or $ or Euro symbol in front of the data and also ensures there are two decimal places.Examples:=N=50£5.75$54.99

Page 7: Year 11 DATA PROCESSING 1st Term

HOME

Date/TimeThis restricts data entry to 1-31 for day (28 or 30 in appropriate months) and 1-12 for month.It checks that a date can actually exist, for example, it would not allow 31/02/06 to be entered.It formats the data into long, medium or short date/timeExamples:• Long Date: 20 February 2006• Medium Date: 20-Feb-06• Short Date: 20/02/06• Long Time: 18:21:35• Medium Time: 06:21 PM• Short Time: 18:21

Page 8: Year 11 DATA PROCESSING 1st Term

HOME

AUTONUMBERThis datatype will automatically increase by 1 as records are added to the database1, 2, 3, 4, 5, …….Logical, Boolean, Yes/NoThis datatype is often referred to as different things, you may hear it called 'logical', or ‘Boolean' or 'yes/no'.All it means is that the data is restricted to one of only two choicesExamples:• Yes/No• Male/Female• Hot/Cold• On/Off

Page 9: Year 11 DATA PROCESSING 1st Term

HOME

This datatype is often referred to as different things, you may hear it called 'logical', or 'boolean' or 'yes/no'.All it means is that the data is restricted to one of only two choicesExamples:• Yes/No• Male/Female• Hot/Cold• On/Off

Page 10: Year 11 DATA PROCESSING 1st Term

HOME

Assignment

Give examples of the following types of data:1. Numeric2. Alphanumeric3. Date and time

Page 11: Year 11 DATA PROCESSING 1st Term

HOME

WEEK 2

DATA MODELLING

Page 12: Year 11 DATA PROCESSING 1st Term

HOME

PROCESS AND DATA MODELLING• Process modelling: Involves the design of the different

modules of the system, each of which is a process with clearly defined inputs and outputs and a transformation process. Dataflow diagrams are often used to define processes in the system.• Data modelling: Data modelling involves considering how to

represent data objects within a system, both logically and physically. The entity relationship diagram is used to model the data.

Page 13: Year 11 DATA PROCESSING 1st Term

HOME

A data model can be thought of as a diagram or flowchart that illustrates the relationships between data. Although capturing all the possible relationships in a data model can be very time-intensive, it's an important step and shouldn't be rushed. Well-documented models allow stake-holders to identify errors and make changes before any programming code has been written.

DATA MODELLING

Page 14: Year 11 DATA PROCESSING 1st Term

HOME

Components of A Data ModelThe data model gets its inputs from the planning and analysis stage. Here the modeler, along with analysts, collects information about the requirements of the database by reviewing existing documentation and interviewing end-users.The data model has two outputs. The first is an entity-relationship diagram which represents the data structures in a pictorial form.

Page 15: Year 11 DATA PROCESSING 1st Term

HOME

IMPORTANCE OF DATA MODELLINGThe goal of the data model is to make sure that all the data objects required by the database are completely and accurately represented. Because the data model uses easily understood notations and natural language , it can be reviewed and verified as correct by the end-users.

Page 16: Year 11 DATA PROCESSING 1st Term

HOME

SummaryA data model is a plan for building a database. To be effective, it must be simple enough to communicate to the end user the data structure required by the database yet detailed enough for the database design to use to create the physical structure.

Page 17: Year 11 DATA PROCESSING 1st Term

HOME

WEEK 3 & 4

NORMALIZATION IN DATABASES

Page 18: Year 11 DATA PROCESSING 1st Term

HOME

What is Normalization?Unnormalised data exists in flat filesNormalization is the process of moving data into related tables It is the process of organizing the fields and tables of a

relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them.

Normalization works through a series of stages called normal forms:

• FIRST NORMAL FORM (1NF)• SECOND NORMAL FORM (2NF)• THIRD NORMAL FORM (3NF)

Page 19: Year 11 DATA PROCESSING 1st Term

HOME

First normal form (1NF)First Normal Form is defined in the definition of relations (tables) itself. This rule defines that all the attributes in a relation must have atomic domains. The values in an atomic domain are indivisible units.

We re-arrange the relation (table) as below, to convert it to First Normal Form.

Each attribute must contain only a single value from its pre-defined domain.

Page 20: Year 11 DATA PROCESSING 1st Term

HOME

A design that complies with 1NFA design that is unambiguously in first normal form makes use of two tables: a Customer Name table and a Customer Telephone Number table.Customer name

Customer telephone number

Customer ID First Name Surname

123 Robert Ingram456 Jane Wright789 Maria Fernandez

Customer ID Telephone Number123 555-861-2025

456 555-403-1659

456 555-776-4100

789 555-808-9633

Page 21: Year 11 DATA PROCESSING 1st Term

HOME

Second normal form (2NF)• Before we learn about the second normal form, we need to understand the following −• Prime attribute − An attribute, which is a part of the prime-key, is known as a prime

attribute.• Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a non-

prime attribute.A table is in 2NF if and only if it is in 1NF and every most important attribute of the table is dependent on the whole of a candidate key.If we follow second normal form, then every non-prime attribute should be fully functionally dependent on prime key attribute. That is, if X → A holds, then there should not be any proper subset Y of X, for which Y → A also holds true.

Page 22: Year 11 DATA PROCESSING 1st Term

HOME

2nd Normal Form ExampleConsider the following example:

This table has a composite primary key [Customer ID, Store ID]. The non-key attribute is [Purchase Location]. In this case, [Purchase Location] only depends on [Store ID], which is only part of the primary key. Therefore, this table does not satisfy second normal form.

Page 23: Year 11 DATA PROCESSING 1st Term

HOME

To bring this table to second normal form, we break the table into two tables, and now we have the following:What we have done is to remove the partial functional dependency that we initially had. Now, in the table [TABLE_STORE], the column [Purchase Location] is fully dependent on the primary key of that table, which is [Store ID].

Page 24: Year 11 DATA PROCESSING 1st Term

HOME

Third Normal Form (3NF)For a relation to be in Third Normal Form, it must be in Second Normal form and the following must satisfy • No non-prime attribute is transitively dependent on

prime key attribute.• For any non-trivial functional dependency, X → A, then

either − X is a super key or, A is prime attribute.

Page 25: Year 11 DATA PROCESSING 1st Term

HOME

Third Normal Form (3NF)

We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists transitive dependency.

To bring this relation into third normal form, we break the relation into two relations as follows

Page 26: Year 11 DATA PROCESSING 1st Term

HOME

Referential IntegrityIs a property of data which, when satisfied, requires every value of one attribute (column) of a relation(table) to exist as a value of another attribute in a different (or the same) relation (table).For referential integrity to hold in a relational database, any field in a table that is declared a foreign key can contain either a null value, or only values from a parent table's primary key or a candidate key. In other words, when a foreign key value is used it must reference a valid, existing primary key in the parent table.

Page 27: Year 11 DATA PROCESSING 1st Term

HOME

Denormalization and UnnormalizationDenormalization is the process of attempting to optimize the read performance of a database by adding redundant data or by grouping data. In some cases, denormalization is a means of addressing performance or scalability in relational database software.

Unnormalization is a table that does not meet the definition of a relation. – it contains rows with multiple values for an attribute (repeating groups)

or – contains duplicate rows.

• A table is said to be in first normal form if it meets the definition of a relation –Generally this means it contains no repeating groups of attributes.

Page 28: Year 11 DATA PROCESSING 1st Term

HOME

Assignment

1.What do you mean by referential integrity?

2.What are second and third normal forms?

Page 29: Year 11 DATA PROCESSING 1st Term

HOME

Types of Data Model1. Database ModelA database model is a specification describing how a database is structured and used. Several database models have been suggested. Some common ones include:1. Flat2. Hierarchical3. Network4. Relational5. Object oriented models 6. Star schema

Page 30: Year 11 DATA PROCESSING 1st Term

HOME

Flat ModelThis may not strictly qualify as a data model. The flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another.

Page 31: Year 11 DATA PROCESSING 1st Term

HOME

Hierarchical modelIn this model data is organized into a tree-like structure, implying a single upward link in each record to describe the nesting, and a sort field to keep the records in a particular order in each same-level list.

Page 32: Year 11 DATA PROCESSING 1st Term

HOME

Network ModelThis model organizes data using two fundamental constructs, called records and sets. Records contain fields, and sets define one-to-many relationships between records: one owner, many members.

Page 33: Year 11 DATA PROCESSING 1st Term

HOME

Relational ModelThis is a database model based on first-order predicate logic. Its core idea is to describe a database as a collection of predicates over a finite set of predicate variables, describing constraints on the possible values and combinations of values

Page 34: Year 11 DATA PROCESSING 1st Term

HOME

Object-Relational ModelThe object-relational model is similar to a relational database model, but objects, classes and inheritance are directly supported in database schemas and in the query language.

Page 35: Year 11 DATA PROCESSING 1st Term

HOME

Star schemaThis is the simplest style of data warehouse schema. The star schema consists of a few "fact tables" (possibly only one, justifying the name) referencing any number of "dimension tables". The star schema is considered an important special case of the snowflake schema.

Page 36: Year 11 DATA PROCESSING 1st Term

HOME

2. Entity-Relationship ModelAn entity-relationship model (ERM) is an abstract conceptual data model (or semantic data model) used in software engineering to represent structured data. There are several notations used for ERMs.

Page 37: Year 11 DATA PROCESSING 1st Term

HOME

3. Generic Data ModelGeneric data models are developed as an approach to solve some shortcomings of conventional data models. For example, different modelers usually produce different conventional data models of the same domain. This can lead to difficulty in bringing the models of different people together and is an obstacle for data exchange and data integration.

Page 38: Year 11 DATA PROCESSING 1st Term

HOME

4. Semantic data modelA semantic data model in software engineering is a technique to define the meaning of data within the context of its interrelationships with other data. A semantic data model is an abstraction which defines how the stored symbols relate to the real world. A semantic data model is sometimes called a conceptual data model.

Page 39: Year 11 DATA PROCESSING 1st Term

HOME

CHARACTERISTICS OF SUITABLE SET OF RELATIONS IN A DATA MODEL• Minimal number of attributes necessary to support

data requirements of enterprise• Attributes with close logical relationship found in

same relation • Minimal redundancy with each attribute• Represented once except for attributes that form all

or part of foreign keys

Page 40: Year 11 DATA PROCESSING 1st Term

HOME

WEEK 5

Page 41: Year 11 DATA PROCESSING 1st Term

HOME

Page 42: Year 11 DATA PROCESSING 1st Term

HOME

Page 43: Year 11 DATA PROCESSING 1st Term

HOME

Page 44: Year 11 DATA PROCESSING 1st Term

HOME

Page 45: Year 11 DATA PROCESSING 1st Term

HOME

Page 46: Year 11 DATA PROCESSING 1st Term

HOME

Star Schema Model

Page 47: Year 11 DATA PROCESSING 1st Term

HOME

Week 7Database using Microsoft Access

Page 48: Year 11 DATA PROCESSING 1st Term

HOME

Week 8Data Models

Page 49: Year 11 DATA PROCESSING 1st Term

HOME

Data Models• Data Model: A set of concepts to describe the

structure of a database, and certain constraints that the database should obey.

• It is a conceptual representation of the data structures that are required by a database. The data structures include the data objects, the association between the data objects and the rules which govern operations on the objects.

Page 50: Year 11 DATA PROCESSING 1st Term

HOME

What is a Database?A database is an organized collection of related data. It manages very large amounts of data, supports efficient access to very large amounts of data and concurrent access to very large amounts of data. Example: bank and its ATM machines, a filing cabinet, an address book, a telephone directory, a timetable, etc.

Page 51: Year 11 DATA PROCESSING 1st Term

HOME

Database Management System (DBMS)A Database Management System (DBMS) is a collection of software programs which provide management of databases, control access to data and contain a query language to retrieve information easily.

Examples include 1. Microsoft Access2. FileMaker3. Lotus Notes4. Oracle SQL Server

Page 52: Year 11 DATA PROCESSING 1st Term

HOME

RDBMS

A relational database management system is a type of database that stores data in form of related tables.

Page 53: Year 11 DATA PROCESSING 1st Term

HOME

Data vs. Information

• Data Data is a collection of raw facts made up of text, numbers and dates:

Murray 35000 7/18/86

• Information This is the result of data that has been processed in a meaningful way

Mr. Murray is a sales person whose annual salary is $35,000 and whose hire date is July 18, 1986.

Page 54: Year 11 DATA PROCESSING 1st Term

HOME

Basic Database Concepts

• Table– A table is a set of related records

Name: Barry HarrisCollege: MedicineTel: 392-5555

Name: Barry Harris

• Field

• Record–A record is a collection of data

about an individual item

–A field is a single item of data common to all records

Page 55: Year 11 DATA PROCESSING 1st Term

HOME

• QueriesA database "query" is basically a "question" that you ask the database in order to get information back from the database. It is used as the way of retrieving the information from database.• ReportsDatabase reports are the formatted result of database queries and contain useful data for decision-making and analysis.

Page 56: Year 11 DATA PROCESSING 1st Term

HOME

Primary Keys & Foreign KeysName User Phone College

Graff rgraff 392-3900 Pharmacy

Harris bharris 392-5555 Medicine

Ipswich zipswich 846-5656 PHHP

To ensure that each record is unique in each table, we can set one field to be a Primary Key field.

A Primary Key is a field that that will contain no duplicates and no blank values.

Foreign Keys link to data in other tables

Page 57: Year 11 DATA PROCESSING 1st Term

HOME

Types of DatabasesRelational databasesIn relational databases, fields can be used in a number of ways (and can be of variable length), provided that they are linked in tables.

Non-relational databasesNon-relational databases place information in field categories that we create so that information is available for sorting and disseminating the way we need it. The data can only be "copied and pasted.“ Example: a spread sheet

Page 58: Year 11 DATA PROCESSING 1st Term

HOME

File Organization

Page 59: Year 11 DATA PROCESSING 1st Term

HOME

File OrganizationPhysical arrangement of the records of a file on secondary storage devices.It is used to determine an efficient file organization for each base relation. For example, if we want to retrieve student records in alphabetical order of name, sorting the file by student name is a good file organization. However, if we want to retrieve all students whose marks is in a certain range, a file ordered by student name would not be a good file organization. Some file organizations are efficient for bulk loading data into the database but inefficient for retrieve and other activities.

1. Sequential2. Linked List3. Indexed4. Hashed

Page 60: Year 11 DATA PROCESSING 1st Term

HOME

Physical Design1. Volume and Usage analysis

2. Distribution Strategy

3. File Organizations

4. Indexes and Access Methods

5. Integrity Constraints

Page 61: Year 11 DATA PROCESSING 1st Term

HOME

Physical Design Issues1. Size2. Speed of access3. Speed of update4. Growth issues: performance and degradation5. Security6. Maintenance

Page 62: Year 11 DATA PROCESSING 1st Term

HOME

DBMS Organization

1. Relationships: physical address pointers

2. Links generated when data is entered3. Efficient but not flexible4. Ad hoc design 5. Query dependent on specific DBMS

(may support SQL)

1. Relationships: logical data references2. Links generated when data is retrieved3. Flexible but not efficient4. Theoretical base5. SQL

Structured Relational

Page 63: Year 11 DATA PROCESSING 1st Term

HOME

DBMS Technology1. CPU• Components• Operation

2. DASD• Technology• Organization

3. Data Transfer

4. Access methods

Page 64: Year 11 DATA PROCESSING 1st Term

HOME

Physical DesignData Distribution

1. Centralized2. Partitioned–Horizontal–Vertical

3. Replicated4. Hybrid

Page 65: Year 11 DATA PROCESSING 1st Term

HOME

Methods of organizing filesDifferent methods of organizing files-

1.Heap2.Sequential 3.Indexed-sequential4.Inverted list5.Direct access

Page 66: Year 11 DATA PROCESSING 1st Term

HOME

Choosing a file organization is a design decision, hence it must be done having in mind the achievement of good performance with respect to the most likely usage of the file. The criteria usually considered important are:   1. Fast access to single record or collection of related records.   2. Easy record adding/update/removal, without disrupting .   3. Storage efficiency.   4. Redundancy as a warranty against data corruption.

Page 67: Year 11 DATA PROCESSING 1st Term

HOME

HEAP FILES(UNORDERED)Basically these files are unordered files. It is the simplest and most basic type. These files consist of randomly ordered records. The records will have no particular order.The operations we can perform on the records are insert, retrieve and delete. The features of the heap file or the pile file Organisation are:

1.New records can be inserted in any empty space that can accommodate them.2.When old records are deleted, the occupied space becomes empty and available for any new insertion.3.If updated records grow; they may need to be relocated (moved) to a new empty space. This needs to keep a list of empty space.

Page 68: Year 11 DATA PROCESSING 1st Term

HOME

Advantages and disadvantages of HEAP FILESAdvantages 1.This is a simple file Organisation method.2. Insertion is somehow efficient.3. Good for bulk-loading data into a table.4. Best if file scans are common or insertions are frequent.

Disadvantages 1.Retrieval requires a linear search and is inefficient.2. Deletion can result in unused space/need for reorganisation.

Page 69: Year 11 DATA PROCESSING 1st Term

HOME

Heap file organizationIn the below figure, we can see a sample of heap file organization for EMPLOYEE relation which consists of 8 records stored in 3 contiguous blocks, each blocks can contains at most 3 records.

Page 70: Year 11 DATA PROCESSING 1st Term

HOME

Sequential file organization1. Stored in key sequence.2. Adding/deleting requires making new file.3. Used as master file.4. Records in these files can only be read or written sequentially.

Page 71: Year 11 DATA PROCESSING 1st Term

HOME

Sequential file organization•Records are also in sequence within each block. To access a record, previous records within the block are scanned. Thus sequential record design is best suited for “get next” activities, reading one record after another without a search delay.

•records can be added only at the end of the file.

Page 72: Year 11 DATA PROCESSING 1st Term

HOME

Advantages and disadvantages of Sequential fileADVANTAGES1. Simple file design2. Very efficient when most of the records must be processed

e.g. Payroll3. Very efficient if the data has a natural order4. Can be stored on inexpensive devices like magnetic tape.

DISADVANTAGES

5. Entire file must be processed even if a single record is to be searched.

6. Transactions have to be sorted before processing7. Overall processing is slow.

Page 73: Year 11 DATA PROCESSING 1st Term

HOME

Indexed-sequential organization1. Each record of a file has a key field which uniquely identifies

that record.2. An index consists of keys and addresses.3. An indexed sequential file is a sequential file (i.e. sorted into

order of a key field) which has an index.4. A full index to a file is one in which there is an entry for every

record.5. When a record is inserted or deleted in a file the data can be

added at any location in the data file. Each index must also be updated to reflect the change. For a simple sequential index this may mean rewriting the

index for each insertion.

Page 74: Year 11 DATA PROCESSING 1st Term

HOME

Indexed-sequential organization

Page 75: Year 11 DATA PROCESSING 1st Term

HOME

Indexed-sequential organization

Page 76: Year 11 DATA PROCESSING 1st Term

HOME

Page 77: Year 11 DATA PROCESSING 1st Term

HOME

Indexed-sequential organizationIndexed sequential files are important for applications where data needs to be accessed.....Sequentially randomly using the index.

An indexed sequential file can only be stored on a random access devicee.g. magnetic disc, CD.

Page 78: Year 11 DATA PROCESSING 1st Term

HOME

ADVANTAGES AND DISADVANTAGES

Advantages

Provides flexibility for users who need both type of accesses with the same file.Faster than sequential.

Disadvantages

Extra storage space for the index is required

Page 79: Year 11 DATA PROCESSING 1st Term

HOME

Inverted list organizationLike the indexed-sequential storage method, the inverted list organization maintains an index. The two methods differ, however, in the index level and record storage. The indexed- sequential method has a multiple index for a given key, whereasthe inverted list method has a single index for each key type.The records are not necessarily stored in a sequence. They are placed in the are data storage area, but indexes are updated for the record keys and location.

Page 80: Year 11 DATA PROCESSING 1st Term

HOME

ADVANTAGES AND DISADVANTAGES

AdvantagesThe benefits are apparent immediately because searching is fast

disadvantagesinverted list files use more media space and the storage devices get full quickly with this type of organization. updating is much slower.

Page 81: Year 11 DATA PROCESSING 1st Term

HOME

Advantages and disadvantagesAdvantages

Any record can be directly accessed.Speed of record processing is very fast.Up-to-date file because of online updating.Concurrent processing is possible. Transactions need not be sorted.DisadvantagesMore complex than sequential.Does not fully use memory locations.More security and backup problems. Expensive hardware and software are required. System design is complex and costly. File updation is more difficult as compared to sequential files.

Page 82: Year 11 DATA PROCESSING 1st Term

HOME

Comparison

wps.cn/moban

Page 83: Year 11 DATA PROCESSING 1st Term

HOME

Quiz 1.Different types of files area)Master Transaction Backup

b)Archive Table Report

c)Dump Library

2. Major criteria for selecting a File organization are1. Method of processing of file2. Size of data3. File inquiry capability4. File volatility5. Response time6. Activity ratio