Unit 1 Data Structure

Escuela Superior de Informática de Ciudad Real

Universidad de Castilla-La Mancha

Data Structures

Unit 1

General Concepts

© Escuela Superior de Informática de Ciudad Real (UCLM) 2

Objectives and competencies

UNIT 1: GENERAL CONCEPTS

Competencies

BA3 Capacidad para comprender y dominar los conceptos básicos de matemática discreta, lógica, algorítmica y complejidad computacional, y su aplicación para la resolución de problemas propios de la ingeniería.

CO6 Conocimiento y aplicación de los procedimientos algorítmicos básicos de las tecnologías informáticas para diseñar soluciones a problemas, analizando la idoneidad y complejidad de los algoritmos propuestos.

CO7 Conocimiento, diseño y utilización de forma eficiente de los tipos y estructuras de datos más adecuados para la resolución de un problema.

INS1 Capacidad de análisis, síntesis y evaluación.

INS4 Capacidad de resolución de problemas aplicando técnicas de ingeniería.

SIS1 Razonamiento crítico.

SIS3 Aprendizaje autónomo.

UCLM2 Capacidad para utilizar las Tecnologías de la Información y la Comunicación.

Learning outcomes

Saber manejar tipos de datos, estructuras de datos y tipos abstractos de datos de forma correcta y adecuada a los problemas, así como su especificación formal, implementación y utilización de los tipos abstractos de datos lineales y no lineales.

Diseñar soluciones a problemas, analizando la idoneidad y complejidad de los algoritmos propuestos


References


Basic references

B. Meyer, “Construcción de Software Orientado a Objetos”, Capítulo 6: TiposAbstractos de Datos. Segunda edición, Prentice-Hall (1999)

M. T. Goodrich & R. Tamassia, “Data Structures and Algorithms in Java”, Chapter 4 6th Edition. International Student Version, John Wiley (2011).

PowerPoint PDF Handouts:

http://bcs.wiley.com/he-bcs/Books?action=chapter&bcsId=8950&itemId=1118808576&chapterId=101146

Specific references

J. V. Guttag, E. Horowitz and D. R. Musser, “Abstract Data Types and the Development of Data Structures”, Communications of ACM, 20, 396-404 (1977)

J. V. Guttag, E. Horowitz and D. R. Musser, “Abstract Data Types and Software Validation”, Communications of ACM, 21, 1048-1064 (1978)

http://bcs.wiley.com/he-bcs/Books?action=chapter&bcsId=8950&itemId=1118808576&chapterId=101146


Outline


1. Introduction

2. Data structures and Abstract Data Types (ADTs)

3. ADTs specification

3.1. UML based specification

3.2. Algebraic specification

3.2.1. Syntactic specification

3.2.2. Semantic specification

4. Analysis of algorithms

4.1. RAM model

4.2. Primitive operations

4.3. Big-Oh notation

4.4. Asymptotic analysis

5. Useful tips (reminder)


1. Introduction (I) This is a course about what in computer science are called data

structures. What is that?

A data structure is a data container (it stores and supplies data).

A data structure is related to the concepts of data type and abstract

data type.

Remember, a data type represents a set of values (recall the

primitive data types).

A structured data is a set of variables of, possibly, different data

types.

An abstract data type involves a structured data plus a set of

operations available for it.

A data structure will correspond to an ADT.


1. Introduction (II) Data structures are fundamental components of algorithms (which

are specifically considered in “Programming

Methodology/Metodología de la Programación” in the second

semester).

In this course, we will consider the most common data structures:

Lists

Stacks Queues

Trees

Data structures

Linear Nonlinear

Graphs


1. Introduction (III)

Very important: THIS COURSE IS NOT FOCUSED ON

PROGRAMMING.

Programming will be mainly a tool for implementing examples

using data structures.

The goal is not, specifically, to implement data structures.

In fact, most of the data structures are already available in standard

packages.

Let’s consider in more detail both, data structures and abstract data

types.


2. Data structures and ADTs (I)

It is convenient to start considering the concept of Abstract Data

Type (ADT) introduced in the 1970’s. So, what’s an Abstract Data

Type?

Aho, Hopcroft & Ullman definition: An ADT is a mathematical

model, together with several operations defined on the model.

NIST (National Institute of Standards and Technology) definition:

In very practical terms, an ADT is a set of data (variables) and

functions (methods) acting on that data.

ADT: A set of data values and associated operations that are precisely specified independent

of any particular implementation.


2. Data structures and ADTs (II)

An ADT is an abstract model. Then, how do we use an ADT in a

computer?

Data structures can be described as ADTs.

A data structure is a set of variables responding to the set of

methods that represent an ADT.

Remember: An ADT has a correspondence with the concept of

class in object orientation.


2. Data structures and ADTs (III)

From the object orientation standpoint, the data structure

corresponds to a class that implements the ADT methods.

In lay terms, a data structure is (or corresponds to) the practical

realization of an ADT.

Thus, to describe a data structure we must specify the

corresponding ADT.


2. Data structures and ADTs (IV)

Extremely important: The ADT (or the data structure) is defined by

its specification not by its implementation.

The key point is that the operations define the ADT. The ADT is NOT

defined by the data it contains.

Therefore, the way to work with an ADT is through its operations

(just by calling the methods in a class).

You NEVER work with an ADT (data structure) by accessing its data

because those data depend on the implementation used.

However, the operations do not depend on the implementation.


3. ADTs specification

How can we describe, specify, the behavior of an ADT?

The basic idea is to be able to describe the operations of the ADT

(remember, the ADT is defined by its operations not by its data).

There is not a single specification approach.

We can use different approaches: the syntax of a language, UML

notation, or, the traditional algebraic specification.

Let’s present a simple specification based on UML.


3.1. UML based specification

To describe the behavior of the operations in the ADT we can use

the method specification notation of UML:

method1 (parameter1: type, parameter2: type,…): return type

Thus, we can specify the data that the method acts on (the formal

parameters) and the result produced (the return type).

We should specify every method (operation) associated to the ADT.

However, the most complete (sometimes “too” complete)

specification method is the traditional algebraic specification.


3.2. Algebraic specification (I)

The algebraic specification was initially developed in the late 1970’s

by groups in Europe and US.

The idea was to use algebraic specification as a formal

specification technique for abstract data types.

An algebraic data type specification consists of a syntactic and a

semantic specification.

The syntactic specification defines the names, domains and ranges

of the ADT’s operations (how to use the operation: form).

The semantic specification contains a set of axioms in the form of

equations that relate the operations of the ADT to each other (what

the operation does: meaning).


3.2. Algebraic specification (II)

The syntactic specification corresponds to the previously

presented UML specification.

The semantic specification is needed for formal verification

techniques (to prove that the ADT does what it is supposed to do).

Since we are not performing formal verifications, in the rest of the

course we will restrict ourselves to the syntactic specification

(algebraic or other) of the considered ADT’s.


3.2.1. Syntactic specification (I)

We will use algebraic notation (some authors use some kind of

programming notation).

In the syntactic specification (or simply syntax) we identify the type

and signature of the operations.

We must specify:

– The ADT (as a general entity, for instance a stack, not “a stack of

integers”, or “a stack of doubles”)

– Auxiliary types used in the specification

– The operations. We define the operation using the concept (and

syntax) of functions. We specify the name of the operation, the set

to which each data belongs and the set the result belongs to.

Example: consider the addition (add) of natural numbers:

› add: N x N → N

– add is the name of the operation, N represents the natural numbers,

x the Cartesian product, and the arrow (→) the result.


3.2.1. Syntactic specification (II)

The aspect of a syntactic specification for the stack ADT is:

Partial functions: in some cases the operation is not defined

(Exception)

Type: stack

Sorts: el (element), boolean, natural

Operations:

init : stack

push : stack x el stack

pop : stack / stack

top : stack / el

isEmpty : stack boolean

size : stack natural

Syntax

Partial

functions


3.2.2. Semantic specification (I)

With the syntax we could have data structures with the same

specification but with different behavior (a stack and a queue, for

instance). The semantics breaks the ambiguity.

Here, in the semantics, we specify the behavior of each operation

using axioms.

Remember, in traditional logic, an axiom or postulate is a

proposition that is not proved or demonstrated but considered to

be self-evident. An axiom is a logical statement that is assumed to

be true.

For each operation, we define an axiom.


3.2.2. Semantic specification (II)

The aspect of a semantic specification is:

From now on we will focus just on the syntactic specification.

p stack, e element

Axioms:

isEmpty(init) true

isEmpty(push(p, e)) false

top(init) error

top(push(p, e)) e

size (init) 0

size (push(p, e)) size (p)+1

pop(init) error

pop(push(p, e)) p


4. Analysis of algorithms (I)

An algorithm can be analyzed in terms of

time efficiency or space utilization. We

will consider only the former right now.

The running time of an algorithm

typically grows with the input size.

Types of analysis

– Best case: Lower bound on cost.

Determined by “easiest” input.

– Worst case: Upper bound on cost.

Determined by “the most difficult”

input.

– Average case: Expected cost for a

random input. Need of a model for

defining what “random input” is.

Average case time is often difficult to

determine. We focus on the worst case

running time.

0

20

40

60

80

100

120

Ru

nn

ing

Tim

e

1000 2000 3000 4000

Input Size

best case

average case

worst case


4. Analysis of algorithms (II)

Example: Comparing the growth of the running time, as the input

grows, to the growth of known functions.

Time efficiency: log n < n < n log n < n² < n³ < 2ⁿ

Input Size (n) n log n n log n n² n³ 2ⁿ

5 5 3 15 25 125 32

10 10 4 33 100 10³ 10³

100 100 7 664 104 106 1030

1000 1000 10 104 106 109 10300

10000 10000 13 105 108 1012 103000


A picture is worth a thousand words


4. Analysis of algorithms (III)

To analyze an algorithm we have two options:

– To perform an experimental study.

– To perform a theoretical analysis.

Problems with the experimental approach:

– It is necessary to implement the algorithm, which may be difficult.

– Results may not be indicative of the running time on other inputs not included in the experiment.

– In order to compare two algorithms, the same hardware and software environments must be used.


4. Analysis of algorithms (IV)

Advantages of the theoretical approach:

– Only needs a high-level description of the algorithm instead of an

implementation.

– Characterizes running time as a function of the input size, n.

– Takes into account all possible inputs.

– Allows us to evaluate the speed of an algorithm independently of

the hardware/software environment.

We present here the theoretical approach.

In it, we describe the algorithm using pseudo-code.

First of all, we need a model for computing the operations the

algorithm performs.


4.1. RAM model (I)

The Random Access Machine (RAM) model is a

simple model to quantify algorithms.

The model considers:

– A single (sequential) CPU

– A potentially unbounded bank of memory cells,

each of which can hold an arbitrary number or

character.

– Memory cells are numbered and accessing any

cell in memory takes unit time.

The goal is counting operations, so we must

now consider what are the basic, primitive

operations we want to count.

01

2


4.1. RAM model (II)

What are the basic operations?

– Basic computations performed by an algorithm

– Identifiable in pseudo-code

– Largely independent from the programming language

– Exact definition not important (we will see why later)

– Assumed to take a constant amount of time in the RAM model. The

time is the same for every operation.


4.1. RAM model (III)

Examples of basic operations:

– Evaluating an expression

– Assigning a value to a variable

– Indexing into an array

– Calling a method

– Returning from a method

By inspecting the pseudo-code, we can determine the maximum

number of primitive operations executed by an algorithm, as a

function of the input size.

Let’s see an example:


4.1. RAM model (IV)

Example: find maximum element of an array (the worst case is

when the maximum value is the last element of the array)

Algorithm arrayMax(A, n) # operations

currentMax A[0] 2

for i 1 to n 1 do 2n+1

if A[i] currentMax then 2(n 1)

currentMax A[i] 2(n 1)

{ increment counter i } 2(n 1)

return currentMax 1

Total 8n 2

Best case? We never execute currentMax A[i]

Total 6n


1

10

100

1.000

10.000

1 10 100 1.000

n

3n

2n+10

n

4.3. Big-Oh notation (I)

Big-Oh notation characterizes functions according to their growth

rates (how fast the function grows).

Let’s go back to the worst case scenario.

Given functions f(n) and g(n), we say that f(n) is O(g(n)) (“f(n) is

order of g(n)”) if there are positive constants c and n0 such that

f(n) c · g(n) for n n0

We say f(n) “is big-oh” g(n)

Example: 2n + 10 is O(n) since:

– 2n + 10 c · n, then

– (c 2) n 10, then

– n 10/(c 2), then

– Pick c = 3, then n0 = 10 and that’s it

From n=10, the growth rate is the same for the two functions.


4.3. Big-Oh notation (II)

The big-Oh notation gives an upper bound (remember, the worst

case) on the growth rate of a function.

The statement “f(n) is O(g(n))” means that the growth rate of f(n) is

no more than the growth rate of g(n). Your function does not grow

faster than g(n).

We can use the big-Oh notation to rank functions according to their

growth rate.

f(n) is O(g(n)) g(n) is O(f(n))

g(n) grows more Yes No

f(n) grows more No Yes

Same growth Yes Yes


4.4. Asymtotic analysis

Now, for comparing algorithms, we apply the Big-Oh notation to the

running time (# operations). This is called asymptotic analysis. With

this, the running time is expressed in big-Oh notation.

To perform the asymptotic analysis

– We find the worst-case number of primitive operations executed as

a function of the input size, T(n).

– We are interested in what happens with T(n) when n ∞ (the limit)

– We express the resulting function with big-Oh notation.

Example:

– We determine that algorithm arrayMax executes at most 8n 2

primitive operations.

– T(n) = 8n when n ∞

– We say that algorithm arrayMax “runs in O(n) time”

Since constant factors and lower-order terms are eventually

dropped anyhow, we can disregard them when counting primitive

operations.


5. Useful tips (Reminder)

We collect here the concepts needed in this course from

“Programming Fundamentals I“ and “Programming Fundamentals

II”.

Programming Fundamentals I (Fundamentos de Programación I) :

– Recursion

Programming Fundamentals II (Fundamentos de Programación II) :

– Class definition

– Inheritance

– Polymorphism

– Interfaces

– Exceptions

– Generics

– Linked list/variables

JoséÁngel

Resaltado

JoséÁngel

Resaltado


Recommended activities


Recommended readings

http://www.informatik.uni-bremen.de/agbkb/forschung/formal_methods/completed_projects/compass/7years_e.htm

http://en.wikipedia.org/wiki/Exponential_growth

http://en.wikipedia.org/wiki/Wheat_and_chessboard_problem

Recommended activities

Invent some ADTs, and develop the corresponding specification.

Solve the unsolved problems from the list proposed.

Apply the asymptotic analysis to your own algorithms.

http://www.informatik.uni-bremen.de/agbkb/forschung/formal_methods/completed_projects/compass/7years_e.htm

http://en.wikipedia.org/wiki/Exponential_growth

http://en.wikipedia.org/wiki/Wheat_and_chessboard_problem

Unit 1 Data Structure

Documents

Transcript of Unit 1 Data Structure