Data Structures Alan, Tam Siu Lung96397999 [email protected].

50
Data Structures Alan, Tam Siu Lung 96397999 [email protected] 99967891

Transcript of Data Structures Alan, Tam Siu Lung96397999 [email protected].

Page 1: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Data Structures

Alan, Tam Siu Lung 96397999

[email protected] 99967891

Page 2: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Prerequisite

• Familiarity with Pascal/C/C++• Asymptotic Complexity• Techniques learnt

– Recursion– Divide and Conquer– Exhaustion– Greedy– [Dynamic Programming exempted]

• Algorithms learnt– Bubble / Insertion / Selection / Shell / Merge / Quick /

Bucket / Radix Sorting– Linear / Binary / Interpolation Searching

Page 3: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

What our Programming Language provides?

• Built-in Data Types– Character/String (length limit?)– Integral (signed/unsigned 8 [?], 16, 32, 64 [?] bit)– Floating Point (signed/unsigned 32, 64, 80 [?]

bit)– Fixed Point [?]– Complex [?]– Pointer/Reference– Function Pointer/Reference

Page 4: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

What our Programming Language provides?

• Aggregate Data Types– Array [base-definable?]

• Multiple Values of same type

• Access by numeric index

– Record/Struct/Class• Multiple Values of different types

• Function Aggregation + Inheritance + Polymorphism [?]

– Unions [?]

Page 5: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

What our Programming Language provides?

• Built-in Language Constructs– Branching (If, Else)– Loops (For, While, Until)– Function/Procedure Calling

• In C++’s view, statements and operators are functions as well

• a = b int &operator=(int &a, const int &b)• a > b bool operator>(const int &a, const int &b)• *a int &operator*(int *a)• a[b] string &operator[](string &a[], int b)

– Recursion– Even more for more sophisticated languages!

Page 6: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

For most of the remaining time

• We concentrate at– Pointer– Array– Record– and how they interact

• We will use a C++-like notation– array<int> meaning an array of integer– int* is acronym of pointer<int>– Records are written as: struct<int, int, string>– Capital types are “variables” which means it can

be replaced by any types

Page 7: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Formal Definition: Pointer

• Concept:– pointer<Type> p; (Type *p) [^p in Pascal]

• Operations:– *p Type &operator*(Type *p) [p^ in Pascal]

• Returns the pointed value

• Error if p is null/nil

– &y Type *operator&(Type &p) [@y in Pascal]• Returns the address of a value

– p = x Type *operator=(Type *p, Type *x)• Pointer assignment

Page 8: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Formal Definition: Pointer

• More Operators– p < q bool operator<(Type *p, Type *q)

• Returns if pointer p is smaller

– ++p Type *operator++(Type *p) [inc(p) in Turbo Pascal]

• Point to next element (in an array)

– --p Type *operator--(Type *p) [dec(p) in Turbo Pascal]

• Point to previous element (in an array)

– p + n Type *operator+(Type *p, int n) [not in Turbo Pascal]

• Point to nth next element (in an array)

Page 9: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Programming Syntax: Pointer

int main() {

int a[10];

int *b = &a[1];

*b = 1;

b = new int(2);

delete b;

b = 0;

}

var

a : array[1..10] of integer;

b : ^integer;

begin

b = @a[2];

b^ = 1;

new(b);

b^ = 2;

dispose(b);

b = nil;

end.

Page 10: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Array

• Concept– array<Type, Size : int>– array<Type, Lower : int, Upper : int>

• Operations– Type &operator[](Type a[], int index)

• Requires 0 <= index < Size• Requires Lower <= index <= Upper

• Analysis– a[x] is equivalent to *(a + x)– which is equivalent to (Type *)(@a + x * sizeof(a))– It is sometimes slower than necessary!

Page 11: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Example: Prime Finding

• primes[] stores all primes found

primes[0] = 2;

for each i for each v in primes[]

if (v * v > i) then begin

primes.add(n);

break;

end;

if (i mod v = 0) then break;

Page 12: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Solution

#include <iostream>using namespace std;int main() { int primes[100], *last = primes; cout << (*last++ = 2) << endl; for (int i = 3; i < 100; ++i) { int *j = primes; do { if (*j * *j > i) { cout << (*last++ = i) << endl; break; } if (i % *j == 0) break; } while (++j < last); }}

var primes: array[1..100] of integer; i : integer; last, j: ^integer;begin last := @primes; last^ := 2; inc(last); for i := 3 to 100 do begin j := @primes; repeat if j^ * j^ > i then begin last^ := i; inc(last); writeln(i); break; end; if (i mod j^ = 0) break; inc(j); until j >= last; end;end.

Page 13: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Record

• Like Arrays

• Identified by names instead of index

• Each name is associated with a type

• Pair is a special record with 2 elements, Key and Value– Keys are unique (i.e. keys identify records)– Keys are comparable (i.e. sort-able) [sometimes]– Since Value can itself be a record, all records

with a unique portion can be represented as a pair)

Page 14: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Programming Syntax: Record

struct Point { double x, y;};struct Rect { Point tl, br; int color;};int main() { Rect rect; rect.color = 255; rect.tl.x = 0.0;}

type Point = record x, y : real; end; Rect = record tl, br : Point; color : integer; end;var rect : Rect;begin rect.color := 255; rect.tl.x := 0.0; with rect do begin color := 255; tl.x := 0.0; end;end.

Page 15: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Linked List

• Combining Pointer and Record• linkedlist<string>:

type

pNode = ^Node;

Node: record

value : string;

next : pNode;

end;

var

head: pNode;

Page 16: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Linked List

• Operations– void Add(linkedlist<Type> p, Type &v)

• Add an element to the Linked List

– Node *Search(linkedlist<Type> p, Type &v)• Returns null/nil if not found

– void InsertAfter(Node node, Type &v)• Insert an element after another

– void Remove(Node node)• How to implement?

• C++: x->y == (*x).y

Page 17: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Linked List Implementation

Node *list;void Add(int v) { Node *old = list; list = new Node(); list.next = old; list.value = v;}Node *Search(int v) { for (Node *p = list; p; p = p->next) if (p->value == v) return v; return 0;}Node *InsertAfter(Node *n, int v) { Node *old = n.next; n.next = new Node(); n.next.next = old; n.next.value = v;}

var list: pNode;procedure Add(v : integer);var old : pNode; old := list; new(list); list.next := old; list.value := v;}function Search(v : integer) : pNode;var n : pNode;begin n := list; while (n <> nil) and (n^.value <> v)

do p := p^.next; Search := n;end;{ InsertAfter is similar to Add }

Page 18: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Array Implementation

/ 1

K N V

1 2 /

2 3 /

3 4 /

4 5 /

5 / /

1 2

K N V

1 / 2

2 3 /

3 4 /

4 5 /

5 / /

Add 2

2 3

K N V

1 / 2

2 1 3

3 4 /

4 5 /

5 / /

Add 3

Page 19: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Array Implementation

2 3

K N V

1 / 2

2 1 3

3 4 /

4 5 /

5 / /

Remove 2

Remove 3

1 2

K N V

1 / 2

2 3 /

3 4 /

4 5 /

5 / /

2 1

K N V

1 3 /

2 / 3

3 4 /

4 5 /

5 / /

Page 20: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Abstraction

• Both of the implementations feature the same complexity– O(1) Addition– O(n) Searching– O(1) Insertion– O(1) Removal

• Sometimes we don’t care how it gets implemented– We only want a data structure which provides the operations we

want.• We define Abstract Data Types (ADTs) to mean a collection

of Data Structures providing certain operations– Plane– Polynomial– Graph

• We don’t even care how fast the operations in an ADT are, though practically we do

Page 21: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Dictionary (Map, Associative Array)

• Dictionary is unordered container of kv-pairs

• map<Key, Value>– void Insert(map<Key, Value> &c, Key &key,

Value &value)– int Size(map<Key, Value> &c)– Value &Search(map<Key, Value> &list, Key

&key)– void Delete(map<Key, Value> &list, Key &key)

Page 22: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

List ADT

• List ADT is ordered container of kv-pairs• list<Key, Value>

– void Insert(list<Key, Value> &c, int pos, Type &value)– Type &Find-ith(list<Key, Value> &c, int pos)– void Delete-ith(list<Key, Value> &c, int pos)– int Size(list<Key, Value>)– Type &Search(list<Key, Value> &c, Key &key)– void Delete(list<Key, Value> &c, Key &key)– …

• A List can be implemented by array (Vector/Table), linked list (LinkedList), etc

• A List is also a Dictionary

Page 23: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Time Complexity

Average Case Add Remove Search

Array O(1) O(n) O(n)

Sorted Array O(n) O(n) O(lg n)

Linked List O(1) O(n) O(n)

• We seldom remove anyway• There is no way to make both Add/Search fast• In general, it is difficult if we do not depend on

features of the Key

Page 24: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Direct Addressing Implementation

0 Ant

5 Boy

99 Car

• Use the Vector ADT• The key is the location• Efficient: O(1) for all

operations• Infeasible: if the key can range

from 1 to 20000000000, if the key is not numeric ...

Page 25: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Hash Function

• Hash Function: hm(k)

• Map all keys “by calculation” into an integer domain, e.g. 0 to m ─ 1

• E.g. CRC32 hashes strings into 32-bit integer (i.e. m = 232)– Alan: 1598313570– Max: 3452409927– Man: 943766770– On: 2246271074

Page 26: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Hash Table Implementation

• Use a Table<int, Value> ADT of size m• Use hm(Key) as the key• All operations can be done like using Table• Solved except

– Collision: What to do if two different k have same h(k)– How to find a suitable hash function

• If good hash functions are used, hash tables provide near O(1) insertion, searching and removal– But it is difficult to get it right– And it is not easy to code– C++: hash_map<Key, Value, hash_func>

• Read 2003 Advanced Notes on Hash Table if you are motivated enough

Page 27: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Binary Search Tree Implementation

• Sorted Array is fast for searching– But it is slow when inserted at front

• Idea– Store separate arrays– If value < v, insert to left array– If value >= v, insert to right array

• Now we have a Data Structure which is– Worst Case N / 2 + 1 insertion (N in the past)– lg(N) + 1 searching

v

Page 28: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Binary Search Tree Implementation

• Now we have a Data Structure which is– N / 2 + 1 insertion (N in the past)– lg(N) + 1 searching

• If we store “N / 2” elements in this DS– N / 4 + 1 insertion– lg(N) searching

• If both of left and right arrays use this DS [Recursion]– N / 4 + 2 insertion– lg(N) + 1 searching

• Continue this process lg(N) times– lg(N) + 2 insertion– lg(N) + 1 searching– How will it look like?

Page 29: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Binary Search Tree Implementation

struct Node {Node *left, right;int *value;

};

typepNode = ^Node;Node = record

left, right : ^Node;value : int;

end;

6

3

1

8

4 97

7.5

Page 30: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Introduction to Tree

• Node

• Root

• Leaf / Internal

• Parent / Children

• [Proper] Ancestors / Descendants

• Siblings

Page 31: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Binary Search Tree Implementation

• Operations– Searching

• If target < current, go to left• If target > current, go to right

– Insertion• Search• Insert it there

– Removal• If it is leaf, just remove it.• Otherwise, the smallest one larger than it is leaf.

Replace!• Worst Case

– If input is sorted, the tree will become …– What can we do?– C++: map<Key, Value, comparator>

Page 32: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Recess

Have a break!

Page 33: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Stack ADT

• Something your compiler has implemented for you.

void pow(int x, int n) {

if (n == 0) return 1;

int v = pow(x, n / 2);

if (n % 2 == 0) return v * v;

return x * v * v;

}

• pow(3, 5)→pow(3, 2)→pow(3, 1)→pow(3, 0)

Page 34: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Stack ADT

• But– It mandates what to be put in stack– It couples control flow with data flow

• So we will still implement our own stack

• Last-in-first-out– When do we need this behavior?

• Array?– Fast, but fixed size– C++: stack<Type>

Page 35: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Array Implementation of Stack

int stack[100];

int top = 0;

void push(int v) {

stack[top++] = v;

}

int pop() {

return stack[--top];

}

var

stack : array[1..100] of integer;

top : integer;

procedure push(v : integer);

begin

inc(top);

stack[top] := v;

end;

function pop : integer;

begin

pop := stack[top];

dec(top);

end;

Page 36: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Queue ADT

• First-in-first-out– When do we need this behavior?– Major use is Breadth First Search in Graph

• Array?– Fast, but fixed size– Circular?– C++: queue<Type>

Page 37: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Array Implementation of Queue

int queue[100];int head = 0, tail = 0;

void enqueue(int v) { queue[tail++] = v;}

int dequeue() { return queue[head++];}

var queue : array[1..100] of integer; head, tail : integer;procedure enqueue(v : integer);begin inc(tail); stack[tail] := v;end;function dequeue : integer;begin inc(head); pop := stack[head];end;

Page 38: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Priority Queue ADT

• PriorityQueue<Priority, Value>– void Push(Priority &p, Value& v)

• Add an element

– Value &Top()• Returns the element with maximum priority

– void Pop()• Remove the element with maximum priority

• Again both Array and Linked List can do it suboptimally. A maximum heap can finish Push and Pop in O(lg n) and Top in O(1).

• C++: priority_queue<Type, comparator>

Page 39: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Heap

• In an array with N elements– We can obtain maximum value of an array in O(1) time

if every Add() updates this value.

– But removal of it destroys all knowledge and requires N – 1 operations to recalculate.

• If we have 2 arrays of N / 2 elements– We only need N / 2 time because only the array with

maximum extracted is recalculated.

3 1 5 7 8 5 4

8

2 6 3 4 2 5 3

6

Page 40: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

3 1 5 5 4

6

2 3 4 2 5 3

7

2 7 3 4 2 5 3 3 1 5 6 5 4

8

3 1

54

5

3 1 4

2 3

4

2 3

5

2 3 2 3

Heap

Page 41: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Heap

67

8

5 5

3 1 4

4 5

2 3 2 3

Page 42: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Heap

67

5 5

3 1 4

4 5

2 3 2 3

Page 43: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Heap

6

7

5 5

3 1 4

4 5

2 3 2 3

Page 44: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Heap

65

7

5 5

3 1 4

4

2 3 2 3

Page 45: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Heap

65

7

5 5

3 1 4

4 3

2 3 2

Page 46: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Heap

67

8

5 5

3 1 4

4 5

2 3 2 3

Page 47: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Heap

67

4

5 5

3 1 8

4 5

2 3 2 3

Page 48: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Heap

64

7

5 5

3 1 8

4 5

2 3 2 3

Page 49: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Heap

65

7

5 5

3 1 8

4 4

2 3 2 3

Page 50: Data Structures Alan, Tam Siu Lung96397999 Tam@SiuLung.com99967891.

Heap

• Left Complete Binary Tree• 1 2 3 4 5 6 7 8 91011121314 • [8, 7, 6, 4, 5, 5, 5, 2, 3, 2, 3, 3, 1, 4]• [4, 7, 6, 4, 5, 5, 5, 2, 3, 2, 3, 3, 1] 8• [7, 4, 6, 4, 5, 5, 5, 2, 3, 2, 3, 3, 1] 8• [7, 5, 6, 4, 4, 5, 5, 2, 3, 2, 3, 3, 1] 8• [1, 5, 6, 4, 4, 5, 5, 2, 3, 2, 3, 3] 7, 8• [6, 5, 1, 4, 4, 5, 5, 2, 3, 2, 3, 3] 7, 8• [6, 5, 5, 4, 4, 1, 5, 2, 3, 2, 3, 3] 7, 8• [6, 5, 5, 4, 4, 3, 5, 2, 3, 2, 3, 1] 7, 8