Data Structures Alan, Tam Siu Lung96397999 [email protected].
-
Upload
kerry-short -
Category
Documents
-
view
214 -
download
0
Transcript of Data Structures Alan, Tam Siu Lung96397999 [email protected].
Prerequisite
• Familiarity with Pascal/C/C++• Asymptotic Complexity• Techniques learnt
– Recursion– Divide and Conquer– Exhaustion– Greedy– [Dynamic Programming exempted]
• Algorithms learnt– Bubble / Insertion / Selection / Shell / Merge / Quick /
Bucket / Radix Sorting– Linear / Binary / Interpolation Searching
What our Programming Language provides?
• Built-in Data Types– Character/String (length limit?)– Integral (signed/unsigned 8 [?], 16, 32, 64 [?] bit)– Floating Point (signed/unsigned 32, 64, 80 [?]
bit)– Fixed Point [?]– Complex [?]– Pointer/Reference– Function Pointer/Reference
What our Programming Language provides?
• Aggregate Data Types– Array [base-definable?]
• Multiple Values of same type
• Access by numeric index
– Record/Struct/Class• Multiple Values of different types
• Function Aggregation + Inheritance + Polymorphism [?]
– Unions [?]
What our Programming Language provides?
• Built-in Language Constructs– Branching (If, Else)– Loops (For, While, Until)– Function/Procedure Calling
• In C++’s view, statements and operators are functions as well
• a = b int &operator=(int &a, const int &b)• a > b bool operator>(const int &a, const int &b)• *a int &operator*(int *a)• a[b] string &operator[](string &a[], int b)
– Recursion– Even more for more sophisticated languages!
For most of the remaining time
• We concentrate at– Pointer– Array– Record– and how they interact
• We will use a C++-like notation– array<int> meaning an array of integer– int* is acronym of pointer<int>– Records are written as: struct<int, int, string>– Capital types are “variables” which means it can
be replaced by any types
Formal Definition: Pointer
• Concept:– pointer<Type> p; (Type *p) [^p in Pascal]
• Operations:– *p Type &operator*(Type *p) [p^ in Pascal]
• Returns the pointed value
• Error if p is null/nil
– &y Type *operator&(Type &p) [@y in Pascal]• Returns the address of a value
– p = x Type *operator=(Type *p, Type *x)• Pointer assignment
Formal Definition: Pointer
• More Operators– p < q bool operator<(Type *p, Type *q)
• Returns if pointer p is smaller
– ++p Type *operator++(Type *p) [inc(p) in Turbo Pascal]
• Point to next element (in an array)
– --p Type *operator--(Type *p) [dec(p) in Turbo Pascal]
• Point to previous element (in an array)
– p + n Type *operator+(Type *p, int n) [not in Turbo Pascal]
• Point to nth next element (in an array)
Programming Syntax: Pointer
int main() {
int a[10];
int *b = &a[1];
*b = 1;
b = new int(2);
delete b;
b = 0;
}
var
a : array[1..10] of integer;
b : ^integer;
begin
b = @a[2];
b^ = 1;
new(b);
b^ = 2;
dispose(b);
b = nil;
end.
Array
• Concept– array<Type, Size : int>– array<Type, Lower : int, Upper : int>
• Operations– Type &operator[](Type a[], int index)
• Requires 0 <= index < Size• Requires Lower <= index <= Upper
• Analysis– a[x] is equivalent to *(a + x)– which is equivalent to (Type *)(@a + x * sizeof(a))– It is sometimes slower than necessary!
Example: Prime Finding
• primes[] stores all primes found
primes[0] = 2;
for each i for each v in primes[]
if (v * v > i) then begin
primes.add(n);
break;
end;
if (i mod v = 0) then break;
Solution
#include <iostream>using namespace std;int main() { int primes[100], *last = primes; cout << (*last++ = 2) << endl; for (int i = 3; i < 100; ++i) { int *j = primes; do { if (*j * *j > i) { cout << (*last++ = i) << endl; break; } if (i % *j == 0) break; } while (++j < last); }}
var primes: array[1..100] of integer; i : integer; last, j: ^integer;begin last := @primes; last^ := 2; inc(last); for i := 3 to 100 do begin j := @primes; repeat if j^ * j^ > i then begin last^ := i; inc(last); writeln(i); break; end; if (i mod j^ = 0) break; inc(j); until j >= last; end;end.
Record
• Like Arrays
• Identified by names instead of index
• Each name is associated with a type
• Pair is a special record with 2 elements, Key and Value– Keys are unique (i.e. keys identify records)– Keys are comparable (i.e. sort-able) [sometimes]– Since Value can itself be a record, all records
with a unique portion can be represented as a pair)
Programming Syntax: Record
struct Point { double x, y;};struct Rect { Point tl, br; int color;};int main() { Rect rect; rect.color = 255; rect.tl.x = 0.0;}
type Point = record x, y : real; end; Rect = record tl, br : Point; color : integer; end;var rect : Rect;begin rect.color := 255; rect.tl.x := 0.0; with rect do begin color := 255; tl.x := 0.0; end;end.
Linked List
• Combining Pointer and Record• linkedlist<string>:
type
pNode = ^Node;
Node: record
value : string;
next : pNode;
end;
var
head: pNode;
Linked List
• Operations– void Add(linkedlist<Type> p, Type &v)
• Add an element to the Linked List
– Node *Search(linkedlist<Type> p, Type &v)• Returns null/nil if not found
– void InsertAfter(Node node, Type &v)• Insert an element after another
– void Remove(Node node)• How to implement?
• C++: x->y == (*x).y
Linked List Implementation
Node *list;void Add(int v) { Node *old = list; list = new Node(); list.next = old; list.value = v;}Node *Search(int v) { for (Node *p = list; p; p = p->next) if (p->value == v) return v; return 0;}Node *InsertAfter(Node *n, int v) { Node *old = n.next; n.next = new Node(); n.next.next = old; n.next.value = v;}
var list: pNode;procedure Add(v : integer);var old : pNode; old := list; new(list); list.next := old; list.value := v;}function Search(v : integer) : pNode;var n : pNode;begin n := list; while (n <> nil) and (n^.value <> v)
do p := p^.next; Search := n;end;{ InsertAfter is similar to Add }
Array Implementation
/ 1
K N V
1 2 /
2 3 /
3 4 /
4 5 /
5 / /
1 2
K N V
1 / 2
2 3 /
3 4 /
4 5 /
5 / /
Add 2
2 3
K N V
1 / 2
2 1 3
3 4 /
4 5 /
5 / /
Add 3
Array Implementation
2 3
K N V
1 / 2
2 1 3
3 4 /
4 5 /
5 / /
Remove 2
Remove 3
1 2
K N V
1 / 2
2 3 /
3 4 /
4 5 /
5 / /
2 1
K N V
1 3 /
2 / 3
3 4 /
4 5 /
5 / /
Abstraction
• Both of the implementations feature the same complexity– O(1) Addition– O(n) Searching– O(1) Insertion– O(1) Removal
• Sometimes we don’t care how it gets implemented– We only want a data structure which provides the operations we
want.• We define Abstract Data Types (ADTs) to mean a collection
of Data Structures providing certain operations– Plane– Polynomial– Graph
• We don’t even care how fast the operations in an ADT are, though practically we do
Dictionary (Map, Associative Array)
• Dictionary is unordered container of kv-pairs
• map<Key, Value>– void Insert(map<Key, Value> &c, Key &key,
Value &value)– int Size(map<Key, Value> &c)– Value &Search(map<Key, Value> &list, Key
&key)– void Delete(map<Key, Value> &list, Key &key)
List ADT
• List ADT is ordered container of kv-pairs• list<Key, Value>
– void Insert(list<Key, Value> &c, int pos, Type &value)– Type &Find-ith(list<Key, Value> &c, int pos)– void Delete-ith(list<Key, Value> &c, int pos)– int Size(list<Key, Value>)– Type &Search(list<Key, Value> &c, Key &key)– void Delete(list<Key, Value> &c, Key &key)– …
• A List can be implemented by array (Vector/Table), linked list (LinkedList), etc
• A List is also a Dictionary
Time Complexity
Average Case Add Remove Search
Array O(1) O(n) O(n)
Sorted Array O(n) O(n) O(lg n)
Linked List O(1) O(n) O(n)
• We seldom remove anyway• There is no way to make both Add/Search fast• In general, it is difficult if we do not depend on
features of the Key
Direct Addressing Implementation
0 Ant
5 Boy
99 Car
• Use the Vector ADT• The key is the location• Efficient: O(1) for all
operations• Infeasible: if the key can range
from 1 to 20000000000, if the key is not numeric ...
Hash Function
• Hash Function: hm(k)
• Map all keys “by calculation” into an integer domain, e.g. 0 to m ─ 1
• E.g. CRC32 hashes strings into 32-bit integer (i.e. m = 232)– Alan: 1598313570– Max: 3452409927– Man: 943766770– On: 2246271074
Hash Table Implementation
• Use a Table<int, Value> ADT of size m• Use hm(Key) as the key• All operations can be done like using Table• Solved except
– Collision: What to do if two different k have same h(k)– How to find a suitable hash function
• If good hash functions are used, hash tables provide near O(1) insertion, searching and removal– But it is difficult to get it right– And it is not easy to code– C++: hash_map<Key, Value, hash_func>
• Read 2003 Advanced Notes on Hash Table if you are motivated enough
Binary Search Tree Implementation
• Sorted Array is fast for searching– But it is slow when inserted at front
• Idea– Store separate arrays– If value < v, insert to left array– If value >= v, insert to right array
• Now we have a Data Structure which is– Worst Case N / 2 + 1 insertion (N in the past)– lg(N) + 1 searching
v
Binary Search Tree Implementation
• Now we have a Data Structure which is– N / 2 + 1 insertion (N in the past)– lg(N) + 1 searching
• If we store “N / 2” elements in this DS– N / 4 + 1 insertion– lg(N) searching
• If both of left and right arrays use this DS [Recursion]– N / 4 + 2 insertion– lg(N) + 1 searching
• Continue this process lg(N) times– lg(N) + 2 insertion– lg(N) + 1 searching– How will it look like?
Binary Search Tree Implementation
struct Node {Node *left, right;int *value;
};
typepNode = ^Node;Node = record
left, right : ^Node;value : int;
end;
6
3
1
8
4 97
7.5
Introduction to Tree
• Node
• Root
• Leaf / Internal
• Parent / Children
• [Proper] Ancestors / Descendants
• Siblings
Binary Search Tree Implementation
• Operations– Searching
• If target < current, go to left• If target > current, go to right
– Insertion• Search• Insert it there
– Removal• If it is leaf, just remove it.• Otherwise, the smallest one larger than it is leaf.
Replace!• Worst Case
– If input is sorted, the tree will become …– What can we do?– C++: map<Key, Value, comparator>
Recess
Have a break!
Stack ADT
• Something your compiler has implemented for you.
void pow(int x, int n) {
if (n == 0) return 1;
int v = pow(x, n / 2);
if (n % 2 == 0) return v * v;
return x * v * v;
}
• pow(3, 5)→pow(3, 2)→pow(3, 1)→pow(3, 0)
Stack ADT
• But– It mandates what to be put in stack– It couples control flow with data flow
• So we will still implement our own stack
• Last-in-first-out– When do we need this behavior?
• Array?– Fast, but fixed size– C++: stack<Type>
Array Implementation of Stack
int stack[100];
int top = 0;
void push(int v) {
stack[top++] = v;
}
int pop() {
return stack[--top];
}
var
stack : array[1..100] of integer;
top : integer;
procedure push(v : integer);
begin
inc(top);
stack[top] := v;
end;
function pop : integer;
begin
pop := stack[top];
dec(top);
end;
Queue ADT
• First-in-first-out– When do we need this behavior?– Major use is Breadth First Search in Graph
• Array?– Fast, but fixed size– Circular?– C++: queue<Type>
Array Implementation of Queue
int queue[100];int head = 0, tail = 0;
void enqueue(int v) { queue[tail++] = v;}
int dequeue() { return queue[head++];}
var queue : array[1..100] of integer; head, tail : integer;procedure enqueue(v : integer);begin inc(tail); stack[tail] := v;end;function dequeue : integer;begin inc(head); pop := stack[head];end;
Priority Queue ADT
• PriorityQueue<Priority, Value>– void Push(Priority &p, Value& v)
• Add an element
– Value &Top()• Returns the element with maximum priority
– void Pop()• Remove the element with maximum priority
• Again both Array and Linked List can do it suboptimally. A maximum heap can finish Push and Pop in O(lg n) and Top in O(1).
• C++: priority_queue<Type, comparator>
Heap
• In an array with N elements– We can obtain maximum value of an array in O(1) time
if every Add() updates this value.
– But removal of it destroys all knowledge and requires N – 1 operations to recalculate.
• If we have 2 arrays of N / 2 elements– We only need N / 2 time because only the array with
maximum extracted is recalculated.
3 1 5 7 8 5 4
8
2 6 3 4 2 5 3
6
3 1 5 5 4
6
2 3 4 2 5 3
7
2 7 3 4 2 5 3 3 1 5 6 5 4
8
3 1
54
5
3 1 4
2 3
4
2 3
5
2 3 2 3
Heap
Heap
67
8
5 5
3 1 4
4 5
2 3 2 3
Heap
67
5 5
3 1 4
4 5
2 3 2 3
Heap
6
7
5 5
3 1 4
4 5
2 3 2 3
Heap
65
7
5 5
3 1 4
4
2 3 2 3
Heap
65
7
5 5
3 1 4
4 3
2 3 2
Heap
67
8
5 5
3 1 4
4 5
2 3 2 3
Heap
67
4
5 5
3 1 8
4 5
2 3 2 3
Heap
64
7
5 5
3 1 8
4 5
2 3 2 3
Heap
65
7
5 5
3 1 8
4 4
2 3 2 3
Heap
• Left Complete Binary Tree• 1 2 3 4 5 6 7 8 91011121314 • [8, 7, 6, 4, 5, 5, 5, 2, 3, 2, 3, 3, 1, 4]• [4, 7, 6, 4, 5, 5, 5, 2, 3, 2, 3, 3, 1] 8• [7, 4, 6, 4, 5, 5, 5, 2, 3, 2, 3, 3, 1] 8• [7, 5, 6, 4, 4, 5, 5, 2, 3, 2, 3, 3, 1] 8• [1, 5, 6, 4, 4, 5, 5, 2, 3, 2, 3, 3] 7, 8• [6, 5, 1, 4, 4, 5, 5, 2, 3, 2, 3, 3] 7, 8• [6, 5, 5, 4, 4, 1, 5, 2, 3, 2, 3, 3] 7, 8• [6, 5, 5, 4, 4, 3, 5, 2, 3, 2, 3, 1] 7, 8