Data structures

52
Data Structures Placement Lectures 2012 Pranav Gupta

description

this slide some of the important data structures like graphs, trie, suffix trees, hash tables etc

Transcript of Data structures

Page 1: Data structures

Data StructuresPlacement Lectures 2012

Pranav Gupta

Page 2: Data structures

Why we need Data Structures?

• Efficient and Intuitive representation of data• Tree using arrays vs tree using pointers

• To solve real life problems efficiently• Insertion• Deletion• Search• Sort

• Applications• Social networks• Employee hierarchy• Recommended items

Page 3: Data structures

Basic Operations

1. traverse2. insert3. delete4. find

Page 4: Data structures

Data Structures (Basic)

• Arrays• Linked Lists• Stacks• Queues• Recursion• Trees – Basic• Practice Problems

Page 5: Data structures

Arrays

Page 6: Data structures

• Contiguous and fixed memory allocation (independent of language)

• Random access and modification

• List of (index, value); index is non-negative integer; all values in a given array are of the same data type

• To hold various types of values or have non-numerical indices, use associative arrays/dictionaries – The Dictionary Problem?

Page 7: Data structures

• Arrays may also be:• 2-D : array of 1-D arrays (a 1-D array is a data type in itself)• 3-D : array of 2-D arrays (a 2-D array is a data type in itself)

• Memory placement of multi-dimensional arrays1.row-major2.column-major

• Useful Operationa. Modifyb. Accessc. Swapd. In-place reverse

Page 8: Data structures

Structure of an Arraytemplate<class T> class Array{int size;T *arr;void put();void get();…….};

Useful Libraries#include <vector>

Page 9: Data structures

Irregular Arrays• Languages known to students at IITG1.2-D Array

2.Irregular Array

Student

Languages

Student

Page 10: Data structures

Special (Arrays ??)

• Diagonal matrix, upper/lower triangular matrix, trigonal matrix, symmetric/asymmetric matrices

• Generally deal with 2-D matrices, but 3-D or higher cases are also possible. Generally deal with square matrix, but rectangular (non-square) are also possible

• More like functions

Page 11: Data structures

Special (Arrays ??)

int spec_matrix(int i, int j){return no_cols*i + j + 1;

}

• Performance ??

Page 12: Data structures

One Dimensional Sparse Array

4 17 7 23 8 14ary

0 0 0 0 17 0 0 23 14 0 0 0

0 1 2 3 4 5 6 7 8 9 10 11ary

Page 13: Data structures

Two Dimensional Sparse Array

8

12

33

17

0 1 2 3 4 50

1

2

3

4

5

5 120

1

2

3

4

5

1 8 5 33

3 17

Row elements can be accessed efficiently

Page 14: Data structures

Two Dimensional Sparse Array

8

12

33

17

0 1 2 3 4 50

1

2

3

4

5

5 120

1

2

3

4

5

1 8 5 33

3 17

0 1 2 3 4 5

0

33

4

rows

cols

Efficient row and column elements access

Page 15: Data structures

Efficient Representation

8

12

3317

0 1 2 3 4 50

1

2

3

4

5

5 12

1 8 5 33

3 17

5

0

33

4

rows

cols

0

3

4

31

Page 16: Data structures

Linked Lists

Page 17: Data structures

Why?

• To store heterogeneous data• To store sparse data• Flexibility of increase/decrease in size; easy insertion and

deletion of elements

• Useful Operations• insertion• deletion

Page 18: Data structures

Logical Arrangement

First element

Second element

Third element Null

Head nodeFinal node

Tail node

Address of second node

Address of third node

Address of final node

Page 19: Data structures

The Structuretemplate <class T> class node{

T data;node<T> *next; // Extra (4?) bytes; size of a pointer

};

template <class T> LinkedList{node<T> *head;int size; // …..etc etc etc

};

Useful Libraries#include <list>

Page 20: Data structures

View of the Memory

*Struct is stored in contiguous memory

Page 21: Data structures

Insertion/Deletion

Time Complexity:Insertion : O(1) / O(n)Deletion : O(1) / O(n)

Space Complexity:Insertion : O(1)Deletion : O(1)

Page 22: Data structures

Tweak some more !

• Doubly Linked Lists• Extra (4?) bytes space vs better accessibility• Insertion/deletion ?

• Circular Linked Lists• How to find the end?

• Tail pointer• Null ‘next’ pointer from last node• Last node points to first (circular)

Page 23: Data structures

Practice (Linked List)Linked List 1)Linked List 2)

Page 24: Data structures

Recursion

Page 25: Data structures

• To solve a task using that task itself• ; a task should have recursive nature• ; generally can be transformed by tweaking some parts of the

task

• Example: task of piling up n coins vs picking up a suitcase.

• Let the task be a C function. What are the parts of the task:1.Input it takes2.What it does3.Output it gives

Page 26: Data structures

• A task is performed recursively when generally a large input can’t be handled directly.

• So, recursion is all about simplifying the input at every step till it becomes trivial (base case)

Page 27: Data structures

Implementation – run time stack

• Activation Records (AR)• Store the state of a method

1.input parameters2.return values3.local variables4.return addresses

Page 28: Data structures

2

25.6

(136)?

2

...…y…

2

25.6

(136)?

2

...…y…

2

15.6

(105)?

2

25.6

(136)?

2

...…y…

2

15.6

(105)?

2

05.6

(105)?

2

25.6

(136)?

2

...…y…

2

15.6

(105)?

2

05.6

(105)1.0

2

25.6

(136)?

2

...…y…

2

15.6

(105)5.6

2

25.6

(136)31.36

2

...…y…

power(5.6, 2)

power(5.6, 1)power(5.6, 2)

power(5.6, 0)power(5.6, 1)power(5.6, 2)

power(5.6, 1)power(5.6, 2)

power(5.6, 2)

Page 29: Data structures

• AR is formed on run-time stack and is private to a method.• run-time stack is 1 only.

Stack pointer

Stack pointer

Stack pointer

Stack pointer

Page 30: Data structures

Advantages/Disadvantages

1.more readable/understandable/consistent with the the definition

2.memory requirements increase due to runtime stack3.difficult to open and debug

Page 31: Data structures

Types of Recursion

• Tail (vs loop?)int factn;While (n > 0) factn *= n--;

• Indirect• A() -> B() -> C() -> A()

• Nested:• h(n) = h(2 + h(n-1))

Page 32: Data structures

Types of Recursion

• Excessive: exponential time complexity!

• Questionable: will it terminate??

2)2()1(

11

00)(

nnFibnFibnif

nifnFib

otherwisenf

evenisnifnf

nif

nf

)1*3(

)2/(

11

)(

Page 33: Data structures

Hashes

Page 34: Data structures

Why?

• Want to store dictionaries?, associative arrays?• arrays with non-numerical indices

• String operations made easy• Ex: Finding anagrams• Ex: Counting frequency of words in a string

Page 35: Data structures

Associative Arrays• (key, value) pairs where key is not necessarily a non-negative

integer; can be string etc.

• Ex: no. of students in each department• “cse” => 68• “eee” => 120• “mech” => 70• “biotech” => 30

• Do not allow duplicate keys• Dict (“cse”) = “data structures”• Dict (“cse”) = “algorithms”

Dict(“cse”) = {“data structures”, “algorithms”}

Page 36: Data structures

Hash Functions1.HashTable : an array of fixed size

• TableSize - preferably prime and large2.Hash function (map to an index of the HashTable)Techniques

• use all characters• use aggregate properties - length, frequencies of characters• first 3 characters, odd characters

Evaluation• Uniform distribution; load factor λ?• Utilize table space• Quickly computable

Page 37: Data structures

3. Collision resolution1.separate chaining

• Linked list at each index• Insertion (at head?)• Desired length of a chain : close to λ• Avg. time for Successful search = 1 + 1 + λ/2• Disadvantages

• slow?• different data structures - array/linked lists?

Page 38: Data structures

1.open addressing• Single table• Desired λ ~ 0.5• Apply h0(x), h1(x), h2(x) …

• hi(x) = h(x) + f(i); f(0) = 03 ways to do it

1.linear probing : f(i) is linear in i• f(i) = i (quickly computable vs primary clustering?)

2.quadratic probing : f(i) is quadratic in i3.double hashing

• H(x) = h(x) + f(i).h2(x)Rehashing

• What if the table gets full (70%, …. , 100%)• Create a new HashTable double? the size

Page 39: Data structures

Structure

template<class T> class Hash{int TableSize;T *arr;

};

Useful Libraries#include <hash>

Page 40: Data structures

Practice (Hashes)Trie 7)

Page 41: Data structures

Graphs

Page 42: Data structures

What is it?

In simple words, G = (V, E)V = (v0, v1, v2, v3, .. vn) is the set of nodesE = (e0, e1, e2, e3 .. em) is the set of edges

*Any tree T = (V, E) as well; so most techniques in graph algorithms apply to trees as well.

v0

v3v2

v1

Page 43: Data structures

Representation1.Adjacency Matrix (|V| * |V|)

2.Adjacency List

Page 44: Data structures

Breadth First Traversal (BFT)

• Traverse the nodes depth-wise; nodes at depth 0 before nodes at depth 1 before nodes at depth 2 ....

• Done using a queue• Ex: 1,2,3,4,5,7,8,6

Page 45: Data structures

Depth First Traversal (DFT)

• Move to next child only after all nodes in the current child are marked

• Done using a stack• Ex: a, b, c, d, e, h, f, g

Page 46: Data structures

Trees (Advanced)

Page 47: Data structures

Retrieval

• Stores the prefixes of a set of strings in an efficient manner• Used to store associative arrays/dictionaries

Page 48: Data structures

How to create a Trie

• Ex: tin, ten, ted, tea, to, i, in, inn

Page 49: Data structures

Pairs of anagrams

• Sort all the strings• acute -> acetu• obtuse -> beostu … etc

• Insert them into the trie• Keep storing collisions i.e. multiple values for each key• Each set of values gives groups of anagrams

Page 50: Data structures

Suffix Tree/Patricia/Radix Tree

• Stores the suffixes of a string• O(n) space and time to build• Does not exist for all strings; add special symbol $ at the end

Page 51: Data structures

Advantages of Suffix Trees

• Store n suffixes in O(n) space.• Improved string operations. Eg. substring lookup, Longest

common substring operation (generalized suffix trees?)

Generalized Suffix Trees• Each string terminated by a different special symbol• More space efficient• Have different set of algorithms

Page 52: Data structures

Longest Common Substring

Longest Common Substring1.Make a “generalized suffix tree” for the (2?) strings2.Traverse the tree to mark all internal nodes as 1, 2 or (1,2)

depending on whether it is parent to a leaf node terminating with the special symbol of string 1 and string 2.

3.Find the deepest internal node marked 1,2

Pattern Matching ?