CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail:...

23
CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: [email protected] Course Web site: http://ravi.cs.sonoma.edu/cs315sp08
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail:...

Page 1: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

CS 315 Data Structures

B. Ravikumar Office: 116 I Darwin Hall

Phone: 664 3335E-mail: [email protected]

Course Web site: http://ravi.cs.sonoma.edu/cs315sp08

Page 2: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

Textbook for the course:

Data Structures and Algorithm Analysis in C++ by Mark Allen Weiss

Lab: T 1 to 2:50 PM, Darwin Hall # 28

Page 3: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

Course Goals • Learn to use fundamental data structures:

• arrays, linked lists, stacks and queues• hash table• priority queue• binary trees• others

• Enhance your skill in programming in c++• recursion, classes, algorithm implementation• build projects using different data structures

• Analytical and experimental analysis• quantitative reasoning about the performance of algorithms• comparing different data structures

Page 4: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

Goals for today’s lecture

• Course outline

• Discuss Course work• lab assignments• projects• tests, final exam

• If any time left, we will start discussion of recursion. (will continue it in the lab).

Page 5: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

Data Structures – key to software design

• Data structures play a key role in every type of software.

•Data structure deals with how to store the data internally while solving a problem in order to

•Optimize the overall running time of a program•Optimize the response time (for queries)•Optimize the memory requirements•Optimize other resources (e.g. network)• Simplify software design• make solution extendible, more robust

Page 6: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

Abstract vs. concrete data structures

Abstract data structure (sometimes called ADT) is a collection of data with a set of operations supported to manipulate the structure

Examples:• stack, queue• priority queue• Dictionary

Concrete data structures are the implementations of abstract data structures: • Arrays, linked lists, trees, heaps, hash table

Main emphasis of the course: Find the best mapping between abstract and concrete data structures.

Page 7: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

Abstract Data Structure (ADT)container supporting operations

•Dictionary•search •insert primary operations•Delete

•deleteMin•Range search•Successor secondary operations•Merge

•Priority queue•Insert•deleteMin •Merge, split etc. Secondary operations

primary operations

Page 8: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

Linear data structures

• key properties of the (1-d) array:

• a sequence of items are stored in consecutive memory locations. • array provides a constant time access to k-th element for any k. (access the element by: Element[k].)• inserting at the end is easy. if the current size is S, then we can add x at the end using the single instruction: Element[S++] = x;• deleting at the end is also easy.• inserting or deleting at any other position is expensive. • Even searching is expensive (unless sorted).

Page 9: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

Images are stored in 2-d arrays

We will work on 2-d arrays by manipulating images:• Each pixel is represented by a blue value, a red value and a

green value (any color is a combination of these colors). (255, 255, 255) represents white, (255, 0, 0) represents red etc.• pic(i , j)-> Blue represents the blue component of the i-th

row, j-th column pixel of pic and so on.• Some basic operations on images:

• open, read, write• rotate, mirror image • filter (remove blemishes)• extract features (identify where buildings are in an

aerial photograph)

Page 10: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

Linked lists

Linked lists:• Storing a sequence of items in non-

consecutive locations of the memory.• Not easy to search for a key (even if

sorted).• Inserting next to a given item is easy.• In doubly linked list, inserting before or

after a given item is easy.• Don’t need to know the number of items

in advance. (dynamic memory)

order is important

Page 11: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

stacks and queues

• stacks:• insert and delete at the same end.• equivalently, last element inserted will be the first one to be deleted.• very useful to solve many problems

•Processing arithmetic expressions

• queues:• insert at one end, deletion at the other end.• equivalently, first element inserted is the first one to be deleted.

Page 12: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

Non-linear data structures

Various versions of trees• Binary search trees• Height-balanced trees etc.

Lptr key Rptr

15

Main purpose of a binary search tree supportsdictionary operations efficiently

Page 13: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

Priority queue Max priority key is the one that gets deleted

next.• Equivalently, support for the following

operations:• insert• deleteMin

Useful in solving many problems• fast sorting (heap-sorting)• shortest-path, minimum spanning tree,

scheduling etc.

Page 14: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

Hashing • Supports dictionary operations very efficiently (most of the time).

• Main advantages:•Simple to design, implement• on average very fast• not good in the worst-case.

Page 15: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

What data structure to use?

Example 1: There are more than 1 billion web pages. When you type on google something like:

You get instantaneous response. What kind of data structure is used here?

• The details are quite complicated, but the main data structure used is quite simple.

Page 16: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

Data structure used - inverted index

Array of lists – each array entry contains a word and a pointer to all the web pages that contain that word:

Data structure38 97 145 297

This list is kept sorted

876

Question: How do we access the array index from key word?Hashing or some other clever data structure is needed.

Page 17: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

Example 2: The entire landscape of the world is being digitized (there is a whole new branch that combines information technology and geography called GIS – Geographic Information System). What kind of data structure should be used to store all this information?

SnapshotfromGoogleearth

Page 18: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

Some general issues related to GIS

• How much memory do we need? Can this be stored in one computer? (or need a distributed data base?)

Building the database is done in the background (off-line processing)

• How fast can the queries be answered?

Response to query is called the on-line processing

• Suppose each square mile is represented by a 1024 by 1024 pixel image, how much storage do we need to store the terrain of United States?

Page 19: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

Calculate the memory needed

Very rough estimate of the memory needed:

•Area of USA is 4 x 106 sq miles (roughly)

•Each square mile needs 106 pixels (roughly)

•Each pixel requires 32 bits usually.

Thus the total memory needed = 4 x 106 x 32 x 106 = 168 x 1012 = 168000 Giga bits

(A standard desk top has ~ 200 Giga bits of memory.)

Need about 800 such computers to store the data

Page 20: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

What data structure to store the images?

• each 1024 x 1024 image can be stored in a two-dimensional array. (standard way to store all kinds of images – bmp, jpg, png etc.) The actual images are stored in a secondary memory (hard disks on several servers either in a central location or distributed).

• The number of images would be roughly 4 x 106. A set of pointers to these images can be stored in a 1 (or 2) dimensional array.

• When you click on a point on the map, its index in the array is calculated.

• From that index, the image is accessed and sent by a network to the requesting client.

Page 21: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

Overview of the projects:

1) Generate all the poker hands More generally, given a set of N items and a

number k<= N, generate all possible combinations or permutations of k items.

(concept: recursion, arrays, lists)

2) Image manipulation: (concept: arrays, library, analysis of algorithm)

1) After filtering

Page 22: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

3) Spelling checker: Given a text file T, identify all the misspelled words in T.

Idea: build a hash table H of all the words in a dictionary, and search for each word of the text T in the table H.

(concept: hashing, string processing)

4) Scheduling problem: A set of tasks should be assigned to several identical machines. What is the maximum number of jobs that can be assigned?

Concept: priority queue (heap)

Page 23: CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: ravi93@gmail.edu Course Web site: .

5) Geometric computation problem – given a set of rectangles, determine the total area covered by them. Also report all the intersections between them.

Concept: binary search tree.