Algorithms Session 7 LBSC 790 / INFM 718B Building the Human-Computer Interface.

Algorithms

Session 7

LBSC 790 / INFM 718B

Building the Human-Computer Interface

Agenda: Weeks 7 and 8

• Questions

• Some useful algorithms

• Project

• Some useful data structures– Including Java implementations

Why Study Algorithms?

• Some generic problems come up repeatedly– Sorting– Searching– Graph traversal

• Need a way to compare alternative solutions• Reusing algorithms is easy and productive

– Focusing on the algorithm reveals the key ideas– Language and interface make reusing code hard

Sorting

• Given an array, put the elements in order– Numerical or lexicographic

• Desirable characteristics– Fast– In place (don’t need a second array)– Able to handle any values for the elements– Easy to understand

Insertion Sort

• Simple, able to handle any data

• Grow a sorted array from the beginning– Create an empty array of the proper size– Pick the elements one at a time in any order– Put them in the new array in sorted order

• If the element is not last, make room for it

– Repeat until done

• Can be done in place if well designed

Insertion Sort

90 11 27 37111631 4

Insertion Sort

90 11 27 37111631 4

90

Insertion Sort

90 11 27 37111631 4

9011

Insertion Sort

90 11 27

90

37111631 4

2711

Insertion Sort

90 11 27

31 90

37111631 4

2711

Insertion Sort

90 11 27

27 31 90

37111631 4

114

Insertion Sort

90 11 27

16 27 31 90

37111631 4

114

Insertion Sort

90 11 27

11 16 27 31 90

37111631 4

114

Insertion Sort

90 11 27

11 16 27 31 37 90

37111631 4

114

Insertion Sort

• Sorting can actually be done in place– Never need the same element in both arrays

• Every insertion can cause lots of copying– If there are N elements, need to do N insertions– Worst case is about N/2 copys per insertion– N elements can take nearly N operations to sort

• But each operation is very fast– So this is fine if N is small (20 or so)

2

Merge Sort

• Fast, able to handle any data– But can’t be done in place

• View the array as a set of small sorted arrays– Initially only the 1-element “arrays” are sorted

• Merge pairs of sorted arrays– Repeatedly choose the smallest element in each

– This produces sorted arrays that are twice as long

• Repeat until only one array remains

Merge Sort

90 11 27 37111631 4

9011

Merge Sort

90 11 27

27 31

37111631 4

9011

Merge Sort

90 11 27

27 31 4 16

37111631 4

9011

Merge Sort

90 11 27

27 31 4 16 11 37

37111631 4

9011

Merge Sort

11 27 31

27 31 4 16 11 37

90

9011

Merge Sort

11 27 31

27 31 4 16 11 37

37161190 4

9011

Merge Sort

11 27 31

11 16 27 31 37 90

37161190 4

114

Merge Sort

• Each array size requires N steps– But 8 elements requires only 3 array sizes

• In general, 2 elements requires k array sizes– So the complexity is N*log(N)

• No faster sort (based on comparisons) exists– Faster sorts require assumptions about the data

– There are other N*log(N) sorts, though• Merge sort is most often used for large disk files

k

Computational Complexity

• Run time typically depends on:– How long things take to set up– How many operations there are in each step– How many steps there are

• Insertion sort can be faster than merge sort– One array, one operation per step– But N*log(N) eventually beats N for large N

• And once it does, the advantage increases rapidly

2

Divide and Conquer

• Split a problem into simpler subproblems– Keep doing that until trivial subproblems result

• Solve the trivial subproblems

• Combine the results to solve a larger problem– Keep doing that until the full problem is solved

• Merge sort illustrates divide and conquer– But it is a general strategy that is often helpful

Recursion

• Divide and conquer problems are recursive– Solve the same problem at increasing granularity

• Construct a Java method to solve the problem– Divide the problem into subproblems– Call the same method to solve each subproblem

• Unless the subproblems are trivial

– Use the parameters to control the granularity

• See this week’s notes page for merge sort example

A Recipe for Recursion

• First, craft a divide and conquer strategy

• Create a non-recursive top-level method– Calls recursive method with initial parameters

• In the recursive method:– First solve the problem if it is trivial and return

• Be sure you eventually get here!

– Otherwise, split the problem and call itself– Combine the results and return them

Search

• Find something by following links– Web pages– Connections in the flight finder– Winning moves in chess

• This may seem like an easy problem– But computational complexity can get really bad– Simple tricks can help in some cases

Web Crawlers

• Goal is to find everything on the web– Build a balanced tree, sorted by search terms

• Start anywhere, follow every link

• If every page has 1 kB of text and 10 links– Then 10 levels would find a terabyte of data!

• Avoid links that are likely to be uninteresting

• Detect duplicates quickly with hashing

Outside-In Search

• Explore multiple paths between two points– Usually trying to find the best by some measure

• Flight finder searches like a web crawler– Every possible continuation of every route

• Also search backward from the destination• Assuming 10 departures per airfield:

– 3 connections takes Flight finder 10,000 steps

– 1 connection twice would take 200 steps

How to Win at Chess

• The paths are the legal moves– And the “places” are possible board positions

• You are seeking to make things better– Your opponent seeks to make things worse

• Such “zero sum games” are common– Although many lack chess’ shared information

• Any problem structure makes search easier– The trick is to exploit constraints effectively

Minimax Searching

• Decide how many half-moves to look ahead

• Develop a scoring strategy for final positions– Based on piece count and positional factors

• Follow a promising path– Helpful to guess the best moves for each side

• With several moves available, pick the best– But stop any search that can’t improve things

Minimax Search ExampleMin

Max

0 5

0

-3

-3

0

3 5

3

3

0

Traveling Salesperson Problem

• Find the cheapest way of visiting 10 cities– Given the airfare between every city pair– Only visit each city once, finish where you start

• There are only 90 city pairs– But there are a LOT of possible tours

• The best known algorithm is VERY slow– Because the problem is “NP complete”

NP Complete Problems

• No “polynomial time” algorithm is known– Not N , N , N …

• Haven’t proved that none exists– But if it does, many hard problems would be easy

• Approximate solutions with heuristic methods– Greedy methods– Genetic algorithms– Simulated annealing

2 3 4

What’s Wrong With Arrays?

• Must specify maximum size when declared– And the maximum possible size is always used

• Can only index with integers– For efficiency they must be densely packed

• Adding new elements is costly– If the elements are stored in order

• Every element must be the same type

What’s Good About Arrays?

• Can get any element quickly– If you know what position it is in

• Natural data structure to use with a loop– Do the same thing to different data

• Efficiently uses memory– If the array is densely packed

• Naturally encodes an order among elements

Linked Lists

• A way of making variable length arrays– In which insertions and deletions are easy

• Very easy to do in Java

• But nothing comes for free– Finding an element can be slow– Extra space is needed for the links– There is more complexity to keep track of

Making a Linked List

• In Java, all objects are accessed by reference– Object variables store the location of the object

• New instances must be explicitly constructed

• Add reference to next element in each object– Handy to also have a reference to the prior one

• Keep a reference to the first object– And then walk down the list using a loop

Linked List Example

Jill Joe Tom

first

Public static main (String[] argv) {Student first;

…}

Public class Student {int String name;public Student next;

}

Linked List Operations

• Add an element– Easy to put it in sorted order

• Examine every element– Just as fast as using an array

• Find just one element– May be as slow as examining every element

• Delete an element after you find it– Fast if you keep both next and prior links

Linked List Insertion

Jill Joe Tom

first

public void insert(String newName) {Student temp = first;boolean done = false;while (!done) {

if ((temp.next==null) || (temp.next.name.follows(newName))){

Student new = new Student(name, temp.next);

temp.next=new;done = true;

}temp = temp.next;

}}

Trees

• Linked list with multiple next elements– Just as easy to create as linked lists– Binary trees are useful for relationships like “<“

• Insertions and deletions are easy

• Useful for fast searching of large collections– But only if the tree is balanced

• Efficiently balancing trees is complex, but possible

Binary Tree Example

Jill

Joe

Tom

root Public class Student {int String name;public Student left;public Student right;

}

Data Structures in Java• Resizable array [O(n) insertion, O(1) access]:

– ArrayList

• Linked list [O(1) insertion, O(n) access, sorted]: – LinkedList

• Hash table [object index, unsorted, O(1)]: – HashSet (key only)– HashMap (key+value)

• Balanced Tree [object index, sorted, O(log n)]:– TreeSet (key only)– TreeMap (key+value)

Hashtables

• Find an element nearly as fast as in an array– With easy insertion and deletion– But without the ability to keep things in order

• Fairly complex to implement– But Java defines a class to make it simple

• Helpful to understand how it works– “One size fits all” approaches are inefficient

How Hashtables Work

• Create an array with enough room– It helps a lot to guess the right size first

• Choose a variable in each object as the key– But it doesn’t need to be an integer

• Choose a spot in the array for each key value– Using a fast mathematical function

– Best if things are scattered around well

• Choose how to handle multiple keys at one spot

Java HashMap Class

• Hashtables are objects like any other– import java.util.*– Must be declared and instantiated

• The constructor optionally takes a size parameter

• put(key, value) adds an element

• containsKey(key) checks for an element

• get(key) returns the “value” object for that key

Stacks

• Maintain an implicit order– Last In-First Out (LIFO)

• Easy additions and deletions– push and pop operations

• Maps naturally to certain problems– Interrupt a task to compute an intermediate value

• Implemented as a Java class– import java.util.*

Choosing a Data Structure

• What operations do you need to perform?– Reading every element is typically easy– Other things depend on the representation

• Hashing finds single elements quickly– But cannot preserve order

• Stacks and linked lists preserve order easily– But they can only read one element at any time

• Balanced trees are best when you need both

• Which operations dominate the processing?

Algorithms Session 7 LBSC 790 / INFM 718B Building the Human-Computer Interface.

Documents

Transcript of Algorithms Session 7 LBSC 790 / INFM 718B Building the Human-Computer Interface.