COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1...

36
COMP3600/6466 – Algorithms Introduction [CLRS ch. 1, sec. 2.1, 2.2] Hanna Kurniawati https://cs.anu.edu.au/courses/comp3600/ [email protected]

Transcript of COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1...

Page 1: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

COMP3600/6466 – Algorithms Introduction

[CLRS ch. 1, sec. 2.1, 2.2]

Hanna Kurniawati

https://cs.anu.edu.au/courses/comp3600/[email protected]

Page 2: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Today•What is Algorithms?• Setting up the stage• The Problems we’ll focus on:• Sorting• Searching• Model of Computation• Math refresher

Page 3: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

What is Algorithms?• An algorithm is a well-defined

computational procedure (i.e., computational steps) that transforms inputto output for solving a well-specifiedcomputational problem•Well-defined: Can specify when & where

the procedure works and when it is not• Correctness• Space & time requirements

Page 4: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

What is this class about?

The basics of Algorithms Design & Analysis

• Know and understand various algorithm design techniques

• Develop the basic skills to design effective & efficient (in time & space requirements) algorithms

• How to predict the time & space an algorithm required given various input size?

• Is an algorithm correct? (in this class, only briefly)

Page 5: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Why bother? • Computer capability improves fast (Moore’s Law:

Doubled every 18 month), why don’t we just wait?

Page 6: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

An Example: Breadth First Search depth # Nodes Time Memory

2 111 .11 msec 107 Kbytes4 11,111 11 msec 10.6 Mbytes6 ~106 1 sec 1 Gbytes8 ~108 ~2 min 103 Gbytes10 ~1010 ~2.8 hours 10 Tbyte12 ~1012 ~11.6 days 1 Pbytes14 ~1014 ~3.2 years 99 Pbytes

Assume: Branching factor = 10, can visit 1 million nodes/sec, use 1 Kbytes/node

Taken from: Russell & Norvig, Artificial Intelligence: A Modern Approach

Page 7: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Why bother? • Computer capability improves fast (Moore’s Law:

Doubled every 18 month), why don’t we just wait?• The waiting time for the hardware may be more than the

waiting time to solve the problem…• Just parallelize, right? • Parallelization provides increased computation linearly, while

many problem complexity increases exponentially• Note: Of course, the above does not mean that hardware

advances do NOT matter. Algorithms + its software implementation and hardware must go hand in hand. But, in this class, we will focus on the algorithms.

Page 8: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Why bother?•We are good at utilizing available

resources and more

Velodyne LiDAR – generates 8GB data/sec

Page 9: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

TodayüWhat is Algorithms?• Setting up the stage• The Problems we’ll focus on:• Sorting• Searching• Model of Computation• Math refresher

Page 10: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

The Problems• Recall: An algorithm is a well-defined

procedure to solve well-specified problems• In this class, we will focus only on 2

classes of problems:• Sorting • SearchingThese problems are building blocks of almost any program you see in practice today

Page 11: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Sorting – Problem Specification• Input: A sequence of n numbers [a1, a2, …, an]• Output: A reordering [ a1’, a2’, …, an’] of the

input sequence such that a1 ≤ a2 ≤… ≤ an (for ascending order)

Page 12: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Where is sorting used?• Often sorting is used as a building block of a bigger

program. For instance:• Searching. Binary search can run faster once the keys are

sorted. Internet search (e.g., Google) also uses sort to present the most relevant results first

• Closest pair. Given n numbers, which two have the smallest different between them? Once the numbers are sorted, the closest pair must be next to each other. Therefore, a linear-time scan will find the solution. Closest pair problem is often encountered in various applications, including in machine learning, computational geometry, etc.

• More in Applications of Sorting in the class website (schedule à W1 readings)

Page 13: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Searching – Problem Specification• Input: A bounded set 𝑋 with relation 𝑅 among its

elements, and a solution criteria 𝐶• Output: An element of 𝑋, 𝑥 ∈ 𝑋, that satisfies the

criteria 𝐶• Note: The above definition is very generic. In many

problems, we’ll be more specific. For instance:• What type of space is 𝑋? (e.g., is it countable or

uncountable?) What type of relation exists among the elements? Different algorithms are more suitable for different types of 𝑋 and 𝑅.

• The solution criteria will also influence the type of algorithms we should use to be efficient.

Page 14: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Searching – A Problem Specification• An example for a specific search problem that will be

used often in this class• Input: A sequence of n numbers A = (a1, a2, …, an)

and a value v• Output: An index i such that v = A[i] or the special

value null if v does not appear in A

Page 15: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Where is searching used?•More visible than sorting. For instance:• Internet search, search in your email, search in a

document, etc.• Optimization. Computation wise, optimization

problem is a search problem (i.e., finding a value that is the largest/smallest). Many are search in continuous space (e.g., in real number space).Since almost (if not all) problems can be framed as an optimization problem, they will eventually become a search problem J

Page 16: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

A bit more about problems• Recall: An algorithm solves well-specified problems• In reality, problems do not come in well-specified form.

Someone (usually the algorithm designer) needs to formulate the problem into a well-specified problem that an algorithm can solve • In fact, good problem formulation is usually half the

solution• In this class, esp. from tutorial questions and

assignments, we’ll learn how to formulate problems too• Yes, you’ll need to do this in your assignments J

Page 17: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

TodayüWhat is Algorithms?• Setting up the stageüThe Problems we’ll focus on:

üSortingüSearching• Model of Computation• Math refresher

Page 18: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Analysing Algorithms • Recall that analysing algorithms mean

computing computational resources resources (typically, time and space) required to run the algorithm on a certain input• To do the above analysis, we need to first

know the machine where the algorithms are executed• Model of computation

Page 19: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Model of Computation• In this class, we’ll assume:• An abstract generic one-processor Random

Access Machine (RAM) • Abstract: We don’t consider memory swapping,

garbage collection, etc.• One-processor RAM: Instructions are executed one

after another (no concurrent operations)

• To store numbers, RAM has integer and float data type. Each data type in RAM has a limit. We will introduce as necessary

Page 20: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Model of Computation• Have the following primitive instructions (each

primitive instruction takes constant time)Arithmetic: Add, subtract, multiply, divide, mod, floor, ceilData movement: Load, store, copyControl: Conditional & unconditional branching, subroutine

call and return

Page 21: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Analysing Algorithms Executed in a RAM: Example on Time Analysis • Sorting problem:• Input: A sequence of n numbers [a1, a2, …, an]• Output: A reordering [ a1’, a2’, …, an’] of the input

sequence such that a1 ≤ a2 ≤ … ≤ an• InsertionSort(A)

1. for j = 2 to A.length2. Key = A[j]3. i = j-14. While i > 0 and A[i] > key5. A[i+1] = A[i]6. i = i-17. A[i+1] = key

Page 22: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Illustration of Insertion Sort

20 5 10 8 7 15 12 3

Key = 5, j = 2

20 20 10 8 7 15 12 3

20 20 10 8 7 15 12 3

Key = 5, j = 2

i = 1

i = 0

5 20 10 8 7 15 12 3A[i+1] = Key

Page 23: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Illustration of Insertion Sort

5 20 10 8 7 15 12 3

Key = 10, j = 3

5 20 20 8 7 15 12 3

5 20 20 8 7 15 12 3

Key = 10, j = 3

i = 2

i = 1

5 10 20 8 7 15 12 3A[i+1] = Key

Correctness? We’ll discuss after Asymptotic Analysis

Page 24: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Some Admin• Tutorial group selection has opened this morning. So

far 237 of 316 have registered. Please don’t forget to register before Friday, 31 July 23:59• Class piazza, so far 204 of 316 have registered.

Please register ASAP• We’ll need 3 class rep. If you’re interested, please

send an email to [email protected] by Friday 31 July 23:59 with:• Subject: class rep • Your CV and transcript in .pdf as attachment

Page 25: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

InsertionSort Cost Times1 for j = 2 to A.length2 Key = A[j]3 i = j-14 While i > 0 and A[i] > key5 A[i+1] = A[i]6 i = i-17 A[i+1] = key

Total time 𝑇 𝑛 : sum of cost X times

Back to time analysis

Page 26: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

InsertionSort Cost Times1 for j = 2 to A.length c1 n2 Key = A[j] c2 n-13 i = j-1 c3 n-14 While i > 0 and A[i] > key c4

!!"#

$

𝑡!

5 A[i+1] = A[i] c5!!"#

$

𝑡! − 1

6 i = i-1 c6!!"#

$

𝑡! − 1

7 A[i+1] = key c7 n-1

𝑡!: #times the test in line-4 is executed for the value 𝑗Total time 𝑇 𝑛 : sum of cost X times

Page 27: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

𝑇 𝑛

= 𝑐"𝑛 + 𝑐# + 𝑐$ 𝑛 − 1 + 𝑐%,!&#

'

𝑡! + 𝑐( + 𝑐) ,!&#

'

𝑡! − 1

+ 𝑐* 𝑛 − 1

In the best case (the array is already sorted), 𝑡! = 1 for all 𝑗In the worst case, 𝑡! = 𝑗Let’s compute for the worst case. Then, the summations become

,!&#

'

𝑗 =𝑛 − 12

2 + 𝑛

#elements in the sequence

1st element in the sequence

last element in the sequence

Page 28: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

𝑇 𝑛

= 𝑐"𝑛 + 𝑐# + 𝑐$ 𝑛 − 1 + 𝑐%,!&#

'

𝑡! + 𝑐( + 𝑐) ,!&#

'

𝑡! − 1

+ 𝑐* 𝑛 − 1

Let’s computer for the worst case, then the summations become: ∑!&#' 𝑡! = ∑!&#' 𝑗 = '+"

#2 + 𝑛 = '!,'

#− 1

And ∑!&#' 𝑡! − 1 = ∑!&#' (𝑗 − 1) = '+"#

1 + 𝑛 − 1 = '+"#𝑛

Now, just need to plug them in to 𝑇 𝑛 :

𝑇 𝑛

= 𝑐"𝑛 + 𝑐# + 𝑐$ 𝑛 − 1 + 𝑐%𝑛# + 𝑛2

− 1 + 𝑐( + 𝑐)𝑛 − 12

𝑛

+𝑐* 𝑛 − 1

Page 29: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Now, we can group the terms as follows:𝑇 𝑛

= 𝑛#𝑐% + 𝑐( + 𝑐)

2+ 𝑛 𝑐" + 𝑐# + 𝑐$ +

𝑐%2−

𝑐( + 𝑐)2

+ 𝑐*− 𝑐# + 𝑐$ + 𝑐% + 𝑐*= 𝑛#𝐶 + 𝑛𝐶- − 𝐶′′

Where

𝐶 =𝑐% + 𝑐( + 𝑐)

2𝐶- = 𝑐" + 𝑐# + 𝑐$ +

."#− .#,.$

#+ 𝑐*

𝐶-- = 𝑐# + 𝑐$ + 𝑐% + 𝑐*

Page 30: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

TodayüWhat is Algorithms?• Setting up the stageüThe Problems we’ll focus on:

üSortingüSearching

üModel of Computation• Math refresher

Page 31: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Source: [Lev] Appendix A

Page 32: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Properties of logarithmsIn this class, unless

otherwise stated, the base of the log is 2, i.e.: log 𝑥 = log# 𝑥

Source: [Lev] Appendix A

Page 33: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Examples• Can we transform 2! "#$ % into a polynomial form?

• Can we transform 𝑛 % into the form of a power of 2?

Page 34: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

Examples• Can we transform 2! "#$ % into a polynomial form?

Yes: 2! "#$ % = 2"#$ %/ = 2"#$ %/0 = 𝑛

/0"#$ &

= 𝑛/0

• Can we transform 𝑛 % into the form of a power of 2?

Yes: 𝑛 % = 2"#$ % % = 2 % "#$ %

Property #3 Property #6 =1

Property #6

Page 35: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

You need to be comfortable with the basic math (basic transformations as in previous slides, basic arithmetic and algebra + basic calculus + basic probability). If not, please catch up NOW!!!Some math background materials from [CLRS] and [Lev] are in the notes of W1

Page 36: COMP3600/6466 –Algorithms - cs.anu.edu.au€¦ · 4 11,111 11 msec 10.6 Mbytes 6 ~106 1 sec 1 Gbytes 8 ~108 ~2 min 103 Gbytes ... •Closest pair. Given n numbers, which two have

TodayüWhat is Algorithms?üSetting up the stageüThe Problems we’ll focus on:

üSortingüSearching

üModel of ComputationüMath refresher

Next: Asymptotic Analysis