Chapter 8. Sorting in Linear Time - Fudan...

27
Chapter 8. Chapter 8. Sorting in Linear Time Sorting in Linear Time

Transcript of Chapter 8. Sorting in Linear Time - Fudan...

Page 1: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process

Chapter 8.Chapter 8.

Sorting in Linear TimeSorting in Linear Time

Page 2: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process

Types of Sort AlgorithmsTypes of Sort Algorithms

The only operation that may be used to gain order The only operation that may be used to gain order information about a sequence is comparison of pairs of information about a sequence is comparison of pairs of elements.elements.

Quick Sort Quick Sort ---- comparisoncomparison--basedbased

Insertion Sort Insertion Sort ---- comparisoncomparison--basedbased

Heap SortHeap Sort ---- comparisoncomparison--basedbased

Merge SortMerge Sort ---- comparisoncomparison--basedbased

Page 3: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process

Lower bounds for sortingLower bounds for sorting

Lower boundsLower bounds–– ΩΩ(n) (n) to examine all the input.to examine all the input.

–– All sorts seen so far are All sorts seen so far are ΩΩ(n (n lglg n).n).

–– We’ll show that We’ll show that ΩΩ(n (n lglg n) n) is a lower bound for comparison sorts.is a lower bound for comparison sorts.

Decision treeDecision treeDecision treeDecision tree–– Abstraction of any comparison sort.Abstraction of any comparison sort.

–– Represents comparisons made byRepresents comparisons made by

a specific sorting algorithma specific sorting algorithm

on inputs of a given size.on inputs of a given size.

–– Abstracts away everything else: control and data movement.Abstracts away everything else: control and data movement.

–– We’re counting We’re counting only only comparisonscomparisons..

Page 4: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process
Page 5: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process

How many leaves on the decision tree? There are ≥ n! leaves, because every permutation appears

at least once.

For any comparison sort,– 1 tree for each n.

– View the tree as if the algorithm splits in two at each node,

Decision TreeDecision Tree

– View the tree as if the algorithm splits in two at each node, based on the information it has determined up to that point.

– The tree models all possible execution traces.

What is the length of the longest path from root to leaf?– Depends on the algorithm

– Insertion sort: (n²)

– Merge sort: (n lg n)

Worst –case

Running time

Page 6: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process

LemmaLemma

Lemma

Any binary tree of height h has ≤ 2h leaves.In other words:

l = # of leaves,

h = height,h = height,

Then l ≤ 2h .

Theorem

Any decision tree that sorts n elements has height Ω(n lg n).

Page 7: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process

ProofProof

Page 8: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process

Sorting in Linear TimeSorting in Linear Time

NonNon--comparison sorts.comparison sorts.

Counting sortCounting sort

Depends on a Depends on a key assumptionkey assumption: numbers to be sorted are integers : numbers to be sorted are integers in 0in 0, , 11, . . . , k, . . . , k..

Input: Input:

A[1 . . n],A[1 . . n], where where A[ j ] A[ j ] ∈∈ 0, 1, . . . , k0, 1, . . . , k for for j j = 1= 1, , 22, ..., n, ..., n. .

Array Array A A and values and values n n and and k k are given as parametersare given as parameters..

Output: Output:

BB[1 [1 . . n. . n], sorted. ], sorted. B B is assumed to be already allocated and is given as a is assumed to be already allocated and is given as a parameterparameter..

Auxiliary storage: Auxiliary storage: C[0 . . k]C[0 . . k]

Page 9: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process
Page 10: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process
Page 11: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process

Analysis of Counting sort

ΘΘ(n (n + + k)k),, which is which is ΘΘ(n)(n) if if k k = = O(n)O(n)..

How big a How big a k k is practical?is practical?–– Good for sorting 32Good for sorting 32--bit values? No.bit values? No.

–– 1616--bit? Probably not.bit? Probably not.

–– 88--bit? Maybe, depending on bit? Maybe, depending on nn..–– 88--bit? Maybe, depending on bit? Maybe, depending on nn..

–– 44--bit? Probably (unless bit? Probably (unless n n is really small).is really small).

Counting sort will be used in radix sortCounting sort will be used in radix sort..

Stable algorithmStable algorithm::

Numbers with the same value appear in the output array in the Numbers with the same value appear in the output array in the same order same order as they do in the input array;as they do in the input array;.

Page 12: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process

Radix Sort(1)

Key idea: Sort least significant digits first.

Why?

Page 13: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process

Correctness:

Radix Sort(2)

Correctness:

Induction on number of passes ( i in pseudocode).

Assume digits 1, 2, . . . , i-1 are sorted.

Show that a stable sort on digit i leaves digits 1, ..., i sorted:

• If 2 digits in position i are different, ordering by position i is

correct, and positions 1, . . . , i-1 are irrelevant.

• If 2 digits in position i are equal, numbers are already in the

right order (by inductive hypothesis). The stable sort on digit i

leaves them in the right order.

Page 14: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process

Analysis of Radix Sort(1)

Lemma 8.3:Lemma 8.3:

Given Given n d--digit numbers in which each digit can take on up to digit numbers in which each digit can take on up to k

possible values, RADIXpossible values, RADIX--SORT correctly sorts these numbers in SORT correctly sorts these numbers in

ΘΘ (d( (d( n + + k)) time.k)) time.

Assume that we use counting sort as the intermediate Assume that we use counting sort as the intermediate

sort.sort.

–– ΘΘ ((n + + k) k) per pass (digits in range 0per pass (digits in range 0, ... , k, ... , k))

–– d d passespasses

–– ΘΘ (d((d(n + + k)) k)) totaltotal

–– If If k = O(k = O(n),), time = time = ΘΘ ((dd⋅⋅n))..

Page 15: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process

Lemma 8.4

Given n b-bit numbers and any positive integer r ≤ b, RADIX-

SORT correctly sorts these numbers in Θ (b/r (n+ 2r )) time.

How to break each key into digits?

Analysis of Radix Sort(2)

– n words.

– b bits/word.

– Break into r-bit digits. Have d = b/r.

– Use counting sort, k = 2r -1.

Example: 32-bit words, 8-bit digits.

b = 32, r = 8, d = 32/8 = 4, k = 28 - 1 = 255.

– Time = Θ (b/r (n + 2r )).

Page 16: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process
Page 17: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process

Bucket SortAssumes the input is generated by a random process

that distributes elements uniformly over [0, 1).

Idea:

Divide [0, 1) into n equal-sized buckets.

Distribute the n input values into the buckets. Distribute the n input values into the buckets.

Sort each bucket.

Then go through buckets in order, listing elements in

each one.

Input: A[1 .. n], where 0 ≤ A[i] < 1 for all i.

Auxiliary array: B[0 .. n-1] of linked lists, each list

initially empty.

Page 18: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process

Consider Consider AA[[ii ], ], AA[ [ j j ]: ]:

Assume without loss of generality that Assume without loss of generality that AA[[ii ] ≤ ] ≤ AA[ [ j j

TThen hen nn··AA[[ii ]] ≤ ≤ nn··AA[ [ j j ]]. .

CorrectnessCorrectness

TThen hen nn··AA[[ii ]] ≤ ≤ nn··AA[ [ j j ]]. .

So So AA[[ii ] is placed into the same bucket as ] is placed into the same bucket as AA[ [ j j ] or ] or

into a bucket with a lower index.into a bucket with a lower index.

–– If same bucket, insertion sort fixes up.If same bucket, insertion sort fixes up.

–– If earlier bucket, concatenation of lists fixes If earlier bucket, concatenation of lists fixes up.up.

Page 19: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process

AnalysisAnalysis

Relies on Relies on nono bucket getting bucket getting too many too many values.values.

All lines of algorithm except insertion sorting All lines of algorithm except insertion sorting

take take ΘΘ(n) (n) altogether.altogether.

Intuitively, if each bucket gets a constant Intuitively, if each bucket gets a constant Intuitively, if each bucket gets a constant Intuitively, if each bucket gets a constant

number of elements, it takes number of elements, it takes O(1)O(1) time to sort time to sort

each bucket each bucket ⇒⇒ O(n) O(n) sort time for all buckets.sort time for all buckets.

We “expect” each bucket to have few elements, We “expect” each bucket to have few elements,

since the average is 1 element per bucket.since the average is 1 element per bucket.

Page 20: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process
Page 21: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process

Detailed AnalysisDetailed Analysis

Page 22: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process
Page 23: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process
Page 24: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process
Page 25: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process

ConclusionConclusion

Page 26: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process

HomeworkHomework

8.28.2--3, 8.23, 8.2--4, 8.34, 8.3--2,8.32,8.3--4, 8.44, 8.4--2, 8.42, 8.4--33

Page 27: Chapter 8. Sorting in Linear Time - Fudan Universitydatamining-iip.fudan.edu.cn/ppts/algo/Lecture08.pdf · 2016-04-07 · Bucket Sort Assumes the input is generated by a random process

Important Features of Sort AlgorithmsImportant Features of Sort Algorithms

Sort AlgorithmSort Algorithm WorstWorst--case T(n)case T(n) AverageAverage--case T(n)case T(n)

Insertion sortInsertion sort ΘΘ(n² )(n² ) ΘΘ(n² )(n² )

Quick sortQuick sort ΘΘ (n²)(n²) ΘΘ (n (n lglg n)n)Quick sortQuick sort ΘΘ (n²)(n²) ΘΘ (n (n lglg n)n)

Merge sortMerge sort ΘΘ (n (n lglg n)n) ΘΘ (n (n lglg n)n)

Heap sortHeap sort ΘΘ (n (n lglg n)n) ΘΘ(n (n lglg n)n)

Counting sortCounting sort

Radix sortRadix sort

Bucket sortBucket sort

ΘΘ (n)(n)

ΘΘ (n (n lglg n)n)

ΘΘ(n² )(n² )

ΘΘ (n)(n)

ΘΘ (n)(n)

ΘΘ (n)(n)