Quicksort - SKKUmodsim.skku.ac.kr/bbs/Course/Data/AL/LectureNote/alg... · 2009. 3. 27. ·...

of 49/49
September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.14 Quicksort Proposed by C.A.R. Hoare in 1962. Divide-and-conquer algorithm. Sorts “in place” (like insertion sort, but not like merge sort). Very practical (with tuning). Can be viewed as a randomized Las Vegas algorithm
  • date post

    29-Aug-2020
  • Category

    Documents

  • view

    2
  • download

    0

Embed Size (px)

Transcript of Quicksort - SKKUmodsim.skku.ac.kr/bbs/Course/Data/AL/LectureNote/alg... · 2009. 3. 27. ·...

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.14

    Quicksort

    • Proposed by C.A.R. Hoare in 1962.• Divide-and-conquer algorithm.• Sorts “in place” (like insertion sort, but not

    like merge sort).• Very practical (with tuning).• Can be viewed as a randomized Las Vegas

    algorithm

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.15

    Quicksort an n-element array:1. Divide: Partition the array into two subarrays

    around a pivot x such that elements in lower subarray ≤ x ≤ elements in upper subarray.

    2. Conquer: Recursively sort the two subarrays.3. Combine: Trivial.

    Divide and conquer

    xx

    Key: Linear-time partitioning subroutine.

    ≤ x≤ x≤ x≤ x xx ≥ x≥ x

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.16

    Pseudocode for quicksortQUICKSORT(A, p, r)

    if p < rthen q ← PARTITION(A, p, r)

    QUICKSORT(A, p, q–1)QUICKSORT(A, q+1, r)

    Initial call: QUICKSORT(A, 1, n)

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.17

    xx

    Partitioning subroutinePARTITION(A, p, r) ⊳ A[p . . r]

    x ← A[p] ⊳ pivot = A[p]i ← pfor j ← p + 1 to r

    do if A[ j] ≤ xthen i ← i + 1

    exchange A[i] ↔ A[ j]exchange A[p] ↔ A[i]return i

    ≤ x≤ x ≥ x≥ x ??p i rj

    Invariant:

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.18

    Example of partitioning

    i j66 1010 1313 55 88 33 22 1111

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.19

    Example of partitioning

    i j66 1010 1313 55 88 33 22 1111

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.20

    Example of partitioning

    i j66 1010 1313 55 88 33 22 1111

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.21

    Example of partitioning

    66 1010 1313 55 88 33 22 1111

    i j66 55 1313 1010 88 33 22 1111

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.22

    Example of partitioning

    66 1010 1313 55 88 33 22 1111

    i j66 55 1313 1010 88 33 22 1111

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.23

    Example of partitioning

    66 1010 1313 55 88 33 22 1111

    i j66 55 1313 1010 88 33 22 1111

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.24

    Example of partitioning

    66 1010 1313 55 88 33 22 1111

    i j66 55 33 1010 88 1313 22 1111

    66 55 1313 1010 88 33 22 1111

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.25

    Example of partitioning

    66 1010 1313 55 88 33 22 1111

    i j66 55 33 1010 88 1313 22 1111

    66 55 1313 1010 88 33 22 1111

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.26

    Example of partitioning

    66 1010 1313 55 88 33 22 1111

    66 55 33 1010 88 1313 22 1111

    66 55 1313 1010 88 33 22 1111

    i j66 55 33 22 88 1313 1010 1111

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.27

    Example of partitioning

    66 1010 1313 55 88 33 22 1111

    66 55 33 1010 88 1313 22 1111

    66 55 1313 1010 88 33 22 1111

    i j66 55 33 22 88 1313 1010 1111

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.28

    Example of partitioning

    66 1010 1313 55 88 33 22 1111

    66 55 33 1010 88 1313 22 1111

    66 55 1313 1010 88 33 22 1111

    i j66 55 33 22 88 1313 1010 1111

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.29

    Example of partitioning

    66 1010 1313 55 88 33 22 1111

    66 55 33 1010 88 1313 22 1111

    66 55 1313 1010 88 33 22 1111

    66 55 33 22 88 1313 1010 1111

    i22 55 33 66 88 1313 1010 1111

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.30

    xx

    Analysis of quicksort

    • Assume all input elements are distinct.• In practice, there are better partitioning

    algorithms for when duplicate input elements may exist.

    • What is the worst case running time of Quicksort ?

    ≤ x≤ x ≥ x≥ x

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.31

    Worst-case of quicksort

    • Input sorted or reverse sorted.• Partition around min or max element.• One side of partition always has no elements.

    )()()1(

    )()1()1()()1()0()(

    2nnnT

    nnTnnTTnT

    Θ=

    Θ+−=Θ+−+Θ=Θ+−+=

    (arithmetic series)

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.32

    Worst-case recursion treeT(n) = T(0) + T(n–1) + cn

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.33

    Worst-case recursion treeT(n) = T(0) + T(n–1) + cn

    T(n)

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.34

    cnT(0) T(n–1)

    Worst-case recursion treeT(n) = T(0) + T(n–1) + cn

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.35

    cnT(0) c(n–1)

    Worst-case recursion treeT(n) = T(0) + T(n–1) + cn

    T(0) T(n–2)

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.36

    cnT(0) c(n–1)

    Worst-case recursion treeT(n) = T(0) + T(n–1) + cn

    T(0) c(n–2)

    T(0)

    Θ(1)

    O

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.37

    cnT(0) c(n–1)

    Worst-case recursion treeT(n) = T(0) + T(n–1) + cn

    T(0) c(n–2)

    T(0)

    Θ(1)

    O

    ( )21

    nkn

    kΘ=

    Θ ∑

    =

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.38

    cnΘ(1) c(n–1)

    Worst-case recursion treeT(n) = T(0) + T(n–1) + cn

    Θ(1) c(n–2)

    Θ(1)

    Θ(1)

    O

    ( )21

    nkn

    kΘ=

    Θ ∑

    =

    h = n

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.39

    Nice-case analysis

    If we’re lucky, PARTITION splits the array evenly:T(n) = 2T(n/2) + Θ(n)

    = Θ(n lg n) (same as merge sort)

    What if the split is always 109

    101 : ?

    ( ) ( ) )()( 109101 nnTnTnT Θ++=

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.40

    Analysis of nice case)(nT

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.41

    Analysis of nice casecn

    ( )nT 101 ( )nT 109

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.42

    Analysis of nice casecn

    cn101 cn109

    ( )nT 1001 ( )nT 1009 ( )nT 1009 ( )nT 10081

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.43

    Analysis of nice casecn

    cn101 cn109

    cn1001 cn1009 cn1009 cn10081

    Θ(1)

    Θ(1)

    … …log10/9n

    cn

    cn

    cn

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.44

    log10n

    Analysis of nice casecn

    cn101 cn109

    cn1001 cn1009 cn1009 cn10081

    Θ(1)

    Θ(1)

    … …log10/9n

    cn

    cn

    cn

    T(n) ≤ cn log10/9n + Ο(n)

    cn log10n ≤

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.45

    Randomized quicksort• Partition around a random element. I.e.,

    around A[t] , where t chosen uniformly at random from {p…r}

    • We will show that the expected time is O(n log n)

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.46

    “Paranoid” quicksort

    • Will modify the algorithm to make it easier to analyze:

    • Repeat:• Choose the pivot to be a random element of the array• Perform PARTITION

    • Until the resulting split is “lucky”, i.e., not worse than 1/10: 9/10• Recurse on both sub-arrays

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.51

    Quicksort in practice

    • Quicksort is a great general-purpose sorting algorithm.

    • Quicksort is typically over twice as fast as merge sort.

    • Quicksort can benefit substantially from code tuning.

    • Quicksort behaves well even with caching and virtual memory.

    • Quicksort is great!

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.52

    More intuitionSuppose we alternate lucky, unlucky, lucky, unlucky, lucky, ….

    L(n) = 2U(n/2) + Θ(n) luckyU(n) = L(n – 1) + Θ(n) unlucky

    Solving:L(n) = 2(L(n/2 – 1) + Θ(n/2)) + Θ(n)

    = 2L(n/2 – 1) + Θ(n)= Θ(n lg n)

    How can we make sure we are usually lucky?Lucky!

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.53

    Randomized quicksort analysis

    Let T(n) = the random variable for the running time of randomized quicksort on an input of size n, assuming random numbers are independent.For k = 0, 1, …, n–1, define the indicator random variable

    Xk =1 if PARTITION generates a k : n–k–1 split,0 otherwise.

    E[Xk] = Pr{Xk = 1} = 1/n, since all splits are equally likely, assuming elements are distinct.

  • Expected valueExpected value = Outcome • Probability

    E[Xk] = 1•Pr{Xk = 1} + 0•Pr{Xk = 0}= 1•(1/n) + 0 •((n-1)/n)= 1/n

    Dice Example:Dice Example:

    Xk : Indicator random variable of dice throwing

    Xk = 1 if dice’s value is k0 otherwise

    E[X5] = 1•Pr{X5 = 1} + 0•Pr{X5 = 1}= 1•(1/6) + 0•(5/6) = 1/6

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.54

    Analysis (continued)

    T(n) =

    T(0) + T(n–1) + Θ(n) if 0 : n–1 split,T(1) + T(n–2) + Θ(n) if 1 : n–2 split,

    MT(n–1) + T(0) + Θ(n) if n–1 : 0 split,

    ( )∑−

    =Θ+−−+=

    1

    0)()1()(

    n

    kk nknTkTX .

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.55

    Calculating expectation( )

    Θ+−−+= ∑

    =

    1

    0)()1()()]([

    n

    kk nknTkTXEnTE

    Take expectations of both sides.

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.56

    Calculating expectation( )

    ( )[ ]∑

    ∑−

    =

    =

    Θ+−−+=

    Θ+−−+=

    1

    0

    1

    0

    )()1()(

    )()1()()]([

    n

    kk

    n

    kk

    nknTkTXE

    nknTkTXEnTE

    Linearity of expectation.

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.57

    Calculating expectation( )

    ( )[ ]

    [ ] [ ]∑

    =

    =

    =

    Θ+−−+⋅=

    Θ+−−+=

    Θ+−−+=

    1

    0

    1

    0

    1

    0

    )()1()(

    )()1()(

    )()1()()]([

    n

    kk

    n

    kk

    n

    kk

    nknTkTEXE

    nknTkTXE

    nknTkTXEnTE

    Independence of Xk from other random choices.

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.58

    Calculating expectation( )

    ( )[ ]

    [ ] [ ]

    [ ] [ ] ∑∑∑

    =

    =

    =

    =

    =

    =

    Θ+−−+=

    Θ+−−+⋅=

    Θ+−−+=

    Θ+−−+=

    1

    0

    1

    0

    1

    0

    1

    0

    1

    0

    1

    0

    )(1)1(1)(1

    )()1()(

    )()1()(

    )()1()()]([

    n

    k

    n

    k

    n

    k

    n

    kk

    n

    kk

    n

    kk

    nn

    knTEn

    kTEn

    nknTkTEXE

    nknTkTXE

    nknTkTXEnTE

    Linearity of expectation; E[Xk] = 1/n .

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.59

    Calculating expectation( )

    ( )[ ]

    [ ] [ ]

    [ ] [ ]

    [ ] )()(2

    )(1)1(1)(1

    )()1()(

    )()1()(

    )()1()()]([

    1

    1

    1

    0

    1

    0

    1

    0

    1

    0

    1

    0

    1

    0

    nkTEn

    nn

    knTEn

    kTEn

    nknTkTEXE

    nknTkTXE

    nknTkTXEnTE

    n

    k

    n

    k

    n

    k

    n

    k

    n

    kk

    n

    kk

    n

    kk

    Θ+=

    Θ+−−+=

    Θ+−−+⋅=

    Θ+−−+=

    Θ+−−+=

    ∑∑∑

    =

    =

    =

    =

    =

    =

    =

    Summations have identical terms.

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.60

    Hairy recurrence

    [ ] )()(2)]([1

    2nkTE

    nnTE

    n

    kΘ+= ∑

    =

    (The k = 0, 1 terms can be absorbed in the Θ(n).)

    Prove: E[T(n)] ≤ an lg n for constant a > 0 .

    Use fact: 21

    2812

    21 lglg nnnkk

    n

    k∑

    =−≤ (exercise).

    • Choose a large enough so that an lg ndominates E[T(n)] for sufficiently small n ≥ 2.

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.61

    Substitution method

    [ ] )(lg2)(1

    2nkak

    nnTE

    n

    kΘ+≤ ∑

    =

    Substitute inductive hypothesis.

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.62

    Substitution method

    [ ]

    )(81lg

    212

    )(lg2)(

    22

    1

    2

    nnnnna

    nkakn

    nTEn

    k

    Θ+

    −≤

    Θ+≤ ∑−

    =

    Use fact.

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.63

    Substitution method

    [ ]

    Θ−−=

    Θ+

    −≤

    Θ+≤ ∑−

    =

    )(4

    lg

    )(81lg

    212

    )(lg2)(

    22

    1

    2

    nannan

    nnnnna

    nkakn

    nTEn

    k

    Express as desired – residual.

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.64

    Substitution method

    [ ]

    nan

    nannan

    nnnnna

    nkakn

    nTEn

    k

    lg

    )(4

    lg

    )(81lg

    212

    )(lg2)(

    22

    1

    2

    Θ−−=

    Θ+

    −=

    Θ+≤ ∑−

    =

    if a is chosen large enough so that an/4 dominates the Θ(n).

    ,

  • September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.66

    Randomized Algorithms

    • Algorithms that make decisions based on random coin flips.

    • Can “fool” the adversary.• The running time (or even correctness) is a

    random variable; we measure the expectedrunning time.

    • We assume all random choices are independent .

    • This is not the average case !