Stanford Dsa

download Stanford Dsa

of 52

Transcript of Stanford Dsa

  • 7/25/2019 Stanford Dsa

    1/52

    Welcome to CS166!

    Course information handout available upfront.

    Today:

    Course overview.

    Why study data structures?

    The rane minimum uery problem.

  • 7/25/2019 Stanford Dsa

    2/52

    Course Staff

    "eith Schwar# $htie%&cs.stanford.edu'

    "yle (role $brole%&stanford.edu')aniel *ollinshead $dhollinshead&stanford.edu'+ic% ,saacs $nisaacs&stanford.edu'

    -parna "rishnan $aparna%&stanford.edu'Sen Wu $senwu&stanford.edu'

    Course Staff Mailing List:cs166spr1/10staff&lists.stanford.edu

    mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]
  • 7/25/2019 Stanford Dsa

    3/52

    http://cs166.stanford.edu

    The Course Website

    http://cs166.stanford.edu/http://cs166.stanford.edu/
  • 7/25/2019 Stanford Dsa

    4/52

    euired eadin

    Introduction toAlgorithms, ThirdEditionby Cormen23eiserson2 ivest2 and

    Stein.

    4ou5ll want the thirdedition for thiscourse.

    -vailable in theboo%store severalcopies on hold at the7nineerin 3ibrary.

  • 7/25/2019 Stanford Dsa

    5/52

    8rereuisites

    CS161$)esin and -nalysis of -lorithms'

    We5ll assume familiarity with asymptotic notation2correctness proofs2 alorithmic strateies $e..divideandconuer'2 classical alorithms2

    recurrence relations2 etc. CS107$Computer 9rani#ation and Systems'

    We5ll assume comfort wor%in from thecommandline2 desinin and testin nontrivial

    prorams2 and manipulatin bitwise representationsof data. 4ou should have some %nowlede of thememory hierarchy.

    +ot sure whether you5re in the riht place? 8lease feelfree to as%!

  • 7/25/2019 Stanford Dsa

    6/52

    radin 8olicies

  • 7/25/2019 Stanford Dsa

    7/52

    ;;= idterm

    >;= @inal 8roAect

    radin 8olicies

  • 7/25/2019 Stanford Dsa

    8/52

    -Bess: 7nrollment 3imitedD

    (ecause this is a new course2 we5re limitinenrollment in CS166 to 1

  • 7/25/2019 Stanford Dsa

    9/52

    Why Study )ata Structures?

  • 7/25/2019 Stanford Dsa

    10/52

    Why Study )ata Structures?

    Explore the intersection beteentheor! and practice.

    Learn ne approaches to "odeling

    and sol#ing proble"s. Expand !our sense of hat can be

    done efficientl!.

  • 7/25/2019 Stanford Dsa

    11/52

    ane inimum Eueries

  • 7/25/2019 Stanford Dsa

    12/52

    41 59 26 5331 41 59 26 53 58 97 93

    The E 8roblem

    The $ange Mini"u" %uer!$$M%'problem is the followin:

    iven a fiBed array - and two indices

    i j2 what is the smallest element out of-FiG2 -FiH 1G2 I2 -FjJ 1G2 -FjG?

    31 58 97 93

  • 7/25/2019 Stanford Dsa

    13/52

    - Trivial Solution

    There5s a simple 9$n'time alorithm forevaluatin E

    -$i2j': Aust iterate across the

    elements between iandj2 inclusive2 and ta%ethe minimum!

    Why is this problem at all alorithmicallyinterestin?

    Suppose that the array - is fiBed and we5ll

    ma%e kueries on it. Can we do better than the naKve alorithm?

  • 7/25/2019 Stanford Dsa

    14/52

    -n 9bservation

    ,n an array of lenth n2 there are only L$n>' possibleueries.

    Why?

    5subarrays oflength 1

    4subarrays of

    length 2

    3subarrays oflength 3

    2subarrays oflength

    1subarray oflength !

  • 7/25/2019 Stanford Dsa

    15/52

    - )ifferent -pproach

    There are only L$n>' possible Es in an array oflenth n.

    ,f we precompute all of them2 we can answer E intime 9$1' per uery.

    16 18 33 98

    0 1 2 3

    1M

    0 1 2 3

    0

    1

    2

    3

  • 7/25/2019 Stanford Dsa

    16/52

    (uildin the Table

    9ne simple approach: for each entry inthe table2 iterate over the rane inuestion and find the minimum value.

    *ow efficient is this? +umber of entries: L$n>'.

    Time to evaluate each entry: 9$n'.

    Time reuired: 9$n/'. The runtime is 9$n/' usin this approach.

    ,s it also L$n/'?

  • 7/25/2019 Stanford Dsa

    17/52

    /0

    ;

    6

    N

    < 1 > / 0 ; 6 N

    7ach entry in yellow reuires at

    least nO > P L$n' wor% to evaluate.

    There are rouhly n>O M P L$n>'entries here.

    Total wor% reuired: L$n/'

    7ach entry in yellow reuires at

    least nO > P L$n' wor% to evaluate.

    There are rouhly n>O M P L$n>'entries here.

    Total wor% reuired: L$n/'

  • 7/25/2019 Stanford Dsa

    18/52

    18

    16

    18 18

    33

    18

    33

    16 1616

    33

    98

    33

    98

    1818

    1616

    - )ifferent -pproach

    +aKvely precomputin the table is inefficient.

    Can we do better?

    Clai":Can precompute all subarrays in time L$n>'usin dynamic prorammin.

    16 18 33 98

    0 1 2 3

    0 1 2 3

    0

    1

    2

    3

  • 7/25/2019 Stanford Dsa

    19/52

    Some +otation

    We5ll say that an E data structure has timecompleBity &p'n() q'n(+if

    preprocessin ta%es time at most "$n' and

    ueries ta%e time at most #$n'.

    We now have two E data structures:

    Q9$1'2 9$n'R with no preprocessin.

    Q9$n>'2 9$1'R with full preprocessin.

    These are two eBtremes on a curve of tradeoffs:no preprocessin versus full preprocessin.

    %uestion:Is there a $golden mean% bet&eenthese e'tremes(

  • 7/25/2019 Stanford Dsa

    20/52

    -nother -pproach: ,loc- eco"position

  • 7/25/2019 Stanford Dsa

    21/52

    26 23 62

    26 53 58 97 93 23 84 62 6426 53 58 97 93 23 84 62 64

    26 23 62

    33 8341 59

    31 27

    31 41 59 33 83 2731 27

    - (loc%(ased -pproach

    Split the input into 9$nO b' bloc%s ofsome bloc% si#eD b.

    *ere2 bP /.

    Compute the minimum value in eachbloc%.

    31 27

  • 7/25/2019 Stanford Dsa

    22/52

    -naly#in the -pproach

    3et5s analy#e this approach in terms of nand b.

    8reprocessin time:

    9$b' wor% on 9$nO b' bloc%s to find minimums.

    Total wor%: 'n(.

    Time to uery E-$i2j':

    9$1' wor% to find bloc% indices $divide by bloc% si#e'.

    9$b' wor% to scan inside iandj5s bloc%s.

    9$nO b' wor% loo%in at bloc% minimums between iandj.

    Total wor%: 'b n/ b(.

    26 23 62

    26 53 58 97 93 23 84 62 6426 53 58 97 93 23 84 62 64

    26 23 62

    33 8341 59

    31 27

    31 41 59 33 83 2731 27

    31 27

  • 7/25/2019 Stanford Dsa

    23/52

    ,ntuitin 9$bH n/ b'

    -s bincreases:

    The bterm rises $more elements to scan withineach bloc%'.

    The n/ bterm drops $fewer bloc%s to loo% at'. -s bdecreases:

    The bterm drops $fewer elements to scan withina bloc%'.

    The n/ bterm rises $more bloc%s to loo% at'.

    ,s there an optimal choice of biven theseconstraints?

  • 7/25/2019 Stanford Dsa

    24/52

    9ptimi#in b

    What choice of bminimi#es bH nO b?

    Start by ta%in the derivative:

    Settin the derivative to #ero:

    -symptotically optimal runtime is when bP n1O>.

    ,n that case2 the runtime is

    9$bH nO b' P 9$n1O>H nO n1O>' P 9$n1O>H n1O>' P 'n1/(

    d

    db(b+n/b) = 1

    n

    b>

    1n/b> =

    b> = nb

    = n

  • 7/25/2019 Stanford Dsa

    25/52

    Summary of -pproaches

    Three solutions so far: +o preprocessin: Q9$1'2 9$n'R.

    @ull preprocessin: Q9$n>'2 9$1'R.

    (loc% partition: Q9$n'2 9$n1O>'R.

    odest preprocessin yields modestperformance increases.

    %uestion:Can we do better?

  • 7/25/2019 Stanford Dsa

    26/52

    - Second -pproach: Sparse 2ables

  • 7/25/2019 Stanford Dsa

    27/52

    -n ,ntuition

    The Q9$n>'2 9$1'R solution ives fastueries because every rane we mihtloo% up has already been precomputed.

    This solution is slow overall because wehave to compute the minimum of everypossible rane.

    %uestion:Can we still et 9$1' uerieswithout preprocessin all possibleranes?

  • 7/25/2019 Stanford Dsa

    28/52

  • 7/25/2019 Stanford Dsa

    29/52

  • 7/25/2019 Stanford Dsa

    30/52

    Some 9bservations

  • 7/25/2019 Stanford Dsa

    31/52

    The -pproach

    @or each indeB i2 compute E for ranesstartin at iof si#e 12 >2 02 M2 162 I2 >kas lonas they fit in the array.

    ives both lare and small ranes startin at

    any point in the array.

    9nly 9$lo n' ranes computed for each arrayelement.

    Total number of ranes: 9$nlo n'.

    Clai":-ny rane in the array can be formedas the union of two of these ranes.

  • 7/25/2019 Stanford Dsa

    32/52

    Creatin anes

    14

    16

    16

  • 7/25/2019 Stanford Dsa

    33/52

    Creatin anes

    7

    5

    5

  • 7/25/2019 Stanford Dsa

    34/52

    )oin a Euery

    To answer E-$i2j':

    @ind the larest ksuch that >kjJ iH 1.

    With the riht preprocessin2 this can be done in

    time 9$1' you5ll fiure out how in the problemset!

    The rane Fi2jG can be formed as the overlapof the ranes Fi2 iH >kJ 1G and FjJ >kH 12jG.

    7ach rane can be loo%ed up in time 9$1'. Total time: '1(.

  • 7/25/2019 Stanford Dsa

    35/52

    2626

    31

    41

    59

    4141

    59

    41

    31 3131

    53

    26

    58

    97

    93

    53

    26

    58

    97

    93

    8recomputin the anes

    There are 9$nlo n' ranes to precompute. sin dynamic prorammin2 we can compute

    all of them in time 9$nlo n'.

    31 41 59 26 53 58 97 93

    26

    53

    58

    93

    0

    1

    5

    6

    70 1 5 6 7

    0 1

  • 7/25/2019 Stanford Dsa

    36/52

    Sparse Tables

    This data structure is called a sparsetable.

    ives an &'nlog n() '1(+solution to

    E. -symptotically better than precomputin

    all possible ranes!

  • 7/25/2019 Stanford Dsa

    37/52

    The Story So @ar

    We now have the followin solutions forE:

    8recompute all: Q9$n>'2 9$1'R.

    8recompute none: Q9$1'2 9$n'R. (loc%in: Q9$n'2 9$n1O>'R.

    Sparse table: Q9$nlo n'2 9$1'R.

    Can we do better?

  • 7/25/2019 Stanford Dsa

    38/52

    - Third -pproach: 8!brid Strategies

    l % d

  • 7/25/2019 Stanford Dsa

    39/52

    (loc%in evisited

    31 26 23 62 27

    31 41 59 26 53 58 97 93 23 84 62 64 33 83 2731 41 59 26 53 58 97 93 23 84 62 64 33 83 27

    31 26 23 62 27

    (l %i i i d

  • 7/25/2019 Stanford Dsa

    40/52

    (loc%in evisited

    31 26 23 62 27

    31 41 59 26 53 58 97 93 23 84 62 64 33 83 2731 41 59 26 53 58 97 93 23 84 62 64 33 83 27

    31 26 23 62 27

    This is just )*+ onthe block minimums

    This is just )*+ onthe block minimums

    (l %i i it d

  • 7/25/2019 Stanford Dsa

    41/52

    (loc%in evisited

    31 26 23 62 27

    31 41 59 26 53 58 97 93 23 84 62 64 33 83 2731 41 59 26 53 58 97 93 23 84 62 64 33 83 27

    31 26 23 62 27

    This is just )*+inside the blocks

    This is just )*+inside the blocks

    Th S t

  • 7/25/2019 Stanford Dsa

    42/52

    The Setup

    *ere5s a new possible route for solvin E: Split the input into bloc%s of some bloc% si#e b.

    @or each of the 9$nO b' bloc%s2 compute theminimum.

    Construct an $M% structure on the bloc-"ini"u"s.

    Construct $M% structures on each bloc-.

    Combine the E answers to solve E overall.

    This approach of sementin a structure into ahihlevel structure and many lowlevel structuresis sometimes called a "acro/"icrodeco"position.

    C bi ti d 8 t ti

  • 7/25/2019 Stanford Dsa

    43/52

    Combinations and 8ermutations

    The macroOmicro decomposition isn5t a sinledata structure it5s aframe&orkfor datastructures.

    We et to choose

    the bloc% si#e2

    which E structure to use on top2 and

    which E structure to use for the bloc%s.

    Summary and bloc% E structures don5t haveto be the same type of E data structure J wecan combine different structures toether toet different results.

    Th @ %

  • 7/25/2019 Stanford Dsa

    44/52

    The @ramewor%

    Suppose we use a Q"U$n'2 #U$n'Rtime E solutionfor the bloc% minimums and a Q"V$n'2 #V$n'RtimeE solution within each bloc%.

    3et the bloc% si#e be b.

    ,n the hybrid structure2 the preprocessin time is

    'n p9'n/ b( 'n/ b( p'b((

    The uery time is

    'q9'n/ b( q'b((

    31 26 23 62 27

    31 41 59 26 53 58 97 93 23 84 62 64 33 83 2731 41 59 26 53 58 97 93 23 84 62 64 33 83 27

    31 26 23 62 27

    - S it Ch %

  • 7/25/2019 Stanford Dsa

    45/52

    - Sanity Chec%

    The Q9$n'2 9$n1O>'R bloc%based structure from earlier usesthis framewor% with the Q9$1'2 9$n'R nopreprocessinE structure and bP n1O>.

    -ccordin to our formulas2 the preprocessin time shouldbe

    P P 9$nH "U$nO b' H $nO b' "V$b'' P P 9$nH 1 H nO b' P PP 'n(

    The uery time should be

    P PP 9$#U$nO b' H #V$b'' P PP 9$nO bH b' P PP 'n1/(

    3oo%s ood so far!

    ;or $eference

    "U$n' P 1#U$n' P n

    "V$n' P 1#V$n' P n

    bP n1O>

    ;or $eference

    "U$n' P 1#U$n' P n

    "V$n' P 1#V$n' P n

    bP n1O>

    ;or $eference

    "U$n' P 1#U$n' P n

    "V$n' P 1#V$n' P n

    bP n1O>

    ;or $eference

    "U$n' P 1#U$n' P n

    "V$n' P 1#V$n' P n

    bP n1O>

    -n 9bservation

  • 7/25/2019 Stanford Dsa

    46/52

    -n 9bservation

    - sparse table ta%es time 9$nlo n' to constructon an array of nelements.

    With bloc% si#e b2 there are 9$nO b' total bloc%s.

    Time to construct a sparse table over the bloc%minimums: 9$$nO b' lo $nO b''.

    Since lo $nO b' P 9$lo n'2 the time to build thesparse table is at most 9$$nO b' lo n'.

    Cute tric-:,f bP L$lo n'2 the time to construct asparse table over the minimums is

    9$$nO b' lo n' P 9$$nO lo n' lo n' P 'n(

    9ne 8ossible *ybrid

  • 7/25/2019 Stanford Dsa

    47/52

    9ne 8ossible *ybrid

    Set the bloc% si#e to lo n. se a sparse table for the toplevel structure.

    se the no preprocessinD structure for each bloc%.

    8reprocessin time:

    P9$nH "U$nO b' H $nO b'"V$b'' P 9$nH nH nO lo n' P 'n(

    Euery time:

    P9$#U$nO b' H #V$b'' P 9$1 H lo n' P 'log n(

    -n &'n() 'log n(+solution!

    ;or $eference

    "U$n' P nlo n#U$n' P 1

    "V$n' P 1#V$n' P n

    bP lo n

    ;or $eference

    "U$n' P nlo n#U$n' P 1

    "V$n' P 1#V$n' P n

    bP lo n

    -nother *ybrid

  • 7/25/2019 Stanford Dsa

    48/52

    -nother *ybrid

    3et5s suppose we use the Q9$nlo n'2 9$1'R sparse tablefor both the top and bottom E structures with abloc% si#e of lo n.

    The preprocessin time is

    P9$nH "U$nO b' H $nO b'"V$b'' P 9$nH nH $nO lo n' blob' P 9$nH $nO lo n' lo nlo lo n'

    P 'nlog log n(

    The uery time is

    P9$#U$nO b' H #V$b'' P '1(

    We have an &'nlog log n() '1(+solution to E!

    ;or $eference

    "U$n' P nlo n#U$n' P 1

    "V$n' P nlo n#V$n' P 1

    bP lo n

    ;or $eference

    "U$n' P nlo n#U$n' P 1

    "V$n' P nlo n#V$n' P 1

    bP lo n

    9ne 3ast *ybrid

  • 7/25/2019 Stanford Dsa

    49/52

    9ne 3ast *ybrid

    Suppose we use a sparse table for the top structureand the Q9$n'2 9$lo n'R solution for the bottomstructure. 3et5s choose bP lo n.

    The preprocessin time is

    P9$nH "U$nO b' H $nO b'"V$b'' P 9$nH nH $nO lo n' b' P 9$nH nH $nO lo n' lo n'

    P 'n(

    The uery time is

    P9$#U$nO b' H #V$b'' P 9$1 H lo lo n' P 'log log n(

    We have an &'n() 'log log n(+

    solution to E!

    ;or $eference

    "U$n' P nlo n#U$n' P 1

    "V$n' P n#V$n' P lo n

    bP lo n

    ;or $eference

    "U$n' P nlo n#U$n' P 1

    "V$n' P n#V$n' P lo n

    bP lo n

    Where We Stand

  • 7/25/2019 Stanford Dsa

    50/52

    Where We Stand

    We5ve seen a bunch of E structurestoday:

    +o preprocessin: Q9$1'2 9$n'R

    @ull preprocessin: Q9$n>

    '2 9$1'R (loc% partition: Q9$n'2 9$n1O>'R

    Sparse table: Q9$nlo n'2 9$1'R

    *ybrid 1: Q9$n'2 9$lo n'R *ybrid >: Q9$nlo lo n'2 9$1'R

    *ybrid /: Q9$n'2 9$lo lo n'R

  • 7/25/2019 Stanford Dsa

    51/52

    ,s there an Q9$n'2 9$1'R solution to E?

  • 7/25/2019 Stanford Dsa

    52/52

    +eBt Time

    Cartesian 2rees - data structure closely related to E.

    2he Method of ;our $ussians

    - techniue for shavin off lo factors.

    2he ;ischer>8eun Structure

    - deceptively simple2 asymptotically optimal

    E structure.