Problem whose solutions can be 'ranked'

27
Randomized Algorithms

Transcript of Problem whose solutions can be 'ranked'

Page 1: Problem whose solutions can be 'ranked'

Randomized Algorithms

Page 2: Problem whose solutions can be 'ranked'

Essence

Probabilistic algorithms or sampling

A degree of randomness is part of the logic

(using a random number generator)

Results can be different for different runs

Algorithms may

Produce incorrect results (Las Vegas)

Fail to produce a result (Monte Carlo)

Page 3: Problem whose solutions can be 'ranked'

Example

findA_LV

Repeat

Randomly select one

out of n elements

Until ‘a’ is found

End

findA_MC

count = 0

Repeat

Randomly select one

out of n elements

count ++

Until ‘a’ is found or

count > k

End

An array of n elements, half are ‘a’ and half are ‘b’

Find an ‘a’ in the array

Page 4: Problem whose solutions can be 'ranked'

Example (cont)

findA_LV

Always succeeds

Random run time

(average is O(1))

findA_MC

Succeed with P=1-

(1/2)^k

Max run time is O(k)

An array of n elements, half are ‘a’ and half are ‘b’

Find an ‘a’ in the array

Page 5: Problem whose solutions can be 'ranked'

More Example

Quick sort

The selection of pivot is random

Always produce the correct results, but runtime

is random

Average runtime is O(nlogn)

Page 6: Problem whose solutions can be 'ranked'

General Curve Fitting

2,4,1 ,,for solve

1

2

1

139

124

111

)(

)(

139224

1

),,,,(

),,,,(),,,,(

equations 3equationsn

(3,1)(2,2),(1,1),pointsinput 3

),(,),,(),,(pointsinput

),,,,(

1

1

2

1

1

21

2122

2111

2211

2

21

cbaaa

c

b

a

y

y

a

a

a

xf

xf

cbacba

cba

aaaxfy

aaaxfyaaaxfy

yxyxyxn

cbxaxyaaaxfy

n

n

n

n

nnn

n

n

nn

n

Page 7: Problem whose solutions can be 'ranked'

General Least Square Regression

0))...((

,...,1,0

))...((min

)ˆ(

min

1

0

1

1

2

2

1

1

2

1

0

1

1

2

2

1

1),...,,(

1

2

),...,,(

110

110

m

i

i

n

in

n

ini

j

i

j

m

i

i

n

in

n

iniaaa

m

i

ii

aaa

axaxaxayx

nja

E

axaxaxay

yyEwhere

E

n

n

Page 8: Problem whose solutions can be 'ranked'

General Least Square Regression

m

i

i

m

i

i

n

i

m

i

i

n

i

o

n

n

m

i

m

i

n

i

m

i

n

i

m

i

n

i

m

i

n

i

n

i

m

i

n

i

n

i

m

i

n

i

m

i

n

i

n

i

m

i

n

i

n

i

m

i

i

j

i

m

i

j

i

m

i

i

j

in

m

i

n

i

j

in

m

i

n

i

j

i

m

i

i

n

in

n

ini

j

i

y

yx

yx

a

a

a

xx

xxxxx

xxxxx

yxaxaxxaxxaxx

axaxaxayx

1

1

2

1

1

2

1

11

2

1

1

1

2

1

22

1

12

1

1

1

21

1

11

1

0

1

1

1

1

2

1

2

1

1

1

1

0

1

1

2

2

1

1

1

)()(...)()(

0))...((

Page 9: Problem whose solutions can be 'ranked'

Homework #4

i

ii

i

ii

ii

m

i

ii

iiii

m

i

iii

m

i

iiba

m

i

ii

y

yx

b

a

x

xx

ybaxbaxyb

E

yxbxaxbaxyxa

E

b

E

a

E

baxyE

yyEE

1

)1()(0))((2

)()(0))((2

0

))((minmin

)ˆ( where,min

2

1

2

1

2

1),(

1

2

Page 10: Problem whose solutions can be 'ranked'

Caveats LS

Democracy, everybody gets an equal say

Perform badly with “outliers”

X

Y

Page 11: Problem whose solutions can be 'ranked'

noisy data

outliers

Noisy Data vs.

Outliers

Page 12: Problem whose solutions can be 'ranked'

Outliers

Outliers (y) Outliers (x, leverage points)

Page 13: Problem whose solutions can be 'ranked'

Computer Vision and Image Analysis

Randomized Algorithm

Choose p points at random from the set of n

data points

Compute the fit of model to the p points

Compute the median of the fitting error for

the remaining n-p points

The fitting procedure is repeated until a fit

is found with sufficiently small median of

squared residuals or up to some

predetermined number of fitting steps

(Monte Carlo Sampling)

Page 14: Problem whose solutions can be 'ranked'

How Many Trials? Well, theoretically it is C(n,p) to find all

possible p-tuples

Very expensive

tries inset good oneleast at got :))1(1(1

tries all indata badgot :))1(1(

bad is sample oneleast at :)1(-1

good are samples all:)1(

data good of fraction:)1(

data bad of fraction:

))1(1(1

mp

m

p

mp

mp

p

p

mp

Page 15: Problem whose solutions can be 'ranked'

How Many Trials (cont.)

Make sure the probability is high (e.g. >95%)

given p and epsilon, calculate m

p 5% 10

%

20

%

25

%

30

%

40

%

50

%

1 1 2 2 3 3 4 5

2 2 2 3 4 5 7 11

3 2 3 5 6 8 13 23

4 2 3 6 8 11 22 47

5 3 4 8 12 17 38 95

Page 16: Problem whose solutions can be 'ranked'

Best Practice

Randomized selection

can completely

remove outliers

“plutocratic”

Results are based on a

small set of features

LS is most fair,

everyone get an equal

say

“democratic”

But can be seriously

influenced by bad data

Use randomized algorithm to remove outliers

Use LS for final “polishing” of results (using all

“good” data)

Allow up to 50% outliers theoretically

Page 17: Problem whose solutions can be 'ranked'

Navigation

Autonomous Land Vehicle (ALV),

Google’s Self-Driving Car, etc.

One important requirement: track the

position of the vehicle

Kalman Filter, loop of

(Re)initialization

Prediction

Observation

Correction

Page 18: Problem whose solutions can be 'ranked'
Page 19: Problem whose solutions can be 'ranked'

Navigation

Hypothesis and verification

Classic Approach like Kalman Filter maintains

a single hypothesis

Newer approach like particle filter maintains

multiple hypotheses (Monte Carlo sampling of

the state space)

Page 20: Problem whose solutions can be 'ranked'

Single Hypothesis

If the “distraction” – noise – is white and

Gaussian

State-space probability profile remains

Gaussian (a single dominant mode)

Evolving and tracking the mean, not a

whole distribution

Page 21: Problem whose solutions can be 'ranked'

Multi-Hypotheses

The distribution can have multiple modes

Sample the probability distribution with

“importance” rating

Evolve the whole distribution, instead of

just the mean

Page 22: Problem whose solutions can be 'ranked'

Key – Baeys Rule

In the day time, some animal runs in front of

you on the bike path, you know exactly what it

is (p(o|si) is sufficient)

In the night time, some animal runs in front of

you on the bike path, you can hardly distinguish

the shape (p(o|si) is low for all cases, but you

know it is probably a squirrel, not a lion

because of p(si))

nobservatioo

states

sPsopop

sPsop

op

soposP ii

iiii

:

:

)()|()(

)()|(

)(

),()|(

Page 23: Problem whose solutions can be 'ranked'

Initialization: before observation and measurement

Observation: after seeing a door

P(s): probability of state

P(o|s): probably of observation given current state

Page 24: Problem whose solutions can be 'ranked'

Prediction : internal mechanism saying that robot moves right

Correction : prediction is weighed by confirmation with observation

Page 25: Problem whose solutions can be 'ranked'
Page 26: Problem whose solutions can be 'ranked'

new particles

+ weights

controlsmeasurements

Particles + weights

Total weights

Page 27: Problem whose solutions can be 'ranked'

Why Be Stochastic?

More Choices – remove bad data

More Alternatives – sample the problem

states based on likelihood