Physical Variables Social Variables Personality Variables Context Variables.
Exercises of Random Variables - docencia.ac.upc.edu
Transcript of Exercises of Random Variables - docencia.ac.upc.edu
1
Exercises of Random Variables
2
Exercise
• Show that the necessary and suficient condition for a random variable on NN to have a geometric distribution is that it should have the property:
– For each natural number n and m.
)()/( nXPmXmnXP >=>+>
3
geometric distribution
• Random variable that models the number of trials until a success or failure.
• requirements :– number of trials is potentially infinite– two outcomes per trial; success and failure– outcomes statistically independent– trials have the same probability of success
L1,2,3,ifor )1()( 1 =−== − ppiXP i
4
Exercise
• Meaning of:
• Probability of waiting n minuts more given that you have waited m is independent of m.– Applications:
• Queue at the bus stop (Relate to Poison rv)• Queue at a hub or a relay (is the model correct?)• Expected survival time
– Illness, or protocol design.
)()/( nXPmXmnXP >=>+>
Like its continuous analogue (the exponential distribution), the geometric distribution is memoryless.
5
Exercise
• Property to be shown:
• Definition: Geometric Random Variable:
• The distribution function is
)()/( nXPmXmnXP >=>+>
L1,2,3,ifor )1()( 1 =−== − ppiXP i
nn
k
kn
k
nk
ni
i
pp
pp
pppppppnXP
=−
−=
=−=−=−=> ∑∑∑∞
=
∞
=
+
+−=
∞
+=
−
11
)1(
)1( )1( )1()(Series
Geometric001)(nikc.v. 1
1
npnXP => )(
6
Exercise
• If A then B:
)()(
)()/( nXPp
pp
mXPmnXP
mXmnXP nm
mn
>===>
+>=>+>
+
npnXP => )(
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
1 6 11 16 21 26 31 36 41 46 51 56 61
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.11 6 11 16 21 26 31 36 41 46 51 56 61
m n+m
npnXP => )( )()/( nXPmXmnXP >=>+>
7
Exercise
• On the other hand If B then A:
)1()()1()(then
and then
property thehas )( that Suppose
11
111
1
122
111
aaaamXPmXPmXP
aaaaaaaaa
aa
aanXP
mmm
mmmmmnmn
nm
mnn
−=−=>−−>==
====
==>
−−
−−+
+
L
)()(
)()/( nXP
mXPmnXP
mXmnXP >=>
+>=>+>
8
Example of a rv with memory
• A Pareto distribution when used to model a queue has memory:
– For each natural number n and m.– Meaning:
• Probability of waiting n minuts more given that you have waited m is greater than at the arrival.
• Richer get richer: "80-20 rule" which says that 20% of the population owns 80% of the wealth.
• The more you wait, the more you are expected to wait
)()/( nXPmXmnXP >>>+>
9
Example of a rv with memory• Examples of uses of the Pareto Distribution:
– * Frequencies of words in longer texts (a few words are used often, lots of words are used infrequently)
– * The sizes of human settlements (few cities, many hamlets/villages)– * File size distribution of Internet traffic which uses the TCP protocol (many smaller
files, few larger ones)– * Clusters of Bose-Einstein condensate near absolute zero– * The values of oil reserves in oil fields (a few large fields, many small fields)– * The length distribution in jobs assigned supercomputers (a few large ones, many
small ones)– * The standardized price returns on individual stocks– * Sizes of sand particles– * Sizes of meteorites– * Numbers of species per genus (There is subjectivity involved: The tendency to
divide a genus into two or more increases with the number of species in it)– * Areas burnt in forest fires
10
Cities and firms
• Zipf distribution of U.S. firm sizes
Axtell, R. L. (2001), "Zipf distribution of U.S. firm sizes", Science
11
Web sites visits
• Distribution of AOL users' visits to various sites on a December day in 1997
Zipf, Power-laws, and Pareto - a ranking tutorial Lada A. Adamic
Comments from B.A. Huberman
12
Word frequencies in a text
13
Speculative Prices
• Mandelbrot’s paper on long tail densities
14
Speculative Prices
• Mandelbrot’s paper on long tail densities– An interesting result
http://classes.yale.edu/fractals/Panorama/ManuFractals/Internet/Internet4.html
15
Burstiness property
• Burstiness in cities & internet trafic
The image below (composed of several satellite pictures) gives an idea of the degree of economic agglomeration in the world economy.
An introduction to geographical economics
Steven Brakman, Harry Garretsen, and Charles van Marrewijk
16
Analisys of the Pareto distribution
• We will compute the value:
• Remember the definition:
• The conditioned probability is:
0 with )( 0 >
=> α
α
mm
mXP
)/( nXmnXP >+>
)()(
)/(mXP
mnXPmXmnXP
>+>
=>+>
17
Analisys of the Pareto distribution
• We will compute the value:
• The conditioned probability is:
ααα
αα
α
α
=>>
+
+
+
+
=
++
=>+>
nn
nXPmn
mmn
m
mnm
mnm
mm
mnmn
mXmnXP
0
00
0
00
0
0
00
)(
)/(
)()(
)/(mXP
mnXPmXmnXP
>+>
=>+>
18
Analisys of the Pareto distribution• Simulation:
– Message: the longer you wait, the more you will wait
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12
Value of n
Pro
bab
ility
P(X>n+10/X>10) P(X>n)
αα
+
+
=>+>mn
mmn
mmXmnXP
00
0)/(α
=>
nn
nXP 0)(
1 and 10 00 == nm
19
Negative Binomial distribution
• Generalization of a Geometric distribution:• Def. Probability of r successes in n
Bernouilli trials. Trials independent and identically distributed.
( )
1 1
1
2 2 2 2
1r=1 (1 ) (1 )
0
1r=2 1 (1 ) (1 )
1
General case
1r
1
N N
N
N N
T
T
T T
HHH HHHH
HH HHHH
H
NT p p p p
TN
N p p pHH HHH
HH H H
p
H
T
r
T T
N
− −
−
− −
− → − = −
− → − − = −
−−
L1442443
LO
L
L
1 (1 )
1
r N rNp p
r
T
−
− → − −
M
L
20
Negative Binomial distribution
• General expression:– Probability of r successes in n Bernouilli trials.
Trials independent and identically distributed.
• Examples:– Disk redundancies– Coding theory. Error correction– Banach Matches.
1( ) (1 )
1r N rN
P X r p pr
−− = = − −
21
Banach’s Matches
• ExampleA pipe-smoking mathematician carries, at all times, 2matchboxes, 1 in his left-hand pocket and 1 in hisright-hand pocket. Each time he needs a match he isequally likely to take it from either pocket. Consider themoment when the mathematician first discovers that one ofhis matchboxes is empty. If it is assumed that bothmatchboxes initially contained N matches, what is theprobability that there are exactly k matches in he otherbox, k = 0, 1, ...,N?
See Feller
22
Banach’s Matches
• Note that it is a negative binomial, at least must have N+1 successes in one of the boxes.
• The success number (N+1) occurs at the (N+1)+(N-k)=2N-k trial.
12Prob( ) 2 ( ( 1)) 2 (1 )N N kN k
k P X N p pN
+ −− = = + = −
23
Banach’s Matches
• Applications:– Allocations of files in a disk system.– Heap management.
12Prob( ) 2 ( ( 1)) 2 (1 )N N kN k
k P X N p pN
+ −− = = + = −
24
• Models the number of successes k in a sequence of n draws from a finite population without replacement. – Size of the population: m– Observed successes: k– Favorable objects: r– Number of draws: n
Hypergeometric Random
{ }Wht
{ }Blck{ }Wht{ }Wht
{ }Wht{ }Wht
{ }Blck{ }Blck
{ }Blck
{ }Wht{ }Wht
25
• Random Variable Y=k– Size of the population: m– Observed successes: k– Favorable objects: r– Number of draws: n
Hypergeometric Random
{ }Wht
{ }Blck{ }Wht{ }Wht
{ }Wht{ }Wht
{ }Blck{ }Blck
{ }Blck
{ }Wht{ }Wht
( )
r m rk n k
P Y kmn
− − = =
26
Application: capture-recapture problem
• Lake containing m fish where m is unknown. We capture r of the fish, tag them, and return them to the lake.
• Next we capture n of the fish and observe Y, the number of tagged fish in the sample.
Y rn m
=
Size of the population: mObserved successes: kFavorable objects: rNumber of draws: n
27
Application: capture-recapture problem
• Caveat:– Diffusion problem
takes for granted that the observed value is the meanY rn m
=
( )
r m rk n k
P Y kmn
− − = =
28
Observation
* ( / )p P white observation composition of the urn=
*p
ˆ ˆ( )pf p
Urn:3 White7 Black
( / )P composition of the urn white observation
Application: capture-recapture problem
• Caveat:
– Variability arround the most probable value
takes for granted that the observed value is the meanY rn m
=
29
Example
• A computer cluster of 24 machines, at a given moment has 3 with high load processes. What is the probability of getting k loaded machines if 5 are selected at random?
3 215
( )245
k kP Y k
− = =
3 210 5 19*18*17
( 0) 0.478724 24*23*225
P Y
= = = =
!!
30
Combinatorial Methods.Lotto6/49
• Lotto6/49: 6 numbers+ 1 complementary are selected from 49. A multiple bet means selecting r from the 49 numbers– Probability of guessing k from the winning combination.– Probability of guessing k AND the complentary– Probability of guessing k AND Not the complentary
{ }1i
b { }6i
b{ }5i
b{ }4i
b{ }3i
b{ }2i
b { }7i
b
{ }1 2 3 48 49, , , , ,b b b b bL { }1 2 7, , ,i i iL { }1 2 7, , ,i i ib b bL→ →
Example taken from VÉLEZ , HERNÁNDEZ, Cálculo de Probabilidades
31
Combinatorial Methods.Lotto6/49
• Number of ways for guessing n results.
496
r rk k
− −
{ }1b { }6b{ }5b{ }4b{ }3b{ }2b
{ }1 2, , , ki i iL
Different sets with the non-selected winning numbers.
Different sets with the winning numbers
496
Pr( )496
r rk k
n
− − =
•Probability of guessing k
32
Combinatorial Methods.Lotto6/49
• Probability of guessing k AND the complentary
{ }1i
b { }6ib{ }
5ib{ }4i
b{ }3ib{ }
2ib { }
7ib
( )
49 496 1 6
Pr( )49 49
43 436 6
r r r k r rk k k k
n r k
− − − − − = = −
Different sets with the non-selected winning numbers.
The complementary can be any of the remaining r-k
33
Combinatorial Methods.Lotto6/49
• Probability of guessing k AND NOT the complentary
{ }1i
b { }6i
b{ }5i
b{ }4i
b{ }3i
b{ }2i
b { }7i
b
( )
( )( )
49 49 6 496 1 6
Pr( ) 4349 49
43 436 6
r r r k r rk k k k
n r k
− − + − − − − = = − −
Compementary cannot be
•in the marked r, •nor in (6-k) non-marked but winner numbers.
34
Binomial Random Variables
• Most important discrete probability distribution.• Model:
– Two possible outcomes: Success/Failure– Probabilities: Success=p / Failure=1-p– We compound n independent Bernouilli trials.– Define the random variable:
X=Total number of successes in n indep. Bernouilli trials
35
Binomial Random Variables
• Distribution.
X=Total number of successes in n indep. Bernouilli trials
• Model:– Two possible outcomes: Success/Failure– Probabilities: Success=p / Failure=1-p
– We compound n independent Bernouilli trials.
( ) (1 ) 0,1, 2,3 , k n knP X k p p k n
k−
= = − =
L
Successes in trialsk n
T TT T
TTT
HH H HHHH H HHH H
HHH HH
LL
OL1442443
36
Binomial Random VariablesExample
• Overbooking:– An aircraft has a capacity of 150 tickets. The airline
management sells 160 tickets in order to protect themselves against no-show passengers.
– Experience shows that the probability of a passenger being a no-show is of 0.1. The booked passengers act independengly of each other.
– Given this overbooking strategy, what is the probability that some passengers will be left out?.
Taken from H.Tijms, understanding probability
37
Binomial Random VariablesExample
• Overbooking:– The problem can be seen as 160 independent trials of a
bernouilli experiment with a success rate of 9/10, where a passenger who shows up for the flight is counted as a success.
– We define X=number of passengers that show up.– X is binomially distributed with parameters n=160, and p=9/10.
– The probability is P(X>150)
151
( 150) (1 ) 0.0359n
k n k
k
nP X p p
k−
=
> = − =
∑
more than 150 Successes in 160 trials
T TT T T THH H HH TTL14444244443
Taken from H.Tijms, understanding probability