ECE-517: Reinforcement Learning in Artificial...
Transcript of ECE-517: Reinforcement Learning in Artificial...
1
Lecture 3: Review of basic probability theory
Dr. Itamar Arel
College of EngineeringElectrical Engineering and Computer Science Department
The University of TennesseeFall 2015
August 27, 2015
ECE-517: Reinforcement Learning in Artificial Intelligence
ECE-517 - Reinforcement Learning in AI 2
Outline
Probability theory fundamentals
Random variables
ECE-517 - Reinforcement Learning in AI 3
Basic definitions
The collection or set of "all possible" distinct outcomes of an experiment is called the sample space of theexperiment/trial. Flipping a coin {H,T}
Rolling a die {1,2,3,4,5,6}
Outcomes – Elements of the sample space
Event - The possible outcome of a experiment/trial
An Experiment
The Sample Space
ECE-517 - Reinforcement Learning in AI 4
More definitions
Independence Two experiments are independent if the outcome
of either one does not depend on the outcome of the other
Deterministic - outcome of a trial is predictable (100%)Randomness The absence of any patternA sample space is called discrete if it is a finiteor a countable infinite set, otherwise it is calledcontinuousProbability can be viewed as the likelihood of anevent occurring
ECE-517 - Reinforcement Learning in AI 5
Fundamentals
Let S denote the sample space and Ai the set of all possible outcomes with probabilities P(Ai), respectively
P(Ai) 0 for all i
P(Ai) =1
For example, a probabilistic model might present the length of a packet sent over a network
Two events A and B are called mutually exclusive, or disjoint, if they have no common outcomes
P(A + B) = P(A) + P(B) - P(AB)
Often, P(A + B) = P(A) + P(B)
A
B
AB
ECE-517 - Reinforcement Learning in AI 6
Conditional Probability
Conditional probability
P(A|B) = P(AB)/P(B)
A and B are defined as independent events if and only if,
P(AB) = P(A)P(B)
P(A|B) = P(A)
Bayes’ Rule Consider two events, A and B, where P(AB) = P(A|B)P(B) and
P(BA) = P(B|A)P(A)
But P(AB) = P(BA) , so P(A|B)P(B) = P(B|A)P(A) and P(A|B) = P(AB)/P(B)
)(
)()|()|(
BP
APABPBAP
ECE-517 - Reinforcement Learning in AI 7
Outline
Probability theory fundamentals
Random variables
ECE-517 - Reinforcement Learning in AI 8
Discrete Random Variables
A random variable (r.v.) is a function that assigns a real number to each outcome in the sample space of a random experiment
For a discrete r.v. X, the probability mass function (PMF) gives
the probability that X will take on a particular value in its range.
We note this by PX, i.e.
PX(x) = P(X=x)
The expected value of a discrete r.v. X is defined by
E[X] = x PX(x)
The variance of X is defined as
E(X -E[X])2 = E[X 2]-E[X]2
Question: in what scenario will the variance be zero ?
ECE-517 - Reinforcement Learning in AI 9
Bernoulli and Geometric Random Variables with parameter p
X is a Bernoulli r.v. with parameter p if it can take on values 1 (success) and 0 (failure) with
P(x=1)=p
P(x=0)=1-p
Example: Packet arrivals may be modeled as either correct (1) or erroneous (0)
Given a sequence of independent Bernoulli r.v.’s, let T be the number of successes observed up to and including the first. Then T will have a geometric distribution; its PMF is given by
P(T=n)=(1-p)n-1p
E[T]=1/p
ECE-517 - Reinforcement Learning in AI 10
Memoryless property – the fact that there were n time steps separating success events has no influence on future events
The memoryless property makes it very useful in various analysis tasks
Geometric distribution with N = 16, p = 0.3
ECE-517 - Reinforcement Learning in AI 11
Binomial Random Variable with parameters p and n
Let S denote the number of successes out of nindependent Bernoulli r.v.’s. The PMF is given by
for k = 0,1,…, n.
The expected number of successes is given by
Example: if packets arrive correctly at a node in a network
with probability p (independently); then the number of
correct arrivals out of n is a Binomial r.v.
knk ppk
nkSP
)1()(
E[S] = np
ECE-517 - Reinforcement Learning in AI 12
Mean of a Binomial r.v.
Note that:
ECE-517 - Reinforcement Learning in AI 13
Examples (from ECE-453)
Consider the following network. Packets transmitted from Router Ato Router B have a packet error rate (PER) of pAB, while packetstransmitted from Router B to Router C have a PER of pBC. The packeterror rates are assumed to be independent.
pAB
A B C
pBC
• If all traffic from Router A to Router C traverses Router B, what is the probabilitythat all N packets transmitted from Router A to Router C are received correctly?
• Given that N packets were transmitted from Router A to Router B, write anexpression for the probability that at least m of those N packets are receivedcorrectly.
• Assuming Router A has transmitted N packets to Router C (via Router B), what isthe probability that exactly m packets (where m<N) are received correctly at RouterC?
ECE-517 - Reinforcement Learning in AI 14
CDF and PDF
The Cumulative Distribution Function (cdf) of a r.v. X,
FX(x), is defined as the probability of the event {X x}
Axioms related are:
The probability density function (pdf) of a r.v. X, fX(x), is
defined as the derivative of the CDF
xxXPxFX ],[)(
)()( ba
0)( lim ,1)( lim ,1)(0xx
bFaFthenif
xFxFxF
XX
XXX
k}{X(k)Pdx
xdFxf X
XX Pr
)()(
ECE-517 - Reinforcement Learning in AI 15
Exponential Distribution
Continuous random variable
Continuous-time analogy to the geometric distribution (memoryless properties hold)
Models lifetime, inter-arrival times,…
ECE-517 - Reinforcement Learning in AI 16
Minimum of Independent Exponential rvs
Assume X1, X2, …, Xn, are Independent Exponentials
ECE-517 - Reinforcement Learning in AI 17
Memoryless Property
True for Geometric and Exponential Dist.:
The coin does not remember that it came up tails l times
Root cause of Markov property (discussed later)
ECE-517 - Reinforcement Learning in AI 18
Useful Results
The following are some results that are useful for manipulating many of the equations that may arise when dealing with discrete-time probabilistic models
when |x|<1,
Differentiating both sides of the previous equation yields another useful expression:
n
k
nk
x
xx
0
1
1
1
0 1
1
k
k
xx
02)1(k
k
x
xkx