ECE-517: Reinforcement Learning in Artificial...

18
1 Lecture 3: Review of basic probability theory Dr. Itamar Arel College of Engineering Electrical Engineering and Computer Science Department The University of Tennessee Fall 2015 August 27, 2015 ECE - 517: Reinforcement Learning in Artificial Intelligence

Transcript of ECE-517: Reinforcement Learning in Artificial...

Page 1: ECE-517: Reinforcement Learning in Artificial Intelligenceweb.eecs.utk.edu/~ielhanan/courses/ECE-517/notes/lecture3.pdfECE-517 - Reinforcement Learning in AI 9 Bernoulli and Geometric

1

Lecture 3: Review of basic probability theory

Dr. Itamar Arel

College of EngineeringElectrical Engineering and Computer Science Department

The University of TennesseeFall 2015

August 27, 2015

ECE-517: Reinforcement Learning in Artificial Intelligence

Page 2: ECE-517: Reinforcement Learning in Artificial Intelligenceweb.eecs.utk.edu/~ielhanan/courses/ECE-517/notes/lecture3.pdfECE-517 - Reinforcement Learning in AI 9 Bernoulli and Geometric

ECE-517 - Reinforcement Learning in AI 2

Outline

Probability theory fundamentals

Random variables

Page 3: ECE-517: Reinforcement Learning in Artificial Intelligenceweb.eecs.utk.edu/~ielhanan/courses/ECE-517/notes/lecture3.pdfECE-517 - Reinforcement Learning in AI 9 Bernoulli and Geometric

ECE-517 - Reinforcement Learning in AI 3

Basic definitions

The collection or set of "all possible" distinct outcomes of an experiment is called the sample space of theexperiment/trial. Flipping a coin {H,T}

Rolling a die {1,2,3,4,5,6}

Outcomes – Elements of the sample space

Event - The possible outcome of a experiment/trial

An Experiment

The Sample Space

Page 4: ECE-517: Reinforcement Learning in Artificial Intelligenceweb.eecs.utk.edu/~ielhanan/courses/ECE-517/notes/lecture3.pdfECE-517 - Reinforcement Learning in AI 9 Bernoulli and Geometric

ECE-517 - Reinforcement Learning in AI 4

More definitions

Independence Two experiments are independent if the outcome

of either one does not depend on the outcome of the other

Deterministic - outcome of a trial is predictable (100%)Randomness The absence of any patternA sample space is called discrete if it is a finiteor a countable infinite set, otherwise it is calledcontinuousProbability can be viewed as the likelihood of anevent occurring

Page 5: ECE-517: Reinforcement Learning in Artificial Intelligenceweb.eecs.utk.edu/~ielhanan/courses/ECE-517/notes/lecture3.pdfECE-517 - Reinforcement Learning in AI 9 Bernoulli and Geometric

ECE-517 - Reinforcement Learning in AI 5

Fundamentals

Let S denote the sample space and Ai the set of all possible outcomes with probabilities P(Ai), respectively

P(Ai) 0 for all i

P(Ai) =1

For example, a probabilistic model might present the length of a packet sent over a network

Two events A and B are called mutually exclusive, or disjoint, if they have no common outcomes

P(A + B) = P(A) + P(B) - P(AB)

Often, P(A + B) = P(A) + P(B)

A

B

AB

Page 6: ECE-517: Reinforcement Learning in Artificial Intelligenceweb.eecs.utk.edu/~ielhanan/courses/ECE-517/notes/lecture3.pdfECE-517 - Reinforcement Learning in AI 9 Bernoulli and Geometric

ECE-517 - Reinforcement Learning in AI 6

Conditional Probability

Conditional probability

P(A|B) = P(AB)/P(B)

A and B are defined as independent events if and only if,

P(AB) = P(A)P(B)

P(A|B) = P(A)

Bayes’ Rule Consider two events, A and B, where P(AB) = P(A|B)P(B) and

P(BA) = P(B|A)P(A)

But P(AB) = P(BA) , so P(A|B)P(B) = P(B|A)P(A) and P(A|B) = P(AB)/P(B)

)(

)()|()|(

BP

APABPBAP

Page 7: ECE-517: Reinforcement Learning in Artificial Intelligenceweb.eecs.utk.edu/~ielhanan/courses/ECE-517/notes/lecture3.pdfECE-517 - Reinforcement Learning in AI 9 Bernoulli and Geometric

ECE-517 - Reinforcement Learning in AI 7

Outline

Probability theory fundamentals

Random variables

Page 8: ECE-517: Reinforcement Learning in Artificial Intelligenceweb.eecs.utk.edu/~ielhanan/courses/ECE-517/notes/lecture3.pdfECE-517 - Reinforcement Learning in AI 9 Bernoulli and Geometric

ECE-517 - Reinforcement Learning in AI 8

Discrete Random Variables

A random variable (r.v.) is a function that assigns a real number to each outcome in the sample space of a random experiment

For a discrete r.v. X, the probability mass function (PMF) gives

the probability that X will take on a particular value in its range.

We note this by PX, i.e.

PX(x) = P(X=x)

The expected value of a discrete r.v. X is defined by

E[X] = x PX(x)

The variance of X is defined as

E(X -E[X])2 = E[X 2]-E[X]2

Question: in what scenario will the variance be zero ?

Page 9: ECE-517: Reinforcement Learning in Artificial Intelligenceweb.eecs.utk.edu/~ielhanan/courses/ECE-517/notes/lecture3.pdfECE-517 - Reinforcement Learning in AI 9 Bernoulli and Geometric

ECE-517 - Reinforcement Learning in AI 9

Bernoulli and Geometric Random Variables with parameter p

X is a Bernoulli r.v. with parameter p if it can take on values 1 (success) and 0 (failure) with

P(x=1)=p

P(x=0)=1-p

Example: Packet arrivals may be modeled as either correct (1) or erroneous (0)

Given a sequence of independent Bernoulli r.v.’s, let T be the number of successes observed up to and including the first. Then T will have a geometric distribution; its PMF is given by

P(T=n)=(1-p)n-1p

E[T]=1/p

Page 10: ECE-517: Reinforcement Learning in Artificial Intelligenceweb.eecs.utk.edu/~ielhanan/courses/ECE-517/notes/lecture3.pdfECE-517 - Reinforcement Learning in AI 9 Bernoulli and Geometric

ECE-517 - Reinforcement Learning in AI 10

Memoryless property – the fact that there were n time steps separating success events has no influence on future events

The memoryless property makes it very useful in various analysis tasks

Geometric distribution with N = 16, p = 0.3

Page 11: ECE-517: Reinforcement Learning in Artificial Intelligenceweb.eecs.utk.edu/~ielhanan/courses/ECE-517/notes/lecture3.pdfECE-517 - Reinforcement Learning in AI 9 Bernoulli and Geometric

ECE-517 - Reinforcement Learning in AI 11

Binomial Random Variable with parameters p and n

Let S denote the number of successes out of nindependent Bernoulli r.v.’s. The PMF is given by

for k = 0,1,…, n.

The expected number of successes is given by

Example: if packets arrive correctly at a node in a network

with probability p (independently); then the number of

correct arrivals out of n is a Binomial r.v.

knk ppk

nkSP

)1()(

E[S] = np

Page 12: ECE-517: Reinforcement Learning in Artificial Intelligenceweb.eecs.utk.edu/~ielhanan/courses/ECE-517/notes/lecture3.pdfECE-517 - Reinforcement Learning in AI 9 Bernoulli and Geometric

ECE-517 - Reinforcement Learning in AI 12

Mean of a Binomial r.v.

Note that:

Page 13: ECE-517: Reinforcement Learning in Artificial Intelligenceweb.eecs.utk.edu/~ielhanan/courses/ECE-517/notes/lecture3.pdfECE-517 - Reinforcement Learning in AI 9 Bernoulli and Geometric

ECE-517 - Reinforcement Learning in AI 13

Examples (from ECE-453)

Consider the following network. Packets transmitted from Router Ato Router B have a packet error rate (PER) of pAB, while packetstransmitted from Router B to Router C have a PER of pBC. The packeterror rates are assumed to be independent.

pAB

A B C

pBC

• If all traffic from Router A to Router C traverses Router B, what is the probabilitythat all N packets transmitted from Router A to Router C are received correctly?

• Given that N packets were transmitted from Router A to Router B, write anexpression for the probability that at least m of those N packets are receivedcorrectly.

• Assuming Router A has transmitted N packets to Router C (via Router B), what isthe probability that exactly m packets (where m<N) are received correctly at RouterC?

Page 14: ECE-517: Reinforcement Learning in Artificial Intelligenceweb.eecs.utk.edu/~ielhanan/courses/ECE-517/notes/lecture3.pdfECE-517 - Reinforcement Learning in AI 9 Bernoulli and Geometric

ECE-517 - Reinforcement Learning in AI 14

CDF and PDF

The Cumulative Distribution Function (cdf) of a r.v. X,

FX(x), is defined as the probability of the event {X x}

Axioms related are:

The probability density function (pdf) of a r.v. X, fX(x), is

defined as the derivative of the CDF

xxXPxFX ],[)(

)()( ba

0)( lim ,1)( lim ,1)(0xx

bFaFthenif

xFxFxF

XX

XXX

k}{X(k)Pdx

xdFxf X

XX Pr

)()(

Page 15: ECE-517: Reinforcement Learning in Artificial Intelligenceweb.eecs.utk.edu/~ielhanan/courses/ECE-517/notes/lecture3.pdfECE-517 - Reinforcement Learning in AI 9 Bernoulli and Geometric

ECE-517 - Reinforcement Learning in AI 15

Exponential Distribution

Continuous random variable

Continuous-time analogy to the geometric distribution (memoryless properties hold)

Models lifetime, inter-arrival times,…

Page 16: ECE-517: Reinforcement Learning in Artificial Intelligenceweb.eecs.utk.edu/~ielhanan/courses/ECE-517/notes/lecture3.pdfECE-517 - Reinforcement Learning in AI 9 Bernoulli and Geometric

ECE-517 - Reinforcement Learning in AI 16

Minimum of Independent Exponential rvs

Assume X1, X2, …, Xn, are Independent Exponentials

Page 17: ECE-517: Reinforcement Learning in Artificial Intelligenceweb.eecs.utk.edu/~ielhanan/courses/ECE-517/notes/lecture3.pdfECE-517 - Reinforcement Learning in AI 9 Bernoulli and Geometric

ECE-517 - Reinforcement Learning in AI 17

Memoryless Property

True for Geometric and Exponential Dist.:

The coin does not remember that it came up tails l times

Root cause of Markov property (discussed later)

Page 18: ECE-517: Reinforcement Learning in Artificial Intelligenceweb.eecs.utk.edu/~ielhanan/courses/ECE-517/notes/lecture3.pdfECE-517 - Reinforcement Learning in AI 9 Bernoulli and Geometric

ECE-517 - Reinforcement Learning in AI 18

Useful Results

The following are some results that are useful for manipulating many of the equations that may arise when dealing with discrete-time probabilistic models

when |x|<1,

Differentiating both sides of the previous equation yields another useful expression:

n

k

nk

x

xx

0

1

1

1

0 1

1

k

k

xx

02)1(k

k

x

xkx