Uncertainty
Assumptions Inherent in Deductive Logic-based Systems
• All the assertions we wish to make and use are universally true.
• Observations of the world (percepts) are complete and error-free.
• All conclusions consistent with our knowledge are equally viable.
• All the desirable inference rules are truth-preserving.
Completely Accurate Assertions
• Initial intuition: if an assertion is not completely accurate, replace it by several more specific assertions.
• Qualification Problem: would have to add too many preconditions (or might forget to add some).
• Example:
Complete and Error-Free Perception
• Errors are common: biggest problem in use of Pathfinder for diagnosis of lymph system disorder is human error in feature detection.
• Some tests are impossible, too costly, or dangerous. “We could determine if your hip pain is really due to a lower back problem if we cut these nerve connections.”
Consistent Conclusions are Equal
• A diagnosis of either early smallpox or cowpox is consistent with our knowledge and observations.
• But cowpox is more likely (e.g., if the sores are on your cow-milking hand).
Truth-Preserving Inference
• Even if our inference rules are truth-preserving, if there’s a slight probability of error in our assertions or observations, during chaining (e.g., resolution) these probabilities can compound quickly, and we are not estimating them.
Solution: Reason Explicitly About Probabilities
• Full joint distributions.
• Certainty factors attached to rules.
• Dempster-Shafer Theory.
• Qualitative probability and non-monotonic reasoning.
• Possibility theory (within fuzzy logic, which itself does not deal with probability).
• Bayesian Networks.
Start with the Terminology of Most Rule-based Systems
• Atomic proposition: assignment of a value to a variable.
• Domain of possible values: variables may be Boolean, Discrete (finite domain), or Continuous.
• Compound assertions can be built with standard logical connectives.
Rule-based Systems (continued)
• State of the world (model, interpretation): a complete setting of all the variables.
• States are mutually exclusive (at most one is actually the case) and collectively exhaustive (at least one must be the case).
• A proposition is equivalent to the set of all states in which it is true; standard compositional semantics of logic applies.
To Add Probability
• Replace variables with random variables.
• State of the world (setting of all the random variables) will be called an atomic event.
• Apply probabilities or degrees of belief to propositions: P(Weather=sunny) = 0.7.
• Alternatively, apply probabilities to atomic events – full joint distribution
Prior Probability
• The unconditional or prior probability of a proposition is the degree of belief accorded to it in the absence of any other information.
• Example: P(Cavity = true) or P(cavity)
• Example: P(Weather = sunny)
• Example: P(cavity (Weather = sunny))
Probability Distribution
• Allow us to talk about the probabilities of all the possible values of a random variable
• For discrete random variables: P(Weather=sunny)=0.7 P(Weather=rain)=0.2 P(Weather=cloudy)=0.08 P(Weather=snow)=0.02
• For the continuous case, a probability density function (p.d.f.) often can be used.
P Notation• P(X = x) denotes the probability that the
random variable X takes the value x;
• P(X) denotes a probability distribution over X.
For example:
P(Weather) = <0.7, 0.2, 0.08, 0.02>
Joint Probability Distribution• Sometimes we want to talk about the
probabilities of all combinations of the values of a set of random variables.
• P(Weather, Cavity) denotes the probabilities of all combinations of the values of the pair of random variables (4x2 table of probabilities)
Full Joint Probability Distribution
• The joint probability distribution that talks about the complete set of random variables used to describe the world is called the full joint probability distribution
• P(Cavity, Catch, Toothache, Weather) denotes the probabilities of all combinations of the values of the random variables (2x2x2x4) table of probabilities with 32 entries.
ExampleToothache Cavity Catch ProbabilityT T T 0.108T T F 0.012T F T 0.016T F F 0.064F T T 0.072F T F 0.008F F T 0.144F F F 0.576
Probability of a Proposition
• Recall that any proposition a is viewed as equivalent to the set of atomic events in which a holds call this set e(a);
• the probability of a proposition a is equal to the sum of the probabilities of atomic events in which a holds:
e(a)
i)P(e)(ei
aP
ExampleP(cavity) = 0.108 +0.012 + 0.072 + 0.008
= 0.2
P(cavity toothache) = 0.108 +0.012
P(cavity v toothache) = 0.108 +0.012 + 0.072 + 0.008 + 0.016 = 0.28
The Axioms of Probability
• For any proposition a,
• P(true) = 1 and P(false) = 0
• The probability of a disjunction is given by
0 P ( ) 1a
P ( ) = P ( ) + P ( ) - P ( )a b a b a b
Conditional (Posterior) Probability
• P(a|b): the probability of a given that all we know is b.
• P(cavity|toothache) = 0.8: if a patient has a toothache, and no other information is available, the probability that the patient has a cavity is 0.8.
• To be precise:P(a|b) = P(a b) / P(b)
Example
P ( | ) = P ( )
P ( )
= = 0 .6
cavity too thachecavity too thache
too thache
0 1 0 8 0 0 1 2
0 1 0 8 0 0 1 2 0 0 1 6 0 0 6 4
. .
. . . .
Related Example
P ( | ) = P ( )
P ( )
= = 0 .4
cavity too thachecavity too thache
too thache
0 0 1 6 0 0 6 4
0 1 0 8 0 0 1 2 0 0 1 6 0 0 6 4
. .
. . . .
Normalization
• In the two preceding examples the denominator (P(toothache)) was the same, and we looked at all possible values for the variable Cavity given toothache.
• The denominator can be viewed as a normalization constant .
• We don’t have to compute the denominator -- just normalize 0.12 and 0.08 to sum to 1.
Product Rule
• From conditional probabilities we obtain the product rule:
P ( ) = P ( | ) P ( )
P ( ) = P ( | ) P ( )
a b a b b
a b b a a
Bayes’ RuleR eca ll p ro d u c t ru le :
P ( ) = P ( ) P ( )
P ( ) = P ( ) P ( )
E q u a tin g r ig h t - h an d sid es an d d iv id in g b y P ( ):
P ( ) = P ( ) P ( )
P ( )
F o r m u lti - v a lu ed v ariab les an d :
) = ( ) ( )
( )
a b a b b
a b b a a
a
b aa b b
a
X Y
Y XX Y Y
X
|
|
||
||
PP P
P(
Bayes’ Rule with Background Evidence
O ften w e ' ll w an t to u se B ay es ' R u le
co n d itio n a lized o n so m e b ack g ro u n d
ev id en ce :
( , ) = ( , ) ( | )
( | )
e
P eP e P e
P eY |X
X |Y Y
X
Example of Bayes’ Rule
• P(stiff neck|meningitis) = 0.5
• P(meningitis) = 1/50,000
• P(stiff neck) = 1/20
• Then P(meningitis|stiff neck) =
P ( | ) P (
P ( ) =
=
stiff neck m en ing itis m en ing itis
stiff neck
)
( . )( / , )
/.
0 5 1 5 0 0 0 0
1 2 00 0 0 0 2
Normalization with Bayes’ Rule
• P(stiff neck|meningitis) and P(meningitis) are relatively easy to estimate from medical records.
• Prior probability of stiff neck is harder to estimate accurately.
• Bayes’ rule with normalization:
P P P( ) = ( ) ( )Y |X X |Y Y
Normalization with Bayes’ Rule (continued)
M igh t
stiff neck m en ing itis m en ing itis
stiff neck m en ing itis m en ing itis
stiff neck
b e easie r to co m p u te
P ( | ) P ( ) an d
P ( | ) P ( )
th an to co m p u te
P ( ) .
Why Use Bayes’ Rule
• Causal knowledge such as P(stiff neck|meningitis) often is more reliable than diagnostic knowledge such as P(meningitis|stiff neck).
• Bayes’ Rule lets us use causal knowledge to make diagnostic inferences (derive diagnostic knowledge).
Boldface Notation
• Sometimes we want to write an equation that holds for a vector (ordered set) of random variables. We will denote such a set by boldface font. So Y denotes a random variable, but Y denotes a set of random variables.
• Y = y denotes a setting for Y, but Y=y denotes a setting for all variables in Y.
Marginalization & Conditioning
• Marginalization (summing out): for any sets of variables Y and Z:
• Conditioning(variant of marginalization):
P P(Y ) = (Y , z ) z Z
P P P(Y ) = (Y | z ) (z ) z Z
O ften w an t to d o th is fo r ( ) in stead o f (Y ).
R eca ll ( ) = ( )
( )
P P
PP
P
Y |X
Y |XX Y
X
General Inference Procedure
• Let X be a random variable about which we want to know its probabilities, given some evidence (values e for a set E of other variables). Let the remaining (unobserved) variables be Y. The query is P(X|e), and it can be answered by
P e P e P e yy
( | ) = (X , ) = (X , , )
X