Artificial Intelligence - Department of Theoretical...

13
Artificial Intelligence Roman Barták Department of Theoretical Computer Science and Mathematical Logic Rational decisions We are designing rational agents that maximize expected utility. Probability theory is a tool for dealing with degrees of belief (about world states, action effects etc.). Now, we explore utility theory to represent and reason with preferences. Finally, we combine preferences (as expressed by utilities) with probabilities in the general theory of rational decisions – decision theory.

Transcript of Artificial Intelligence - Department of Theoretical...

Page 1: Artificial Intelligence - Department of Theoretical ...ktiml.mff.cuni.cz/~bartak/ui2/lectures/lecture05eng.pdf · Artificial Intelligence Roman Barták Department of Theoretical Computer

ArtificialIntelligence

Roman BartákDepartment of Theoretical Computer Science and Mathematical Logic

Rationaldecisions

Wearedesigningrationalagentsthatmaximizeexpectedutility.Probabilitytheoryisatoolfordealingwithdegreesofbelief(aboutworldstates,actioneffectsetc.).Now,weexploreutilitytheorytorepresentandreasonwithpreferences.

Finally,wecombinepreferences(asexpressedbyutilities)withprobabilitiesinthegeneraltheoryofrationaldecisions– decisiontheory.

Page 2: Artificial Intelligence - Department of Theoretical ...ktiml.mff.cuni.cz/~bartak/ui2/lectures/lecture05eng.pdf · Artificial Intelligence Roman Barták Department of Theoretical Computer

Utilitytheory

Theagent‘spreferencesarecapturedbyautilityfunction,U(s),whichassignsasinglenumbertoexpressdesirabilityofastate.Theexpectedutilityofanactiongiventheevidenceisjusttheaveragevalueofoutcomes,weightedbytheirprobabilitiesEU(a|e)=∑s P(Result(a)=s|a,e)U(s)

Arationalagentshouldchoosetheactionthatmaximizes theagent’sexpectedutility(MEU)action=argmaxa EU(a|e)

TheMEUprincipleformalizesthegeneralnotionthattheagentshould“dotherightthing”,butweneedmakeitoperational.

Rationalpreferences

Frequently,itiseasierforanagenttoexpresspreferencesbetweenstates:

– A>B:theagentprefersAoverB– A<B:theagentprefersBoverA– A~B:theagentisindifferentbetweenAandB

WhatsortofthingsareAandB?– Theycouldbestatesofoftheworld,butmoreoftentannotthere

isuncertaintyaboutwhatisreallybeingoffered.– Wecanthinkofthesetofoutcomesforeachactionasalottery

(possibleoutcomesS1,…,Sn thatoccurwithprobabilitiesp1,…,pn)• [p1,S1;…;pn,Sn]

Anexampleoflottery(foodinairplanes)Chickenorpasta?

– [0.8,juicychicken;0.2,overcooked chicken]– [0.7,deliciouspasta;0.3,congealedpasta]

Page 3: Artificial Intelligence - Department of Theoretical ...ktiml.mff.cuni.cz/~bartak/ui2/lectures/lecture05eng.pdf · Artificial Intelligence Roman Barták Department of Theoretical Computer

Propertiesofrationalpreferences

Rationalpreferencesshouldleadtomaximizingexpectedutility(iftheagentviolatesthemitwillexhibitpatentlyirrationalbehaviorinsomesituations.Werequireseveralconstraints(theaxiomsofutilitytheory)thatrationalpreferencesshouldobey.

– orderability:exactlyoneof(A>B)or(A<B) or(A~B)holds

– transitivity:(A<B)∧ (B<C)⇒ (A<C)

– continuity:(A>B>C)⇒ ∃p[p,A;1-p,C] ~B

– substitutability:A~B⇒ [p,A;1-p,C] ~ [p,B;1-p,C]

– monotonicity:A>B⇒ (p>q⇔ [p,A;1-p,B] >[q,A;1-q,B]

– decomposability:[p,A;1-p,[q,B;1-q,C]] ~ [p,A;(1-p)q,B;(1-p)(1-q),C]

Preferencesleadtoutility

Teh axiomsofutilitytheoryareaxiomsaboutpreferencesbutwecanderivethefollowingconsequencesformthem..

Existenceofutilityfunctionsuchthat:U(A)<U(B)⇔ A<BU(A)=U(B)⇔ A~B

Expectedutilityofalottery:U([p1,S1;…;pn,Sn] )=∑i pi U(Si)

Autilityfunctionexistsforanyrationalagentbutitisnotunique:

U‘(S)=aU(S)+bExistenceofautilityfunctiondoesnotnecessarilymeanthattheagentisexplicitlymaximizingthatutilityfunction.Byobservingitspreferencesanobservercanconstructtheutilityfunction(eveniftheagentdoesnotknowit).

Page 4: Artificial Intelligence - Department of Theoretical ...ktiml.mff.cuni.cz/~bartak/ui2/lectures/lecture05eng.pdf · Artificial Intelligence Roman Barták Department of Theoretical Computer

Utilityfunctions

Utilityisafunctionthatmapsfromlotteriestorealnumbers.Wemustfirstworkoutwhattheagent‘sutilityfunctionis(preferenceelicitation).

– Wewillbelookingforanormalized utilityfunction.– Wefixtheutilityofa“bestpossibleprize”Smax to1,U(Smax)=1.

– Similarly,a“worstpossiblecatastrophe”Smin ismappedto0,U(Smin)=0.

– Now,toassesstheutilityofanyparticularprizeSweasktheagenttochoosebetweenSandastandard lottery[p, Smax;1-p, Smin]

– Theprobabilitypisadjusteduntiltheagentisindifferentbetween Sandthestandardlottery.

– ThentheutilityofSisgivenby,U(S)=p.

Theutilityofmoney

Universalexchangeabilityofmoneyforallkindsofgoodsandservicessuggeststhatmoneyplaysasignificantroleinhumanutilityfunctions.

– Anagentprefersmoremoney toless,allotherthingsbeingequal.

Butthisdoesnotmeanthatmoneybehavesasautilityfunction(becauseitsaysnothingaboutpreferencesbetweenlotteriesinvolvingmoney).Assumethatyouwonacompetitionandthehostoffersyouachoice:eitheryoucantakethe1mil.USDpriceoryoucangambleitontheflipofcoin.Ifthecoincomesupheads, youendupwithnothing, butifitcomesuptails,youget2.5mil.USD.Whatisyourchoice?

– Expectedmonetary valueofthegambleis1.250.000USD.– Mostpeopledeclinethegambleandpocketthemillion.Are

theybeingirrational?

Page 5: Artificial Intelligence - Department of Theoretical ...ktiml.mff.cuni.cz/~bartak/ui2/lectures/lecture05eng.pdf · Artificial Intelligence Roman Barták Department of Theoretical Computer

the utility of money

Theutilityofmoney

Thedecisioninthepreviousgamedoesnotdependontheprizeonlybutalsoonthewealthoftheplayer!LetSn denoteastateofpossessingtotalwealthnUSD,andthecurrentwealthiskUSD.Thetheexpectedutilitiesoftwoactionsare:

– EU(Accept)=½U(Sk)+½U(Sk+2.500.000)– EU(Decline)=U(Sk+1.000.000)

SupposeweassignU(Sk)=5,U(Sk+1.000.000)=8,U(Sk+2.500.000)=9.Thentherationaldecisionwouldbetodecline!

risk-averse area (prefer a sure thing with a payoff less than the expected monetary value of a gamble)

risk-seeking behavior in this area

an agent that has a linear curve is said to be risk-neutral.

Humanjudgment(certaintyeffect)

Theevidencesuggeststhathumansare“predictableirrational“.Allaisparadox

– A:80%chanceof4000USD– B:100%chanceof3000USD

Whatisyourchoice?• MostpeopleconsistentlypreferBoverA(takingthesurething!)

– C:20%chanceof4000USD– D:25%chanceof3000USD

Whatisyourchoice?• MostpeoplepreferCoverD(higherexpectedmonetaryvalue)

Certaintyeffect– peoplearestronglyattractedtogainsthatarecertain

Page 6: Artificial Intelligence - Department of Theoretical ...ktiml.mff.cuni.cz/~bartak/ui2/lectures/lecture05eng.pdf · Artificial Intelligence Roman Barták Department of Theoretical Computer

Humanjudgment(ambiguityaversion)

EllsbergparadoxTheurncontains1/3redballs,and2/3eitherblackoryellowballs.– A:100USDforaredball– B:100USDforablackball

Whatisyourchoice?• MostpeoplepreferAoverB(Agivesa1/3chanceofwinning,whileBcouldbeanywherebetween0and2/3)

– C:100USDforaredoryellowball– D:100USDforablackoryellowball

Whatisyourchoice?• MostpeoplepreferDoverC(Dgivesyoua2/3chance,whileCcouldanywherebetween1/3and3/3)

However,ifyouthinktherearemoreredthanblackballsthenyourshouldpreferAoverBandCoverD.

Ambiguityaversion– mostpeopleelecttheknownprobabilityratherthantheunknownunknown.

Humanjudgment

Framingeffect– theexactwordingofadecisionproblemcanhaveabigimpactontheagent‘schoices

– medicalprocedureAhas90%survivalrate– medicalprocedureBhas10%deathrate

Whatisyourchoice?• MostpeoplepreferAoverBthoughbothchoicesareidentical

Anchoringeffect– peoplefeelmorecomfortablemakingrelativeutilityjudgmentsratherthanabsoluteones

– Therestauranttakesadvantageofthisbyofferinga$200bottlethatitknowsnobodywillbuy,butwhichservestoskewupwardthecustomer’sestimateofthevalueofallwinesandmakethe$55bottleseemlikeabargain.

Page 7: Artificial Intelligence - Department of Theoretical ...ktiml.mff.cuni.cz/~bartak/ui2/lectures/lecture05eng.pdf · Artificial Intelligence Roman Barták Department of Theoretical Computer

Multi-attributeutilitytheory

Inreal-lifetheoutcomesarecharacterizedbytwoormoreattributessuchascostandsafetyissues–multi-attributeutilitytheory.Wewillassumethathighervaluesofanattributecorrespondtohigherutilities.Thequestionishowtogetpreferencesformoreattributes– withoutcombiningtheattributevaluesintoasingleutilityvalue– dominance

– combiningtheattributevaluesintoasingleutilityvalue

Dominance

Ifanoptionisoflowervalueonallattributesthatsomeotheroption,itneednotbeconsideredfurther– strictdominance.

Strictdominancecanbedefinedforuncertainoutcomestoo.– ifallpossibleoutcomesofBstrictlydominateallpossibleoutcomesofA

Strictdominancewillprobablyoccurlessoftenthaninthedeterministiccase.

Page 8: Artificial Intelligence - Department of Theoretical ...ktiml.mff.cuni.cz/~bartak/ui2/lectures/lecture05eng.pdf · Artificial Intelligence Roman Barták Department of Theoretical Computer

Stochasticdominance

Stochasticdominanceoccursmorefrequentlyinrealproblems.Itiseasiertounderstandinthecontextofasinglevariable.Stochasticdominance isbestseenbyexaminingthecumulativedistribution thatmeasurestheprobabilitythatthatthecostislessthanorequalanygivenamount(itintegratestheoriginaldistribution).

probability distribution cumulative distribution

Preferencestructure

Tospecifythecompleteutilityfunctionfornattributeseachhavingdvalues,weneeddnvaluesintheworstcase.– Thiscorrespondstoasituationinwhichagent‘spreferenceshavenoregularityatall.

Preferencesoftypicalagentshavemuchmorestructuresothetheutilityfunctioncanbeexpressedas:

U(x1,…,xn)=F[f1(x1),…,fn(xn)]

Page 9: Artificial Intelligence - Department of Theoretical ...ktiml.mff.cuni.cz/~bartak/ui2/lectures/lecture05eng.pdf · Artificial Intelligence Roman Barták Department of Theoretical Computer

Preferencestructure(withoutuncertainty)

Thebasicregularityiscalledpreferenceindependence.TwoattributesX1 andX2 arepreferentiallyindependentofathirdattributeX3 ifthepreferencebetweenoutcomes� x1,x2,x3�and� x’1,x’2,x3�doesnotdependontheparticularvaluex3.Ifeachpairofattributesispreferentiallyindependentofanyotherattribute,wetalkaboutmutualpreferentialindependence(MPI).Ifattributesaremutuallypreferentiallyindependentthentheagent’spreferencebehaviorcanbedescribedasmaximizingthefunction:

U(x1,…,xn)=∑i Ui(xi)Avaluefunctionofthistypeiscalledanadditivevaluefunction.

Preferencestructure(withuncertainty)

Whenuncertaintyispresentweneedtoconsiderthestructureofpreferencesbetweenlotteries.Formutuallyutilityindependent(MUI)attributeswecanusemultiplicativeutilityfunction:U=k1U1 +k2U2 +k3U3+k1k2U1U2 +k2k3U2U3 +k1k3U1U3+k1k2k3U1U2U3

FornattributesexhibitingMUIwecanrepresenttheutilityfunctionusingnconstantsandnsingle-attributeutilities.

Page 10: Artificial Intelligence - Department of Theoretical ...ktiml.mff.cuni.cz/~bartak/ui2/lectures/lecture05eng.pdf · Artificial Intelligence Roman Barták Department of Theoretical Computer

Thevalueofinformation

Sofarwehaveassumedthatallrelevantinformationisprovidedtotheagentbeforeitmakesitsdecision.Inpractice,thisishardlyeverthecase.Forexample,adoctorcannotexpecttobeprovidedwiththeresultofallpossiblediagnostictests.Oneofthemostimportantpartsofdecisionmakingisknowingwhatquestionstoask.

Wewillnowlookatinformationvaluetheory,whichenablesanagenttochoosewhichinformationtoacquire.

Thevalueofinformation(example)

Supposeanoilcompanyishopingtobuyoneofthenindistinguishableblocksofocean-drillingrights.Letusassumefurther thatexactlyoneoftheblockscontainoilworthCdollars,whileothersareworthless.TheaskingpriceofeachblockisC/n.

ExpectedmonetaryvalueofbuyingoneblockisC/n – C/n =0.Nowsuppose thataseismologistoffers thecompanytheresultofasurveyofonespecificblock,whichindicatesdefinitelywhethertheblockcontainsoil.

Howmuchshouldthecompanytopayforthatinformation?– Withprobability1/n,thesurveywillindicateoilinagivenblockandthethe

companywillbuyitandmakeaprofitC– C/n.– Withprobability(n-1)/n,thesurveywillshowthattheblockcontainsnooil,

inwhichcasethecompanywillbuyanotherblock.Nowtheprobabilityoffindingoilinthatotherblockis1/(n-1),sotheexpectedprofitisC/(n-1)–C/n.

– Togethertheexpectedprofitgiventhesurveyinformationis:1/n(C– C/n)+(n-1)/n(C/(n-1)– C/n)=C/n

ThereforethecompanyshouldbewillingtopaytheseismologistuptoC/n dollarsfortheinformation:theinformationisworthasmuchastheblockitself.

Page 11: Artificial Intelligence - Department of Theoretical ...ktiml.mff.cuni.cz/~bartak/ui2/lectures/lecture05eng.pdf · Artificial Intelligence Roman Barták Department of Theoretical Computer

Thevalueofinformation(ageneralformula)

WeassumethatexactevidencecanbeobtainedaboutthevalueofsomerandomvariableEj – thisiscalledvalueofperfectinformation (VPI).Thevalueofthecurrentbestaction& (withtheinitialevidencee)isdefinedby:

EU(&|e)=maxa ∑s‘ P(Result(a)=s‘|a,e)U(s‘)

Thevalueofthebestaction&jk afterthenewevidenceEj =ejk isobtainedisdefinedby:

EU(&jk|e,Ej=ejk)=maxa∑s‘ P(Result(a)=s‘|a,e, Ej=ejk)U(s‘)

ButthevalueofEj iscurrentlyunknownsowemustaverageoverallpossiblevaluesthatwemightdiscoverforEj:

VPIe(Ej)=(∑k P(Ej = ejk|e)EU(&jk|e,Ej=ejk))- EU(&|e)

Thevalueofinformation(qualitatively)

Whenisitbeneficialtoobtainnewinformation?

Informationhasvaluetotheextendthat– itislikelytocauseachangeofaplanand– thenewplanwillbesignificantlybetterthattheoldplan.

Clear choice, the information is not needed

The choice is unclear and the information is crucial

The choice is unclear, the information is less valuable

Page 12: Artificial Intelligence - Department of Theoretical ...ktiml.mff.cuni.cz/~bartak/ui2/lectures/lecture05eng.pdf · Artificial Intelligence Roman Barták Department of Theoretical Computer

Propertiesofthevalueofinformation

Isitpossibleforinformationtobedeleterious?Theexpectedvalueofinformationisnonnegative.∀e,Ej VPIe(Ej)≥ 0

Thevalueofinformationisnotadditive.VPIe(Ej,Ek)≠ VPIe(Ej)+VPIe(Ek)

Theexpectedvalueofinformationisorderindependent.

VPIe(Ej,Ek)=VPIe(Ej)+VPIe,ej(Ek)=VPIe(Ek)+VPIe,ek(Ej)

Informationgathering

Asensibleagentshould– askquestionsinareasonableorder– avoidaskingquestionsthatareirrelevant– takeintoaccounttheimportanceofeachpieceofinformationinrelation

itscost– stopaskingquestionswhenthatisappropriate

Weassumethatwitheachobservableevidence variableEj,thereisanassociatedcost,Cost(Ej).Information-gatheringagentcanselect(greedily)themostefficientobservationuntilnoobservationworthitscosts(myopicapproach)

Page 13: Artificial Intelligence - Department of Theoretical ...ktiml.mff.cuni.cz/~bartak/ui2/lectures/lecture05eng.pdf · Artificial Intelligence Roman Barták Department of Theoretical Computer

© 2016 Roman BartákDepartment of Theoretical Computer Science and Mathematical Logic

[email protected]