Download - Artificial Intelligence - Department of Theoretical ...ktiml.mff.cuni.cz/~bartak/ui2/lectures/lecture05eng.pdf · Artificial Intelligence Roman Barták Department of Theoretical Computer

ArtificialIntelligence

Roman BartákDepartment of Theoretical Computer Science and Mathematical Logic

Rationaldecisions

Wearedesigningrationalagentsthatmaximizeexpectedutility.Probabilitytheoryisatoolfordealingwithdegreesofbelief(aboutworldstates,actioneffectsetc.).Now,weexploreutilitytheorytorepresentandreasonwithpreferences.

Finally,wecombinepreferences(asexpressedbyutilities)withprobabilitiesinthegeneraltheoryofrationaldecisions– decisiontheory.

Utilitytheory

Theagent‘spreferencesarecapturedbyautilityfunction,U(s),whichassignsasinglenumbertoexpressdesirabilityofastate.Theexpectedutilityofanactiongiventheevidenceisjusttheaveragevalueofoutcomes,weightedbytheirprobabilitiesEU(a|e)=∑s P(Result(a)=s|a,e)U(s)

Arationalagentshouldchoosetheactionthatmaximizes theagent’sexpectedutility(MEU)action=argmaxa EU(a|e)

TheMEUprincipleformalizesthegeneralnotionthattheagentshould“dotherightthing”,butweneedmakeitoperational.

Rationalpreferences

Frequently,itiseasierforanagenttoexpresspreferencesbetweenstates:

– A>B:theagentprefersAoverB– A<B:theagentprefersBoverA– A~B:theagentisindifferentbetweenAandB

WhatsortofthingsareAandB?– Theycouldbestatesofoftheworld,butmoreoftentannotthere

isuncertaintyaboutwhatisreallybeingoffered.– Wecanthinkofthesetofoutcomesforeachactionasalottery

(possibleoutcomesS1,…,Sn thatoccurwithprobabilitiesp1,…,pn)• [p1,S1;…;pn,Sn]

Anexampleoflottery(foodinairplanes)Chickenorpasta?

– [0.8,juicychicken;0.2,overcooked chicken]– [0.7,deliciouspasta;0.3,congealedpasta]

Propertiesofrationalpreferences

Rationalpreferencesshouldleadtomaximizingexpectedutility(iftheagentviolatesthemitwillexhibitpatentlyirrationalbehaviorinsomesituations.Werequireseveralconstraints(theaxiomsofutilitytheory)thatrationalpreferencesshouldobey.

– orderability:exactlyoneof(A>B)or(A<B) or(A~B)holds

– transitivity:(A<B)∧ (B<C)⇒ (A<C)

– continuity:(A>B>C)⇒ ∃p[p,A;1-p,C] ~B

– substitutability:A~B⇒ [p,A;1-p,C] ~ [p,B;1-p,C]

– monotonicity:A>B⇒ (p>q⇔ [p,A;1-p,B] >[q,A;1-q,B]

– decomposability:[p,A;1-p,[q,B;1-q,C]] ~ [p,A;(1-p)q,B;(1-p)(1-q),C]

Preferencesleadtoutility

Teh axiomsofutilitytheoryareaxiomsaboutpreferencesbutwecanderivethefollowingconsequencesformthem..

Existenceofutilityfunctionsuchthat:U(A)<U(B)⇔ A<BU(A)=U(B)⇔ A~B

Expectedutilityofalottery:U([p1,S1;…;pn,Sn] )=∑i pi U(Si)

Autilityfunctionexistsforanyrationalagentbutitisnotunique:

U‘(S)=aU(S)+bExistenceofautilityfunctiondoesnotnecessarilymeanthattheagentisexplicitlymaximizingthatutilityfunction.Byobservingitspreferencesanobservercanconstructtheutilityfunction(eveniftheagentdoesnotknowit).

Utilityfunctions

Utilityisafunctionthatmapsfromlotteriestorealnumbers.Wemustfirstworkoutwhattheagent‘sutilityfunctionis(preferenceelicitation).

– Wewillbelookingforanormalized utilityfunction.– Wefixtheutilityofa“bestpossibleprize”Smax to1,U(Smax)=1.

– Similarly,a“worstpossiblecatastrophe”Smin ismappedto0,U(Smin)=0.

– Now,toassesstheutilityofanyparticularprizeSweasktheagenttochoosebetweenSandastandard lottery[p, Smax;1-p, Smin]

– Theprobabilitypisadjusteduntiltheagentisindifferentbetween Sandthestandardlottery.

– ThentheutilityofSisgivenby,U(S)=p.

Theutilityofmoney

Universalexchangeabilityofmoneyforallkindsofgoodsandservicessuggeststhatmoneyplaysasignificantroleinhumanutilityfunctions.

– Anagentprefersmoremoney toless,allotherthingsbeingequal.

Butthisdoesnotmeanthatmoneybehavesasautilityfunction(becauseitsaysnothingaboutpreferencesbetweenlotteriesinvolvingmoney).Assumethatyouwonacompetitionandthehostoffersyouachoice:eitheryoucantakethe1mil.USDpriceoryoucangambleitontheflipofcoin.Ifthecoincomesupheads, youendupwithnothing, butifitcomesuptails,youget2.5mil.USD.Whatisyourchoice?

– Expectedmonetary valueofthegambleis1.250.000USD.– Mostpeopledeclinethegambleandpocketthemillion.Are

theybeingirrational?

the utility of money

Theutilityofmoney

Thedecisioninthepreviousgamedoesnotdependontheprizeonlybutalsoonthewealthoftheplayer!LetSn denoteastateofpossessingtotalwealthnUSD,andthecurrentwealthiskUSD.Thetheexpectedutilitiesoftwoactionsare:

– EU(Accept)=½U(Sk)+½U(Sk+2.500.000)– EU(Decline)=U(Sk+1.000.000)

SupposeweassignU(Sk)=5,U(Sk+1.000.000)=8,U(Sk+2.500.000)=9.Thentherationaldecisionwouldbetodecline!

risk-averse area (prefer a sure thing with a payoff less than the expected monetary value of a gamble)

risk-seeking behavior in this area

an agent that has a linear curve is said to be risk-neutral.

Humanjudgment(certaintyeffect)

Theevidencesuggeststhathumansare“predictableirrational“.Allaisparadox

– A:80%chanceof4000USD– B:100%chanceof3000USD

Whatisyourchoice?• MostpeopleconsistentlypreferBoverA(takingthesurething!)

– C:20%chanceof4000USD– D:25%chanceof3000USD

Whatisyourchoice?• MostpeoplepreferCoverD(higherexpectedmonetaryvalue)

Certaintyeffect– peoplearestronglyattractedtogainsthatarecertain

Humanjudgment(ambiguityaversion)

EllsbergparadoxTheurncontains1/3redballs,and2/3eitherblackoryellowballs.– A:100USDforaredball– B:100USDforablackball

Whatisyourchoice?• MostpeoplepreferAoverB(Agivesa1/3chanceofwinning,whileBcouldbeanywherebetween0and2/3)

– C:100USDforaredoryellowball– D:100USDforablackoryellowball

Whatisyourchoice?• MostpeoplepreferDoverC(Dgivesyoua2/3chance,whileCcouldanywherebetween1/3and3/3)

However,ifyouthinktherearemoreredthanblackballsthenyourshouldpreferAoverBandCoverD.

Ambiguityaversion– mostpeopleelecttheknownprobabilityratherthantheunknownunknown.

Humanjudgment

Framingeffect– theexactwordingofadecisionproblemcanhaveabigimpactontheagent‘schoices

– medicalprocedureAhas90%survivalrate– medicalprocedureBhas10%deathrate

Whatisyourchoice?• MostpeoplepreferAoverBthoughbothchoicesareidentical

Anchoringeffect– peoplefeelmorecomfortablemakingrelativeutilityjudgmentsratherthanabsoluteones

– Therestauranttakesadvantageofthisbyofferinga$200bottlethatitknowsnobodywillbuy,butwhichservestoskewupwardthecustomer’sestimateofthevalueofallwinesandmakethe$55bottleseemlikeabargain.

Multi-attributeutilitytheory

Inreal-lifetheoutcomesarecharacterizedbytwoormoreattributessuchascostandsafetyissues–multi-attributeutilitytheory.Wewillassumethathighervaluesofanattributecorrespondtohigherutilities.Thequestionishowtogetpreferencesformoreattributes– withoutcombiningtheattributevaluesintoasingleutilityvalue– dominance

– combiningtheattributevaluesintoasingleutilityvalue

Dominance

Ifanoptionisoflowervalueonallattributesthatsomeotheroption,itneednotbeconsideredfurther– strictdominance.

Strictdominancecanbedefinedforuncertainoutcomestoo.– ifallpossibleoutcomesofBstrictlydominateallpossibleoutcomesofA

Strictdominancewillprobablyoccurlessoftenthaninthedeterministiccase.

Stochasticdominance

Stochasticdominanceoccursmorefrequentlyinrealproblems.Itiseasiertounderstandinthecontextofasinglevariable.Stochasticdominance isbestseenbyexaminingthecumulativedistribution thatmeasurestheprobabilitythatthatthecostislessthanorequalanygivenamount(itintegratestheoriginaldistribution).

probability distribution cumulative distribution

Preferencestructure

Tospecifythecompleteutilityfunctionfornattributeseachhavingdvalues,weneeddnvaluesintheworstcase.– Thiscorrespondstoasituationinwhichagent‘spreferenceshavenoregularityatall.

Preferencesoftypicalagentshavemuchmorestructuresothetheutilityfunctioncanbeexpressedas:

U(x1,…,xn)=F[f1(x1),…,fn(xn)]

Preferencestructure(withoutuncertainty)

Thebasicregularityiscalledpreferenceindependence.TwoattributesX1 andX2 arepreferentiallyindependentofathirdattributeX3 ifthepreferencebetweenoutcomes� x1,x2,x3�and� x’1,x’2,x3�doesnotdependontheparticularvaluex3.Ifeachpairofattributesispreferentiallyindependentofanyotherattribute,wetalkaboutmutualpreferentialindependence(MPI).Ifattributesaremutuallypreferentiallyindependentthentheagent’spreferencebehaviorcanbedescribedasmaximizingthefunction:

U(x1,…,xn)=∑i Ui(xi)Avaluefunctionofthistypeiscalledanadditivevaluefunction.

Preferencestructure(withuncertainty)

Whenuncertaintyispresentweneedtoconsiderthestructureofpreferencesbetweenlotteries.Formutuallyutilityindependent(MUI)attributeswecanusemultiplicativeutilityfunction:U=k1U1 +k2U2 +k3U3+k1k2U1U2 +k2k3U2U3 +k1k3U1U3+k1k2k3U1U2U3

FornattributesexhibitingMUIwecanrepresenttheutilityfunctionusingnconstantsandnsingle-attributeutilities.

Thevalueofinformation

Sofarwehaveassumedthatallrelevantinformationisprovidedtotheagentbeforeitmakesitsdecision.Inpractice,thisishardlyeverthecase.Forexample,adoctorcannotexpecttobeprovidedwiththeresultofallpossiblediagnostictests.Oneofthemostimportantpartsofdecisionmakingisknowingwhatquestionstoask.

Wewillnowlookatinformationvaluetheory,whichenablesanagenttochoosewhichinformationtoacquire.

Thevalueofinformation(example)

Supposeanoilcompanyishopingtobuyoneofthenindistinguishableblocksofocean-drillingrights.Letusassumefurther thatexactlyoneoftheblockscontainoilworthCdollars,whileothersareworthless.TheaskingpriceofeachblockisC/n.

ExpectedmonetaryvalueofbuyingoneblockisC/n – C/n =0.Nowsuppose thataseismologistoffers thecompanytheresultofasurveyofonespecificblock,whichindicatesdefinitelywhethertheblockcontainsoil.

Howmuchshouldthecompanytopayforthatinformation?– Withprobability1/n,thesurveywillindicateoilinagivenblockandthethe

companywillbuyitandmakeaprofitC– C/n.– Withprobability(n-1)/n,thesurveywillshowthattheblockcontainsnooil,

inwhichcasethecompanywillbuyanotherblock.Nowtheprobabilityoffindingoilinthatotherblockis1/(n-1),sotheexpectedprofitisC/(n-1)–C/n.

– Togethertheexpectedprofitgiventhesurveyinformationis:1/n(C– C/n)+(n-1)/n(C/(n-1)– C/n)=C/n

ThereforethecompanyshouldbewillingtopaytheseismologistuptoC/n dollarsfortheinformation:theinformationisworthasmuchastheblockitself.

Thevalueofinformation(ageneralformula)

WeassumethatexactevidencecanbeobtainedaboutthevalueofsomerandomvariableEj – thisiscalledvalueofperfectinformation (VPI).Thevalueofthecurrentbestaction& (withtheinitialevidencee)isdefinedby:

EU(&|e)=maxa ∑s‘ P(Result(a)=s‘|a,e)U(s‘)

Thevalueofthebestaction&jk afterthenewevidenceEj =ejk isobtainedisdefinedby:

EU(&jk|e,Ej=ejk)=maxa∑s‘ P(Result(a)=s‘|a,e, Ej=ejk)U(s‘)

ButthevalueofEj iscurrentlyunknownsowemustaverageoverallpossiblevaluesthatwemightdiscoverforEj:

VPIe(Ej)=(∑k P(Ej = ejk|e)EU(&jk|e,Ej=ejk))- EU(&|e)

Thevalueofinformation(qualitatively)

Whenisitbeneficialtoobtainnewinformation?

Informationhasvaluetotheextendthat– itislikelytocauseachangeofaplanand– thenewplanwillbesignificantlybetterthattheoldplan.

Clear choice, the information is not needed

The choice is unclear and the information is crucial

The choice is unclear, the information is less valuable

Propertiesofthevalueofinformation

Isitpossibleforinformationtobedeleterious?Theexpectedvalueofinformationisnonnegative.∀e,Ej VPIe(Ej)≥ 0

Thevalueofinformationisnotadditive.VPIe(Ej,Ek)≠ VPIe(Ej)+VPIe(Ek)

Theexpectedvalueofinformationisorderindependent.

VPIe(Ej,Ek)=VPIe(Ej)+VPIe,ej(Ek)=VPIe(Ek)+VPIe,ek(Ej)

Informationgathering

Asensibleagentshould– askquestionsinareasonableorder– avoidaskingquestionsthatareirrelevant– takeintoaccounttheimportanceofeachpieceofinformationinrelation

itscost– stopaskingquestionswhenthatisappropriate

Weassumethatwitheachobservableevidence variableEj,thereisanassociatedcost,Cost(Ej).Information-gatheringagentcanselect(greedily)themostefficientobservationuntilnoobservationworthitscosts(myopicapproach)