Axioms for Neural Normalization - Adam...
Transcript of Axioms for Neural Normalization - Adam...
AxiomsforNeuralNormalization
AdamBrandenburgerJ.P.VallesProfessor,NYUSternSchoolofBusinessDistinguishedProfessor,NYUTandon SchoolofEngineeringFacultyDirector,NYUShanghaiProgramonCreativity+InnovationGlobalNetworkProfessorNewYorkUniversity
7/13/178:19PM 2
“Abasicpresuppositionofthisbookisthatchoicebehaviorisbestdescribedasaprobabilistic,notanalgebraic,phenomenon….
Thepresentlyunanswerablequestioniswhichapproachwill,inthelongrun,giveamoreparsimoniousandcompleteexplanationofthetotalrangeofphenomena.”
-- R.DuncanLuce (1959)
Luce,R.D.,IndividualChoiceBehavior:ATheoreticalAnalysis,Wiley,1959;http://img2.yardbarker.com/media/8/8/884c6ca6f1b4a25bc0597be7da8ea8d7c4c41d66/xl/Random-Numbers.jpg?stamp=1491141319
7/13/178:19PM 3
ADecisionModelBasedonNeuralNormalizationFindings
ρ(x,A) = Pr (x = argmaxy∈A
v(y)F(A)
+εy )
Probabilityofchoosingalternativefromchoiceset
x
A
Utilityfunction
Normalizationfactor
Noise
Levy,D.,andP.Glimcher (2012),“TheRootofAllValue:ANeuralCommonCurrencyforChoice,”CurrentOpinioninNeurobiology,222012,1027-1038;Heeger,D.,“NormalizationofCellResponsesinCatStriateCortex,”VisualNeuroscience,9,1992,181-197;Louie,K.,L.Grattan,andP.Glimcher,“RewardValue-BasedGainControl:DivisiveNormalizationinParietalCortex,”JournalofNeuroscience,31,2011,10627-10639
7/13/178:19PM 4
WhatKindofOptimizationProcessIsThis?
Canweunderstandthenormalizationmodelin(somekindof)acost-benefitframework?
Canweidentifythespecificroleplayedbythenormalizationfactor insuchaframework?
Westartwiththe(grounded)axiomthatbehaviorisstochastic
Stochasticitycanbereducedbutweassumethatreductioninvolvesacost(thiscanbejustifiedonbasicthermodynamicgrounds)
Ourdecisionmakerthereforefacesatrade-off betweenthe(opportunity)costofstochasticityandthe(energetic)costofdeterminism
WespecifythecostofdecreasingstochasticitytobeproportionaltotheassociateddecreaseinShannonentropy (Shannon,1948)
Shannon,C.,“AMathematicalTheoryofCommunication,”BellSystemTechnicalJournal,27,1948,379-423
7/13/178:19PM 5
FromBoundedRationalitytoGroundedRationality
Simon,H.,“ABehavioralModelofRationalChoice,”QuarterlyJournalofEconomics,69,1955,99-118;http://www.biografiasyvidas.com/biografia/s/fotos/simon_herbert_a.jpg
Atheoryofdecisionmakingshouldbeconsistentwith
“theaccesstoinformationandthecomputationalcapacitiesthatareactuallypossessedbytheorganism”-- HerbertSimon(1955)
7/13/178:19PM 6
Decision-MakingFramework
Letbeafinitesetofalternatives
Letbethecollectionofallnonemptysubsets(“choicesets”)from
Arandomchoicerule isafunctionsuchthat
X
ℑ X
ρ : X ×ℑ→ [0,1]ρ(x,A)> 0 ifandonlyifand
TheinterpretationisthatistheprobabilitythattheDMchoosesalternativewhenfacedwithchoiceset
Letdenotethesetofallrandomchoicerules(forgiven)
x ∈ A
ρ(x,A) =1x∈Α∑ forall A ∈ℑ
ρ(x,A)x A
℘ X
Steverson,K.,A.Brandenburger,andP.Glimcher,“RationalImprecision:Information-Processing,Neural,andChoice-RulePerspectives,”workingpaper,February2017
7/13/178:19PM 7
Information-ProcessingModel
Letbethevalueofalternative
Theexpectedutilityofchoiceruleonchoicesetis
TheassociatedShannonentropyis
Letdenotethemaximumpossibleentropy(achievedwhenisuniform)
Weemployaproportionalityfactorinourspecificationofthecostofchoiceruleonchoiceset,whichis
v(x)
ρ
F(A)
ρ(x,A)v(x)x∈Α∑
ρ
x
A
H (ρ,A) = − ρ(x,A)lnρ(x,A)x∈Α∑
Hmax (A)
ρ AF(A)(Hmax (A)−H (ρ,A))
7/13/178:19PM 8
Information-ProcessingModelcontd.
Arandomchoicerulehasaninformation-processingformulation ifthereexistfunctionsandsuchthatforall
Note:Ifwerelaxourdefinitionofarandomchoiceruletoallowforthepossibilityofassigningprobabilityzerotoanavailablealternative,thisturnsoutnevertobeoptimal.Thisisbecausethederivativeofentropybecomeinfiniteasaprobabilityapproacheszero.
v : X→ (0,∞)
ρ ∈ argmaxρ̂∈℘
ρ̂(x,A)v(x)−F(A)(Hmax (A)−H (ρ̂x∈Α∑ ,A))
⎧⎨⎩
⎫⎬⎭
A ∈ℑ
ρF :ℑ→ (0,∞)
7/13/178:19PM 9
Information-ProcessingFoundation
Wecanprovethatarandomchoicerulehasaneural-normalizationform(whenthenoiseisGumbel-distributed)ifandonlyifithasaninformation-processingformulation
Thenormalizationfactor inthefirstformulationbecomesapartofthecostfunction inthesecondformulation,whichgivesa(partial?)normativefoundationforthisfactor
Atthetechnicallevel,theequivalencerelieson:
1.areinterpretationofstandardmathfromthermodynamics(derivationoftheBoltzmanndistributionviafreeenergyminimization)
2.aneasyextensionofstandardmathfromestimationofchoicemodels(McFadden,1978)
ρ
Steverson,K.,A.Brandenburger,andP.Glimcher,op.cit.;McFadden,D.,“ModellingtheChoiceofResidentialLocation,”inKarlqvist,S.,L.Lundqvist,F.Snickars,andJ.Weibull (eds.),SpatialInteractionTheoryandPlanningModels,Volume673,North-Holland,1978,75-96
ComparisonwithOtherModels
1.Thefactordistinguishesthismodelfromrandomutility
2.TheGumbeldistributionarisesastheasymptoticdistributionofthemaximumofasequenceofi.i.d.normalr.v.’s (canthisfactbeusedtogroundthemodelfurther?)
3.Neuralstudieshaveoftentakentobeasum(whereisaconstant):
7/13/178:19PM 10
ρ(x,A) = Pr (x = argmaxy∈A
v(y)F(A)
+εy )
F(A)
𝐹 𝐴 = 𝜎 +&𝑣(𝑧)�
,∈.
F(A) 𝜎
7/13/178:19PM 11
WhatKindofBehaviorDoesNormalizationPermit?
Arandomchoiceruleobeysregularity ifwhen,i.e.,addingalternativesreducesexistingprobabilities
TheclassicLuce rule (axiomatized byindependencefromirrelevantalternatives)obeysregularity
Departuresfromregularityareroutinelyobservedinexperiments(asinthe“attractioneffect”)
≈10% ≈90% ≈30%≈30% ≈40%
ρ ρ(x,B) ≤ ρ(x,A)x ∈ A ⊆ B
Huber,J.,J.Payne,andC.Puto,“AddingAsymmetricallyDominatedAlternatives:ViolationsofRegularityandtheSimilarityHypothesis,”JournalofConsumerResearch,9,1982,90-98;Simonson,I.,“ChoiceBasedonReasons:TheCaseofAttractionandCompromiseEffects,”JournalofConsumerResearch,16,1989,158-174;imagesfromTheAttractionEffectExplained,MarketingExplained channel,https://www.youtube.com/watch?v=pybVPm5VU5A
?
7/13/178:19PM 12
IndependencefromIrrelevantAlternatives
TheIIArulecharacterizestheLucemodelofstochasticchoice(Luce,1959)
Luce,R.D.,IndividualChoiceBehavior:ATheoreticalAnalysis,Wiley,1959
A Bx�y�
ρ(x,A)ρ(y,A)
=ρ(x,B)ρ(y,B)
RelaxedIIA
Let’sconsiderarelaxationofIIAthatrequiresonly
whereisstrictlyincreasing(andlikewisefor)
Note:The“strictlyincreasing”conditionimpliesthatismorelikelytobechosenthaninifandonlyifthisistruein
Sayacollectionofrandomchoicerulesisfree ifforeveryandfull-supportprobabilitymeasureonthereisawith
Note:ThefamilyofLucerulesisfree
Afamilyoffunctionsisadmissible ifthearestrictlyincreasingandthecollectionofrandomchoicerulessatisfying()forisfree
GAρ(x,A)ρ(y,A)⎛
⎝⎜
⎞
⎠⎟=GB
ρ(x,B)ρ(y,B)⎛
⎝⎜
⎞
⎠⎟ (∗)
GA : (0,∞)→ (0,∞) GB
xy A B
P ∈℘ A ∈ℑλ A ρ ∈ P
ρ(⋅,A) = λ
{GA}A∈ℑGA{GA}A∈ℑ
7/13/178:19PM 13
∗
7/13/178:19PM 14
BehavioralCharacterization
ArandomchoiceruleisarelaxedIIArule ifthereexistsanadmissiblefamilyoffunctionswithrespecttowhichobeys
Wecanprovethatarandomchoicerulehasaneural-normalizationform(whenthenoiseisGumbel-distributed)ifandonlyifitisarelaxedIIArule
Theequivalenceisanewargument(usingtheCauchyfunctionalequation)
RelaxedIIAallowsdeparturesfromregularity (viasuitablechoiceof)
Thereareclosed-formaxiomsthatcharacterizerelaxedIIA,sothatbehavioraltesting oftheruleispossible
{GA}A∈ℑρ
ρ
GAρ(x,A)ρ(y,A)⎛
⎝⎜
⎞
⎠⎟=GB
ρ(x,B)ρ(y,B)⎛
⎝⎜
⎞
⎠⎟ (∗)
ρ
Steverson,K.,A.Brandenburger,andP.Glimcher,op.cit.
F(⋅)
7/13/178:19PM 15*IowethisnicephrasingtomycolleagueSamsonAbramskyhttp://www.istockphoto.com/photo/crowd-of-people-above-gm171159648-19164839
FromGreenCheeseto…
Incomputerscience,itusedtobesaidthatthetheorywasindependentofthephysicalsubstrate(“computersmightaswellbemadeofgreencheese”*)
Thisviewturnedouttobewrong(thankstothediscoveryofquantumspeedup)
Indecisiontheory,thetraditionalpositionappearstohavebeensimilar---thatthephysicalsubstratedoesnotmatter
Thisviewnevermadegoodsense,andnowwecanuseinputsfromthecognitivesciencestore-builddecisiontheory… andgametheory… and…