Axioms for Neural Normalization - Adam...

AxiomsforNeuralNormalization

AdamBrandenburgerJ.P.VallesProfessor,NYUSternSchoolofBusinessDistinguishedProfessor,NYUTandon SchoolofEngineeringFacultyDirector,NYUShanghaiProgramonCreativity+InnovationGlobalNetworkProfessorNewYorkUniversity

7/13/178:19PM 2

“Abasicpresuppositionofthisbookisthatchoicebehaviorisbestdescribedasaprobabilistic,notanalgebraic,phenomenon….

Thepresentlyunanswerablequestioniswhichapproachwill,inthelongrun,giveamoreparsimoniousandcompleteexplanationofthetotalrangeofphenomena.”

-- R.DuncanLuce (1959)

Luce,R.D.,IndividualChoiceBehavior:ATheoreticalAnalysis,Wiley,1959;http://img2.yardbarker.com/media/8/8/884c6ca6f1b4a25bc0597be7da8ea8d7c4c41d66/xl/Random-Numbers.jpg?stamp=1491141319

7/13/178:19PM 3

ADecisionModelBasedonNeuralNormalizationFindings

ρ(x,A) = Pr (x = argmaxy∈A

v(y)F(A)

+εy )

Probabilityofchoosingalternativefromchoiceset

x

A

Utilityfunction

Normalizationfactor

Noise

Levy,D.,andP.Glimcher (2012),“TheRootofAllValue:ANeuralCommonCurrencyforChoice,”CurrentOpinioninNeurobiology,222012,1027-1038;Heeger,D.,“NormalizationofCellResponsesinCatStriateCortex,”VisualNeuroscience,9,1992,181-197;Louie,K.,L.Grattan,andP.Glimcher,“RewardValue-BasedGainControl:DivisiveNormalizationinParietalCortex,”JournalofNeuroscience,31,2011,10627-10639

7/13/178:19PM 4

WhatKindofOptimizationProcessIsThis?

Canweunderstandthenormalizationmodelin(somekindof)acost-benefitframework?

Canweidentifythespecificroleplayedbythenormalizationfactor insuchaframework?

Westartwiththe(grounded)axiomthatbehaviorisstochastic

Stochasticitycanbereducedbutweassumethatreductioninvolvesacost(thiscanbejustifiedonbasicthermodynamicgrounds)

Ourdecisionmakerthereforefacesatrade-off betweenthe(opportunity)costofstochasticityandthe(energetic)costofdeterminism

WespecifythecostofdecreasingstochasticitytobeproportionaltotheassociateddecreaseinShannonentropy (Shannon,1948)

Shannon,C.,“AMathematicalTheoryofCommunication,”BellSystemTechnicalJournal,27,1948,379-423

7/13/178:19PM 5

FromBoundedRationalitytoGroundedRationality

Simon,H.,“ABehavioralModelofRationalChoice,”QuarterlyJournalofEconomics,69,1955,99-118;http://www.biografiasyvidas.com/biografia/s/fotos/simon_herbert_a.jpg

Atheoryofdecisionmakingshouldbeconsistentwith

“theaccesstoinformationandthecomputationalcapacitiesthatareactuallypossessedbytheorganism”-- HerbertSimon(1955)

7/13/178:19PM 6

Decision-MakingFramework

Letbeafinitesetofalternatives

Letbethecollectionofallnonemptysubsets(“choicesets”)from

Arandomchoicerule isafunctionsuchthat

X

ℑ X

ρ : X ×ℑ→ [0,1]ρ(x,A)> 0 ifandonlyifand

TheinterpretationisthatistheprobabilitythattheDMchoosesalternativewhenfacedwithchoiceset

Letdenotethesetofallrandomchoicerules(forgiven)

x ∈ A

ρ(x,A) =1x∈Α∑ forall A ∈ℑ

ρ(x,A)x A

℘ X

Steverson,K.,A.Brandenburger,andP.Glimcher,“RationalImprecision:Information-Processing,Neural,andChoice-RulePerspectives,”workingpaper,February2017

7/13/178:19PM 7

Information-ProcessingModel

Letbethevalueofalternative

Theexpectedutilityofchoiceruleonchoicesetis

TheassociatedShannonentropyis

Letdenotethemaximumpossibleentropy(achievedwhenisuniform)

Weemployaproportionalityfactorinourspecificationofthecostofchoiceruleonchoiceset,whichis

v(x)

ρ

F(A)

ρ(x,A)v(x)x∈Α∑

ρ

x

A

H (ρ,A) = − ρ(x,A)lnρ(x,A)x∈Α∑

Hmax (A)

ρ AF(A)(Hmax (A)−H (ρ,A))

7/13/178:19PM 8

Information-ProcessingModelcontd.

Arandomchoicerulehasaninformation-processingformulation ifthereexistfunctionsandsuchthatforall

Note:Ifwerelaxourdefinitionofarandomchoiceruletoallowforthepossibilityofassigningprobabilityzerotoanavailablealternative,thisturnsoutnevertobeoptimal.Thisisbecausethederivativeofentropybecomeinfiniteasaprobabilityapproacheszero.

v : X→ (0,∞)

ρ ∈ argmaxρ̂∈℘

ρ̂(x,A)v(x)−F(A)(Hmax (A)−H (ρ̂x∈Α∑ ,A))

⎧⎨⎩

⎫⎬⎭

A ∈ℑ

ρF :ℑ→ (0,∞)

7/13/178:19PM 9

Information-ProcessingFoundation

Wecanprovethatarandomchoicerulehasaneural-normalizationform(whenthenoiseisGumbel-distributed)ifandonlyifithasaninformation-processingformulation

Thenormalizationfactor inthefirstformulationbecomesapartofthecostfunction inthesecondformulation,whichgivesa(partial?)normativefoundationforthisfactor

Atthetechnicallevel,theequivalencerelieson:

1.areinterpretationofstandardmathfromthermodynamics(derivationoftheBoltzmanndistributionviafreeenergyminimization)

2.aneasyextensionofstandardmathfromestimationofchoicemodels(McFadden,1978)

ρ

Steverson,K.,A.Brandenburger,andP.Glimcher,op.cit.;McFadden,D.,“ModellingtheChoiceofResidentialLocation,”inKarlqvist,S.,L.Lundqvist,F.Snickars,andJ.Weibull (eds.),SpatialInteractionTheoryandPlanningModels,Volume673,North-Holland,1978,75-96

ComparisonwithOtherModels

1.Thefactordistinguishesthismodelfromrandomutility

2.TheGumbeldistributionarisesastheasymptoticdistributionofthemaximumofasequenceofi.i.d.normalr.v.’s (canthisfactbeusedtogroundthemodelfurther?)

3.Neuralstudieshaveoftentakentobeasum(whereisaconstant):

7/13/178:19PM 10

ρ(x,A) = Pr (x = argmaxy∈A

v(y)F(A)

+εy )

F(A)

𝐹 𝐴 = 𝜎 +&𝑣(𝑧)�

,∈.

F(A) 𝜎

7/13/178:19PM 11

WhatKindofBehaviorDoesNormalizationPermit?

Arandomchoiceruleobeysregularity ifwhen,i.e.,addingalternativesreducesexistingprobabilities

TheclassicLuce rule (axiomatized byindependencefromirrelevantalternatives)obeysregularity

Departuresfromregularityareroutinelyobservedinexperiments(asinthe“attractioneffect”)

≈10% ≈90% ≈30%≈30% ≈40%

ρ ρ(x,B) ≤ ρ(x,A)x ∈ A ⊆ B

Huber,J.,J.Payne,andC.Puto,“AddingAsymmetricallyDominatedAlternatives:ViolationsofRegularityandtheSimilarityHypothesis,”JournalofConsumerResearch,9,1982,90-98;Simonson,I.,“ChoiceBasedonReasons:TheCaseofAttractionandCompromiseEffects,”JournalofConsumerResearch,16,1989,158-174;imagesfromTheAttractionEffectExplained,MarketingExplained channel,https://www.youtube.com/watch?v=pybVPm5VU5A

?

7/13/178:19PM 12

IndependencefromIrrelevantAlternatives

TheIIArulecharacterizestheLucemodelofstochasticchoice(Luce,1959)

Luce,R.D.,IndividualChoiceBehavior:ATheoreticalAnalysis,Wiley,1959

A Bx�y�

ρ(x,A)ρ(y,A)

=ρ(x,B)ρ(y,B)

RelaxedIIA

Let’sconsiderarelaxationofIIAthatrequiresonly

whereisstrictlyincreasing(andlikewisefor)

Note:The“strictlyincreasing”conditionimpliesthatismorelikelytobechosenthaninifandonlyifthisistruein

Sayacollectionofrandomchoicerulesisfree ifforeveryandfull-supportprobabilitymeasureonthereisawith

Note:ThefamilyofLucerulesisfree

Afamilyoffunctionsisadmissible ifthearestrictlyincreasingandthecollectionofrandomchoicerulessatisfying()forisfree

GAρ(x,A)ρ(y,A)⎛

⎝⎜

⎞

⎠⎟=GB

ρ(x,B)ρ(y,B)⎛

⎝⎜

⎞

⎠⎟ (∗)

GA : (0,∞)→ (0,∞) GB

xy A B

P ∈℘ A ∈ℑλ A ρ ∈ P

ρ(⋅,A) = λ

{GA}A∈ℑGA{GA}A∈ℑ

7/13/178:19PM 13

∗

7/13/178:19PM 14

BehavioralCharacterization

ArandomchoiceruleisarelaxedIIArule ifthereexistsanadmissiblefamilyoffunctionswithrespecttowhichobeys

Wecanprovethatarandomchoicerulehasaneural-normalizationform(whenthenoiseisGumbel-distributed)ifandonlyifitisarelaxedIIArule

Theequivalenceisanewargument(usingtheCauchyfunctionalequation)

RelaxedIIAallowsdeparturesfromregularity (viasuitablechoiceof)

Thereareclosed-formaxiomsthatcharacterizerelaxedIIA,sothatbehavioraltesting oftheruleispossible

{GA}A∈ℑρ

ρ

GAρ(x,A)ρ(y,A)⎛

⎝⎜

⎞

⎠⎟=GB

ρ(x,B)ρ(y,B)⎛

⎝⎜

⎞

⎠⎟ (∗)

ρ

Steverson,K.,A.Brandenburger,andP.Glimcher,op.cit.

F(⋅)

7/13/178:19PM 15*IowethisnicephrasingtomycolleagueSamsonAbramskyhttp://www.istockphoto.com/photo/crowd-of-people-above-gm171159648-19164839

FromGreenCheeseto…

Incomputerscience,itusedtobesaidthatthetheorywasindependentofthephysicalsubstrate(“computersmightaswellbemadeofgreencheese”*)

Thisviewturnedouttobewrong(thankstothediscoveryofquantumspeedup)

Indecisiontheory,thetraditionalpositionappearstohavebeensimilar---thatthephysicalsubstratedoesnotmatter

Thisviewnevermadegoodsense,andnowwecanuseinputsfromthecognitivesciencestore-builddecisiontheory… andgametheory… and…

Axioms for Neural Normalization - Adam...

Documents

Transcript of Axioms for Neural Normalization - Adam...