From bounded rationalityto learning
Bernard WALLISER
(Paris School of Economics)Rationality, Heuristics and Motivation in Decision Making,
Pisa, November 12-14, 2010
Introduction (1)¤ Simon’s problem decision considered as a reasoning process + limited capacities of information gathering and treatment → bounded rationality procedures (satisficing, ….) But how are these procedures precisely related to the assumptions?
¤ Meta-optimization paradox if the decision-maker optimizes the gathering of information (Winter) the best choice procedure (Mongin-Walliser) he needs: to have previous information on available information to take into account the computing costs Hence, meta-optimization is engaged in an infinite regress which
happens to be a vicious one (Mongin-Walliser)
Introduction (2)¤ Bounded rationality was initially defined in a static way:- fixed environment- fixed beliefs and preferences- fixed choice rule (combining beliefs and preferences)
¤ Learning processes later introduce partial dynamics :- non stationary environment (game)- revised beliefs when new information comes in- endogenous preferences when more experience is acquired- adaptive choice rule
→ comparison by distinguishing information gathering, information treatment and choice
→ treatment in various contexts (epistemic logics, decision theory)
Statics: limited information on context
¤ uncertainty on context - plainly probabilistic - hierarchical but always probabilistic: ambiguity - non probabilistic: qualitative probabilities, belief functions Ex: Choquet utility maximization (Gilboa-Schmeidler)¤ unawareness (not knowing and not knowing that not knowing)
- treated in epistemic logics Ex: precautionary principle
¤ limited crossed beliefs - p-accuracy reasoning
- k-level reasoning - crossed awareness (Meier et al) Ex: cognitive hierarchy model (Camerer)
Statics: limited information on preferences
¤ multidimensional preferences - preferences are decomposed and simplified Ex: satisficing (Simon), elimination by aspects (Tversky)
¤ random preferences - preferences correspond to alternative ‘moods’ Ex: discrete choice model, quantal model
¤ context-dependent preferences - situated preferences Ex: context (history)-dependent aspiration levels for satisficing - reference points (statu quo, norm) Ex: reference for gains vs losses in EU
Statics: simplified choice rule
¤ limited logical omniscience - treated in epistemic logics Ex: satisficing (sequential examination of actions) ?
¤ finite number of internal states - simple expression of ‘computation complexity’ Ex: finite automata
¤ computation costs - approximate cost of mental calculus Ex: basic operations
Dynamics: information research on context
¤ exogenous information, - resulting from purchase at specialized institutes and characterized
by its value (opposed to its cost) Ex: signals about actual state (correlated to it) → limited relevance
¤ endogenous free information - resulting from repeated observation Ex: observation of other’s action in fictitious play → memory constraints → scope constraints (information neighbourhood)
¤ endogenously induced information - resulting from voluntary (suboptimal) action and characterized by its
value (opposed to loss of utility) Ex: search procedures → ambiguous interpretation
Dynamics: information research on own’s preferences
¤ observation of own’s past utility of actions
- assuming that choice utility= (expected) felt utility
Ex: CPR model
→ partial preferences (incompleteness)
¤ observation of other’s utility of actions
- assuming that other’s utility = own’s utility (in same situations)
Ex: imitation of successful opponents
→ biased preferences
Dynamics: treatment of information about context
¤ expectation process - especially of other’s strategy - stationarity assumption → extrapolative expectation Ex: fictitious play (probability = frequency of past actions)
¤ belief revision procedure - 3 contexts: updating, revising, focusing - possibility of contradiction between initial belief and message → simplified or distorted Bayes rule (judgment biases) Ex: weight between initial belief and message
¤ reconstruction of structural information - 3 types of information: factual (past), structural (constant), strategic (future) - pattern recognition (trends, cycles) - revelation of other’s preferences (abductive process) Ex: reputation effect
Dynamics: treatment of information about own’s preferences
¤ performance indices - (average or cumulative) index for each action - stationarity assumption → proxy for utility function Ex: CPR rule
¤ adaptation of aspiration levels - adaptive level for global index (for instance, best past utility) → proxy for utility level Ex: dynamic satisficing (Simon)
¤ reconstruction of structural information - design of relative vs absolute preferences Ex: regret matching (unconditional regret index: difference in the
past between utility when using a given strategy and utility really obtained against others’ implemented strategies)
Dynamics: adaptive choice rule (1)¤ inertial behaviour - repeated action if sufficient past payoff Ex: reaction to aspiration levels (continue if levels are reached)
¤ exploration behaviour - random exploration, fixed or decreasing - directed exploration Ex: randomized fictitious play
¤ exploitation behaviour - quasi optimizing behaviour
Ex: fictitious play
¤ stochastic reinforcement behaviour - noisy best response
- stochastic matching (probabilistic behaviour monotonic with utility) → implicit exploration-exploitation dilemma Ex: CPR (decreasing exploration)
Dynamics: adaptive choice rule (2)
¤ imitation - grounded on complementary preferences (preferential mimetism) - grounded on information differences (informational mimetism) - grounded on better experience (experienced mimetism) Ex: plain diffusion model imitation of successful opponents
¤ analogy-based reasoning - previous contexts (case-based reasoning) - repetitive game structures Ex: case-based rule (Gilboa-Schmeidler) analogical equilibrium (Jehiel)
Dynamics: adaptive choice rule (3)
¤ restricted choice rules - specific action set, for instance unidimensional Ex: stubborn rule (Laslier-Walliser) - specific beliefs, for instance objective probabilities Ex: stopping rules in search - specific preferences, for instance multicriteria choice Ex: choice 2 by 2 + synthesis
¤ context- adaptive choice rules - parlour games: chess, cards, Cluedo Ex: keep pawns tight - sports Ex: throwing a ball with constant angle - labyrinth, puzzles Ex: keep right rule
Asymptotic results¤ system’s trajectory - transitory state - asymptotic state (speed of convergence) → different time scales (role of random shocks)
¤ convergence of expectations - towards locally rational ones ¤ convergence of actions (or strategies) - elimination of (strictly) dominated strategies, - convergence notions towards equilibrium states - convergence in time-average or action by action Ex: fictitious play - convergence towards a unique or multiple (point)-equilibrium
(selection when exploration vanishes) - cyclical and chaotic attractors
Conclusion (1)¤ dispersed models, even if two main classes (grounded
on cognitive capacities) - belief-based learning Ex: fictitious play - reinforcement learning Ex: CPR
¤ combination of models - models depending on choice context and results - hybrid models
- models with heterogenous agents ¤ need to consider the precise reasoning modes followed
by agents: - counterfactual reasoning (simulation of opponents) - abductive reasoning (detection of structural or behavioral regularities) - analogical reasoning (situations treated as similar) - taxonomical reasoning (categorization)
Conclusion (2)
¤ possibility of meta-learning - belief revision rule Ex: parameter trading off initial belief and message in extended Bayes rule - preferences Ex: degree of altruism in individual preferences - choice rule Ex: parameter in logit rule → learning levels give again rise to an infinite regress (in order to solve it,
highest level has to be given)
¤ infinite regress stopped by evolution process, but - mix of evolutive process (capacities and constraints imposed by evolution)
and cultural process (capacities and constraints conditioned by society) - very slow time scale (against fluctuating environment) - concrete mechanism not exhibited
Top Related