Normative theory of patch foraging decisions · nects normative theory of foraging decisions with...

28
Normative theory of patch foraging decisions Zachary P Kilpatrick * Department of Applied Mathematics, University of Colorado, Boulder, Colorado 80309, USA and Department of Physiology & Biophysics, University of Colorado School of Medicine, Aurora, Colorado, USA Jacob D Davidson Department of Collective Behavior, Max Planck Institute of Animal Behavior, Konstanz, Germany Department of Biology, University of Konstanz, Konstanz, Germany and Centre for the Advanced Study of Collective Behavior, University of Konstanz, Konstanz, Germany Ahmed El Hady Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey, USA (Dated: April 22, 2020) Foraging is a fundamental behavior as animals’ search for food is crucial for their survival. Patch leaving is a canonical foraging behavior, but classic theoretical conceptions of patch leaving decisions lack some key naturalistic details. Optimal foraging theory provides general rules for when an animal should leave a patch, but does not provide mechanistic insights about how those rules change with the structure of the environment. Such a mechanistic framework would aid in designing quantitative experiments to unravel behavioral and neural underpinnings of foraging. To address these shortcomings, we develop a normative theory of patch foraging decisions. Using a Bayesian approach, we treat patch leaving behavior as a statistical inference problem. We derive the animals’ optimal decision strategies in both non-depleting and depleting environments. A majority of these cases can be analyzed explicitly using methods from stochastic processes. Our behavioral predictions are expressed in terms of the optimal patch residence time and the decision rule by which an animal departs a patch. We also extend our theory to a hierarchical model in which the forager learns the environmental food resource distribution. The quantitative framework we develop will therefore help experimenters move from analyzing trial based behavior to continuous behavior without the loss of quantitative rigor. Our theoretical framework both extends optimal foraging theory and motivates a variety of behavioral and neuroscientific experiments investigating patch foraging behavior. CONTENTS I. Introduction 1 II. Sequential updating foraging model 3 A. Full model 3 B. Idealizing limits 3 III. Minimizing time to find high patch in non-depleting environments 5 A. Binary environment 5 B. Three patch types 8 C. More patch types 9 D. Continuum limit 10 IV. Depletion- vs. uncertainty-driven decisions in depleting environments 11 A. Homogeneous environments 11 B. Binary environments 12 C. Returning to a previously depleted patch 15 V. Learning the distribution of patch types 17 * [email protected] [email protected] [email protected] A. Homogeneous environment 17 B. Binary environments 19 VI. Discussion 20 A. Summary 20 B. Relation to other work 20 C. Experimental applications: Behavior and neuroscience 23 D. Model extensions 24 Acknowledgments 24 References 24 I. INTRODUCTION Nearly all animals forage, as it is essential to ac- quire energy for survival through efficient search and harvesting of food resources. Foraging behavior is per- formed by small organisms like C. Elegans [1, 2] and Drosophila [3, 4], animals with large spatial ranges like birds [5, 6], and mammals like rodents [7, 8], mon- keys [9, 10], and humans [11, 12]. In addition to its universality across animal species, foraging engages mul- tiple cognitive computations such as learning of food dis- tributions across spatiotemporal scales, statistical infer- ence of food availability, route planning and decision- was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which this version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558 doi: bioRxiv preprint

Transcript of Normative theory of patch foraging decisions · nects normative theory of foraging decisions with...

Page 1: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

Normative theory of patch foraging decisions

Zachary P Kilpatrick∗

Department of Applied Mathematics, University of Colorado, Boulder, Colorado 80309, USA andDepartment of Physiology & Biophysics, University of Colorado School of Medicine, Aurora, Colorado, USA

Jacob D Davidson†

Department of Collective Behavior, Max Planck Institute of Animal Behavior, Konstanz, GermanyDepartment of Biology, University of Konstanz, Konstanz, Germany and

Centre for the Advanced Study of Collective Behavior, University of Konstanz, Konstanz, Germany

Ahmed El Hady‡

Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey, USA(Dated: April 22, 2020)

Foraging is a fundamental behavior as animals’ search for food is crucial for their survival. Patchleaving is a canonical foraging behavior, but classic theoretical conceptions of patch leaving decisionslack some key naturalistic details. Optimal foraging theory provides general rules for when ananimal should leave a patch, but does not provide mechanistic insights about how those ruleschange with the structure of the environment. Such a mechanistic framework would aid in designingquantitative experiments to unravel behavioral and neural underpinnings of foraging. To addressthese shortcomings, we develop a normative theory of patch foraging decisions. Using a Bayesianapproach, we treat patch leaving behavior as a statistical inference problem. We derive the animals’optimal decision strategies in both non-depleting and depleting environments. A majority of thesecases can be analyzed explicitly using methods from stochastic processes. Our behavioral predictionsare expressed in terms of the optimal patch residence time and the decision rule by which an animaldeparts a patch. We also extend our theory to a hierarchical model in which the forager learns theenvironmental food resource distribution. The quantitative framework we develop will therefore helpexperimenters move from analyzing trial based behavior to continuous behavior without the loss ofquantitative rigor. Our theoretical framework both extends optimal foraging theory and motivatesa variety of behavioral and neuroscientific experiments investigating patch foraging behavior.

CONTENTS

I. Introduction 1

II. Sequential updating foraging model 3A. Full model 3B. Idealizing limits 3

III. Minimizing time to find high patch innon-depleting environments 5A. Binary environment 5B. Three patch types 8C. More patch types 9D. Continuum limit 10

IV. Depletion- vs. uncertainty-driven decisions indepleting environments 11A. Homogeneous environments 11B. Binary environments 12C. Returning to a previously depleted patch 15

V. Learning the distribution of patch types 17

[email protected][email protected][email protected]

A. Homogeneous environment 17B. Binary environments 19

VI. Discussion 20A. Summary 20B. Relation to other work 20C. Experimental applications: Behavior and

neuroscience 23D. Model extensions 24

Acknowledgments 24

References 24

I. INTRODUCTION

Nearly all animals forage, as it is essential to ac-quire energy for survival through efficient search andharvesting of food resources. Foraging behavior is per-formed by small organisms like C. Elegans [1, 2] andDrosophila [3, 4], animals with large spatial ranges likebirds [5, 6], and mammals like rodents [7, 8], mon-keys [9, 10], and humans [11, 12]. In addition to itsuniversality across animal species, foraging engages mul-tiple cognitive computations such as learning of food dis-tributions across spatiotemporal scales, statistical infer-ence of food availability, route planning and decision-

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 2: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

2

making [11]. As such, foraging utilizes multiple neu-ral systems in the brain [13], offering researchers theopportunity to probe these neural computations simul-taneously. In addition to studying the proximal neu-ral and behavioral mechanisms shaping foraging acrossspecies, one can also ask how these processes and theirinterplay have been shaped by natural selection to opti-mize returns in the face of environmental and physiolog-ical constraints [11, 14–16] opening up the opportunityto perform evolutionary quantitative behavioral studies.For the aforementioned reasons, there has been an in-creased interest in studying foraging in a neurosciencecontext [13, 17–19].

Since the space of foraging behaviors is vast, it is cru-cial to build a conceptual framework starting with atractable behavior that is sufficiently flexible and rich.Adhering to this aim, patch foraging is both a com-mon and understandable behavior, described as fol-lows [20, 21]: an animal enters a patch of food, harvestsresources, and then leaves to search for another patch offood [12]. An animal’s behavior can thus be quantifiedby its patch residence time distribution, travel time dis-tribution, the amount of food resources consumed, andthe movement pattern in-between patches. Moreover, theanimal’s reward rate can be computed by its food intakedivided by time. A basic result in behavioral ecology, themarginal value theorem (MVT) states that an animal canoptimize energy intake by leaving its current patch whenthe current reward rate falls below the global averagereward rate of the environment [22]. This gives a theo-retical foundation for assessing optimal decision-making,which has also been validated in many behavioral stud-ies [20, 21, 23–30].

However, the MVT does not describe mechanisticallyhow an animal uses its foraging experience to learnkey environmental features like the distribution of foodavailability, and assumes the animal already knows theoverall average reward rate of the environment. Al-though mechanistic models of foraging have been pro-posed [20, 21, 26, 29, 31], these were not derived basedon principles of statistical inference, and therefore cannotaddress how optimal decisions depend on the resource en-vironment and uncertainty in the statistics of food avail-ability.

In behavioral ecology, many studies have considered aBayesian approach to patch leaving, asking how an ani-mal uses available information to make optimal foragingdecisions [32–49]. However, because most studies con-sider a specific or narrow range of environmental condi-tions, it is not clear how a Bayesian approach relates toother mechanistic models of foraging decisions, and howsuch an approach may assist in deriving possible neuralimplementations. To date there is not a complete norma-tive theory of how animals make patch leaving decisionsunder a broad class of environmental conditions.

The aim of our study is thus to develop a Bayesianframework of patch leaving behavior by treating decisionsas a statistical inference problem. Our framework con-

nects normative theory of foraging decisions with mech-anistic drift-diffusion models, as proposed in [31].

To realize our theoretical framework, we perform thefollowing model derivations and analysis:

• We consider a series of idealizing limits such asnon-depleting patches, homogeneous environments,and binary environments. Using probabilistic se-quential updating, we derive stochastic differentialequations (SDEs) for the belief about the arrivalrate of food in the current patch. This generatesanalytically tractable models associated with opti-mal patch leaving strategies.

• We consider a series of more naturalistic conditionssuch as depleting patches, environments in whichthe forager returns to depleted patches, and envi-ronments with many or a continuum of patch types.

• We derive optimal decision strategies for patchleaving that either maximize the long term rewardrate or minimize the time to enter and remain ina high yielding patch across all the aforementionedenvironmental structures.

• We study how animals can learn the distributionof patch types in both non-depleting and depletingenvironments. This allows us to study how the rateat which the environmental distribution is learneddepends on environmental parameters.

Following the aforementioned modeling approach, wearrive at a number of general conclusions concerning thebest strategies for foragers to adopt in different environ-ments:

• In idealizing limits, patch leaving decision statis-tics are described by solutions of first passage timeproblems for SDEs with absorbing boundaries.

• In non-depleting environments, in which consump-tion does not diminish food availability, an idealforager should minimize their time to find and re-main in the highest yielding patch in the environ-ment. This time decreases both as high yieldingpatches become more common and as the highestyielding patch becomes more discriminable.

• In depleting environments, an ideal forager mustestimate the current arrival rate of food within apatch and when this becomes too low, depart thepatch.

• Across a wide range of task parameters, the foragershould leave a patch when the current reward rateis matched to the environmental reward rate, as inthe MVT. However, if there is a high level of un-certainty about the food availability within a patch,even an ideal forager will tend to stay too long inlow yielding patches (overharvesting) and not longenough in high yielding patches (underharvesting)

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 3: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

3

• When the environment is fully depleting, such thata forager may return again to patches they mayhave already partially depleted, they can minimizethe time to deplete the environment by aiming tofully deplete patches before leaving them.

Our aforementioned results highlight that we havefound several key deviations from the typical MVT re-sults of classic optimal foraging theory. These deviationsarise due to either limited information available to theforager, slow patch depletion, or full environmental de-pletion.

Another key aspect of our model that extends beyondclassic optimal foraging theory is the ability of our idealforager to learn the underlying food arrival rates of theenvironment. The forager learns patch arrival rates morerapidly when rates are high, since food encounters pro-vide more information than times between food encoun-ters. Also, arrival rates are learned more quickly indepleting environments than in non-depleting environ-ments, since successive food encounters rule out low ini-tial arrival rates. Combined with the broad range of op-timal foraging strategies derived assuming the observerhas full knowledge of its environment, our analysis pro-vides a number of touchstones to be compared with patchleaving statistics from future and past behavioral studies.

We anticipate that our theoretical framework will pro-vide a platform to study behavioral and neural mecha-nisms of naturalistic decision-making in a similar man-ner as trained decision-making behavior is studied withinsystems neuroscience [17]. Our model is therefore usefulin opening up the space for laboratory experiments thatmimic naturalistic patch foraging dynamics and for gen-erating testable hypothesis under different ecologicallyrelevant experimental designs.

II. SEQUENTIAL UPDATING FORAGINGMODEL

We model an animal searching a large arena represent-ing a natural environment with distributed patches offood (Fig. 1A). When the animal enters a food patch,it consumes food within the patch until it decides toleave for another patch. The parameters characterizingthe behavior are: environmental distribution of initialfood density p(λ0), patch size A, food chunk size c, andmean travel time between patches τ . In this work, weassume the animal knows the patch size, chunk size, andthe travel time (which is held fixed). Note, we couldmodel the learning of these parameters as separate se-quential updating processes (See Discussion for more de-tails). Our interest is in how an observer can learn thefood density within and across patches over time and usethis information to guide an efficient foraging strategy.

A. Full model

Under simplifying assumptions, we derive a sequen-tial sampling model for an ideal observer’s posterior oftheir current patch’s food yield rate λ(t). To model therandom timing of food encounters, we assume they arePoisson-distributed with a rate λ(t) = λ0 − ρK(t) thatdecreases with K(t) the number of food chunks foundso far, and ρ is the impact of each food chunk on thearrival rate. Thus, the time tK between the Kth and(K + 1)th encounter is drawn from the exponential dis-tribution (λ0 − ρK)e−(λ0−ρK)t. We assume for now thatthe observer knows and initializes their belief with theprior p(λ0) when arriving in a new patch (Fig. 1B). InSection V, we extend our model to consider the processby which the distribution p(λ0) is learned. Given the

food encounter sequence x(t) = K ′(t) =∑K(t)j=1 δ(t− tj),

an ideal observer updates their belief about the currentarrival rate λ according to

p(λ|x(t)) = p(x(t)|λ)p0(λ+K(t)ρ)

p(x(t))

∝ (λ/ρ+K)!

(λ/ρ)!e−λtp0(λ+Kρ), λ ≥ 0. (1)

Food encounters shape the observer’s belief about thecurrent yield rate λ(t) in two main ways: (1) they giveevidence of higher yield rates, since encounters are moreprobable in high yielding patches; and (2) they depletethe patch, decrementing the yield rate, as described bythe ρ terms in the posterior update. Thus, there is a ten-sion between the depleting effects of food encounters, andthe evidence they provide for higher yield rates (Fig. 1B).

To better understand how an ideal observer’s evidenceaccumulation process evolves, we consider a few differentidealizing limits of Eq. (1). Analyzing these reveals dis-tinct patterns in the belief dynamics as determined bytask parameters, as we will show.

B. Idealizing limits

Non-depletion. In the limit ρ → 0+, food encountersdo not deplete the rate of food arrival, which could bedue to slow search rates or large patch areas [31]. Takingthis limit, Eq. (1) simplifies considerably to

p(λ|x(t)) =p0(λ)

p(t1:K(t))λK(t)e−λt, λ ≥ 0,

so the log-likelihood update for λ ≥ 0 becomes

log p(λ|x(t)) =K(t) log λ− λt− log p(t1:K(t)) + log p0(λ),

(2)

showing that food encounters provide a pulse of evidencethat increases with λ, and a lack of food results in a lineardecrease of the log likelihood of the arrival rate λ.

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 4: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

4

patch environmentp(λ0)

patch food density

food encounters

observer belief

λ0

x(t) = ∑j

δ(t − tj)

p(λ |x(t))

⋯low λ0

high λ0transit

food chunks

x(t) =

move to new patch

forage within patch

feeding history-based belief update

λ

λMLEλtrue

time

p(λ |x(t))

time

t1 t2 t3 t4 t5

λ0, A, c

A B C

D

E

idealizing limitsnon-depletion

time

binary environments

pLλL λH

homogeneous environments

λk0

pH

FIG. 1. Patch foraging task and model. A. Task environment: An animal enters an arena and forages from patches, eachpatch k has some (possibly distinct) initial food yield rate λk0 . B. Ideal observer foraging model: Initial patch yield is drawnfrom the distribution p(λ0), generating random food encounter times t1:K , and updating belief of current food arrival rate λ(t)for patch. The maximum likelihood estimate λMLE gets approaches the true λtrue arrival rate over time. Model idealizations:C. Non-depletion: Assuming patches are depleted slowly enough to be approximated as ‘non-depleting.’ D. Homogeneousenvironments: All patches have the same initial yield rate λ0. E. Binary environment: Two possible initial patch yield ratesλH > λL occur with probability pH and pL.

In binary environments, with two possible arrival ratesλH > λL, we have p0(λ) = pHδ(λ − λH) + pLδ(λ − λL)where pH + pL = 1. Eq. (2) collapses to a scalar updateequation for the log-likelihood ratio (LLR),

y(t) ≡ logp(λH |x(t))

p(λL|x(t))

= K(t) logλHλL︸ ︷︷ ︸

food encounters

− (λH − λL)t︸ ︷︷ ︸lack of food

+ logpHpL︸ ︷︷ ︸

prior

,

which can be written equivalently as a stochastic differ-ential equation (SDE) for the rate-of-change of the LLR:

dy

dt= log

λHλL

∞∑j=1

δ(t− tj)︸ ︷︷ ︸food encounters

− (λH − λL)︸ ︷︷ ︸lack of food

, (3)

with initial condition set by the prior y(0) = logpH

1− pH .

Eq. (3) has a form similar to classic evidence accumu-lation models commonly used to model psychophysicaldata from decision-making tasks [50, 51], as well as toprevious models of foraging decisions [31]. Due to itssimple form, it can be analyzed explicitly to determinehow long term statistics are shaped by task parameters.

Homogeneous environments. Another way of simpli-fying Eq. (1) is to consider homogeneous environmentswhere knowledge of the patch yield rate is perfect. Tosee this, note that if p0(λ) = δ(λ− λ0), then Eq. (1) im-plies p(λ|x(t)) = δ(λ−(λ0−K(t)ρ)). In other words, theobserver has perfect knowledge that λ(t) = λ0 −K(t)ρ.

Binary depleting environments. We also explore be-lief updating and patch departure strategies in depletingbinary environments, revealing two phases of yield rateinference: an initial discrimination phase followed by de-pletion based decrementing of the yield rate estimate. Inthis case, we have p0(λ) = pHδ(λ − λH) + pLδ(λ − λL),and the associated LLR has a rate-of-change given by thenon-autonomous SDE:

dy

dt=

Kmax∑j=1

logλH − (j − 1)ρ

λL − (j − 1)ρ· δ(t− tj)− (λH − λL),

(4)

where y(0) = logpH

1− pH as in the non-depleting case.

These idealizing limits all afford some level of tractabil-ity. However, we first derive optimal patch leaving strate-gies in non-depleting binary environments. In this casewe can explicitly derive first passage time statistics asso-ciated with patch departures, allowing us to examine howefficient patch leaving depends on environmental param-eters like patch discriminability (λH/λL) and high patchprevalence (pH). The trends we find in this idealizedcase help us develop and interpret optimal strategies inmore complex cases like depleting environments and inenvironments with more than two patch types.

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 5: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

5

time

log[λH /λL] −[λH − λ

L ] ⋅ t

θ

λL

λH

A

leav

e pa

tch

B

y(t)

: departure threshold

pH pL

λH λL

FIG. 2. Belief updating in binary non-depleting envi-ronments. A. Distribution of high (λH) and low (λL) patch

types. B. Patch type belief y(t) = log p(λH |x(t))p(λL|x(t))

increases

with food encounters and decreases between until y(t) = θand the observer departs the patch.

III. MINIMIZING TIME TO FIND HIGHPATCH IN NON-DEPLETING ENVIRONMENTS

A forager in an environment with non-depletingpatches is best off locating a patch with the highest yieldand remaining there. In binary environments with twoyield rates λH > λL (Fig. 2A) and belief update givenby Eq. (3), we show an efficient patch departure strategyis to set a single threshold on the LLR (or likelihood)so that when the forager is sufficiently confident theyare in a low yielding patch, they leave. We extend thisapproach to consider environments with multiple patchtypes (e.g., three or a continuum) and show a similarthresholding strategy can be used for the forager to mostquickly find and remain in the highest yielding patches.

A. Binary environment

Considering Eq. (3), we analyze a formula for the longterm reward rate of the forager, and identify optimalparameters for an LLR threshold crossing patch leavingstrategy (Fig. 2B). The reward rate is maximized in thelong time limit as long as the forager eventually ends upin a high yielding patch, which occurs via a LLR thresh-olding strategy. A more thorough analysis of the opti-mal strategy using dynamic programming could be per-formed, but as the system does not fit the conditions ofdecision processes that require time-varying thresholds,we expect this would yield the same result [52, 53].

Energy intake rate. To begin the analysis, define thereward rate function to be maximized over a particularpatch leaving policy π for a given total foraging time T :

Rπ(T ) :=Eπ(T )

T=〈EπH(TH , T )〉+ 〈EπL(TL, T )〉

T,

which is the sum of mean food consumed from high andlow yielding patch visits 〈EπH,L(TL, T )〉 divided by the

total foraging time T . The quantity Eπ(T ) is the aver-age number of food chunks consumed. Since policies cancondition patch leaving on food chunks consumed, the av-erage food chunk count for a finite time t in a high/low

patch will likely not be the mean of the Poisson arrivalprocess λH,Lt. However, consider the limit T → ∞ oflong foraging time:

limT→∞

Rπ(T ) = RπH(∞) +RπL(∞),

in which there are three possible scenarios: (a) the for-ager spends a nonzero fraction of time in both patches soboth RπH,L(∞) 6≡ 0; (b) the forager only spends a nonzero

fraction of time in the high yielding patch so RπL(∞) ≡ 0;or (c) the forager only spends a nonzero fraction of timein the low yielding patch so RπH(∞) ≡ 0.

In policies π(a) leading to case (a), the foragerwill spend some fraction of time transitioning betweenpatches and the rest of the time in patches. Define rHand rL to be the fraction of time spent in the high andlow yielding patches (rH + rL < 1 due to transits), then

limT→∞

Rπ(a)(T ) = rHλH + rLλL

for such policies, as the average yield rates λH,L will berecovered in each patch in the long time limit.

On the other hand, any policy which leads to the for-ager only visiting one patch type for a finite amount oftime (a nonzero time fraction as T → ∞) must involvethe forager remaining in one patch indefinitely. Other-wise, if they continued patch leaving indefinitely, theywould spend a nonzero fraction of time in patches of theother type. As such, policies π(b) and π(c) causing theforager to (b) stay in the high patch and (c) stay in thelow patch imply

limT→∞

Rπ(b)(T ) = λH and limT→∞

Rπ(c)(T ) = λL

As the forager’s fraction of time spent in the high (low)patch type converges to unity, rL → 0 (rH → 0) in caseb (c). Clearly, all policies of type π(b) will maximize longterm energy intake rate.

Thus, we can maximize limT→∞Rπ(T ) via any pol-icy whereby the forager finds and remains in a highyielding patch. To do this, we set a lower thresholdθ < log

pH1− pH on the belief y(t) in Eq. (3), so the for-

ager leaves given sufficient evidence that the patch is lowyielding. We will show subsequently that this leads to anonzero probability of remaining in a high yielding patchindefinitely each time one arrives there. Thus, there iscomplete certainty the forager will eventually find andremain in the high patch for all time.Minimizing time to arrive in high yielding patch.

Given a constant LLR thresholding strategy, we can de-rive an implicit equation for how the mean time to arriveand remain in the high patch Tarrive(θ) depends on θ.This quantity can be computed from the high patch es-cape probability πH(θ), the high patch mean visit timeTH(θ) when departing, the low patch mean visit timeTL(θ), the known patch fraction pH , and mean transittime τ . Patch departure time statistics can be deter-mined explicitly by solving the mean first passage time

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 6: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

6

problem for the SDE in Eq (3) with absorbing boundaryat y = θ. With these quantities in hand, we can compute

Tarrive(θ) =πH(θ)

1− πH(θ)

(TH(θ) + τ +

1− pHpH

(TL(θ) + τ)

)+

1− pHpH

(TL(θ) + τ), (5)

where the first term accounts for the number of highpatch visits required to remain in the high yielding patchand the second term accounts for the time needed to leavethe low yielding patch when starting there. For thresh-olds further from y(0) = log

pH1− pH , the escape probabil-

ity πH(θ) decreases and mean exit times TH,L(θ) increase.

Intuitively, if θ = logpH

1− pH , departure from each patch

is immediate, whereas for θ → −∞ then πH(−∞) → 0but also TH,L(θ) → ∞ since the forager never obtainssufficient evidence to leave a patch. Thus, we expect aninterior solution θ to argminθ∈(−∞,log

pH1−pH

)

[Tarrive(θ)

].

We now compute the components of Eq. (5) using firstpassage time methods for SDEs [54].

Patch departure time statistics. The problem of findingthe time for a forager to leave a patch can be formulatedas a first passage time calculation. To obtain explicit re-sults, we derive the corresponding backward Kolmogorovequation of Eq. (3). To do so, first note that the forwardKolmogorov equation is

pt(y, t) = +apy(y, t) + λΨ(p(y, t), y)− λp(y, t) ≡ Lp(y, t)

where a = λH − λL and λ ∈ {λH , λL} is the arrival rateof the current patch, and

Ψ(p(y, t), y) =

{p(y − b, t), y > θ + b,

0, θ < y < θ + b,

accounts for the fact that jumps in the belief y due tofood encounters can only lead to beliefs that are at leastone jump length b = log λH

λLfrom threshold θ. Patch de-

partures are represented by an absorbing boundary con-dition at the threshold: p(θ, t) = 0. Defining the L2 innerproduct 〈u, v〉 =

∫∞θu(y)v(y)dy over real functions, we

determine the adjoint L∗ as 〈Lu, v〉 = 〈u,L∗v〉, yield-ing the right hand side of the corresponding backwardKolmogorov equation [54]:

qt = −aqy(y′, t|y, 0) + λq(y′, t|y + b, 0)− λq(y′, t|y, 0)

≡ L∗q(y′, t|y, 0), (6)

with associated initial and boundary conditionsp(y′, 0|y, 0) = δ(y′ − y) for y, y′ > θ and p(y′, t|θ, 0) = 0for t > 0.

Now to obtain patch departure statistics, we can inte-grate Eq. (6) over all possible belief values y′ ∈ (θ,∞) todetermine the probability that the patch departure timeT is after the time t. We define the survival functionconditioned on the starting belief y as

P (T > t|y) =

∫ ∞θ

q(y′, t|y, 0)dy′ ≡ Gj(y, t),

where j ∈ {H,L} indexes the true patch type λj . Uponintegrating Eq. (6), we find

Gjt = −aGy(y, t) + λj [G(y + b, t)−G(y, t)] ≡ L∗Gj ,(7)

on y ∈ (θ,∞), and note that the absorbing boundarycondition ensures Gj(θ, t) = 0 for t > 0.

Moreover, by taking the limit t → 0 of Eq. (7), wecan derive an equation for the escape probability πj(y) ≡Gj(x, 0) = P (T ≥ 0|y), which is the probability the beliefever crosses threshold given it begins at y and t = 0:

0 = −aπ′(y) + λj [π(y + b)− π(y)] ≡ L∗πj(y), (8)

with absorbing boundary condition πj(θ) = 1. Eigen-solutions of Eq. (8) take the form πj(y) = Aery withcorresponding transcendental characteristic equation

0 = −ar + λjerb − λj = −(λH − λL)r + λj

(λHλL

)r− λj

= Cj(r). (9)

As λH > λL, C ′′j (r) = λjb2(λH/λL)r−2 > 0, this implies

Cj(r) is concave up and so has at most two roots: oneis r1 = 0, and when j = H (j = L) the other root isr2 = −1 (r2 = 1).

To superpose solutions in each case (j ∈ {H,L}), notewhen j = L, the average drift

〈y〉 = λL logλHλL− (λH − λL) < 0

of the belief in Eq. (3) is down toward threshold andwe expect limy→∞ πL(y) = 1. Applying this and theboundary condition πL(θ) = 1, we see πL(y) ≡ 1 whenλ = λL. However, when λ = λH , the average drift

〈y〉 = λH logλHλL− (λH − λL) > 0

is away from threshold, implying limy→∞ πH(y) = 0 andπH(θ) = 1, yielding πH(y) = eθ−y.

Now to compute the mean first passage time (MFPT)in either case, note that the probability density function(pdf) fj(t) of exit times in case j can be calculated fromthe normalized survival function

lim∆t→0

P (T > t|y)− P (T > t+ ∆t|y)

∆t · P (T ≥ 0|y)= −G

jt (y, t)

πj(y)≡ fj(t).

The MFPT Tj(y) is then obtained by integrating againstthe pdf:

Tj(y) =

∫ ∞0

tfj(t)dt = −∫∞

0tGjt (y, t)dt

πj(y)=

∫∞0Gj(y, t)dt

πj(y).

Integrating Eq. (7) over t ∈ (0,∞) and defining Tj(y) :=∫∞0Gj(y, t)dt = πj(y)Tj(y), we see

−πj(y) = −aT ′j (y) + λj [Tj(y + b)− Tj(y)] ≡ L∗Tj .(10)

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 7: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

7

0 0.5 1

100

102

104

2 4 6 8 10100

102

-4 -3 -2 -1 0

101

102

103

T arri

ve

-4 -2 0 2100

101

102

103

T arri

ve

threshold θ

λH = 1.1

λH = 2λH = 5

pH = 0.1

pH = 0.5

pH = 0.9

2 4 6 8 10-6

-4

-2

0

θopt

0 0.5 1

-6

-4

-2

0

θopt

pHhigh yield fraction

high yield rate λH

pH = 0.1pH = 0.5

pH = 0.9

λH = 1.1

λH = 2λH = 5

Topt

arrive

pHhigh yield fraction

high yield rate λH

λH = 1.1

λH = 2

λH = 5

pH = 0.1

pH = 0.5pH = 0.9

A

B

C

D

E

F

Topt

arrive

FIG. 3. Statistics of high patch identification in binary non-depleting environments. A,B. The mean time to arriveand remain in a high yield patch varies nonmonotonically with departure threshold θ and decreases as the patch discriminabilityλH/λL and high yield fraction pH are increased. Solid lines are the analytical result in Eq. (5) and dots are averages over 104

Monte Carlo simulations. λH = 2 and pH = 0.5 are fixed unless indicated. C,D. Departure threshold θopt minimizing the timeto arrive in the high patch decreases with λH/λL and varies nonmonotonically with pH . Solid lines are obtained numericallyby solving Eq. (11), dotted lines are the Lambert W function approximation (Eq. (12)), and dashed lines are the asymptoticexpansion in logarithms (Eq. (13)). E,F. The minimal mean time T opt

arrive to arrive in a high yield patch decreases with λH/λLand pH . We fix λL = 1 and τ = 5.

For j = L, the boundary conditions TL(θ) = 0, πL(θ) =1, limy→∞ TL(y) =∞, and limy→∞ πL(y) = 1 imply

TL(y) =TL(y)

πL(y)=

y − θ(λH − λL)− λL log λH

λL

.

For j = H, the boundary conditions TH(θ) = 0, πH(θ) =1, limy→∞ TH(y) = ∞, and limy→∞ πH(y) = 0 can beleveraged to find an identical expression for the averagetime in the high patch:

TH(y) =TH(y)

πH(y)=

y − θ(λH − λL)− λL log λH

λL

.

For the Bayesian model, the prior and initial conditionof a patch is y(0) = log pH

1−pH , so the relevant quantities

for our problem are πL ≡ 1; πH = 1−pHpH

eθ; and

TL = TH =log pH

1−pH − θ(λH − λL)− λL log λH

λL

.

Substituting these expressions into Eq. (5), we obtainan explicit formula for Tarrive as it depends on modelparameters (Fig. 3A,B). Minimizing Tarrive to find θopt

then requires solving:

argminθ∈(−∞,log

pH1−pH

) (1− pH)

[1 + eθ

pH − (1− pH)eθ

]

×

[log pH

1−pH − θ(λH − λL)− λL log λH

λL

+ τ

]whose solutions θopt are identified with critical pointsobeying θ < log pH

1−pH and

eθ[log

pH1− pH

− θ +Aτ

]= (1 + eθ)(pH − (1− pH)eθ),

(11)

where A = (λH−λL)−λL log λHλL

, which we can solve for

θopt numerically (solid lines in Fig. 3C,D). An explicitapproximation of θopt is obtained by dropping the e2θ

term (expecting θ negative) and solving to find

θopt ≈W−1

[−(1− pH)

(λHλL

)λLτe−(λH−λL)τe2pH−1

]+Aτ + log

pH1− pH

+ 1− 2pH , (12)

where W−1(z) is the (−1)th branch of the LambertW function (inverse of z = W eW ). This approxima-tion compares well with numerical solutions to Eq. (11)(dotted lines in Fig. 3C,D), and can be further sim-plified using the approximation W−1(z) ≈ log(−z) −log(− log(−z)) yielding

θopt ≈ log(1− pH)− log [− log(1− pH) + (λH − λL)τ

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 8: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

8

−λLτ logλHλL

+ 1− 2pH

], (13)

from which we can observe how the optimal thresholdscales in limits of environmental parameters (dashed linesin Fig. 3C,D).

Examining Eq. (12), we see that when high yieldpatches are rare (pH → 0), the optimal threshold di-verges as θopt → −∞ with the scaling log pH , since theinitial condition becomes more negative. When low yieldpatches are rare and pH → 1, θopt → −∞ since thefirst selected patch will almost always be high yielding.Therefore, if high yielding patches are very rare or com-mon, the threshold should be moved far from zero to re-quire high certainty when abandoning a patch. Between,θopt > −∞ varies nonmonotonically in pH . For largeλH/λL � 1, θopt ∼ − log(λH), since patches are eas-ier to distinguish as λH increases, so again high certaintycan be required to depart a patch. In the limit λH → λL,θopt ≈ log(1−pH)− log [− log(1− pH) + 1− 2pH ]. Note,when both increasing discriminability (λH/λL) and thehigh yield patch fraction pH , the minimal mean timeT opt

arrive needed to arrive and remain in a high yield patchdecreases (Fig. 3E,F).

Thus, the problem of patch leaving can be reducedto a threshold crossing process in the case of a bi-nary non-depleting environment. Indeed, we find themodel is tractable enough to identify explicit scaling re-lations between model parameters and decision strate-gies. Namely, the time to arrive and remain in the highyielding patch increases as high patches become less dis-criminable (λH/λL close to one) and more rare (smallpH). Patch departures require more certainty when highpatches are more discriminable and/or are very rare orcommon. Next, we extend our study of the high patcharrival problem to non-depleting environments with morethan two patch types. Again, we find this can be accom-plished by setting a threshold on the belief about whetheror not the agent is in the highest yielding patch or not.

B. Three patch types

In ternary environments, there are three patch arrivalrates λ1 > λ2 > λ3 ≥ 0 with p0(λ) =

∑3j=1 pjδ(λ − λj)

and∑3j=1 pj = 1 (Fig. 4A). Representing log-likelihoods

using Eq. (2) and defining LLRs y1 = log p(λ1|x(t))p(λ2|x(t)) and

y2 = log p(λ1|x(t))p(λ3|x(t)) for the beliefs within a patch given the

food encounter time series x(t) =∑K(t)j=1 δ(t−tj), we have

the planar system

y′1 = logλ1

λ2

K(t)∑j=1

δ(t− tj)− (λ1 − λ2), (14a)

y′2 = logλ1

λ3

K(t)∑j=1

δ(t− tj)− (λ1 − λ3), (14b)

where y1(0) = log p1p2

and y2(0) = log p1p3

. Note that we

can recover any of the three likelihoods from y1 and y2

by the following mappings

p(λj+1|x(t)) =e−yj

1 + e−y1 + e−y2,

where y0 = 0 for j = 0. An argument similar to the bi-nary case demonstrates that the best patch leaving poli-cies in ternary environments cause the forager to find andremain in the highest yielding patch (λ1). This is accom-plished by thresholding the probability of being in thehigh yielding patch, so when p(λ1|x(t)) = φ ∈ (0, p1),the forager exits the patch. In (y1, y2) space, this thresh-old is a parameterized curve

yφ2 = − log

[1− φφ− e−y

φ1

], yφ1 ∈ (− log

1− φφ

,∞),

(15)

bounded by limyφj→∞yφk = − log 1−φ

φ ≡ θ, suggesting

we approximate the boundary in Eq. (15) with y1, y2 ≥θ. This leads to the forager departing the patch givensufficient evidence they are not in the highest yieldingpatch (Fig. 4B).

Given such a strategy, the mean time Tarrive to arriveand remain in the high yielding patch is computed fromthe escape probability π1(θ) from the high yielding patchand the mean times to visit each patch Tj(θ) (j = 1, 2, 3)when escaping. As before, the forager will always escapethe lower yielding patches (2 and 3). The mean arrivaltime is then given by

Tarrive(θ) =π1(θ)

1− π1(θ)

(T1(θ) + τ

)(16)

+1− p1

p1

p2(T2(θ) + τ) + p3(T3(θ) + τ)

1− π1(θ).

While we could compute the needed MFPT statisticsfor Eq. (16) by solving a corresponding backward Kol-mogorov equation, the resulting two-dimensional delayedPDE proves more computationally expensive to solvethan Monte Carlo sampling Eq. (14). We thus save amore detailed asymptotic analysis of higher order PDEsassociated with multi-patch problems for future work,and report results from sampling here (Fig. 4C-F). Themean high patch arrival time Tarrive depends strongly onthe high patch food arrival rate λ1, decreasing consider-ably as the patch becomes more discriminable (Fig. 4C).On the other hand, Tarrive depends weakly on the worstpatch’s food arrival rate λ3 (Fig. 4D). The discriminabil-ity of λ2 and λ3 does not strongly affect the observer’sability to determine whether a patch is high yielding (λ1)or one of the two lower yielding patches (λ2 or λ3) as longas λ1 and λ2 are sufficiently discriminable. In a relatedway, Tarrive is much more strongly affected by changes inthe fraction of the high yielding patch (p1: Fig. 4E) thanchanges in the balance of the middle (λ2) and low (λ3)patches (Fig. 4F).

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 9: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

9

-4 -3 -2 -1 0100

101

102

-4 -3 -2 -1 0

101

102

103

-4 -3 -2 -1 0

20

25

30

35

y1(t)

y 2(t)

log λ1λ2

log λ1λ3

θ

θ

leave patch

food encounter

dy 2dy 1

=−(λ 1−

λ 3)

−(λ 1−λ 2)

B

p1 p3λ1 λ3

A λ2

p2

no food

C

T arri

ve

threshold θ

λ1 = 2.2

λ1 = 3λ1 = 10

T arri

ve λ3 = 1.9

λ3 = 1

λ3 = 0.1

D

p1 = 0.1

p1 = 0.8p1 = 0.3

E

-4 -3 -2 -1 010

20

30

4050

θ

p2 = 0.6

p2 = 0.3p2 = 0.1

F

threshold

FIG. 4. Ternary (3 patch) non-depleting environments. A. Distribution of three possible patches. B. Patch beliefs

y1(t) = log p(λ1|x(t))p(λ2|x(t))

and y2(t) = log p(λ1|x(t))p(λ3|x(t))

increase with food encounters and decrease between until either reaches the

departure threshold θ. C. Mean time Tarrive to arrive and remain in highest yielding patch λ1 (Eq. (16)) decreases with patchdiscriminability λ1 as does the optimal departure threshold θopt (circles). Other parameters are λ2 = 2, λ3 = 1, τ = 5,and p1 = p2 = p3 = 1/3. D. Tarrive increases with λ3 as does θopt, since the worst patches become less easy to distinguishfrom the best (λ1 = 3). Other parameters as before. E. Tarrive decreases with the prevalence of the best patch p1 (whilep2 = p3 = (1− p1)/2), but θopt varies non-monotonically. F. T opt

arrive increases with p2 (while p1 = 0.333 and p3 = 1− p1 − p2)as does θopt. All curves for Tarrive computed from 106 Monte Carlo simulations.

Importantly, the optimal threshold θopt decreases aspatches become more easily discriminable (Fig. 4C,D),but can vary non-monotonically with the fraction p1 ofhigh patches as in the binary environment (Fig. 3D). In-creases in θopt at lower values of p1 arise from the factthat departure from high valued patches will be inter-spersed with long sequences of visits to lower yieldingpatches. Thus, there is a higher premium on distinguish-ing the high yielding patch when actually in one, similarto how subjects raise their decision thresholds when in-tertrial intervals are lengthened in two alternative forcedchoice paradigms [55]. At higher values of p1, one is morelikely to land in a high yielding patch, so again one canafford to require more certainty to depart.

We now extend our analysis of departure strategies toenvironments with N > 3 patch types. As before, weexpect that patch discrimination will mainly be impactedby the spacing between the highest and second highestfood arrival rates λ1 and λ2, as we now show in detail.

C. More patch types

Consider environments with N patch types having ar-rival rates λ1 > λ2 > · · · > λN ≥ 0 with p0(λ) =∑Nj=1 pjδ(λ−λj) and

∑Nj=1 pj = 1 (Fig. 5A). In this case,

defining the LLRs yj = log p(λ1|x(t))p(λj+1|x(t)) for j = 1, ..., N−1,

yields the N − 1-dimensional system

y′j = logλ1

λj+1

∞∑j=1

δ(t− tj)− (λ1 − λj+1), (17)

where yj(0) = log p1pj+1

, and any likelihood can be recov-

ered as

p(λj+1|x(t)) =e−yj

1 +∑N−1k=1 e−yk

,

where y0 = 0 for j = 0.A full analysis of optimal strategies across a broad

range of parameters is prohibitive, so we focus on theimpact of varying a single departure threshold θ on thetime to arrive and remain in a high yielding patch. Not-ing the weak dependence of arrival times on parametersfor the low patch (p3 and λ3) in the three patch case, weconsider simplified strategies in which the observer onlytracks the first L LLRs y1, y2, ..., yL and compares thesewith the threshold θ to decide when to leave the patch(Fig. 5B). Thus, we compute the mean time to arrive andremain in the high patch:

Tarrive(θ) =π1(θ, L)

1− π1(θ, L)

(T1(θ, L) + τ

)(18)

+1− p1

p1

∑Nj=2 pj(Tj(θ, L) + τ)

1− π1(θ),

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 10: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

10

0 0.2 0.4 0.6101

102

103

-4 -3 -2 -1 0

50

100

150

200

λ4 λ5

λ1

λ3

A λ2 B

D

p 0(λ)

λ

αe−αx

E

C

θ

T arri

ve(θ,L)

L=1, 2,3,4

y1, y3

y 2, y

4

θ

θ

leave patch

λ

p t leav

e(λ)

Pt(λ > λθ) = θ

λ

time

p(λ |x(t))

P t(λ

>λθ)

time

food arrivals

λ1 = 4.5

λ1 = 5

T arri

ve(θ)

θ

F λθ = 2λθ = 1

λθ = 0.5θthreshold

threshold

threshold

FIG. 5. Finding high yielding patches among many patch types. A. Environment with 5 patch types. B. Patch beliefs

yj(t) = log p(λ1|x(t))p(λj+1|x(t))

evolve on an N − 1-dimensional space until one yj (j = 1, ..., L ≤ N − 1) reaches θ. C. The mean time

Tarrive to arrive and remain in highest yielding patch λ1 as computed in Eq. (18) slightly decreases as more LLRs (L increases)are used (pj = 1/N , λj = 6− j, j = 1, 2, 3, 4, N = 5). D. Continuum of patch types given by an exponential prior λ ∼ exp[α].The forager departs a patch when the probability that the arrival rate is above threshold (λ > λθ) falls to a threshold valuePt(λ > λθ) = θ. E. The posterior p(λ|x(t)) of the patch arrival rate is shifted up by food arrives, while it decreases in thetime between. The mass Pt(λ > λθ) follows the same pattern. F. There is an optimal θ that minimizes the time to arriveand remain in a high yielding patch (λ > λθ), and the optimal time T opt

arrive (circles) increases as the acceptable arrival rate isincreased. α = 1 is used here. In C and F, 107 Monte Carlo simulations are used to compute the curves Tarrive.

where patch departure strategy depends on the numberL of LLRs thresholded and the threshold θ used. In afive patch type example, performance depends weaklyon L > 2, and it is sufficient to simply track the LLRsbetween the first three patches (Fig. 5C).

D. Continuum limit

Next, consider the limit of many patches, in whichpatch arrival rate λ is drawn from a continuous distri-bution p0(λ) defined on λ ∈ [0,∞) (Fig. 5D). This gen-

eral case is described in Eq. (2). Rather than resortingto LLRs, we retain the posterior p(λ|x(t)), and computethe mass of a thresholded portion of this pdf. Givena reference arrival rate λθ, we track P (λ > λθ|x(t)) =∫∞λθp(λ|x(t))dλ. This is analogous to a forager who seeks

patches with yield rates λθ or above, but deems loweryield rates to be insufficient. For any continuous pdfp0(λ), the maximum λ will never be sampled, so arriv-ing and remaining in the true maximum yielding patchis infeasible.

For the case of an exponential prior p0(λ) = αe−αλ,

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 11: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

11

given K(t) food arrivals, we can compute

P (λ > λθ|x(t)) =1

p(x(t))

∫ ∞λθ

αe−αλλK(t)e−λtdλ

(α+ t)K(t)+1p(x(t))

∫ ∞(α+t)λθ

νK(t)e−νdν

(α+ t)K(t)+1p(x(t))Γ(K(t), (α+ t)λθ),

where Γ(s, x) is the incomplete gamma function, and ina similar way

P (λ < λθ|x(t)) =α

(α+ t)K(t)+1p(t1:K(t))γ(K(t), (α+ t)λθ),

where γ(s, x) is the lower incomplete gamma function.As such, we can compute the LLR

ρ(t) = logP (λ > λθ|x(t))

P (λ < λθ|x(t))= log

Γ(K(t), (α+ t)λθ)

γ(K(t), (α+ t)λθ),

(19)

and state the forager departs the patch when ρ(t) ≤ θ

or when P (λ > λθ|x(t)) ≤ θ := 1/(1 + e−θ) (Fig. 5D,E).Note, to allow evidence accumulation, we require thatθ < α

∫∞λθ

e−αλdλ = e−αλθ ≡ φ, which represents the

fraction of patches where λ ≥ λθ.As with other cases, we can compute the mean time

to find and remain in a patch of high enough quality. Infact, we can marginalize over all patch types of each class(high and low) to compute the probability of escaping ahigh patch across all high patches,

πH(θ;λθ) =

∫ ∞λθ

p0(λ)π(θ;λ)dλ,

and the mean time per visit to a high and low patchtypes,

TH(θ;λθ) =

∫ ∞λθ

p0(λ)T (θ;λ)dλ,

TL(θ;λθ) =

∫ λθ

0

p0(λ)T (θ;λ)dλ,

when departing across all patches of each type. Wecompute the integrals above via Monte Carlo sampling.Defining pH =

∫∞λθp0(λ)dλ, the mean arrival time is then

given by the two patch formula, Eq. (5), as we have par-titioned the environment into two patch categories.

As the observer places a higher threshold λθ on thequality of an acceptably high yielding patch, the optimaltime to arrive in such a patch increases (Fig. 5F). More-over, the optimal threshold θ decreases, as more timemust be spent in patches to properly discriminate a highyielding patch, as these become rarer in the prior as λθincreases. In connection with the binary environment,increasing λθ corresponds to making high patches morediscriminable but also more rare.

Our analysis of patch departure strategies in the con-text of non-depleting patches reveals a number of consis-tent trends across patch type counts. First, the optimaltime to arrive and remain in the highest yielding patchdecreases as the high patch discrimability increases andas high patches become more common. Second, in en-vironments with more than two patch types, the mostimportant parameters in determining the time to findthe highest yielding patch are those related to the high-est and second highest yielding patch types. To mostefficiently find the high patch, it is sufficient to com-pute LLRs corresponding to the first two or three patchtypes. In this regard, we expect that reasonably effectivestrategies for foraging environments with a continuum ofpatch types could be generated using particle filters thatonly compute likelihoods over a finite sample of possiblepatch types [56]. Strategies for non-depleting environ-ments involve minimizing the time to find a high yieldingpatch. In depleting environments, the forager eventuallymust leave any patch it arrives in, as food sources areexhausted. We study such strategies in the next section.

IV. DEPLETION- VS. UNCERTAINTY-DRIVENDECISIONS IN DEPLETING ENVIRONMENTS

Foraging animals deplete resources as they usethem [57, 58]. This is accounted for in the most gen-eral form of the foraging agent’s belief in Eq. (1), andthus far we have analyzed idealized versions of the modelthat neglect this. To understand the impact of depletionon the agent’s belief and strategy, we will focus here onsimple cases: (a) homogeneous environments in which allpatches are the same and the forager knows the initial ar-rival rate λ0 a priori; (b) binary environments in whichthe forager knows there are two types of patches, andmay or may not have to infer which of the two they arecurrently foraging in while they deplete it; and (c) envi-ronments with a few patches in which there is a memoryof depletion from previous patch visits.

A. Homogeneous environments

We start by considering an agent foraging in an envi-ronment in which all patches begin with the same den-sity of food, so the arrival rate in each is initially λ0. Asthey forage, the arrival rate λ(t) = λ0 −K(t)ρ decreasesin equal increments with each food encounter (Fig. 6A).We assume the agent perfectly knows this arrival rate aswell as the amount of time it has been foraging, and willuse this information to determine a strategy for depart-ing the patch. In keeping with the simplicity of the spaceof possible strategies discussed before, we will assume theforager departs the patch when λ(t) (or analogously thedensity of food in the patch) falls below some thresholdλθ.

This problem can be framed by simply computing the

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 12: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

12

mean food intake rate for a given strategy parameterizedby λθ. Assuming λθ is an integer multiple of ρ, we cancompute directly the number of chunks consumed beforedeparture mθ ≡ K(T (λθ)) := (λ0− λθ)/ρ, and the meantime to depart T (λθ). Linearity of expectations allows usto compute the mean departure time as the sum of meanexponential waiting times between food encounters

T (λθ) =

mθ∑j=1

1

λ0 − (j − 1)ρ=

1

ρ

mθ∑j=1

1

m0 − (j − 1)

=Hm0

−Hm0−mθρ

,

where Hn is the nth harmonic number. Thus, we com-pute the long term reward rate given λθ as

Rλθ =m0ρ− λθ

Hm0−Hλθ/ρ + ρτ

. (20)

There is an interior optimum mθ that maximizes longterm food consumption rate (Fig. 6B,C). For low initialrate λ0 and decrement ρ, the best strategy is to remainin a patch until all food is consumed, but as λ0 and ρ areincreased, the optima occur roughly where Rλθ ≡ λθ asin the marginal value theorem (MVT).

To analyze this observation further, we can approxi-mate optima in the limit of plentiful patches (m0 � 1),such that Hn can be estimated by the large n expansion

Hn = log n+ γ +O(1/n), (21)

where γ = 0.57721566... (the Euler–Mascheroni con-stant). Using this, a first approximation yields

Rλθ ≈ m0ρ− λθlog(ρm0)− log λθ + ρτ

. (22)

In this limit, we can explicitly show the MVT is satisfied,Rλθ ≡ λθ, and the observer departs the patch when thethe arrival rate of food in the patch matches their averageenvironment-wide food arrival rate. To show this, notethat Rλθ in Eq. (22) has derivative

dRλθ

dλθ=−(log(ρm0)− log λθ + τ/ρ) + (m0ρ− λθ)/λθ

(log(ρm0)− log λθ + ρτ)2,

which is increasing at λθ = ρ:

dRλθ

dλθ

∣∣∣∣λθ=ρ

=(m0 − 1)− (log(m0) + τ/ρ)

(logm0 + ρτ)2> 0,

since m0 � 1, and decreasing at λθ = ρm0 = λ0:

dRλθ

dλθ

∣∣∣∣λθ=λ0

= − 1

ρτ< 0.

In between lies a critical point satisfying

m0ρ− λθλθ

= logρm0

λθ+ ρτ, (23)

which is precisely the equation recovered when settingRλθ ≡ λθ in Eq. (22), implying the critical point of Rλθ

occurs where Rλθ ≡ λθ as in the MVT.Alternatively, for small m0 (few chunks of food per

patch), computing the harmonic numbers in Eq. (20) isstraightforward, so we can directly compare the meanrate Rλθ for all possible values of λθ. For instance, whenm0 = 2, we can choose from λθ ∈ {0, ρ} as the departurethreshold, corresponding to maximizing

Rλθ ∈{

3 + 2ρτ,

1 + 2ρτ

}={Rλθ=0, Rλθ=ρ

}.

With this, the low threshold should be chosen whenρ > 1/(2τ), or when energy gained from food chunksis larger than half the transit rate between patches.At ρ = 1/(2τ), the forager should leave after a single

chunk is consumed, yielding Ropt = ρ = λoptθ (as in

the MVT). Similar results can be computed in the casesm0 = 3, 4, 5, ... using explicit computation of the har-monic numbers Hn. We expect these results to generalizeto arbitrary m0 as evidenced by the small m0 case andour large m0 asymptotics.

Overall this suggests that for short transit times τ orsmall food chunk sizes ρ, and ideal observer will departpatches more quickly than when transit times are long τor food chunks have a higher quality ρ. This is consistentwith classic models of continuous consumption as in theMVT, but here we have provided a detailed analysis ofthe case of discretized and random food encounters. Theprimary reason for a mismatch with the MVT arises dueto the discreteness of the food availability and arrivalrate. We show in binary environments, when uncertaintyplays a role in patch leaving decisions, there are nontrivialdeviations from the MVT due to the forager not preciselyknowing the food arrival rate of their current patch.

B. Binary environments

Next, we consider binary depleting environments inwhich the observer must both infer the type of patch(high or low) and track its depletion. We focus on strate-gies in which the observer departs the patch when themean estimate of the arrival rate λ(t) falls below a thresh-old λθ. To understand this most general case, we beginby discussing two simplified cases: (a) the case in whichthe patch type is known on arrival, and (b) the case inwhich the low yielding patch is empty (mL

0 = 0).Depletion-dominated regime. One way of contextual-

izing the case of known patch types is that of depletion-dominated decisions. These types of decisions occur if theobserver stays in a patch till they have determined thepatch type to high certainty. To idealize this situation,assume the observer arrives in a patch and immediatelyknows the patch type λj (j ∈ {H,L}) in which they re-side. Moreover, assume they leverage a strategy in whichthey depart the patch as soon as the arrival rate falls be-low some level λjθ. In this case, there are two thresholds

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 13: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

13

0.2 0.4 0.6 0.8100

101

102

0.5

0.6

0.7

0.8

0.9

1

0 0.5 10

0.5

1

1.5

2

0 1 2 3 40

2

4

6

8

2

2.5

3

pH

100 102

100

101

102

time

λ(t)

A

p(λ(t))single realization

Rλ θ

B C

ρ

m0

(mθ /m0 ) optempty patchλ0 = 1 λ0 = 10

λ0 = 100

Rλθ = λθ

ED FRλH,L

θ

λLθ

λH θ

0 0.5 10

5

10

15

20

ρ = 0.1

ρ = 1ρ = 10

λθ

Tmax

θ Tmaxθ

λmaxθ

rate threshold chunk encounter rate

high yield fraction pHhigh yield fractionFIG. 6. Patch leaving strategies in a depleting environment. A. In homogeneous environments where all patches haveinitial arrival rate λ0, the forager maintains full knowledge of the current arrival rate by counting food encounters. B. Rate offood consumption as a function of strategy Rλθ , where the forager departs once arrival rate falls to or below λθ. Solid linesare exact solution of Eq. (20), and dashed lines are the large m0 approximation Eq. (22). Note optimal λθ ≈ Rλθ . C. Optimal

mθ/m0 as a function of ρ and m0. D. Rate of food consumption RλH,Lθ in binary depleting environments in which the observer

knows patch type (λH or λL) upon arrival. Optimal strategy (purple dot) takes λHθ = λLθ ≈ RλH,Lθ ≈ 3.3. pH = 0.5, mH

0 = 100,mL

0 = 50, and ρ = 0.1. E. Optimal waiting time Tmaxθ in environments where high (low) patch has one (zero) chunk(s) of food.

F. Optimal waiting time Tmaxθ and departure threshold λmax

θ in a binary environment where the low patch has zero chunks offood and the high patch initiall has mH

0 = 50. ρ = 0.1. Transit time τ = 5 throughout.

to tune for each of the patch types. Following our calcu-lations from the homogeneous case, the time to depart apatch of type j is

Tj(λθ) =Hmj0

−Hmj0−mjθ

ρ,

where mj0 is the initial food count in patches of type j

and mjθ is the threshold level of food to consume before

departing. As such, the long term reward rate is

RλH,Lθ =

pH(mH0 ρ− λHθ ) + pL(mL

0 ρ− λLθ )

pH(HmH0−HλHθ /ρ

) + pL(HmL0−HλLθ /ρ

) + ρτ,

(24)

maximized by setting λHθ = λLθ ≈ RλH,Lθ (Fig. 6D). We

show now that this follows the MVT.In large mj

0 limit, the harmonic numbers are approxi-mated by Eq. (21), and Eq. (24) becomes

RλH,Lθ ≈ pH(mH

0 ρ− λHθ ) + pL(mL0 ρ− λLθ )

pH TH(λHθ ) + pLTL(λLθ ) + ρτ, (25)

where Tj(λjθ) = log(ρmj

0) − log λjθ and the critical point

equations for each partial derivative RλH,Lθ

λjθ= 0 imply

pH(mH0 − λHθ ) + pL(mL

0 ρ− λLθ ) = λjθ[pH(log(ρmH

0 )

− log λHθ ) + pL(log(ρmL0 )− log λLθ ) + ρτ

], (26)

which can be rewritten as RλH,Lθ = λjθ (j = H,L), imply-

ing the observer should depart a patch when the arrivalrate equals the mean rate of food arrival for the environ-ment (MVT). As in the homogeneous depleting environ-ment, obtaining the prediction of the MVT relies on theobserver knowing the current food arrival rate. However,as we will show, the optimal strategy deviates from theMVT when there is uncertainty concerning the presentfood arrival rate in the patch.Empty low patch. Before addressing the general case

of a binary environment with initially unknown patchtypes, we study a simple case in which the forager doesnot know the patch type they are in ahead of time butwhere the statistics of patch departures are still explicitlycalculable: when the low yielding patch is empty.

The simplest case is that in which some patches areempty (mL

0 = 0) and others have a single chunk of food(mH

0 = 1), as introduced by [36] but not analyzed in de-tail. Clearly, when a forager encounters a chunk of foodthey should leave the patch, since there is no food leftthereafter. How long should they wait until departing apatch if food has yet to be encountered? This amounts tooptimizing the waiting time Tθ till departure (or equiva-lently setting a threshold mean estimated arrival rate λθat which they depart). Each patch visit thus falls into

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 14: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

14

one of three categories: (a) visits to empty (low) patcheslasting time Tθ; (b) visits to high patches resulting inno food lasting time Tθ; and (c) visits to high patchesresulting in one food chunk lasting time t < Tθ.

To determine an optimal strategy, we need only com-pute statistics for case (c). Since the food arrival rate ina high patch is ρ, the probability of encountering foodwhen waiting a time Tθ is 1− πθ = 1− e−ρTθ , the meantime to wait is

T θH =ρ∫ Tθ

0te−ρtdt

1− e−ρTθ=

1− (ρTθ + 1)e−ρTθ

ρ(1− e−ρTθ ),

and the rate of consumption of a given strategy is

Rθ =pH(1− πθ)

pH(πθTθ + (1− πθ)T θH) + pLTθ + τ,

which we can maximize by finding the critical point of Rθ

in Tθ. As pH is increased and as ρ is decreased, the opti-mal Tθ = Tmax

θ increases (Fig. 6E). When high yieldingpatches are more frequent (higher pH), one should stayin a patch longer, since they are more likely to encounterfood. When the rate of chunk discovery ρ is smaller, oneshould expect to wait longer to encounter food in a highpatch, so the wait time should be increased.

Note, the observer does not leave the initially emptypatches immediately, so their strategy (even if optimallytuned) does not agree with the MVT due to the ob-server’s uncertainty about each patch’s yield rate.

We also analyze the case of an empty low patch and anarbitrary number of food chunks m0 in the high patch.Here, the observer should apply a hybrid strategy: Waita finite time Tθ to depart if no food is encountered, but iffood is encountered before t = Tθ, consume food until thearrival rate drops to λθ = (m0−mθ)ρ. This strategy dis-tinguishes two types of patch leaving decisions arising inuncertain and depleting environments. Decisions may beuncertainty-dominated, where the observer may departa patch early, believing they are in a low-yielding patch.Alternatively, decisions may be depletion-dominated, af-ter the observer is nearly certain of they patch type theyare in and has depleted it.

To optimally parameterize the departure strategy, weagain compute the probability of departing the highpatch early πθ = e−m0ρTθ (at time Tθ). Assuming theobserver remains in the patch, it takes a time

T θH =1− (m0ρTθ + 1)e−m0ρTθ

m0ρ(1− e−m0ρTθ )

to consume the first chunk of food and then depart im-mediately (T (λθ) = 0) if λθ = λ0 − ρ (as in the previouscase). Otherwise, it will take an additional time

T (λθ) =

mθ∑j=2

1

λ0 − (j − 1)ρ=

1

ρ

mθ−1∑j=1

1

m0 − j

=Hm0−1 −Hm0−mθ

ρ.

At each patch then, the observer either obtains no foodand departs after time Tθ or obtains mθ = m0 − λθ/ρchunks and departs on average after time T θH + T (λθ).The probability of the latter is pH(1 − πθ), and assum-ing m0 � 1, we can approximate Hm0−1 − Hm0−mθ ≈log(m0−1)− log(m0−mθ), so the food consumption rateis approximated by the a continuous function

R(Tθ, λθ) ≈pH(1− πθ)(m0ρ− λθ)

pθ(T (λθ) + T θH) + (1− pθ)ρTθ + ρτ,

where pθ = pH(1−πθ) and T (λθ) = log(ρm0−ρ)−log λθ.Reward rate is maximized using a waiting time Tθ thatis fairly insensitive to pH , whereas the threshold λθ issensitive to pH (Fig. 6F). When high yielding patchesbecome more common, the forager need not deplete themas much, since the next patch they visit is likely to behigh yielding.

Many food chunks. Lastly, we examine general binarydepleting environments in which mH

0 > mL0 are arbitrary

integers, so the belief y(t) = log P (λH−K(t)ρ|x(t))P (λL−K(t)ρ|x(t)) evolves

according to Eq. (4). In line with our previous analy-sis, the observer estimates the current arrival rate of thepatch from this belief,

λ(t) =λH + e−yλL

1 + e−y− ρK(t), (27)

and departs the patch when λ(t) ≤ λθ. As before, we areconcerned with tuning the threshold λθ so the long termreward rate

Rλθ =pHmH + pLmL

pH TH + pLTL + τ

is maximized. Unlike the non-depleting case, we cannotframe and explicitly solve a MFPT problem to deter-mine the mean times spent TH and TL and number offood chunks consumed mH and mL in the high and lowpatches. However, we can compute these quantities viaMonte Carlo sampling. There is an internal optimumλoptθ that maximizes the reward rate Rλθ across the en-

vironment, which deviates only slightly when little foodis available (Fig. 7A) and is well matched in the case ofhigh food availability (Fig. 7B) to a forager that knowsthe patch type upon patch arrival (compare black/bluecurves). This suggests the forager typically learns thepatch type before departing in environments with suffi-cient food, so their departure strategy will converge tothat of an observer who already knows the patch type.

We analyzed this trend by comparing departure timesof observers that initially know their current patch typeto those that do not (Fig. 7C). Indeed, foragers in sparserenvironments (lower average initial food amount m0 =(mH

0 + mL0 )/2) stay in low patches too long (blue curve)

and leave high patches too soon (red curve) in compari-son to observers that immediately know their patch type.This is due to their uncertainty about their current patchbefore and at the time of departure (grey curve). On the

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 15: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

15

0 2 4 6 8 100

0.5

1

1.5

2

0 1 2 30

0.1

0.2

0.3

0.4

0.5

0 10 20 30

0.7

0.8

0.9

1

1.1

1.2

1.3

TunknL /Tknown

L

TunknH /Tknown

H

P(λj)

C

λθ

Rλ θA B

stay in low patches too long

leave high patches too soon

cons

ume

rate

Rλ θco

nsum

e ra

te

rate threshold λθrate threshold m0mean chunk #

known patch case

unknown patch case

mH0 = 4mL0 = 2

mH0 = 20mL0 = 10

MVT: �λθ = Rλθ

FIG. 7. Depleting environments with known versus unknown patch reward rates. In the ‘known’ case, the foragerknows the patch type and initial reward rate upon arrival, whereas in the ‘unknown’ case the forager must infer the pathtype. Patch leaving strategies in the unknown case approach those of the known case in the high chunk count limit. A. Totalreward rate Rλθ as a function of the estimated arrival rate λθ at which the observer departs a patch. For low initial foodlevels mH

0 and mL0 , Rλθ near the optimum (dots) deviates between the case of initially known patch type (black) vs. initially

unknown (blue). Both thresholds are less than the MVT prediction due to discreteness of chunk arrivals. B. Reward ratesof the known/unknown cases converge at higher initial food levels. C. Deviations in optimal reward rate between known andunknown cases are due to less certainty P (λj) (grey line) about patch yield rate upon departure, leading the observer to stay inlow patches too long (blue line) and leave high patches too soon (red line). Other parameters are ρ = 1, τ = 5, and pH = 0.5.Mean rates and departure times in the unknown patch case were computed from 106 Monte Carlo simulations.

other hand, in more plentiful environments, foragers ac-cumulate enough evidence about the current patch thattheir departure times resemble those of observers thatknow their patch type, recapitulating the MVT.

Uncertainty thus drives deviations from the MVT. For-agers that exit patches before fully learning their typewill typically understay (overstay) high (low) patches.Key to these uncertainty-driven decisions is an environ-ment in which high/low patches are different enough towarrant different strategies, but similar enough as to notbe immediately distinguishable. We now extend our anal-ysis to the case of environments small enough so that for-agers eventually return to previously depleted patches.

C. Returning to a previously depleted patch

Assuming foragers do not return to previously depletedpatches is a reasonable approximation in large environ-ments, since there is a low probability of the observerreturning to a previously visited patch if drawing ran-domly from their environment. However, in small en-vironments, past patch depletion impacts the value offuture patch visits sooner (Fig. 8A). We briefly analyzethis case now, identifying strategies that minimize thetime to fully deplete the environment before departingfor another environment.

Single chunk-two patch environment. First, considerthe simplest nontrivial case, a two patch environmentin which there is a single chunk of food (mH

0 = 1 soλH = ρ initially and mL

0 = 0). When the forager departsone patch, they journey to the other patch with transittime τ . We seek a waiting time strategy similar to thatdeveloped for depleting patches (Fig. 8B). The observerwaits a time Tθ in their current patch before leaving to

the other patch (if they do not encounter a food chunk).Note, this differs from thresholding the LLR (y = θ)since the first patch visit would be shorter in that case,but this approach allows an explicit approximation ofthe optimal strategy. If the forager is in the high patch,they encounter the food chunk with probability 1−πθ =1 − e−λHTθ . In the low patch, they always depart aftera time Tθ. Thus, the mean time to encounter the chunkin the environment is given by: (a) the time spent in thehigh patch visit on which the chunk is discovered plus (b)the initial time spent in the low patch when that is thefirst patch visited (scaled by that probability 1/2) plus(c) the other time spent visiting patches and not findingthe chunk:

Tfind(Tθ;λH) =1− (λHTθ + 1)e−λHTθ

λH(1− e−λHTθ )+Tθ + τ

2

+ 2(Tθ + τ)(1− πθ)∞∑j=0

jπjθ

=λHTθ + 3λHτ − 2 + (2 + λHTθ + λHτ)eλHTθ

2λH(eλHTθ − 1).

(28)

This is minimized by taking Tθ = T optθ such that

e2λHToptθ − 2(λHT

optθ + 2λHτ)eλHT

optθ − 1 = 0,

For large eλHToptθ , we approximate the critical point equa-

tion as eλHToptθ ≈ 2λHT

optθ + 4λHτ , which has a compact

asymptotic expansion:

T optθ ≈ 1

2λH

[−2W−1

(−e−2λHτ

2

)− 4λHτ

]≈ 1

λHlog(4λHτ + 2 log 2), (29)

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 16: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

16

-102 -101 -100 -10-10

5

10

15

20

-6 -5 -4 -3 -2 -15

6

7

8

0 5 100

2

4

6

8

10

timeA B

λH = ρ λL = 0

ρ

Toptfind

Toptθ

C

D

T clea

r(θ)

θ

mH0 = 4, mL0 = 2

mH0 = 8, mL0 = 4

mH0 = 16, mL0 = 8

initial environment depleting environment

time

E full patch knowledge

single chunk environment

Fρ = 0.1

ρ = 1

ρ = 10

two patch environment

threshold θthreshold

chunk encounter rate

FIG. 8. Returning to patches in fully depleting environments. A. Environmental food depletion in small environmentswhere previously visited patches (blue) remain depleted indefinitely. B. Strategy in a two patch environment with a singlefood chunk: Switch patches after a finite search time Tθ. C. Optimal switch time T opt

θ (approximated by Eq. (29)) and the

total mean time to find a chunk T optfind (Eq. (28)) decrease as the rate of chunk detection ρ increases. Theory compares well

with averages from 106 Monte Carlo simulations (dots). D. In binary environments, the optimal LLR threshold θopt (dots) forminimizing the time to clear the environment decreases as the amount of food in the environment increases. ρ = 1 is used here.E. Optimal threshold θopt decreases as the rate of encountering a chunk increases: mH

0 = 8, mL0 = 4. pH = 0.5 and 106 Monte

Carlo simulations for each curve. F. Optimal strategy given full knowledge of chunk count in each patch is to clear each patchsequentially.

where W−1 is the −1 branch of the Lambert W function.This shows that the optimal waiting time scales inverselywith the initial arrival rate λH of the high patch. Theminimal time to find the chunk is approximated by sub-stituting Eq. (29) into Eq. (28). Both quantities decreasewith ρ (Fig. 8C), owing to the fact that environmentswith higher discovery rates λH = ρ require less time inthe high patch for the observer to find the food chunk.When the detection rate ρ is small, the optimal switchtime T opt

θ (i.e. the decision strategy) is more sensitive tosmall changes in ρ.

General binary environment. A similar approach canbe used in binary environments with an arbitrary ini-tial number mH

0 > mL0 of food chunks in the high and

low patch. To minimize the number of required patchswitches, the forager should first clear mL

0 chunks of foodfrom the starting patch and then wait until the observer’s

LLR y = log P (λH(t)|x(t)))P (λL(t)|x(t)) (computed from Eq. (4)) falls

below the threshold θ < log pH1−pH . After the first couple

of patch switches, the time spent in a patch if food is notdiscovered will be Tθ = −2θ/[ρ(mH

0 −mL0 )]. While an ex-

plicit formula for Tclear(θ) can be obtained by marginal-ization, the resulting expressions are lengthy and do notprovide insight into the trends underlying the optimalstrategy. Thus, we use Monte Carlo sampling to deter-mine how the optimal θ varies with the environmental

parameters.

As the amount of food in the environment increases,the time to clear it increases, but only slightly (Fig. 8D).Since the observer has more evidence about the kind ofpatch they are in after consuming mL

0 chunks as mL0

grows, they can more quickly find the (mL0 + 1)th chunk.

Moreover, the patches become more discriminable as thedifference mH

0 −mL0 increases, such that the forager can

afford more certainty before switching patches, meaningthat the optimal θopt is lower. Increased discriminabilityalso results from increasing the rate of chunk encountersρ (Fig. 8E). With increased discriminability, both theoptimal threshold θopt and mean time to clear the envi-ronment T opt

clear decrease owing to the forager’s ability todiscover chunks more quickly.

More patches. We expect it is possible to extend theabove strategy to the case of N environmental patches.However, the dimensionality of the optimization problemwill grow quickly, so we save a detailed analysis of thiscase for future work.

Nonetheless, we can easily identify a strategy for min-imizing the time to clear the environment in the caseof N patches assuming the observer knows the patchtype upon arrival. If patch j has mj

0 chunks of foodinitially, the forager should still aim to minimize the to-tal time spent in transit between patches. To do this,

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 17: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

17

they should fully clear each patch before moving on tothe next (Fig. 8F). Assuming the forager always choosesan unexplored patch when departing an emptied one, theminimal mean time to clear the environment of food willbe

T optclear = (N − 1)τ +

1

ρ

N∑j=1

Hmj0= N

[τ + H/ρ

]− τ,

where the time to clear the jth patch is Hmj0/ρ as in

Eq. (20) for homogeneous depleting environments andH/ρ is the mean clearance time of patches across theenvironment. Thus, the time to clear the environmentscales linearly with the number of patches N (assumingthe mean patch clearance time is unchanged by N).

Were the forager to use another strategy, they wouldreturn to each patch multiple times to clear them. Thiswould mean the transit contribution would be Ttransit >(N − 1)τ , and the total time to clear patch j would be

no less than Hmj0/ρ, so Tclear > T opt

clear.

There are a number of extensions to this case of a fullydepleting environment. Since the observer must considerboth transit times between patches and patch yields thatdepend on visit history, this is a qualitatively differentproblem from the multi-armed bandit (MAB) problem.Typically, bandit problems do not involve depletion oraction-dependent changes in the yield of arms [59, 60].Our setup thus provides a rich class of optimal switchingstrategy problems which warrant future investigation.

V. LEARNING THE DISTRIBUTION OFPATCH TYPES

Our analysis so far has assumed the forager has com-plete knowledge of the distribution of patch arrival ratesin the environment. However, typically animals arrive ina new environment with uncertainty about the quality ofpatches they will encounter [28, 61–63]. Thus, animalsshould use information from patch encounters to estimatestatistics of patch yields in the environment. In our se-quential updating model, this can be implemented with ahigher level of inference in the model (Fig. 1B) wherebythe observer refines a prior over the initial arrival rate λ0

using the past N patch visits p(λ0|x1:N (t)). We assumeeach new patch a forager enters has not been previouslydepleted.

Here, we measure learning performance using the timeit takes to refine the mean squared error (MSE) to somethreshold. Interestingly, we find that foragers in deplet-ing environments learn the initial rate of arrival fasterthan those in non-depleting environments.

A. Homogeneous environment

A homogeneous environment is described by a singleparameter λ0, the initial arrival rate of food within all

patches. We will start by considering the simple non-depletion case, and then move to consider how learningoccurs in depleting environments. This will illuminatethe differences between learning in these two cases beforemoving to binary environments.Non-depleting environments. In a non-depleting envi-

ronment, an agent intending to maximize the speed theylearn the patch arrival rate of food λ0 should remain ina single patch indefinitely. In this case, the rate learningprocess is equivalent to learning the rate of a sequenceof exponentially distributed waiting times. Given a se-

quence of food encounters x(t) =∑K(t)j=1 δ(t − tj), the

observer uses the time t, count K(t), and initial priorp0(λ0) to estimate the rate λ0 so Bayes rule states

p(λ0|x(t)) =1

p(K(t))p(K(t)|λ0)p0(λ0)

∝ (λ0t)K(t)e−λ0tp0(λ0), (30)

and the maximum likelihood estimate (MLE) of the rateλ∗0 is given by the implicit equation

tλ∗0 = K(t) + λ∗0p′0(λ∗0)/p0(λ∗0).

For a flat, improper prior p0(λ0) ≡ 1, the MLE is λ∗0 =K(t)/t, and for an exponential prior p0(λ0) = ae−aλ0 ,the MLE is λ∗0 = K(t)/(t + a). In either case, the MLEis consistent in the limit as t→∞:

limt→∞

λ∗0 = limt→∞

K(t)

t+ limt→∞

λ∗0p′0(λ∗0)/p0(λ∗0)

t= λtrue

0 .

Each food encounter tends to narrow the distributionp(λ0|x(t)) and the MLE tends to move toward the truearrival rate λtrue

0 (Fig. 9A). One measure of learning isthe rate of decrease of the normalized MSE. Since theestimator is consistent, we expect

MSE(t;λtrue0 ) =

∫ ∞0

(λ0 − λtrue0 )2

(λtrue0 )2

p(λ0|x(t))dλ0

to decrease on average over time. That is,ddt

¯MSE(λtrue0 , t) < 0 where

¯MSE(t;λtrue0 ) =

∫ ∞0

(λ0 − λtrue0 )2

(λtrue0 )2

p(λ0|x(t))dλ0, (31)

and where the average is over realizations of arrival timesp(K(t)|λtrue

0 ), leading to

p(λ0|x(t)) =∞∑K=0

(λtrue0 t)K(λ0t)

Ke−λtrue0 te−λ0tp0(λ0)

(K!)2p(K).

For a flat, improper prior p0(λ0) ≡ 1,

p(λ0|K(t)) =t

K!(λt0)Ke−λ0t,

so

p(λ0|x(t)) = te−(λtrue0 +λ0)t

∞∑K=0

(λtrue0 λ0t

2)K

(K!)2

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 18: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

18

1 2 3

1

2

3

0 100 200 300 400 5000

1

2

3

0 1 2

102

103

0 0.5 1

0 0.5

0 1 2 30

1

2

3

5 10100

102

104

0 20 40 60 80 1000

1

2

0 20 40 60 80 10010-2

10-1

100

Aλ 0

p(λ0 |x(t))ar

r rat

eposterior

0 5

t = 0.1t = 10t = 100

time t

¯ MSE(t

;λtrue

0)

λ true0 = 0.5

λ true0 = 2λ true0 = 1

¯ MLE(t

;λtrue

0)

λ true0 = 0.5

λ true0 = 2λ true0 = 1

0 5 10

102

103

λ true0

B

λ 0ar

r rat

e

p(λ0 |x(t))posterior

time t

chunks mθ

T lear

neθ = 10−1

eθ = 10−3

eθ = 10−2

λtrue0true rate

λ 0ar

r rat

e

λtrue0 = 1

λtrue0 = 2

t = 0.1t = 10

t = 100

C

D

0 1 2 30

1

2

3

0 1 2 30

1

2

3

λ H

λL λL λL

E

¯ MLE(t

;λtrue

0)

time t

p(λH ,λL |x(t))0

1 λtrueH = 2

λtrueL = 1

t = 0 t = 100 t = 500

0 100 200 300 400 5000

1

2

3

time t

G

¯ MLE(t

;λtrue

0)

λ H

λL1 2 3

1

2

3

λL1 2 3

1

2

3

λL

p(λH ,λL |x(t))

F

λtrue0 = 1

homogeneous environmentsdepletingnon-depleting

binary environmentsnon-depleting

depleting

depleting

non-depleting

K1(t)ρ

FIG. 9. Learning reward arrival rates. Patch arrival rates are learned more rapidly in depleting than non-depletingenvironments. A. Posterior pdf p(λ0|x(t)) is refined leading to a maximum likelihood estimate (MLE) which approaches thetrue arrival rate: results for a non-depleting environment. Average mean squared error (MSE), given by Eq. (31), decreasesmore rapidly for higher λtrue

0 . B. Posterior pmf p(λ0|x(t)) peaks more rapidly in depleting environments due to exclusion ofrates λ0 < K(t)ρ. C. Mean time to learn Tlearn to within MSE< eθ is smallest when chunks consumed equals the total numberin the patch mθ = m0 = 10 (left), so the best policy (right) is to exit when MLE of total number of chunks is consumed(mopt

θ = MLE(m0)). Here ρ = 0.1. Note lower Tlearn than in non-depleting case. D. In binary environments, posterior pdfp(λH , λL|x(t)) is first refined over the first patch type visited (λH = 2 here), then over the second patch type visited (λL = 1),and eventually reaches a peak near true (λH , λL). E. Thus, eventually an accurate MLE is obtained for both patch types.Departures every t = 100 time units. F,G. Learning is more rapid in depleting environments, because low arrival rates areruled out. Patches departed after MLE chunks consumed.

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 19: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

19

= tI0(2t√λ0λtrue

0 )e−(λtrue0 +λ0)t (32)

≡ F (t, λ0, λtrue0 ),

where I0(z) is the zeroth order modified Bessel functionof the first kind. Eq. (32) can thus be substituted intoEq. (31) and integrated using quadrature to calculate theensemble mean of the MSE as a function of time:

¯MSE(t;λtrue0 ) =

∫ ∞0

(λ0 − λtrue0 )2

(λtrue0 )2

F (t, λ0, λtrue0 )dλ0

=

∫ ∞0

(x− 1)2F (s, x, 1)dx,

where we have made the changes of variables λtrue0 x = λ0

and s = λtrue0 t. This reveals that the rate of decay of the

normalized mean MSE increases linearly in λtrue0 , such

that the time tθ to obtain a threshold level of mean ac-curacy scales inversely with λtrue

0 (Fig. 9A bottom inset).Thus, the observer learns the arrival rate of the environ-ment more quickly in higher rate environments.

Note, there are a number of ways to modify this basicsetup. First, the forager may have a more informed priorover λ0 based on experience with previous environments.In this case the updating process is the same and only theprior p0(λ0) changes. Had we considered an observer thatmoves between patches, the only difference would be thatthe observer gains no information about the arrival ratewhile in transit but carries their posterior forward uponarrival in the new patch. We now extend this frameworkto consider depleting environments.

Depleting environments. In depleting environments,the forager’s belief update changes in two key ways.First, to continue learning, they must eventually departtheir current patch and continue observing in a new patchwhose arrival rate is reset to the true initial arrival rateλ0. Second, each chunk encounter reduces the arrivalrate within the patch, so the space of possible arrivalrates must be adjusted. In particular, all arrival ratesλ0 < K(t)ρ are eliminated from the posterior, as theyare impossible given K(t) chunks have arrived. Withthese facts in mind, we examine how the time to learnthe arrival rate depends on patch exit strategy and truearrival rate. Following depleting foraging strategies, weassume the departure threshold on the number of chunksconsumed depends on the MLE of the arrival rate.

To determine an efficient strategy for inferring the ar-rival rate λ0, we first determine the optimal departurechunk count given a specific true λtrue

0 . We then examinehow an observer who chooses this chunk count thresholdbased on their MLE limits their time to obtain a givendesired MSE. As before, the posterior over the initial ar-rival rate λ0 can be calculated using Bayes rule, so withina single patch

p(λ0|x(t)) =1

p(K(t))p(K(t)|λ0)p0(λ0) ∝ m0!e−λ0tp0(λ0)

(m0 −K(t))!,

for λ0 ≥ K(t)ρ, where m0 = λ0/ρ is the initial count as-sociated with the initial rate λ0. Each food encounter

generally narrows the distribution p(λ0|x(t)) and de-creases the number of chunks in the patch. Given Npatch visits, the forager can accumulate evidence fromeach visit as a separate term in the posterior product:

p(λ0|x(t)) ∝N∏j=1

m0!e−λ0tj

(m0 −Kj(tj))!· p0(λ0), (33)

where tj is the time spent in the jth patch and Kj(tj) isthe number of chunks encountered (Fig. 9B top).

Averages from simulations show the best strategy forminimizing the time to learn the true rate λtrue

0 is todepart the patch when all the chunks predicted by theobserver’s current MLE are consumed (Fig. 9C left). Wesee evidence of this, for instance, when attempting to in-fer λtrue

0 = 1 (when ρ = 0.1 and m0 = 10). The forager’sbest strategy for minimizing the time to learn this rateis to consume all 10 chunks of food in a patch before de-parting. Thus, an effective strategy for rapidly learningthe initial arrival rate λ0 across environments is to leavea patch once the number of chunks consumed is greaterthan or equal to the MLE for the initial chunk countm0 (mopt

θ = MLE(m0)). Using this strategy, the time tolearn the rate decreases as the true rate increases (Fig. 9Cright) in line with findings in the non-depleting case, butmuch faster due to exclusion effects of depletion.

B. Binary environments

A forager learning two patch arrival rates λH > λLclearly must visit more than one patch, and each succes-sive patch visit increases the likelihood that both patchtypes have been visited. As before, we start by consid-ering the non-depleting environment in which learning isslower, and then move to studying the inference processin a depleting environment.Non-depleting environments. In a single patch (j),

given a flat prior (p(λ0) constant), a sequence of foodencounters xj(t) yields the posterior

p(λ0|xj) ∝ (λ0tj)Kj(tj)e−λ0tj ≡ Q(λ0; tj ,Kj),

over the possible arrival rate λ0 in patch j, where tj isthe time spent and Kj(tj) is the number of chunks en-countered in patch j (as in Eq. (30)).

Given N patch visits and a flat prior (p(λH , λL) con-stant) over the rates, the forager combines informationacross patches by conditioning on the probability theyare in a high/low patch. The posterior for the arrivalrates λk (k = H,L) given patch visits j = 1, ..., N isthen:

p(λH , λL|x(t)) ∝N∏j=1

p(xj(tj)|λH , λL), λH > λL,

∝N∏j=1

[pHQ(λH ; tj ,Kj) + pLQ(λL; tj ,Kj)] ,

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 20: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

20

where pk ≡ p(λj = λk) is assumed known to the observer.The posterior updating is thus refined based on the

arrival rate of the current patch (about λtrueH when in a

high patch and about λtrueL when in a low patch: Fig. 9D).

This process continues as the agent visits each type ofpatch, as seen in time series of the MLE, whereby theestimate of the arrival rate for the patch not being visitedremains fairly constant, while that of the visited patchchanges (Fig. 9E).

Depleting environments. Lastly, we derive the infer-ence strategy for learning in binary environments withdepleting patches. Learning the arrival rate of a singlepatch is given by Eq. (33), and using a flat prior we have

p(λ0|xj(t)) ∝(λ0/ρ)!

(λ0/ρ−Kj)!e−λ0tj ≡ Qjρ(λ0),

where tj is time spent and Kj is the number of chunksencountered in patch j. Conditioning on the patch typebeing visited yields the posterior for the initial arrivalrates λk0 (k = H,L) given patch visits j = 1, 2, ..., N :

p(λH0 , λL0 |x(t)) ∝

N∏j=1

p(xj(tj)|λH0 , λL0 ), λH0 > λL0 ,

∝N∏j=1

[pHQ

jρ(λH) + pLQ

jρ(λL)

],

where pk ≡ p(λj0 = λk0).As in the non-depleting environment, the posterior is

first refined over the first patch type visited (λH = 2in Fig. 9F). However, as opposed to the homogeneousenvironment, low rates λL < Kρ are never entirely ruledout, since there is always a possibility the forager is inthe high patch. As such, the MLE for the high patch isrefined more quickly while the low patch MLE continuesto jump around (Fig. 9G).

We could also consider models that learn the deplet-ing increment ρ or distribution of transit times τ us-ing a similar Bayesian sequential updating framework.Presumably, natural foragers would learn environmen-tal parameters through related evidence accumulationprocesses. Our work here demonstrates that sequentialBayesian inference affords a powerful framework for un-derstanding how a forager could learn hierarchically us-ing a fairly compact model. A key finding of the aboveanalysis is that depletion itself provides additional infor-mation about the arrival rate, allowing the observer torefine their estimate more rapidly than in non-depletingenvironments.

VI. DISCUSSION

A. Summary

Patch leaving decisions are an essential component offoraging. Our model uses principles of probabilistic in-ference to establish a normative theory of patch leaving

decisions, as well as a framework for learning about re-source availability in the environment.

The model yields several key takeaways concerning op-timal decision strategies in a variety of patch foragingcontexts (see Fig. 10 for an overview and Table I fordetails). In the idealized case in which foraging doesnot deplete patches, the optimal strategy is for a for-ager to minimize their time to find and remain in thehighest yielding patch in the environment. This is ac-complished by triggering patch leaving decisions whenthe log-likelihood ratio for the probability of high returnversus other patches falls below a threshold. When aforager depletes patches, we found that across a broadrange of environmental parameters the best strategy formaximizing the environmental reward rate is to departa patch when the in-patch reward rate matches the av-erage return rate of the environment, i.e. correspondingto the marginal value theorm (MVT). However, if thereis high uncertainty about the patch type, the foragerwill stay longer in low yield patches and shorter in highyield patches than predicted by the MVT. When a for-ager fully depletes their environment, an optimal strat-egy minimizes the number of transits between patches inorder to deplete the environment as quickly as possible.Lastly, we found an observer that learns the rate of foodarrival within patches learns more quickly in depletingthan non-depleting environments environments.

Throughout the study, we compute and vary metricsthat have parallels used in systems neuroscience such asreaction times and discriminability. We solved the modelequations via a combination of first passage time meth-ods for stochastic processes along with Monte Carlo sam-pling of jump processes corresponding to foragers’ beliefs.Since the contexts analyzed in our study (Fig. 10; Ta-ble I) can be realized experimentally, our model providesa framework for the formal quantitative analysis of be-havior in patch foraging experiments.

B. Relation to other work

Animals often approximate Bayesian foraging. Wehave treated patch leaving as a statistical inference prob-lem within a Bayesian framework, assuming the animalhas knowledge of the transit time between patches, theimpact of food depletion on the food arrival rate, and(in the case without learning) the distribution of patcharrival rates. Our theoretical treatment of patch leavingdecisions builds on previous Bayesian models of forag-ing [34, 36–42], experimental studies that ask if animalsbehave as Bayesians [43–49], and other experiments thatshow how an animal’s prior knowledge about an environ-ment modulates its foraging behavior [64–66]. For ex-ample, bumblebees [32] and inca doves [67] adjust theirforaging strategies in response to the predictability of theenvironment, as a Bayesian forager would, but other workshows this is not a universal trend [68]. Patch leavingdecisions may deviate from Bayes optimality due to ani-

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 21: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

21

non-depleting patches

time

timethreshold

time

patc

h ty

peremainsearch

depleting patches

time

MLE

arri

val r

ate

timethreshold

time

patc

h ty

pe

deplete & depart

returning to patches

time

time

patc

h

deplete all patches

empty

environment

A B CLL

R(hi

gh/lo

w)

FIG. 10. Summarized taxonomy of foraging strategies. See Table I for details. A. In non-depleting environments, anideal forager searches patches until finding and remaining in a high yielding patch. B. In environments with depleting patches,an ideal forager depletes a patch and departs when reward arrival rate reaches a threshold. C. In fully depleting environments,an ideal forager should fully deplete patches before departing in order to minimize the time to clear the environment.

mals becoming risk-averse in variable environments [69].Apart from the behavioral significance of priors in guid-ing animals’ foraging behavior, such cases can help us un-derstand the types of probabilistic neural computationsperformed in uncertain and dynamic environments.Inferential decisions can validate the marginalvalue theorem. The marginal value theorem(MVT) [22] provides a baseline patch depature rule forcomparing optimal foraging to ecological behavior [12].However, the MVT is limited in its applicability, as itclassically describes the case of continuous rewards wherethe forager has perfect knowledge of resource availabil-ity, and does not provide a mechanistic account of howa forager reaches a decision to leave a patch. To ad-dress these limitations, Davidson & El Hady [31] devel-oped a generalized mechanistic model of patch leavingdecisions with parameters representing the discretenessof reward arrival, belief noise, and environmental aver-aging. In this model, patch leaving decisions are due toa threshold crossing process by an accumulator variable.Parameters can be adjusted to represent a wide range offoraging strategies not limited to the MVT, and specifi-cally addressing environmental uncertainty. This modelinspired our current work, which demonstrates that op-timal patch leaving decisions involve evidence accumula-tion. We have identified uncertainty in the current re-ward rate as a key driver of deviations from the MVT.Previous studies modeling patch departure under discreterewards [25] have formulated mechanistic descriptions ofdeparture decisions [20, 21, 26, 29], but none have ana-lyzed such a broad range of task conditions to provide acomparative study of strategies.Patch foraging as modified multi-armed bandit.Patch leaving behavior involves deciding when to leavethe current patch being harvested, noting that availablerewards in the next patch may be unknown; in the sim-plest case departures lead to a random sample of thesubsequent patch from the environment. In contrast, themulti-armed-bandit (MAB) task [70] assumes an agenthas some control over which patch (arm) is chosen next,

often does not consider patch depletion, and classicallyignores switching costs. However, patch leaving and theMAB task can become similar in certain limits. Forinstance, patch leaving that involves no depletion, zeroswitching costs, and memory of past locations is some-what similar to the classic MAB.

To illustrate the differences, consider an environmentwith two patch types H and L, no depletion of patches,no switching cost, and where the forager knows the patchreward rates values λH and λL (treated in Figs. 2 & 3).Within a patch, the forager uses its experience to decidewhen to leave the current patch, since they may not knowthe patch type on arrival. If the forager does not retainmemory of the locations of specific patches, then theywill randomly come upon the next patch of either typeH or L according to the statistics of the environment,and must repeat the process of inferring the reward ar-rival rate. Now consider a K-arm bandit where someproportion have reward probability λH and some haveλL. At each step, the gambler chooses which of the Karms to exploit, and continually updates its estimate ofwhich patches are H versus L, as the space of arms istypically small enough to learn this well. Even with theassumptions of no depletion and zero switching costs, asformulated these are still different decision problems: toformulate the MAB problem as a patch foraging problem,the gambler would have a choice at each step between thesame arm or a randomly chosen different arm.

Regardless, considerations of switching costs, deple-tion, and uncertain future rewards are central to patchforaging and stay-or-go decisions, and related work withthe MAB has considered some of these cases. Switch-ing costs were considered by [60, 71, 72], and includingthese who generally led to remaining at the same armfor longer, similar to patch foraging. The problem of de-pleting rewards is related to the non-stationary bandit,in the specific case where arm reward rates are choicedependent [73, 74]. In a patch with depleting rewards,the expected return changes in a specific and predictableway, and therefore can be treated mathematically, as we

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 22: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

22

Environment Decision strategy/solution Dependencies/Observations

Non-depleting patchesObjective: Minimize time to arrive and remain in highest yielding patch. Known: Food arrival rates of each patch type.

N -patch types

Depart patch when likelihood of being inhighest yielding patch falls below a threshold.N = 2: See Eqs. (3) & (11) and Fig. 2B;N = 3: See Eqs. (14) & (15) and Fig. 4B;N > 3: See Eqs. (17) and Fig. 5B.

• Optimal strategy and arrival time depend on fraction oflow versus high yield patches and discriminability between1st/2nd highest yield patches (Figs. 3-5).

Continuum of patchtypes

Categorize patches as high or low yieldingand depart patch if likelihood of a high patchis below threshold (See Eq. (19) andFig. 5D).

• Time to arrive and remain in high yielding patch isnon-monotonic in departure threshold, and much longerwhen high patches are rare.

DepletingObjective: Maximize mean food intake rate R over a long time (several patches). Known: Initial yield rates of each patch type.

1-patch typeDepart when arrival rate λ(t) falls to athreshold value λθ.

• Matches MVT (Eqs. (22) & (23) and Fig. 6B) except whenthere are very few chunks per patch (Fig. 6C) in which casethe forager should empty the patch.

2-patch types: patchtype known on arrival

Depart when current patch arrival rate λj(t)reaches a threshold.

• Represents the ‘Depletion-dominated’ regime, whichrecovers MVT (See Eqs. (25) & (26) and Fig. 6D).

2-patch types: emptylow yield patch

Wait a time Tθ, then depart patch if no foodis found. If encounter food at t < Tθ, thenuse threshold on inferred yield rate to makeleaving decision (similar to single patch typecase)

• ‘Uncertainty-dominated’ regime deviates from MVT.• Optimal wait time and departure threshold λθ increasewith prevalence of high yielding patch (Fig. 6E,F)

2-patch types: bothhigh and low patcheshave food

Decision via threshold on current estimated

yield rate λ(t) (See Eq. (27)). Chooseoptimal threshold λθ that maximizes longterm food intake rate.

• Optimal return differs from known case when there are fewfood chunks per patch (Fig 7A), converges to known patchcase when there are many chunks per patch (Fig 7B).• Forager stays in low patches too long, leaves high patchestoo soon when there are few chunks per patch(uncertainty-dominated regime, Fig 7C).

Returning to patchesObjective: Minimize time to harvest all food from the environment. Known: Yield rates of each patch type.

Patch type known onarrival

Empty each patch completely before movingto the next one.

• Time to clear environment increase linearly with number ofpatches in environment (Fig. 8F).

2-patches: 1 chunkLeave current patch if food not found withinwaiting time of Tθ (search problem).

• Optimal waiting time decreases with discovery rate ρ(Fig 8C) in contrast to case without returns (Fig. 6E).

2-patches: general case Consume mL0 chunks from first patch, thenthreshold belief to decide patch departure.

•Time to clear environment depends weakly on chunk count(Fig. 8D) and decreases slightly as chunk encounter rateincreases (Fig. 8E).

TABLE I. Detailed taxonomy of patch foraging strategies. Patch decision strategies depend on the environment, andtrends in observables differ for the foraging environments considered in the model. Environments considered include non-depleting patches, depleting patches, and the case of returning to specific patches to fully deplete an environment (see also Fig10). Columns describe important aspects of the optimal decision strategy for each case, along with key model results.

showed in Section IV. Limited/incomplete memory of thereturn of each arm is related to the non-stationary ban-dit, because non-stationarity of return means estimatesof the return of previously visited arms slowly becomesmore uncertain over time. Other MAB task designs con-sider changing conditions: for example, after an abruptchange in the environment (e.g. [75]), estimates of thereturn from each arm can be ’reset’ [76]. A complete re-set of the choice policy represents no memory of whicharm to choose. This is related to the fact that a for-ager often has knowledge of only the reward rate of thecurrent patch - finding the next patch requires explo-ration and possibly an uncertain amount of search costs.Within these contexts, patch foraging is fairly well de-scribed by a non-stationary bandit with (possibly uncer-tain) switching costs, limited memory or discounting ofthe choice policy of certain arms, and a specific form ofnon-stationarity. As such, we see our work providing use-

ful insights about efficient strategies for non-stationaryMAB tasks.

Bayesian learning could be approximated by rein-forcement learning. At best, animals approximate theoptimal learning processes we have described mathemat-ically [61, 77, 78]. One such approximation is the relativepayoff sum (RPS)-learning rule whereby the probabilityof choosing a patch is proportional to the relative payoffthat the respective patch has yielded so far [62], relatedto a matching law identified in operant conditioning [79].A reinforcement learning (RL) framework can also be ap-plied to patch leaving decisions, as discussed by Kollingand Adam [80]. These authors showed that while purelymodel-free RL fails to describe common aspects of patchleaving decisions, model-based predictions of expectedrewards may be used to decide when to leave a patch,in conjunction with model-free RL that can be used toevaluate the values of different decision strategies across

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 23: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

23

patches.In our theoretical treatment, we assume the forager

uses the correct model of in-patch return rates, consist-ing of stochastic rewards with an underlying constantrate (non-depleting environment) or decreasing rate (de-pleting environment). Even if a Bayesian forager didnot know a priori whether patches were depleting ornot, or what the depletion increment was, a hierarchi-cal Bayesian update could be used to learn this. In ourwork, the forager uses a threshold θ on the LLR to makepatch leaving decisions in non-depleting environments,or a threshold λθ on the inferred arrival rate to triggerpatch depatures in depleting environments. Althoughwe obtained explicit solutions for the optimal thresholdsθopt or λopt

θ by considering values that lead to maximalreturn, we note that within the framework of [80], anagent could use model-free RL to learn the optimal val-ues of these parameters. More work needs to be done inorder to compare Bayesian mechanisms of learning to RLin the context of patch foraging decisions.History and memory effects in patch foraging.Animals performing trained [81–83] and naturalistic be-havior [84–86] make history-dependent decisions basedon the stimuli they have experienced and the responsesthey have made. History-dependence is likely mediatedby both short and long term memory processes. Mem-ory for food availability is crucial for efficient foraging,for example in starlings [57, 87]. Movement and searchpatterns of foraging animals are also governed by envi-ronmental estimates from memory [88]. Our theoreticaltreatment mostly accounts for memory by assuming theforager retains knowledge of their patch’s statistics andthe statistics of the environment, as agents update theirpatch type beliefs by sampling patch food resources.

C. Experimental applications: Behavior andneuroscience

Experimental realizations of foraging behavior.There is currently a surge of interest in systems neu-roscience to study naturalistic behavior as opposed totraditional experimental designs in which animals aretrained over long times to perform particular behavioraltasks (‘trained behavior’). One important distinction be-tween the type of foraging behavior we are studying hereand trained decision making tasks in perceptual deci-sion making is that the former is continuous, withouta specific trial structure, while the latter have a (re-peated) trial structure predetermined by experimenters.The theoretical framework in this work will help exper-imenters move from trial based behavior to continuousbehavior without the loss of quantitative rigor. Studyingbehavior in this continuous paradigm offers importantadvantages over trial based paradigms as it allows multi-sensory integration to unfold, increases the fraction oftime the animal is engaged in a task, engages the animal’ssensory-cognitive-motor loops activated by the environ-

ment’s natural statistics, and allows neural dynamics tounfold over multiple timescales [89].

Although recent work has examined foraging taskswithin a systems neuroscience framework [90–93], thereremain many open questions related to how the brainprocesses key aspects of foraging decisions. For example,experiments with the “Self-control preparation,” wherean animal chooses between two choices, have significantbehavioral differences with those that have a sequentialforaging “patch preparation”, even though from an eco-nomic standpoint, the setups are equivalent [94]. More-over, many tasks do not allow the animal to move ex-tensively in space, which is a crucial aspect of foraging.For instance, when rats are required to physically moveto perform foraging, the observed behavior differs fromtasks that “simulate” foraging by presenting sequentialchoices or that consider visual search [95].

Neural implementation of patch foraging deci-sions. As we are aiming to provide a conceptual frame-work for patch foraging experiments in systems neuro-science, a crucial aim of such experiments is to map braincircuits underlying patch foraging decisions. The behav-ioral conceptual framework we have treated in this studyincludes behavioral mechanisms underlying patch leavingdecisions in different types of environments. By exten-sion, those behavioral mechanisms should map onto neu-ral mechanisms. As in classic perceptual decision-makingtasks, sequential updating models provide key insightsinto the types of neural computations one should expectto see in subjects performing those decisions [50].

Fitting our behavioral model to both the behavioraldata and the neural data simultaneously [96] can unravela series of intricate neural computations underlying par-ticular behavioral strategies. To date, only a few brainareas have been shown to be involved in different as-pects of foraging decisions: ventromedial prefrontal cor-tex encodes values of well-defined options, and anteriorcingulate cortex encodes the average value of the foragingenvironment and cost of foraging [97] or signals changesin environmental context [98, 99]. Although experimen-talists have traditionally targeted selected stereotypicalareas, the increased availability and versatility of largescale recording techniques will enable future experimentsto map the global circuits underlying foraging decisions,thus accessing a brain wide representation of intricatebehavioral strategies [100].

Comparative patch foraging behavior. Compara-tive approaches, identifying similarities and differencesacross species, are a key aspect of the behavioral ecologyof foraging [101–103]. Our model enables quantitative,model-based comparisons of decisions strategies betweenanimals, which will improve our understanding of the as-pects of evolution (e.g., selection pressure) shaping for-aging behavior [11, 14]. Since neural circuits are shapedby these pressures too, this opens novel avenues for com-parative systems neuroscience and an evolutionary un-derstanding of neural computations during foraging.

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 24: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

24

D. Model extensions

Our model forms the foundation for multiple avenues offuture theoretical study. One extension, discussed above,concerns using reinforcement learning to tune the patchleaving decision thresholds in the model. While we as-sumed the agent learns the qualities of different patchtypes, we fixed the transit time between patches; anotherextension could focus on learning and decision making inenvironments with distributed transit times. Such a fea-ture could arise via spatial arrangements of patches, withpotentially correlated resource availabilities across adja-cent patches [40, 104].

Effective search is integral to survival in nature [105],and search behavior can give information on individualdecision strategies. For example, a recent study foundthat rats can solve the stochastic travel-salesman prob-lem using a nearest neighbour algorithm [106]. A similarapproach could ask how an animal’s search and naviga-tion pattern interact with different patch leaving deci-sions to create an effective foraging strategy.

Our framework has also developed strategies assumingthe forager utilizes constant thresholds on either the be-lief or estimated arrival rate. However, it is well knownthat in a variety of decision making contexts, optimalstrategies can involve time-dependent decision thresh-olds [52, 53, 107, 108]. Typically, these results arise inthe context of multi-trial experiments in which the qual-ity of evidence on each trial varies stochastically and isinitially unknown. In the non-depleting environments ofour study, the quality of evidence is fixed across patch vis-its and beliefs are used to trigger departure decisions. Assuch, models in this context fit the assumptions of classic,constant threshold optimal policies. On the other hand,in uncertain binary depleting environments, we have pro-jected a higher-dimensional description of the patch valueto a single scalar estimate of the patch arrival rate. Assuch, we cannot be sure the parameterized model of patchdeparture decisions is optimal across all models. Leverag-ing methods from dynamic programming commonly usedto set optimal decision policies would be a fruitful nextstep in ensuring the optimality of our patch leaving de-cision strategies.

We defined optimal patch foraging as maximizing re-source intake. However, an alternative formulation couldconsider an explicit trade-off between gathering resources(exploitation) and gathering information about the envi-ronment (exploration). Information foraging frames anagent’s movement strategy by prioritizing the value of in-formation over resource gathering [109]. In our problem,

information foraging would target strategies that reducethe time to learn the environmental distribution of patchqualities (as in Section V), rather than maximize the re-ward rate, thus prioritizing the role of exploration andsampling in foraging [48, 110–113].

Although the natural world typically offers a con-tinuous foraging space, considering discrete resourcepatches facilitates understanding by systematically dis-secting fine scale dynamics of local search from globalsearch [114, 115]. Additionally, animals seems to exploitlocal patches of food resources as if they had partitionedthe world into patches. Nonetheless, not all resourcesexist in patches; a further extension of our theoreticalframework could consider a continuous world withoutpatches. Assuming some regularity in the distributionof food, it would then be efficient to consider stochas-tic versions of gradient ascent so the forager could orientitself in the direction of higher food concentrations.

Another important extension will be to consider in-teractions between agents, either through predator-preyinteractions which affect foraging decisions [116], socialforaging of groups collectively foraging [117, 118], or evencompetitive foraging [119]. In our model, the agent onlyreceives direct (non-social) information about resourceavailability; in collective foraging, an individual receivesboth social and non-social information [120, 121]. Thiscan significantly affect foraging decisions, for examplewhen an individual must balance a desire to obtain moreresources with a desire to maintain group cohesion.

To conclude, our model establishes a formal frameworkfor a natural behavior (patch foraging) that can be stud-ied in the same formal rigor as many trained behavioraltasks. Guided by the philosophy that formal tractabil-ity of a behavior entails neural mechanistic tractability,future work will build on this framework to generatetestable hypotheses on the neural mechanistic underpin-nings of foraging behavior.

ACKNOWLEDGMENTS

We thank Jan Drugowitsch for helpful discussionsregarding the optimality of decision threshold policiesacross patch visits. ZPK was supported by a CRCNSgrant (R01MH115557) and NSF (DMS-1853630). AEHacknowledges support by NIH grant 1R21MH121889-01. JDD acknowledges support by the DeutscheForschungsgemeinschaft (DFG, German Research Foun-dation) under Germany’s Excellence Strategy - EXC2117 - 422037984, as well as support from the Heidel-berg Academy of Science.

[1] Adam J Calhoun, Sreekanth H Chalasani, andTatyana O Sharpee. Maximally informative foragingby caenorhabditis elegans. Elife, 3, 2014.

[2] Joshua S. Greene, Maximillian Brown, May Do-

bosiewicz, Itzel G. Ishida, Evan Z. Macosko, XinxingZhang, Rebecca A. Butcher, Devin J. Cline, Patrick T.McGrath, and Cornelia I. Bargmann. Balancing selec-tion shapes density-dependent foraging behaviour. Na-

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 25: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

25

ture, 539(7628):254–258, Nov 2016.[3] Marla B Sokolowski, Clement Kent, and Jonathan

Wong. Drosophila larval foraging behaviour: develop-mental stages. Animal behaviour, 32(3):645–651, 1984.

[4] Dan Landayan, David S Feldman, and Fred W Wolf.Satiation state-dependent dopaminergic control of for-aging in drosophila. Scientific reports, 8(1):1–9, 2018.

[5] Barnaby Marsh, Cynthia Schuck-Paim, and Alex Kacel-nik. Energetic state during learning affects foragingchoices in starlings. Behavioral Ecology, 15(3):396–399,2004.

[6] Cynthia Schuck-Paim and Alex Kacelnik. Rationalityin risk-sensitive foraging choices by starlings. AnimalBehaviour, 64(6):869–879, 2002.

[7] V Arcis and D Desor. Influence of environment structureand food availability on the foraging behaviour of thelaboratory rat. Behavioural processes, 60(3):191–198,2003.

[8] Evan C Carter and A David Redish. Rats value timedifferently on equivalent foraging and delay-discountingtasks. Journal of Experimental Psychology: General,145(9):1093, 2016.

[9] Lisa M Rose. Sex differences in diet and foraging be-havior in white-faced capuchins (cebus capucinus). In-ternational Journal of Primatology, 15(1):95–114, 1994.

[10] PA Garber. Foraging strategies among living primates.Annual review of Anthropology, 16(1):339–364, 1987.

[11] Thomas T Hills. Animal foraging and the evolution ofgoal-directed cognition. Cognitive science, 30(1):3–41,2006.

[12] David W Stephens, Joel S Brown, and Ronald C Yden-berg. Foraging: behavior and ecology. University ofChicago Press, 2008.

[13] Adam J Calhoun and Benjamin Y Hayden. The foragingbrain. Current Opinion in Behavioral Sciences, 5:24–31,2015.

[14] Nigel E Raine, Thomas C Ings, Anna Dornhaus, Ne-hal Saleh, and Lars Chittka. Adaptation, genetic drift,pleiotropy, and history in the evolution of bee foragingbehavior. Advances in the Study of Behavior, 36:305–354, 2006.

[15] Marla B Sokolowski, H Sofia Pereira, and KimberleyHughes. Evolution of foraging behavior in drosophila bydensity-dependent selection. Proceedings of the NationalAcademy of Sciences, 94(14):7373–7377, 1997.

[16] BJ King. Extractive foraging and the evolution of pri-mate intelligence. Human evolution, 1(4):361, 1986.

[17] Dean Mobbs, Pete C Trimmer, Daniel T Blumstein, andPeter Dayan. Foraging for foundations in decision neu-roscience: Insights from ethology. Nature Reviews Neu-roscience, 19(7):419, 2018.

[18] Benjamin Y Hayden and Mark E Walton. Neuroscienceof foraging. Frontiers in neuroscience, 8:81, 2014.

[19] Sam Hall-McMaster and Fabrice Luyckx. Revisiting for-aging approaches in neuroscience. Cognitive, Affective,& Behavioral Neuroscience, 19(2):225–230, 2019.

[20] Jeffrey K Waage. Foraging for patchily-distributed hostsby the parasitoid, nemeritis canescens. The Journal ofAnimal Ecology, pages 353–371, 1979.

[21] John McNamara. Optimal patch use in a stochas-tic environment. Theoretical Population Biology,21(2):269–288, Apr 1982.

[22] Eric L Charnov. Optimal foraging, the marginal valuetheorem. Theoretical population biology, 9(2):129–136,

1976.[23] Roger L Mellgren. Foraging in a simulated natural en-

vironment: There’s a rat loose in the lab. Journal of theExperimental Analysis of Behavior, 38(1):93–100, 1982.

[24] Innes Cuthill and Alejandro Kacelnik. Central placeforaging: a reappraisal of the ‘loading effect’. AnimalBehaviour, 40(6):1087–1101, 1990.

[25] H. Rita and E. Ranta. Stochastic patch exploitationmodel. Proceedings of the Royal Society of London B:Biological Sciences, 265(1393):309–315, Feb 1998.

[26] Gerard Driessen and Carlos Bernstein. Patch departuremechanisms and optimal host exploitation in an insectparasitoid. Journal of Animal Ecology, 68(3):445–459,1999.

[27] Jane F Ward, Roger M Austin, and David W Mac-donald. A simulation model of foraging behaviour andthe effect of predation risk. Journal of Animal Ecology,69(1):16–30, 2000.

[28] Sigrunn Eliassen, Christian Jørgensen, Marc Mangel,and Jarl Giske. Quantifying the adaptive value of learn-ing in foraging behavior. The American Naturalist,174(4):478–489, Oct 2009.

[29] Dale E Taneyhill. Patch departure behavior of bum-ble bees: rules and mechanisms. Psyche: A Journal ofEntomology, 2010, 2010.

[30] Feng Zhang and Cang Hui. Recent experience-drivenbehaviour optimizes foraging. Animal behaviour, 88:13–19, 2014.

[31] Jacob D Davidson and Ahmed El Hady. Foraging asan evidence accumulation process. PLoS computationalbiology, 15(7):e1007060, 2019.

[32] Jay M Biernaskie, Steven C Walker, and Robert JGegear. Bumblebees learn to forage like bayesians. TheAmerican Naturalist, 174(3):413–423, 2009.

[33] Richard F. Green. A simpler, more general method offinding the optimal foraging strategy for bayesian birds.Oikos, 112(2):274–284, 2006.

[34] John M McNamara, Richard F Green, and Ola Ols-son. Bayes’ theorem and its applications in animal be-haviour. Oikos, 112(2):243–251, 2006.

[35] Ola Olsson and Noel MA Holmgren. The survival-rate-maximizing policy for bayesian foragers: wait for goodnews. Behavioral Ecology, 9(4):345–353, 1998.

[36] John McNamara and Alasdair Houston. The applicationof statistical decision theory to animal behaviour. Jour-nal of Theoretical Biology, 85(4):673–690, Aug 1980.

[37] Ola Olsson. Bayesian foraging with only two patchtypes. Oikos, 112(2):285–297, 2006.

[38] Jean-Sebastien Pierre and Richard F Green. A bayesianapproach to optimal foraging in parasitoids. Behavioralecology of insect parasitoids. Blackwell, Oxford, pages357–383, 2008.

[39] Ola Olsson and Noel MA Holmgren. Gaining ecolog-ical information about bayesian foragers through theirbehaviour. i. models with predictions. Oikos, pages 251–263, 1999.

[40] Jan A Van Gils. State-dependent bayesian foragingon spatially autocorrelated food distributions. Oikos,119(2):237–244, 2010.

[41] Miguel A Rodriguez-Girones and Rodrigo A Vasquez.Density-dependent patch exploitation and acquisitionof environmental information. Theoretical PopulationBiology, 52(1):32–42, 1997.

[42] Aaron M Ellison. Bayesian inference in ecology. Ecology

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 26: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

26

letters, 7(6):509–520, 2004.[43] Juan C Alonsa, Javier A Alonsa, Luis M Bautista, and

Rodrigo Munoz-Pulido. Patch use in cranes: a fieldtest of optimal foraging predictions. Animal Behaviour,49(5):1367–1379, 1995.

[44] Richard F Green. Bayesian birds: a simple example ofoaten’s stochastic model of optimal foraging. TheoreticalPopulation Biology, 18(2):244–256, 1980.

[45] OLA Olsson, Ulf Wiktander, Noel MA Holmgren, andSven G Nilsson. Gaining ecological information aboutbayesian foragers through their behaviour. ii. a field testwith woodpeckers. Oikos, pages 264–276, 1999.

[46] Thomas J. Valone. Are animals capable of bayesianupdating? an empirical review. Oikos, 112(2):252–259,2006.

[47] Peter R Killeen, Gina-Marie Palombo, Lawrence R Got-tlob, and Jon Beam. Bayesian analysis of foraging bypigeons (columba livia). Journal of Experimental Psy-chology: Animal Behavior Processes, 22(4):480, 1996.

[48] Peter Nonacs and Joanne L Soriano. Patch samplingbehaviour and future foraging expectations in argentineants, linepithema humile. Animal behaviour, 55(3):519–527, 1998.

[49] Thomas J Valone and Joel S Brown. Measuringpatch assessment abilities of desert granivores. Ecology,70(6):1800–1810, 1989.

[50] Joshua I Gold and Michael N Shadlen. The neural basisof decision making. Annual review of neuroscience, 30,2007.

[51] Bingni W Brunton, Matthew M Botvinick, and Carlos DBrody. Rats and humans can optimally accumulate ev-idence for decision-making. Science, 340(6128):95–98,2013.

[52] Jan Drugowitsch, Ruben Moreno-Bote, Anne K Church-land, Michael N Shadlen, and Alexandre Pouget. Thecost of accumulating evidence in perceptual decisionmaking. Journal of Neuroscience, 32(11):3612–3628,2012.

[53] Jan Drugowitsch, Ruben Moreno-Bote, and AlexandrePouget. Optimal decision-making with time-varying ev-idence reliability. In Advances in neural informationprocessing systems, pages 748–756, 2014.

[54] Crispin W Gardiner. Handbook of stochastic methods,volume 3. Springer, 2004.

[55] Rafal Bogacz, Peter T Hu, Philip J Holmes, andJonathan D Cohen. Do humans produce thespeed–accuracy trade-off that maximizes reward rate?The Quarterly Journal of Experimental Psychology,63(5):863–891, 2010.

[56] Christopher M Glaze, Alexandre LS Filipowicz,Joseph W Kable, Vijay Balasubramanian, and Joshua IGold. A bias–variance trade-off governs individual dif-ferences in on-line learning in an unpredictable environ-ment. Nature Human Behaviour, 2(3):213–224, 2018.

[57] Innes C. Cuthill, Alejandro Kacelnik, John R. Krebs,Patsy Haccou, and Yoh Iwasa. Starlings exploitingpatches: the effect of recent experience on foraging deci-sions. Animal Behaviour, 40(4):625–640, October 1990.

[58] Yannick Outreman, Anne Le Ralec, Eric Wajnberg, andJean-Sebastien Pierre. Effects of within- and among-patch experiences on the patch-leaving decision rules inan insect parasitoid. Behavioral Ecology and Sociobiol-ogy, 58(2):208–217, June 2005.

[59] Steven L Scott. A modern bayesian look at the multi-

armed bandit. Applied Stochastic Models in Businessand Industry, 26(6):639–658, 2010.

[60] Jeffrey S. Banks and Rangarajan K. Sundaram. Switch-ing costs and the Gittins index. Econometrica,62(3):687–694, 1994.

[61] Alan C Kamil and Sonja I Yoerg. Learning and foragingbehavior. In Ontogeny, pages 325–364. Springer, 1982.

[62] Klaus Regelmann. Learning to forage in a variable envi-ronment. Journal of theoretical biology, 120(3):321–329,1986.

[63] Sara M Constantino and Nathaniel D Daw. Learning theopportunity cost of time in a patch-foraging task. Cog-nitive, Affective, & Behavioral Neuroscience, 15(4):837–853, 2015.

[64] Jan A Van Gils, Ingrid W Schenk, Oscar Bos, and The-unis Piersma. Incompletely informed shorebirds thatface a digestive constraint maximize net energy gainwhen exploiting patches. The American Naturalist,161(5):777–793, 2003.

[65] Raymond HG Klaassen, Bart A Nolet, Jan A van Gils,and Silke Bauer. Optimal movement between patchesunder incomplete information about the spatial distri-bution of food items. Theoretical Population Biology,70(4):452–463, 2006.

[66] Jessica J Hughes, David Ward, and Michael R Perrin.Effects of substrate on foraging decisions by a namibdesert gerbil. Journal of Mammalogy, 76(2):638–645,1995.

[67] Thomas J Valone. Bayesian and prescient assessment:foraging with pre-harvest information. Animal Be-haviour, 41(4):569–577, 1991.

[68] Elizabeth A. Marschall, Peter L. Chesson, and Roy A.Stein. Foraging in a patchy environment: prey-encounter rate and residence time distributions. AnimalBehaviour, 37:444–454, March 1989.

[69] Alex Kacelnik and Melissa Bateson. Risky theories—theeffects of variance on foraging decisions. American Zo-ologist, 36(4):402–434, 1996.

[70] Vaibhav Srivastava, Paul Reverdy, and Naomi E.Leonard. On optimal foraging and multi-armed ban-dits. In 2013 51st Annual Allerton Conference on Com-munication, Control, and Computing (Allerton), pages494–499, Monticello, IL, October 2013. IEEE.

[71] Sudipto Guha and Kamesh Munagala. Multi-armedBandits with Metric Switching Costs. In ICALP, 2009.

[72] M. Asawa and D. Teneketzis. Multi-armed bandits withswitching penalties. IEEE Transactions on AutomaticControl, 41(3):328–348, March 1996.

[73] Omar Besbes, Yonatan Gur, and Assaf Zeevi. StochasticMulti-Armed-Bandit Problem with Non-stationary Re-wards. In Z. Ghahramani, M. Welling, C. Cortes, N. D.Lawrence, and K. Q. Weinberger, editors, Advances inNeural Information Processing Systems 27, pages 199–207. Curran Associates, Inc., 2014.

[74] D.E. Koulouriotis and A. Xanthopoulos. Reinforcementlearning and evolutionary algorithms for non-stationarymulti-armed bandit problems. Applied Mathematics andComputation, 196(2):913–922, March 2008.

[75] Lai Wei and Vaibhav Srivatsva. On Abruptly-Changingand Slowly-Varying Multiarmed Bandit Problems. In2018 Annual American Control Conference (ACC),pages 6291–6296, June 2018. ISSN: 2378-5861.

[76] Cedric Hartland, Sylvain Gelly, Nicolas Baskiotis,Olivier Teytaud, and Michele Sebag. Multi-armed Ban-

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 27: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

27

dit, Dynamic Environments and Meta-Bandits. InnIPS-2006 Workshop, Online Trading Between Explo-ration and Exploitation, volume hal-00113668, Whistler,Canada, 2006.

[77] JG Ollason. Learning to forage in a regenerating patchyenvironment: can it fail to be optimal? TheoreticalPopulation Biology, 31(1):13–32, 1987.

[78] Alexandra T Pietrewicz and Jerry B Richards. Learningto forage: an ecological perspective. Lawrence ErlbaumAssociates, 1985.

[79] Michael Davison and Dianne McCarthy. The matchinglaw: A research review. Routledge, 2016.

[80] Nils Kolling and Thomas Akam. (reinforcement?) learn-ing to forage optimally. Current Opinion in Neurobiol-ogy, 46:162–169, Oct 2017.

[81] Athena Akrami, Charles D Kopec, Mathew E Diamond,and Carlos D Brody. Posterior parietal cortex representssensory history and mediates its effects on behaviour.Nature, 554(7692):368–372, 2018.

[82] Ofri Raviv, Merav Ahissar, and Yonatan Loewenstein.How recent history affects perception: the normativeapproach and its heuristic approximation. PLoS com-putational biology, 8(10), 2012.

[83] Paymon Ashourian and Yonatan Loewenstein. Bayesianinference underlies the contraction bias in delayed com-parison tasks. PloS one, 6(5), 2011.

[84] Masayuki Koganezawa, Hiroaki Hara, YoshinoriHayakawa, and Ichiro Shimada. Memory effects onscale-free dynamics in foraging drosophila. Journal oftheoretical biology, 260(3):353–358, 2009.

[85] Claude Belisle and James Cresswell. The effects of a lim-ited memory capacity on foraging behavior. TheoreticalPopulation Biology, 52(1):78–90, 1997.

[86] Alan C. Kamil and Kevin C. Clements. Learning, mem-ory, and foraging behavior. In Contemporary issues incomparative psychology, pages 7–30. Sinauer Associates,Sunderland, MA, US, 1990.

[87] Melissa Bateson and Alex Kacelnik. Accuracy of mem-ory for amount in the foraging starling, sturnus vulgaris.Animal Behaviour, 50(2):431–443, 1995.

[88] Chloe Bracis, Eliezer Gurarie, and R BramVan Moorter. Memory effects on movement behaviorin animal foraging. PloS one, 10(8), 2015.

[89] Alexander Huk, Kathryn Bonnen, and Biyu J He. Be-yond trial-based paradigms: Continuous behavior, on-going neural activity, and natural stimuli. Journal ofNeuroscience, 38(35):7551–7558, 2018.

[90] David L Barack, Steve WC Chang, and Michael L Platt.Posterior cingulate neurons dynamically signal decisionsto disengage during foraging. Neuron, 96(2):339–347,2017.

[91] Benjamin Y Hayden, John M Pearson, and Michael LPlatt. Neuronal basis of sequential foraging decisions ina patchy environment. Nature neuroscience, 14(7):933,2011.

[92] Eran Lottem, Dhruba Banerjee, Pietro Vertechi, DarioSarra, Matthijs oude Lohuis, and Zachary F Mainen.Activation of serotonin neurons promotes active persis-tence in a probabilistic foraging task. Nature communi-cations, 9(1):1–12, 2018.

[93] Pietro Vertechi, Eran Lottem, Dario Sarra, BeatrizGodinho, Isaac Treves, Tiago Quendera, Matthijs Nico-lai Oude Lohuis, and Zachary F Mainen. Inference-based decisions in a hidden state foraging task: Differ-

ential contributions of prefrontal cortical areas. Neuron,2020.

[94] David W Stephens. Decision ecology: foraging and theecology of animal decision making. Cognitive, Affective,& Behavioral Neuroscience, 8(4):475–484, 2008.

[95] Andrew M Wikenheiser, David W Stephens, andA David Redish. Subjective costs drive overly patientforaging strategies in rats on an intertemporal foragingtask. Proceedings of the National Academy of Sciences,page 201220738, 2013.

[96] Eleanor Batty, Matthew Whiteway, Shreya Saxena,Dan Biderman, Taiga Abe, Simon Musall, WinthropGillis, Jeffrey Markowitz, Anne Churchland, John PCunningham, Sandeep R Datta, Scott Linderman,and Liam Paninski. BehaveNet: nonlinear em-bedding and Bayesian neural decoding of behavioralvideos. In H. Wallach, H. Larochelle, A. Beygelzimer,F. d\textquotesingle Alche-Buc, E. Fox, and R. Gar-nett, editors, Advances in Neural Information Process-ing Systems 32, pages 15706–15717, 2019.

[97] Nils Kolling, Timothy EJ Behrens, Rogier B Mars, andMatthew FS Rushworth. Neural mechanisms of forag-ing. Science, 336(6077):95–98, 2012.

[98] Amitai Shenhav, Mark A Straccia, Jonathan D Cohen,and Matthew M Botvinick. Anterior cingulate engage-ment in a foraging context reflects choice difficulty, notforaging value. Nature neuroscience, 17(9):1249, 2014.

[99] Amitai Shenhav, Matthew M Botvinick, andJonathan D Cohen. The expected value of con-trol: an integrative theory of anterior cingulate cortexfunction. Neuron, 79(2):217–240, 2013.

[100] Harris S Kaplan and Manuel Zimmer. Brain-wide rep-resentations of ongoing behavior: a universal principle?Current Opinion in Neurobiology, 64:60–69, 2020.

[101] David W Roubik. Comparative foraging behavior of apismellifera and trigona corvina (hymenoptera: Apidae)on baltimora recta (compositae). Revista de BiologiaTropical, 29(2):177–183, 1981.

[102] Frank J Bonaccorso, John R Winkelmann, Danny Shin,Caroline I Agrawal, Nadia Aslami, Caitlin Bonney, An-drea Hsu, Phoebe E Jekielek, Allison K Knox, Stephen JKopach, et al. Evidence for exploitative competition:comparative foraging behavior and roosting ecology ofshort-tailed fruit bats (phyllostomidae). Biotropica,39(2):249–256, 2007.

[103] Eric Wajnberg, Pierre-Alexis Gonsard, ElisabethTabone, Christine Curty, Nathalie Lezcano, and Ste-fano Colazza. A comparative analysis of patch-leavingdecision rules in a parasitoid family. Journal of AnimalEcology, pages 618–626, 2003.

[104] Vaibhav Srivastava, Paul Reverdy, and Naomi EhrichLeonard. Correlated multiarmed bandit problem:Bayesian algorithms and regret analysis. arXiv preprintarXiv:1507.01160, 2015.

[105] Ran Nathan. An emerging movement ecology paradigm.Proceedings of the National Academy of Sciences,105(49):19050–19051, 2008.

[106] Brian J Jackson, Gusti Lulu Fatima, Sujean Oh, andDavid Henry Gire. Many paths to the same goal: meta-heuristic operation of brains during natural behavior.bioRxiv, page 697607, 2019.

[107] Paul Cisek, Genevieve Aude Puskas, and Stephany El-Murr. Decisions in changing conditions: the urgency-gating model. Journal of Neuroscience, 29(37):11560–

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint

Page 28: Normative theory of patch foraging decisions · nects normative theory of foraging decisions with mech-anistic drift-di usion models, as proposed in [31]. To realize our theoretical

28

11571, 2009.[108] Gaurav Malhotra, David S Leslie, Casimir JH Ludwig,

and Rafal Bogacz. Time-varying decision boundaries:insights from optimality analysis. Psychonomic bulletin& review, 25(3):971–996, 2018.

[109] Massimo Vergassola, Emmanuel Villermaux, and Boris IShraiman. ‘infotaxis’ as a strategy for searching withoutgradients. Nature, 445(7126):406–409, 2007.

[110] Beat Naef-Daenzer. Patch time allocation and patchsampling by foraging great and blue tits. Animal be-haviour, 59(5):989–999, 2000.

[111] David William Stephens. Stochasticity in foraging the-ory: risk and information. PhD thesis, University ofOxford, 1982.

[112] SM Dow and SEG Lea. Sampling of schedule param-eters by pigeons: tests of optimizing theory. Animalbehaviour, 35(1):102–114, 1987.

[113] Donald L Kramer and Daniel M Weary. Explorationversus exploitation: a field study of time allocation toenvironmental tracking by foraging chipmunks. AnimalBehaviour, 41(3):443–449, 1991.

[114] Simon A Levin, Thomas M Powell, and John H Steele.Patch dynamics, volume 96. Springer Science & Busi-

ness Media, 2012.[115] Daniel Grunbaum. The logic of ecological patchiness.

Interface Focus, 2(2):150–155, 2012.[116] Charles H Greene. Patterns of prey selection: implica-

tions of predator foraging tactics. The American Natu-ralist, 128(6):824–839, 1986.

[117] Colin W Clark and Marc Mangel. Foraging and flock-ing strategies: information in an uncertain environment.The American Naturalist, 123(5):626–641, 1984.

[118] Luc-Alain Giraldeau and Thomas Caraco. Social for-aging theory, volume 73. Princeton University Press,2018.

[119] DGC Harper. Competitive foraging in mallards:“idealfree’ducks. Animal Behaviour, 30(2):575–584, 1982.

[120] Jennifer J Templeton and Luc-Alain Giraldeau. Vicar-ious sampling: the use of personal and public informa-tion by starlings foraging in a simple patchy environ-ment. Behavioral Ecology and Sociobiology, 38(2):105–114, 1996.

[121] Alfonso Perez-Escudero and Gonzalo G. de Polavieja.Collective Animal Behavior from Bayesian Estimationand Probability Matching. PLOS Computational Biol-ogy, 7(11):e1002282, November 2011.

was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (whichthis version posted April 24, 2020. . https://doi.org/10.1101/2020.04.22.055558doi: bioRxiv preprint