BIBA · BIBA IST–2001-32115 Bayesian Inspired Brain and Artefacts: Using probabilistic logic to...

BIBAIST–2001-32115

Bayesian Inspired Brain and Artefacts:Using probabilistic logic to understand brain function and

implement life-like behavioural co-ordination

The biological and neural handling of probabilities

‘State of the Art’ report: issues with potential relevance to robotics

Deliverable: 6Workpackage: 1Month due: November 2002Contract Start Date: 01.11.2001 Duration: 48 months

Project Co-ordinator: INRIA–UMR-GRAVIR

Partners: CNRS–UMR-GRAVIR; UCL-ARGM; UCAM-DPOL;CNRS-UMR-LPPA; CDF-UMR-LPPA; EPFL!; MIT-NSL

Project funded by the European Communityunder the “Information Society Technologies”Programme (1998-2002)

The biological and neural handling of probabilities

Horace Barlow and Tony Gardner-Medwin

BIBA – WP1 - Oct 2002

‘State of the Art’ report: issues with potential relevance to robotics

PART A: The biological importance of prior knowledge acquired through evolution

1. Three types of learning2. Evolutionary learning3. The storage medium4. Sampling the gene pool5. Mutations6. Organisation of the individual genome7. Sex and the single genome8. Linkage9. Crossing over10. Sexual selection11. Conclusion

PART B: The neural handling of probability

12. When and why does the nervous system need to handle probabilities?13. Fraud: a biological invention14. Behaviour depending directly on probability: attention, orientation and habituation15. Adaptation and the suppression of responses to uniform stimuli16. Uncertainty and confidence: the expression of probability17. Clarification of relevant terms and concepts in neural systems18. How can probabilities be estimated and stored?19. Weights of evidence: the handling probabilistic inference by addition20. The problem of prior probabilities21. Changing environments: the handling of fluctuating and conditional probabilities22. Equivalence in neurons of absent features and absent information: sparse evidence23. Conclusion

Bibliography

2

PART A: The biological importance of prior knowledge acquired through evolution

1. Three types of learningEngineers are often impressed by the way animals can pick out signals from noisy backgroundsand handle problems of probabilistic inference with seemingly high efficiency. This isattributable, partly at least, to the brain's ability to estimate important prior probabilities of itsenvironment, but it is must be understood that living organisms acquire this knowledge of theenvironments they live in by three different types of learning. The first is the slow process ofevolutionary adaptation, the second is the familiar process that occurs in the brain of eachindividual, and there is a third type, restricted to higher organisms and humans in particular, thatuses external stored knowledge.

The results of the first of these three types of learning, which we shall call "evolutionarylearning", are stored in the gene pool of a species. This store is formed by natural selectionduring evolution and is composed of the genes in the genomes of all individuals of that species.The genome of each individual is the subset of genes from the species gene pool that itpossesses, and these are accessed when the processes of development construct each neworganism under their control. These processes are not altogether familiar to non-biologists, andviewing evolution as a learning process may seem strange; some of the aspects relevant toestimating, storing, and accessing probabilities will therefore be outlined briefly in Part A of thisdocument.

The second type, "brain learning", is the familiar type in which each individual experiences non-random occurrences and contingencies in its own environment, and exploits them for its ownbenefit. In this type of learning probabilities are currently thought to be stored mainly aschanges in functional connectivity between brain cells, and it is the type of learning that BIBAparticipants are likely to think is of greatest importance for the project, and is the subject of PartB of this document. One of the messages of Part A, however, is that one should be wary of thisprejudice, for on the rare occasions when the mechanisms behind an example of biologicalbehaviour have been elucidated thoroughly enough, evolutionary learning turns out to haveplayed a very important role.

The third type of learning, which we call "academic learning", is distinguished by the fact that itsproducts are stored neither in genes, nor for the main part in the connectivity of brain cells, but inthe world's libraries, journals, and general "know how". For millenia, humans have acquired andrecorded much verifiable knowledge of our environment, and for even longer it has been passedon through oral tradition, separate from direct experience. This knowledge is available tosupplement, or substitute for, knowledge stored through evolutionary learning and acquireddirectly from the experience and inferential capabilities of an individual. One of the great debatesof education is the relative emphasis that should be placed on “academic learning” (and its muchless rich relations: “book learning” and “rote learning”) in contrast to the development ofindividual skills to enhance inference and the process of learning from experience.

It is obvious that the time scales of these three types of learning are very different:. Evolutionaryor genetic learning generally operates over thousands of generation-times, while brain learninghas to occur over a time short compared with the lifetime of a single individual, and academicknowledge accumulates at an intermediate rate.

In biological systems probabilistic inference occurs against this rich background of knowledge ofthe environment acquired through evolution and incorporated in the gene pool. Although the

3

division between evolutionary and brain learning is clear-cut with regard to how the results arestored, it has proved extremely difficult for biologists to separate their relative contributions inparticular instances. They are very closely interwoven, for the mechanisms that perform brainlearning are themselves the product of evolutionary learning. Furthermore experience caninfluence the expression of a gene, so although the possession of that gene must result fromevolutionary learning, whether it changes the structure and function of the individual organismcan depend upon experience. Hence the message from biology on where the boundary liesbetween what is determined by "nature" and by "nurture" is blurred and indistinct, principallybecause so much in biology depends upon both.

The hope of the BIBA project is that a small corner of academic learning - that concerned withBayesian inference - can be used to improve the "brain learning" of robots. The warning issuedabove suggests that much other academic knowledge about the environment will have to beincorporated to make such a programme work well; in the next sections the mechanisms thatunderlie evolutionary learning will be sketched out, in the hope that giving some reality to themechanisms will stimulate thinking about the nature and the importance of the tasks theyachieve.

2. Evolutionary learningIt is especially worth taking a glimpse at the mechanisms and consequences of evolutionarylearning because these are probably better understood than for any other learning system, exceptperhaps man-made neural networks and the modifications under experience of simple reflexesthat occur in the sea slug Aplysia.

The theory of evolution was proposed and developed in order to explain the great diversity ofanimal forms, the way they come to be organised into phyla and species, and the way the variousforms and their organisation change with time, as shown in the geological record. It will bepresented here in less familiar form as a system for learning about the environment in which theprobabilities of certain events and contingencies are estimated, stored, accessed, and utilised forthe benefit of the animal concerned.

It might seem logical to start by discussing what is learned, but that turns out to be the mostdifficult bit, so instead the medium in which the results are stored will first be explained; then weshall see what it is that causes change, i.e. what it is that is learned about the environment.

3. The storage mediumThe results of evolutionary learning are stored in the "gene pool" of a species. Over-simplifying,one can initially define this as a list of the possible types of gene that occur in that species,together with the specifications of each variant or "allele" of each type, and the relativefrequencies of occurrence of each allele. Oversimplifying again, a gene is a long molecule ofDNA that codes for a particular protein, and it is these proteins that, as they are produced orexpressed by the decoding mechanism, control the development of an individual animal and arethus responsible for its final adult form, or "phenotype" as it is called.

The two main simplifications that have been made above are, first, to overlook the fact that thereare constraints on the subsets of alleles that an individual can possess; the gene pool of a specieshas an organisation that the above definition ignores. A list of alleles and their frequencies doesnot provide an adequate description of a species gene pool, just as an alphabet and the letterfrequencies do not provide an adequate description of a language. And secondly there are many

4

genes, especially in more highly organised species like humans, that are expressed as RNArather than proteins and play important roles in controlling the expression of other genes duringdevelopment. The first of these is certainly important for understanding evolutionary learning,but let us start with the oversimplified system.

The outline of the learning process is as follows:- Each individual of a species has a useablesubset of genes from the gene pool (the term "useable subset" will be defined later, and also theway it is derived from the gene pool). From these it develops an individual of that species whichgrows, reproduces, and dies in the environment it encounters. Some phenotypes will flourishand produce many offspring, while others will not. All contribute their subsets of genes to thegene pool of the next generation, but the frequencies of genes in the pool becomes biasedtowards that of the phenotypes that have produced many offspring. Thus the constitution of thegene pool changes so that the subset of genes each individual of the next generation draws fromit is more likely to form a phenotype that will flourish in the environment the species encounters.

In this way the constitution of the gene pool is influenced by certain properties of theenvironment, and thus can be said to learn about it. One can see that the properties learnt aboutare the ones that determine the success or otherwise of the phenotypes resulting from the varioussubsets of genes that have been given to the ancestors of the current members of the species.Such properties are hard to define in any other way. They are obviously immensely complicatedaspects of the environment, different for every gene, every species, and every environment, andthis is why they would not have formed a good starting point for this discussion of evolutionarylearning. Evolution learns in a thoroughly empirical way: it just finds out what works best forgiven sets of available genes in a given environment.

4. Sampling the gene poolI think one can see that the evolutionary mechanism sketched above would work when, assuggested, each new individual of a species is given a new useable subset of genes selected atrandom from the entire current gene pool of the species. In this way all possible combinations ofgenes would be eligible for entry into the competition for survival and the variability of theentrants would be maximised, but it neglects two crucial facts. First, each member of the newgeneration can only obtain its genes from its parents, for there is no means for genes from a non-parent to be transmitted to it; and second, it neglects the structure of the genome of a species thatwas oversimplified away four paragraphs back - the individuals of a species do not contain arandom selection of the species gene pool, weighted in accordance with the frequencies of thegenes, for genes do not occur independently of each other: the probability of an individualpossessing one gene is in general strongly dependent upon the other genes it possesses.

Consider first what happens in reality in a species that reproduces asexually: here each individualin a generation is given an exact (or nearly exact - see below) copy of the genome of its parent,and the selection process can only operate on the particular subsets of genes possessed by theimmediately preceding generation, not on new combinations of the entire species gene pool. Buthere we urgently need to know about the inexact copying mentioned above, for this providesadditional material for selection.

5. MutationsDuring the copying processes required to transmit a gene from one generation to the nextmistakes occur that alter the DNA. This introduces new variants of genes and hence increases thevariation among the entrants to the survival competition. Note that a) it has to be a slow process

5

in order for DNA to fulfil its prime role - that of preserving the characteristics of a species fromgeneration to generation; b) the new genes are very rarely improvements on the old genes, in thesense that the resulting phenotypes only very rarely produce more offspring than the unalteredgene would have done. These mutations are important for they are the ultimate source of newexamples of each gene type - new alleles, and they are also one source of genetic "drift" - thechanges in the genome that occur even when there is no selective advantage accompanying thechange. In general the variation between individuals of a generation will depend much more onhow their genes have been drawn from the gene pool than upon the copying errors, so the effectsof selection will depend primarily upon the former.

At this point we need to understand better what constitutes a "useable subset" of genes, how anew individual draws this subset from the gene pool of a species, and how the genome of asexually reproducing individual is organised.

6. Organisation of the individual genomeHumans are now thought to have not many more than 40,000 different genes in their genome.For comparison E. coli, the commonest bacterium inhabiting our gut, has about 4000, and manyof these code for proteins whose function in the living bacterium is now understood; thecorresponding proportion of genes in humans whose function is well understood is muchsmaller, though growing almost daily. In simple organisms the organisation of the genome isvery different from that in sexually reproducing animals, so be warned that the followingdescription is biased in the latter's favour; there are interesting surprises lower down in the scaleof complexity, but let's stick to humans.

The figure 40,000 given above actually refers to gene-types, not to the number of molecules ofdouble-stranded DNA that constitute single genes in our oversimplified scheme. For each gene-type humans have (in almost all cases) a pair of genes, one derived from each parent. In mostcases these are identical, but many gene-types are polymorphic, that is there are two or morealternative genes, or "alleles", of that type. In these cases the pair will often not be identical,though the molecules of DNA will still be very similar and will code for two proteins that differfrom each other in only a few of their amino acids.

The 40,000 gene pairs are divided into 23 groups, one group for each of 23 chromosomes locatedin the nuclei of every cell in the body. The chromosomes are themselves paired bodies, each paircontaining one strand derived from an ovum and the other from a spermatozoon of the parents.The term "useable subset of genes from the species gene pool" simply means a pair of genes ofeach gene-type, one from each parent, for these form a complete set of the chromosomescharacteristic of the species.

7. Sex and the single genomeThis set of chromosomes not only causes the formation of a new individual of the species whentheir genes are expressed in an orderly manner (i.e. usually when they cause the synthesis oftheir corresponding protein at the correct stage of development), but it also provides the unpairedchromosomes that are present in the ovum and sperm. The nuclei of one sperm and the ovumeach contribute an unpaired chromosome to form the paired chromosomes of the fertilised ovum,and it is clearly essential that the gene-types shall be segregated onto the 23 chromosomes in thesame way in both parents, for otherwise the chromosomes in the fertilised ovum would often becomposed of strings of genes of different types in different orders; they would therefore oftenfail to provide the “useable subset” of genes referred to above. The manner in which the gene

6

types are segregated is thus one of the key characteristic of a species, for it is not surprising thatthe jumbled chromosomes resulting from sperm and ovum that do not segregate their genesaccording to the same plan very often fail to develop properly.

From the learning point of view the arrangement sketched above has at least three clearadvantages over asexual formation of each new individual from a single parent. First, each newindividual gets its genes from two individuals from the previous generation, not a single one;since it is important for there to be high variability among the entrants for the survivalcompetition, this is a step in the right direction. Second, accessing the genomes of twoindividuals of the previous generation enables new combinations of genes to occur, whereas withasexual reproduction this cannot happen. Third, a single copy of each gene provides only abinary representation, for it is either present or absent; with two copies there are fourpossibilities if the genes are different, since each can be present or absent. But the system hasyet another advantageous feature that has not yet been described - that of preservingcombinations of genes and enabling them to be passed on, and hence selected for, as a unit.

8. LinkageIf the whole of one strand of each chromosome was identical to a strand in the samechromosome of one of its parents, and the other strand identical to a strand in the samechromosome of the other parent, then all the genes on a strand would be inherited as one unit: ifan individual had one of this set of genes, then it would have all of them. Thus for eachchromosome pair of an individual in the new generation there would be only four possibilities:-S(a)+O(a), S(a)+O(b), S(b)+O(a) and S(b)+O(b), where each strand in a chromosome isarbitrarily labelled a or b, and S and O indicate whether a strand is derived from sperm or ovum.Admittedly with four possibilities for each chromosome pair and 23 chromosome pairs the rangeof combinations is large (423), but it is very much smaller than the range of combinations if the40,000 gene pairs (as opposed to the 23 chromosome pairs) could each be independently selectedfrom the four possibilities (440,000 instead of 423). Is it, however, true that evolutionary selectionwill work best if the range of variation in each new generation is greatest?

It might be advantageous for there to be limitations to the independent inheritance ofcharacteristics, for one gene might well be useless without another also being present. Forinstance a gene that determines that the coat of an animal is dappled cannot exert its effect ifthere are not genes allowing two different skin pigmentations to be expressed, and somethingcomplex like the eye requires the simultaneous expression of a large number of genes in order toproduce a functional organ. It seems likely, therefore, that evolutionary selection will work bestif the pack of genes is not completely shuffled in each new generation, but rather if it could bearranged for groups of genes to be inherited together. If one examines a large number ofoffspring of the same pair of parents it is in fact observed that some pairs of geneticcharacteristics almost always occur together in a particular offspring, whilst others appear almostindependently of each other. This tendency to occur together is called "linkage". How is itbrought about?

9. Crossing overAlthough all the genes in one strand of a chromosome come from the same parent, they comefrom the parent's sperm or ovum, and these strands are not the same as in the parent's other cells.Spermatozoa and ova are produced by a modified form of cell division in which the finaldaughter cell, either ovum or spermatozoon, contains single stranded chromosomes in place of

7

the usual paired ones. One might expect these single strands to correspond to one or other of thestrands in a normal adult chromosome, but this is not the case. Instead the new single strand isformed by a process called "crossing over" in which both strands break and reform (at exactlythe same place if all goes well), but the ends are switched over so that the new strands each havethe head of one of the original strands and the tail of the other. This implies that the observedlinkage between two genetic characteristics should depend upon how far apart the two genesoccur on a chromosome, since this will determine how often a breakage point lies between thetwo.

Notice that the result of crossing over is that each strand in the cells of the offspring correspondsto the single stranded chromosome of the parents sperm or ova, but is different from the singlestrands of the chromosomes in all the other cells of the parents. The demonstration of these factswas a fantastic triumph, and perhaps marks the point when genetics became a real science ratherthan just horticulture or stock breeding.

Placing groups of genes on different chromosomes allows groups of them to tend to be inheritedtogether, while crossing over allows this tendency to be counteracted to a variable extent. Theadditional versatility in the way that groups of genes are inherited together is clearly a fourthenormous advantage of sexual over asexual reproduction, for it provides a means of passing onfrom generation to generation the associative srructure of an individual’s genome, not just therelative gene frequencies. What is passed on in this way can also be selected for, which perhapsaccounts for the unexpected effectiveness of evolution.

10. Sexual selectionSo far the selection applied to each new generation has been considered as a single process: thedegree of success of an individual in one generation is measured by the number of genes (andespecially gene combinations, as we saw above) he or she contributes to the gene pool of thespecies in the next generation. But the number of offspring and their differential survivalcompared with others of that generation are not the only factors involved, for in many speciesindividuals are choosy about whom they mate with. Selection occurs, and since those who donot find mates contribute nothing to the gene pool of the next generation of their species, thisselection process is powerful. It is worth noting that it is applied only to a subset of fertilisedova of the preceding generation - those that successfully develop to adulthood, which is itself aselection process. Two totally different selection processes applied in sequence can clearly dothings that a single selection process cannot. It is the selection of mates (by humans) that hasallowed the domestication of animals and improvement of crops, and in the human species itselfthis type of selection (self-domestication, if you like) has no doubt been involved in theformation, preservation, and modification of our social organisation. Without any doubt, themedia and entertainment worlds are absolutely correct in attaching the importance they do to theprocess of mate selection!

11. ConclusionThis sketch of the genetic basis of evolution is certainly incomplete, for no mention has beenmade of many known sources of genetic variability, of gene repair mechanisms, of jumpinggenes, and many other details. The whole subject has developed only over the last century, but ithas occurred at an ever-increasing rate since the chemical basis (DNA) was established at mid-century. Those familiar with the subject are far from confident that our present knowledge isstable and that further revolutionary discoveries will not overturn much of it. But what surely

8

cannot change is the fact that there are complex mechanisms causing the gene pool of a speciesto store vast amounts of information about the statistical structure of the environment it inhabits.There can be hardly anything in a biological organism whose structure and effective function donot depend to a large extent upon the probabilities of environmental events and the contingenciesamong them, so structure and function must be made to reflect environmental statistics. Thegenetic mechanisms sketched above certainly provide a first step towards understanding how thisis brought about.

Of course there is no suggestion that robot buildersshould try to imitate the mechanismsdescribed. But they will have to find means of incorporating the information biologicalorganisms have acquired through evolution. So the take-home message of this short biologylesson is that robots having the capacity for probabilistic inference will need much priorknowledge of their environments, and the provision of this in an accessible form may prove to bethe hardest part of the whole task; don't underestimate how much prior knowledge of theirenvironments you must give your robots if they are to flourish and be happy in their work.

9

PART B: The neural handling of probability

12. When and why does the nervous system need to handle probabilities?Animals are game players. They perform actions with uncertain outcomes in environments aboutwhich they have limited information. The process of evolution ensures that their behaviour tendsto be well adapted for certain outcomes (survival and reproduction being the most obvious) thatmay sometimes be treated as payoffs (often probabilistic ones) and as parameters to beoptimised.

The most fundamental outcomes, considered in the previous section, are supported by manylower level objectives that must be met in parallel to maintain integrity of the animal and tooptimise performance in achieving other objectives. These are often described in terms of"homeostasis" (the keeping of bodily parameters within functional limits) and the satisfying ofneeds or "drives" such as hunger, thirst, temperature regulation, mating, sleep (though we don'treally know yet much about why that is important), exploration, and even danger. The relationbetween behaviour and outcomes for these objectives is often probabilistic, with varyingprobabilities of success in different environments and circumstances that may vary beyond theanimal's control. For this reason, and also because the behaviours required to achieve differentoutcomes often cannot be carried out simultaneously, animals often adopt strategies in whichbehaviour is directed to just one of the objectives at a time. The selection of which objective topursue, and the switching between different objectives, may be triggered by internal factors (thedegree of need) or external factors (related to probability of success) in a process often known as"drive induction".

Some of the actions and behaviours to satisfy homeostatic objectives can be carried outefficiently in parallel. Many of these are in the domain of homeostatic physiology - maintenanceof what Claude Bernard called the "milieu interieure" - for example, regulation of chemicalconcentrations within the body, blood pressure and flow, core temperature and the like. Thesehomeostatic systems can proceed in parallel, and are mostly based on explicitly relevant (i.e. notprobabilistic) proprioceptive input signals. The outcomes of corrective actions are sometimesunpredictable in magnitude because of the complexity of the network of relevant factors, but thegeneral control principle that seems to handle this uncertainty is negative feedback. There maynot be any situations in which principles of probabilistic inference have been suggested asrelevant to such control systems. A key factor that allows negative feedback to be adequate isthe fact that corrective outcomes are generally predictable in sign, regardless of other factors,even if not in magnitude1. These homeostatic mechanisms are not of particular interest in ourprobabilistic context.

Some homeostatic systems involve behaviour as well as internal physiological functions; forexample, a behavioural thirst drive is induced by a renal hormone (angiotensin) in addition to itseffects on excretion. There are also many behavioural neural reflexes that are essentiallyhomeostatic in nature, and proceed simultaneously and independently in a more or less automaticfashion. Some of these are dependent on inference from information that is only a probabilisticguide to the outcome of actions. For example righting reflexes help to maintain a stable posture

1 An interesting possible exception might be the control of focus of the image on the retina,where the direction of change necessary to correct a blurred image is not predictable simply fromthe nature of the image.

10

in the face of unpredictable or only probabilistically predictable forces, and eye movementreflexes help to stabilise visual images in the presence of somatic, vestibular and visual cues thatmay be conflicting or uncertain. Even though these reflex actions are thought of as automatic, inthe sense that they occur simultaneously and independently and without any act of volition, theyare many of them learned or subject to adaptation to suit new circumstances. For example, thegain of the vestibulo-ocular reflex (stabilising the retinal image when the head rotates) can adjustto the different needs when looking at objects at different distances or through lenses that changethe magnification and movements of an image. These are probably examples of reflexes thatdon't just use feedback to set their parameters but that (through learning) come to adjustparameters on the basis of information that alters the probability distribution for what is theoptimal parameter.

13. Fraud: a biological inventionThe view of animals as game players, acting in the light of incomplete or uncertain informationso as to maximise the probability of desired outcomes, can take an extra twist. Whether in BIBAwe want to pursue this twist will be an interesting issue. The problems of probabilistic inferencearise most directly in simple games in which the animal (or robot) plays against the world.These inferences are not necessarily easy: it is hard keeping upright in a wind, findingblueberries on Mont Blanc, avoiding treacherous swamps, or getting home after an expedition.These are all situations in which an animal is dealing with incomplete information, makinginferences and actions in a probabilistic environment; but it is essentially a passive probabilisticenvironment. Animals also deal with environments that are in a sense active.

By an active environment is meant one in which signals from the environment and theconsequences of actions within it are not all determined by causes and probabilistic interactionsthat can in principle be learned about and incorporated in a probabilistic model. The criticalelement that makes this impossible is, it seems, the existence of biological systems. Biologicalsystems have arisen through evolutionary adaptation to environments in which significantselective pressures often result from interpretations and responses that other animals make to thecharacteristics of an individual – the signals that a plant or animal emits. This can lead to theinteresting statistical concept of deception or fraud. Fraud is a better term, because even in apassive environment signals can be deceptive - as when a seemingly good way down a mountainturns out not be good. Any probabilistic inference will sometimes to lead to non-optimaldecisions, either through bad luck or a poor model, without there being active deception or fraud.What plants and animals do differently from rocks is to generate signals for which the generativecauses may – either in the past or in the present – include the inferences with which animalsreact.

Mimicry is an obvious example of fraud, where a plant (or animal) has evolved withcharacteristics that resemble other plants or animals, so as to attract or repel the attentions ofcertain animals. An orchid is scarcely capable of unethical behaviour, but it certainly commitsfraud when it adopts the appearance of an insect's mate in order to increase the probability ofitself becoming fertilised. The ultimate sophistication of this ability of biological systems to gainthough fraud is in the confidence trickster, the poker player and even the footballer. Themathematics of game theory in such contexts involves vicious circles of causality: the modelsgoverning (a) the choice of signal and (b) the inferences derived by a different organism from thesignal are each functions of the other. This can be rather intractable. Nevertheless, theoptimisation of inference and behaviour in such situations is important for the success, perhapsparticularly of predatory animals, and is much involved in 'play' behaviour in young animals of

11

many species and in competition for status (for example, for the right to mate in some species). Itis a key element in many elaborate human games and in story-telling (notably detective fiction).

14. Behaviour depending directly on probability: attention, orientation and habituationAnimals do not respond in the same way to stimuli or events every time they happen. Repetitionoften induces habituation (a decline in successive responses, possibly but not necessarily to zero)while novelty or occurrence of a rare stimulus can induce an alerting and orienting response, inwhich many aspects of behaviour may be altered so as to improve the animal's informationgathering related to the stimulus. Orienting responses usually habituate rapidly unless there issome form of association with danger or risk, or the stimulus results in pain. Habituation can becontext dependent, so is not simply related to the absolute probability of experiencing aparticular type of stimulus. Seeing a leaf on the ground tends to elicit no response (it is not"noticed", and will probably not be remembered), while seeing a leaf on the carpet of a well kepthouse may induce astonishment, attention, remedial action and memory. A habituated responsemay be restored through what is called "dishabituation" after experience of another unusualstimulus. Thus repetition of a noise in the vicinity of a prey animal may lead to habituation ofthe orienting response, but the response be restored if a different kind of unusual noise ispresented. In the nervous system of Aplysia (a sea slug) habituation (of the withdrawal of gillstructures in response to a squirt of water) and dishabituation (produced by an electric shock)have each been shown to have a simple basis in separate influences on the biophysics of aparticular synapse. In this kind of situation the phenomena can be seen as rather automatic waysin which an animal achieves economy of action. But the process of identifying unusualexperiences and the overcoming of habituation can be dependent on high level cognitiveprocesses monitoring combinations of stimuli - as for example in the alert scientist who notices asmall departure from regular occurences (e.g. the discovery of penicillin).

The orienting response to novel (low probability) stimuli has two different elements to it, oftenlumped as "paying attention". The first is general arousal or alerting, and the second is selectiveattention or the direction of sensory and cognitive systems to the acquisition of particular kindsof information and particular kinds of processing. A "distractor" stimulus may actually reducean animal's ability to act appropriately in response to incoming information if selective attentionis directed as a result away from important but different stimuli - a technique certainly employedby conjurors, and perhaps by animals, though examples don’t readily come to mind. The meansby which attention is selective is straightforward in some instances, particularly those involvingperipheral sensory apparatus, where the eyes or (in some species) ears may literally orienttowards a stimulus. In cognitive processing also it can sometimes be easily understood, as whena priming stimulus may bias selection of the interpretation of an ambiguous picture or themeaning assigned to an ambiguous sentence. But in between there are a lot of things aboutselective attention that are not well understood, for example what is happening when weconsciously attend to sensations from a finger, or a peripheral part of our visual field, or to aparticular object in the environment.

15. Adaptation and the suppression of responses to uniform stimuliAdaptation and lateral inhibition are phenomena that bear a superficial similarity to habituation,but are not really describable in the same terms. In adaptation a stimulus that begins at a specifictime and then continues steadily may elicit just a transient response, or a response that declinesfrom its initial level as the stimulus is maintained. Typically there may be an "off response"when a maintained stimulus like this eventually terminates. Lateral inhibition is a similar

12

phenomenon in the spatial domain, in which a stimulus that is uniform over part of a sensorysurface (like the skin or the retina) conveys signals to the nervous system mainly or only at itsedges (or not at all if the stimulus covers the whole sensory surface).

Adaptation is familiar in many sensory modalities, particularly smell, skin sensation and vision.We can cease to be aware of stimuli (particularly smells and touch) that are maintained steadily.It isn't so obvious with vision because we continue to be aware of a bright stimulus because ingeneral it has contrast features that lead our receptors continually to receive varying stimulationas we move our eyes. But stabilisation of a retinal image or a featureless stimulus ("ganzfeld")reveals that sensation fades with adaptation, and photographers are well aware how difficult it isto judge ambient light levels without using an exposure meter. In fact it turns out that withadaptation the nervous system ceases to receive information (or receives very little comparedwith onset) about maintained stimuli. The basis of adaptation is generally in or closely related tothe sensory receptors. It often plays a clearly beneficial role in the manner of an automatic gaincontrol on a TV camera or sound system: it ensures that signals tend to remain within thedynamic range that can be handled by neurons without causing saturation or bottoming of theresponses.

A corollary of adaptation is "accommodation", whereby a stimulus that builds up very graduallymay never be detected though it would be well above threshold if it had come on suddenly. Thisis familiar in olfaction. Unlike adaptation, the phenomenon does not confer any obvious benefitas a sensory mechanism.

Lateral inhibition is less conspicuous than adaptation to casual observation, but is wellestablished by experiment in many sensory systems. It is responsible for some visual illusions inwhich a combination of luminance or colour boundaries can fool us into making incorrectjudgements about the relative brightness or colour of uniform areas. It is evident from these thatthough we have good information about edges and textures in the visual field we have ratherpoor information about uniform zones. But interestingly (and unlike with adaptation), we do notperceive things the way they are conveyed to the nervous system. For example, a uniform circleof brightness on our retina is perceived as such, not as a circular boundary full of information,but with little difference between its middle and the outside. Of course statements like this aboutperception are subjective and not verifiable by someone else - but people do tend to agree. Whatseems to happen is that the boundary provides evidence of a step change in luminance at theedge of the circle. Within the circle there is no evidence of a further change, so the increasedluminance adjacent to the boundary at the edge is inferred to apply here as well. This could beseen as a sort of mathematical integration - reversing what is essentially a spatial differentiationof the image due to lateral inhibition. This fits with the poor judgement about actual levels ofuniform luminance that visual illusions demonstrate, because it is well known thatdifferentiation followed by integration is prone to DC and low frequency errors due toaccumulation of noise.

Why do adaptation and lateral inhibition occur? One notion (due originally to Barlow2) is thatthe situations where these phenomena operate are ones in which the raw sensory information ishighly redundant, and the transformation reduces the redundancy - in a sense compressing thesignal. Thus a steady stimulus is not repeatedly or continuously signalled by neural activity, and

2 For a recent discussion see Barlow HB (2001) Redundancy reduction revisited. Network:Comput. Neural Sys.: 12:241-253

13

a spatially uniform signal is not duplicated everywhere it is uniform. It is not wholly clearhowever, in what way this concept of redundancy reduction corresponds to a genuine or usefulcompaction or economy in transmission of signals. The same amount of information (at leastapproximately) is certainly transmitted with fewer action potentials, but the capacity of thechannels remains the same. There are at least two ways in which the use of fewer actionpotentials might directly confer an advantage. Firstly, action potentials, where and when theyoccur, often serve to initiate neural processing (as when attention is directed as a result of thefew action potentials set off by an insect landing on our skin). This introduces a markedassymmetry between information that is conveyed by the presence or by the absence of actionpotentials. Secondly, reducing the number of action potentials used to convey information savesmetabolic energy, which may be worthwhile. However, the low average rates of action potentialtransmission resulting from adaptation and lateral inhibition are in a sense a highly redundantway of employing channels with potentially high information transmission rates, with most ofthem are silent most of the time. We shall see later (Section 18 ) that such redundancy can bedesirable for the efficient use of distributed representations in probabilistic contexts, and that -perhaps because of this - the information capacity and redundancy within the visual systemactually increases dramatically in successive stages of information processing, where one mighthave imagined that selective discarding of information would permit information handling withfewer neurons.

A different possible interpretation of these phenomena emerges in the light of probabilisticinference. A uniform patch of light looks uniform and different from its surround, despite thedemonstrably different pattern of incoming sensory signals. This is similar to what happens inthe phenomenon of ‘completion’ of the perception of stimuli that overlap the retinal ‘blind-spot’(the site of the optic nerve head, where there are no receptors). Suppose we look at a scene withone eye. We have no sensory information about parts of an image falling on the blindspot, yetwe do not perceive these parts of the image as absent or dark. What we have from this part ofthe image is an absence of evidence, not evidence of nothing. So, in the manner of Bayesianlogic, the image here is interpreted on the basis of evidence from elsewhere about what it islikely to contain. If the blind spot is surrounded by a uniform field, we perceive a uniform fieldwithin the blind spot. If it is surrounded by an image of patterned wallpaper, we see patternedwallpaper within the blind spot. If someone's head falls on the blindspot we do not see aheadless person but we infer (on the balance of probabilities) that the head we saw shortly beforeon a different part of the retina is still attached. This all happens so effortlessly that it is easy tooverlook what a feat of probabilistic inference it is. Even more remarkable is that it isn't simplythe result of long accustomed adaptation to the anatomical blindspot. In migraine attacks, peopleoften experience blindspots ('scotomata') that cause slowly changing parts of the visual field -often quite large ones - to be completely blind. These are regions where there is no vision witheither eye (because the cause is a cortical pathology). Yet the subject is often almost unaware ofthe scotoma - reporting positive symptoms like a flickering sensation at the edge of the scotomabut not the localised blindness. It takes a considerable effort of discipline and experimentation toestablish that nothing can be seen within the scotoma, and how extensive it is. Again, it seemsthat areas of the field for which there is no information are subject to the sort of completion thatoccurs with the blindspot, and the subject may report no more than the fact that their vision isstrange, or (when the scotoma is to the right of or includes the fovea) that they are havingdifficulty reading - a task in which you need to know what the image is and can't manage oninference.

14

16. Uncertainty and confidence: the expression of probabilityBehaviour consists of a set of actions, each of which can be seen as the output of a decisionprocess, usually based on a hierarchy of facts and inferences about the situation. The outcome ofdecision processes can often be graded in terms of the uncertainty or definiteness of theconclusion. In general this is a tricky concept. In part it is related to the statistical reliability bothof the information relevant to the decision and the processing of this information. In part it isrelated to risk and benefit: the relative payoffs and their probability distributions resulting fromthe alternative decisions that might have been arrived at, given the uncertainty of knowledgeabout the world that the decision interacts with. Furthermore, the uncertainty associated withdifferent decisions for action is often strongly linked, because actions need to be coordinated infunctional patterns and not arrived at through independent decisions. When you decide whetherto jump left or right, you must jump with both legs together, not hedge your bets. What's more,uncertainty may often need to be accompanied by vigorous implementation of whatever decisionis made. Shifting the example slightly, if you are unsure whether a jump will succeed, it isprobably especially important that the jump be strong. At a higher level involving coordinationof activities between animals as well as in each individual's behaviour, it is again important thatdecisions for actions be coordinated, so it may be advantageous to hide any signs of uncertaintyabout decisions from others in a group - a familiar concept when leadership is required in theface of risk. All of these complications make the biology of uncertainty and its manipulation andexpression highly complex.

In humans it is straightforward to ask for expressions of confidence or uncertainty aboutdecisions. The psychological literature has been beset however with concerns about whether thedata thus collected is more determined by individual propensity to admit to uncertainty ratherthan the actual existence of uncertainty. This is puzzling, because my own (ARGM's)experience with eliciting confidence judgements about the answers to questions in an educationalcontext3 shows that with a proper scoring scheme of payoffs to provide maximum benefit to thesubject from honest reporting of uncertainty (in terms of their subjective probability that ananswer is correct) shows that students very straightforwardly make well calibrated distinctionsbetween when a decision is soundly based and when not. In a paper about to be published 4,Smith et al. have reviewed literature on expression of uncertainty in both humans and animalsfor decisions about sensory discriminations. Humans, monkeys and dolphins generally behavevery similarly when given 3-choice response options (X,Y or uncertain), with "uncertain" beingmost likely when parameters are such that errors when they choose definite categorisationresponses are most frequent. But some individual animals (and humans) do behave differently,being regarded as having different propensities to express uncertainty, though it is not clear thatthe differences are not alternatively attributable to different values placed on outcomes. Theinterpretation of such results as revealing inclinations rather than skills seems interestinglyassociated with a language for description of the judgements about uncertainty as being "meta-

3 See www.ucl.ac.uk/lapt . The proper scoring scheme requires confidence judgements on a 3-point scale (C=1, 2 or 3) with marks of 1, 2 or 3 awarded accordingly if the answer is correct,and marks of 0, -2 or -6 awarded if the answer is wrong. Appropriate use of the 3 levels formaximum expectation of score is for estimated probabilities of being correct of <67% (C=1),<80% (C=2) and >80% (C=3).

4 Smith JD, Shields WE, Washburn DA (2002) The comparative psychology of uncertaintymonitoring and metacognition. www.bbsonline.org/Preprints/Smith/Referees/

15

cognition" - as if the uncertainty judgement arises from a form of inspection of the process ofdecision making from outside the process itself, rather than arising as part of the process. Thisseems very odd. It is as if the uncertainty associated with a decision by the European Bank is notsomehow evident to the bankers themselves, but must be ascertained by external observers. Wemay be exaggerating the implications of use of the psychological language of "meta-cognition"but it seems to go against a central BIBA concept that probability, uncertainty, or degree ofbelief should run right through a satisfactory probabilistic decision-making process andeventually be reflected somehow in the degree of confidence in decisions for action.

It is not at all clear yet how probability and uncertainty may thread effectively through theprocesses of neural computation. A concept that may be crucial in this is the stability of theresult of an inferential process to changes of parameters used in the computation. This is quiteanalogous to the way uncertainty can be handled in intellectual decision making processes:“Suppose we gave greater or lesser weight to particular uncertain issues, or reversed someassumptions, would it alter our conclusion?”. In a neural context, simply altering neuralthresholds (for example, with fluctuating levels of widespread inhibition) may establish whetheran output is robust or sensitive to the thresholding of the intermediate inferences that affect thefinal output. A confident conclusion, or a correctly learned pattern of output, should be relativelyunaffected by variations in such parameters.

The educational work carried out by one of us (ARGM) on the value of confidence judgementswas begun in order to encourage students to test their conclusions in a somewhat similar manner:to reflect on what their knowledge is based on and how they can check their conclusions and testwhether these stand up to different ways of looking at a problem. The aim is to help establishinternal networks of knowledge and inference in which items are linked in ways that may notcome to mind without the pressure of an additional incentive to test the robustness of aconclusion. It is evident from experience of insisting on confidence judgements together withanswers that students do make good discriminations between decisions that they generate withdifferent degrees of reliability. It is also clear that the extra information provided throughconfidence judgements enhances the statistical reliability of the data from answers to questionsunder exam conditions very markedly, allowing an economy of 50% in the number of questionsrequired for a given reliability of assessment. An additional tentative conclusion - though noattempt has as yet been made to try to test this - is that an initial confidence judgement arisesvery directly and immediately from the process by which the students generate an answer - thereis a gut feeling about whether the answer is reliable and the judgement does not seem to require aseparate process of meta-analysis. Nevertheless, in line with the primary educational aim, theneed to try to turn an unconfident answer into a confident one (to gain more marks) and theimportance of being sure that confidence is not risking a penalty through oversight ormisinterpretation of the question, both encourage reflection and additional processing of theissues.

17. Clarification of relevant terms and concepts in neural systemsThe term neural system will be used to refer to a set of neurons under consideration, which maybe part of a neural complex comprising all the neural systems within an organism. Theenvironment of a neural system is everything outside it (including any other neural systems in acomplex). A scene is the set of inputs from the environment (including other neural systems)impinging on a neural system. The scene consists of information determined by (i) the state ofthe external world (the sensory scene), (ii) the internal state of the animal itself (theproprioceptive scene), and (iii) the state of other neural systems (the neural scene) in a complex.

16

Such inputs are essentially facts: they are time-dependent state vectors of the input connections.As such, there is no issue of graded probabilities (P) directly associated with scenes and theirhistory: facts have P=1.

Estimates of probabilities (for example about predictions or inferences, or in generalhypotheses) and probabilistic functions (such as weights of evidence or likelihood ratios) may beinferred from the facts in the history of a scene. What a neural system stores, based on its history,may sometimes be close to the facts of a scene (as in episodic memory, e.g. where you were onthe day of the World Trade Centre attack) or they may be pure inferences (as in proceduralmemory - a set of conditioned reflexes or rules for actions that achieve objectives when, forexample, riding a bicycle - without necessarily any retention of facts and circumstances thatdeveloped these skills). It may store what are essentially simple probabilities (e.g. gulls arecommon, eagles are rare) or estimates of degrees of association (e.g. Irish girls are likely to bereliable). Stored probabilities may sometimes be distinguishable as one of two distinct classes.First, there are estimates of the parameters in a stochastic model of the environment (e.g. foodsources may be distributed to some extent at random - as indeed may be the traits of Irish girls).Secondly, there are probabilities of a kind more properly described as degrees of belief, forexample attached to inferences about the environment (e.g. whether a visible object is a predator)or about the neural system itself (e.g. whether a memory has been correctly recalled). Thedistinction may sometimes be subtle or uncertain, but it is sometimes clear in principle,especially in analysis of human thought.

The retention of information through physiological after-effects of activity may occur on manydifferent time scales. The mechanisms and sites of change are often not well understood. Theseafter-effects include brief physiological after-effects of stimulation, like excitation, inhibition,changes of membrane potential (Vm), persisting transmitter release, facilitation, depletion,potentiation, fatigue, adaptation, etc.. It doesn't seem appropriate to go into details here. Thenthere are longer term or permanent changes described as conditioning, habituation, long-termpotentiation (LTP), short-term memory (a term sometimes used on time scales from seconds todays), long-term memory, sensitisation etc..

18. How can probabilities be estimated and stored?Probability estimates may come either from observation or from inference. The problems ofestimating probabilities for inferences and for the outcomes of actions is a central one in neuralcomputation. But there are also clear problems even in the estimation of probabilities fromfrequencies of occurrence. The counting of occurrences, in the sense of establishing whether andhow often they have been experienced, is a crucial prerequisite for all learning. But the form ofrepresentation of experiences can limit how accurately this can be achieved. This has been thesubject of recent work5, the conclusions of which will be outlined briefly.

If there is a cell within a neural system that fires in one-to-one relation to an event that is to becounted (i.e. if that event is directly represented within the system), then there is no difficulty inseeing how physiological mechanisms within such a cell could generate an accurate measure ofthe event frequency, averaged over any period of time. However, there is a problem when theevent corresponds to a pattern on a set of neurons (i.e. it has a distributed representation). In a

5 Gardner-Medwin AR & Barlow HB (2001). The limits of counting accuracy in distributedneural representations. Neural Computation 13: 477-504

17

distributed representation a particular event causes a pattern of activity in several cells, but thereis generally no unique element in the system that signals when the particular event occurs andthat does not signal at other times. The interference that results from this overlap in distributedrepresentations can be dealt with, for the purpose of counting, in two ways: (1) cells andconnections can be devoted to identifying directly in a one-to-one manner when the patternsoccur, i.e. direct representations can be generated, or (2) the interference can be accepted and thefrequency of occurrence of the distributed patterns estimated from the frequency of use of theirindividual active elements. The second procedure increases the uncertainty and variance ofestimated counts, and even if these are statistically unbiassed, the speed and reliability withwhich estimates of probabilities and associations may be inferred is impaired, and learning isrendered less efficient.

Since distributed representations for events of importance are in other respects a desirable (andprobably necessary) feature of neural functioning, largely because they allow very many moreevents to be distinguished than there are cells, the constraints on the efficient estimation offrequencies and probabilities with distributed representations are of considerable interest. In ouranalysis5 we concluded that compact distributed representations (i.e. ones with little redundancy)do enormously reduce the efficiency of counting and would slow down reliable learning, but thisis not the case if they are redundant, with many more cells than are required simply forrepresentation. In fact, simple models for neural counting require representations to besufficiently redundant that the total number of cells required in a neural system to accumulateand store at one time efficient estimates for the frequencies of N different events must of theorder of N or more.

This conclusion might be naively thought to support the view that direct rather than distributedrepresentation is indeed the best strategy for representation of events whose probabilities andassociations need to be learned about. But looking at the conclusion from a different perspective,it shows that the prime combinatorial merit of distributed representations (that they can representup to something on the order of 2Z distinct and unforeseen but possible events on Z cells) can beretained with little reduction of the efficiency of probability estimates, provided only that thenumber of the distinct potential events that are actually experienced by the system remainsrelatively small (on the order of the number that could have been assigned direct representationswith the same number, Z of cells).

The implications of this and associated results generate interesting new perspectives on issuessuch as the expanding anatomical redundancy of the cortical processing of visual inputs,selective attention (which may enhance the number of cells involved in the representation ofselected events that are rare, novel, or likely to be important) and habituation (reducing activityfor common and unimportant events). The relative advantages of direct and distributedrepresentations are interesting. Direct representations (or at least representations in separate non-interfering neural systems) are of course necessary if different events within a scene need to beacted on, or represented, simultaneously. Distributed representations that involve distinctpatterns of activity on the same population of cells can only exist one at a time, so if thecorresponding events are present simultaneously in the environment or scene presented to aneural system, they must be processed serially, one at a time. To take advantage of the flexibilityof distributed representations to handle any of a vast number of potential novel and unforeseenevents in a single neural system, this serial processing (somewhat akin to the stream of consciousor focussed awareness that we have for one thing at a time) is necessary. Once it has beenestablished that a particular kind of event is worth representing directly, for automatic reaction orto avoid monopolising the system for flexible representation, then this may be set up through the

18

establishing of new appropriately tailored neural mechanisms. This kind of transfer and theestablishment of suitable new sites of representation has speculatively been proposed as one ofthe possible functions carried out during sleep6.

Within the structure of a system for distributed representation, there are of course principles thatcan enhance its efficiency by reducing the problems that arise from interference between eventswith overlapping representations. First amongst these is that events with similar representations(strong overlap) should ideally have similar implications for inference. It is a strongcharacteristic of neural systems and animals that things learned for one pattern of sensorystimulation will tend to be generalised to similar patterns. This can of course be valuable wheresignificant objects or situations in the environment present with similar but not identicalcharacteristics. But it is crucial that the similarity have the right metric. Even an identical visualpattern presented on different parts of the retina may have no overlap in its representation atretinal level, and wholly different patterns may have substantial overlap. Hence there is greatvalue in processing systems that convert representations to a form that, for some purposes andperhaps particularly for learning and probabilistic inference, are relatively independent of suchcharacteristics as position, size, motion, luminance, contrast, orientation, etc.. This is not to saythat this information should be discarded, but rather that some of these characteristics should beseparately represented in a broadly modular system. It will often be the case for example that thedirection and speed of movement in a part of the visual field may have significance almostindependent of the colour, size or nature of what is moving. It is a characteristic of themammalian cortex that many such aspects of a sensory scene are somewhat separately analysedin anatomically distinct (modular) areas of the cortex. This may assist the generalisation oflearned responses to stimuli that are similar on appropriate metrics. A major research challengeis how it simultaneously allows the properties of individual objects in a scene to be linkedappropriately (the so-called binding problem) and not confused.

While it is a considerable computational problem to establish representations in which theoverlap of active cells in different events corresponds well to a similarity of the significance ofthe events for inference, it is also important to deal with a converse problem. This is that similarstimuli (for example plants, animals, sounds or situations that seem alike) may have verydifferent inferential significance, and it may be disadvantageous to confuse them. The overlaprenders estimates of probability or associations based on counts of occurrences or associations ofactive cells subject to potentially strong and damaging interference. It is important to enhance orretain (from short term memory) the differences between representations of similar butimportantly distinct events, and it is an interesting computational problem how this can beachieved. Possible strategies include the development of new feature detectors for the smalldifferences and to change the basis set for representations of often repeated stimuli (probably themeans by which sheep or people of a different race eventually cease to all look alike). Whereconsolidation of clear short term memories to more robust but confusible long term memories istaking place, differences between distinct but similar patterns may be retained and enhancedthrough selective consolidation of distinctive features – another process that has been postulatedto require sleep conditions to occur efficiently.

6 Marr D (1970. A theory for cerebral neocortex. Proc. R. Soc. Lond. B 262:24-81

19

19. Weights of evidence: the handling probabilistic inference by additionThe archetypal way in which information from many different sources is combined on neurons isby summation on dendrites of the synaptic currents due to active synapses. The biophysicalcharacteristics of dendrites may lead to non-linear processing such as thresholding or saturationof the resulting signals, and inhibition can effectively result in either subtraction or division. Butaddition of excitatory influences would seem to be the principal likely tool available for thecombination of large numbers of channels of evidence. This makes the logarithmic formulationof Bayes’ rule for combining evidence attractive, in which a posterior belief function (log odds,or ln(p/(1-p)) ) is calculated by summing the prior belief with weights of evidence7 (equivalent tolog likelihood ratios).

Derivation:Applying Bayes to the progression of both P(H) and P(~H) (~H = not-H) we have:P(H|E) = P(H) P(E|H) / P(E) and P(~H|E) = P(~H) P(E|~H) / P(E)Defining belief in H: B(H) = ln(P(H) / (1- P(H)))and weight of evidence for H due to evidence E: W(H|E) = ln (P(E|H) / P(E|~H))we get: B(H|E) = ln(P(H|E) / P(~H|E)) = B(H) + W(H|E)

Provided different pieces of evidence Ei are conditionally independent on both H and ~H, thenthis can be extended to the summation of many weights:

B(H|{Ei}) = B(H) + Si W(H|Ei)

If we look on the ‘decision’ as to whether a postsynaptic cell C should fire as an instance ofprobabilistic neural inference at its simplest, based on ‘evidence’ afforded by a set of activeafferent inputs {Ai}, then several interesting questions are raised.

1) Is it appropriate that the weight of evidence afforded by activity in Ai for activity in C shouldbe based simply on the conjoint history of firing of Ai and C?

2) Are important features of the principles of this computation retained with a simplified modelin which activities in {Ai} and C are treated as binary?

3) To what extent do the {Ai} provide independent evidence for activity in C?

4) Do physiological mechanisms exist that can compute the weight of influence of Ai on C toapproximate the weight of evidence for firing of C, based on the history?

5) How many independently modifiable local variables within a synapse are required for thiscomputation?

6) Given that evidence can be positive or negative, but synapses probably cannot convertbetween excitation and inhibition, how can this be reconciled?

7) What is the nature of performance degradation if computations are performed by summationof evidence as if items were independent, if in fact they are not?

8) What constraints are imposed by the fact that inactivity of a neuron may signify eitherabsence of information or categorical absence of a trigger feature?

7 This simple and intuitive terminology is sometimes attributed to A Turing (see e.g. DA Schum,Evidential Foundations of Probabilistic Reasoning (1994)). A relationship to synaptic summationmay have been first suggested by D Michie in a short essay somewhere.

20

9) What constraints arise if history-dependent modifications only affect the influence of asynapse when it is active and cannot affect the consequences of inactivity?

10) How should (& can) the weight of evidence for a single neuron decision propagate to thenext stage, or what kind of losses arise if uncertainty doesn’t propagate?

This is just a sample of research issues that remain to be resolved, arising from this perspectiveon what is the simplest building block of probabilistic neural inference.

20. The problem of prior probabilitiesThe difficulties of defining and arriving at a prior probability for a hypothesis in the absence ofevidence are the basis for much of the philosophical argument surrounding the application ofBayesian inference. A common way to sidestep the problem is to say that in practice it seldommatters, because accumulation of a decent amount of evidence generally comes to dwarf anyuncertainty in the prior. This isn’t really satisfactory in principle, however, especially when onesees how strong can be the influence of priors in relation to issues such as genetically modifiedfoods or religion. It is no solution to dismiss priors that we don’t agree with as not following theprinciples of evidence-based science, because priors are more or less by definition not evidence-based, though they seem unavoidable in a Bayesian computation of the impact of evidence.

The maximum entropy principle is sometimes used to justify priors corresponding to supposedignorance about a situation before evidence is acquired. This is seductive in its logic: choose thepriors that maximise your entropy, a measure of uncertainty or lack of information. But this toois deeply unsatisfactory, because the definition of entropy and the parameters chosen accordingto the maximum entropy principle (or even the principle of indifference – a simple instance ofthe maximum entropy principle) can be dependent on the structure of a model for the systemunder consideration, or the particular transform chosen for parameters. These are amongst thevery things that one may be ignorant about, without evidence.

These reservations are perhaps more of a concern in thinking about the validity of the concept ofevidence-based science than they are in relation to probabilistic neural inference. The fact is thatanimals must make decisions. If, in a particular context, there is little or no evidence that in aprobabilistic sense bears on the decision, then it will be taken somehow nevertheless – even ifthe decision is to do nothing. The basis for the ‘prior’ that determines this non-evidence-baseddecision may be genetic, or chance. It may or may not correspond to a true probability of asuccessful outcome – though evolutionary pressures (for example in non-learned or non-evidence-based instances of arachnophobia - fear of spiders) may have biassed the priors to beappropriate in some context. The problems of learning are to improve on whatever priors thesystem started with, however they may have been generated.

21. Changing environments: handling fluctuating and conditional probabilitiesA probabilistic model of the environment of an animal is usually far from static. Variations occurfrom place to place (and consequently - through locomotion - potentially very rapidly), and alsoon slower time scales with diurnal changes, weather, season, climate change, natural disaster,etc.. Important variations also occur through interactions between different species andindividuals. The slowest changes (climate change, etc.) can alter probabilities sufficiently slowlythat evolutionary learning is a significant way of adapting to new conditions. The probability thata particular inherited trait will benefit survival changes slowly, and the pattern of inheritancemay actually keep pace with such changes. Change (provided it is not too rapid and too drastic)is a stimulus to evolutionary learning because it increases selective pressures – as in the rapiddevelopment of antibiotic resistance when bacterial populations are exposed to doses of

21

antibiotic that are large enough to provide selection of mutations conferring resistance, and toosmall to kill everything.

Variations within an animal’s lifetime must be handled through learning. Conditionalprobabilities come much more into play: inferences and optimal actions depend on probabilitiesthat are conditional on variable environmental characteristics. We become so used to the role oflearning that genetic factors (probably, for example, at least a part of what determines reactionsto blood, foul smells, snakes, etc. and in many species the reflexes required for locomotion) seemlike oddities.

An important and interesting issue arises when learning to make probabilistic inferences invariable environments. Learning often takes place in concentrated epochs and in environments inwhich there is a great deal of relevant experience. To what extent can the inferences and actionslearned in one such circumstance be applied in another? We have argued5 that the problems ofestimating frequencies of events with distributed representations, and thereby inferringprobabilities and associations, are diminished if learning epochs are concentrated, provided thedecay characteristics of the neural system are reasonably matched to the length of such epochs,thereby diminishing interference and errors in estimation. But this is of limited value if theinferences do not extrapolate to other conditions. What is critical here is that the patterns orevents that are learned about should bear a close relationship to the causal structure of the world.Often this means learning about the associative properties of objects rather than of sense data,and indeed often quite specific classes of objects. A child who is stung repeatedly by nettles on acountry walk will learn something in this concentrated epoch of experience, on the basis ofwhich it may successfully avoid being stung in the future. If the learned association relates to aspecific species of plant (hairy serrated leaves, etc.) then it will generalise correctly to othercircumstances. This is because the conditional probability of being stung if one touches a nettleis fairly independent of the environment or frequency with which one encounters nettles. If thelearned association relates to plants in general, or to contact with anything green (both verylikely valid statistical associations in the concentrated epoch of learning) then the inference doesnot generalise well to other environments. The resultant learning may look more like anacquired phobia than something beneficial, and it may reduce rather than increase the child’sability to interact efficiently with the world.

How neural systems can come to extract and relate the correct kinds of features for thegeneralisation of inferences is a very challenging issue. One can look on it as an intellectual orscientific issue – how to develop, for example, the concept of ‘species’ in relation to plants, as adeterminant of useful classifications to establish important associations, learning to base thismore on characteristics of shape, texture, branching pattern, etc. than on more obvious featuressuch as size, colour, location. But this is almost at the level of ‘academic learning’ described inthe preamble. Neural systems must to some extent be capable of selecting appropriateinformation for correlation, and employing good basis functions for representations, more or lessautomatically if they are to learn efficiently.

22. Equivalence in neurons of absent features and absent information: sparse evidenceIndividual neurons detect and are active when particular features are present in the neural scenethat impinges on them. These are their ‘trigger features’, and the presence of such triggerfeatures can provide evidence for or against the existence of other features or inferences. If onehas a checklist of features to use for inference one would normally have 3 boxes that one mighttick for each: present, absent, or not known. Bayesian application of data expressable as binaryfeatures needs to employ at least such 3-state elements, or (more sophisticated and complicated)

22

graded signals for levels of certainty about the presence of a feature. Viewed simply, neuronshave just 2 states: active or inactive. But levels of activity may vary and it may be possible tomake use of other statistical characteristics such as the variance of activity, or its correlation orsynchrony with the activity of other cells or systems. In general it would appear that there is notmuch scope for distinguishing the absence of a trigger feature from the absence of informationabout whether the trigger feature is or is not present.

One way round this constraint would be to double up on feature detectors, with a second cellemployed to detect the absence of a feature that is detected by the first. But as discussed earlier,this is often not the case: the absence of a boundary in a part of the visual field is often notsignalled at all, rather than being signalled by a cell responding to uniform luminance in theregion. In some situations the true absence of a feature may be clearly inferred from the presenceof other features: there is a clear distinction between a retinal image that fails to signal aperson’s head (e.g. because the image is obscured by an object or falls in a blind region of thevisual field) and a gruesome image that shows a severed neck. The nervous system may onlymake use of the potential value for inference of the absence of a feature if this is signalled by thepresence of something else. The difficulty of using an absent feature for inference is after all thetheme of a celebrated Sherlock Holmes story in which the key to the crime is ‘the dog that didnot bark’.

It is possible to argue that this constraint may not really be much of a constraint at all, given thenature of biological environments. Neural systems most of the time may deal with what onemight describe as "sparse evidence". The trigger features of cells may be very diagnostic whenpresent, but with a probability of successful detection that may be quite low, variable and hard totake into account. There are many reasons why the probability of detection may be low: objectsmay have characteristic visual features, for example, that are only visible from certain angles,easily obscured, or deliberately hidden by animals trying to avoid detection, while the detectioneven of visible features may be compromised by blind spots, inappropriate neural processing ordistracted attention. To take a concrete example, the inferences to be drawn from seeing aglimpse of black and yellow stripes in tiger country may be clear and quantifiable. Equally clearwould be the inferences if it were known there was nothing with such stripes in the locality. Butsimple failure to detect any stripes might be almost equally likely whether there is or is not atiger within range.

If neurons afford only sparse evidence for inferences – in other words, an absent signal provideslittle or no evidence relevant to the outcome – then this solves one of the problems introduced inSection 19, about viewing neurons as simple inference modules, summing evidence over manyinputs. This is the difficulty of providing a plausible neural model for how the postsynapticinfluence of a non-firing fibre could come to vary in accordance with its past history. The knownactivity-dependent modulations of synaptic influence (such as LTP and LTD) seem to make adifference only when the fibre is active. But if the evidence afforded by absence of activity istruly negligible, because silence is quite likely to be due to failure to detect the feature when it isactually present in the environment, then a silent fibre should not influence the activation of thepostsynaptic neuron, or the outcome of the corresponding inference. The weight of evidenceafforded when the synapse is active (and correspondingly the synaptic strength) need to beproportional on such a model to the log likelihood ratio for the feature, but this is unaffected by alow probability of detection of the feature as long as this probability is the same for conditions inwhich the inference is true or false.

23

23. ConclusionThis document has been wide-ranging and selective, within a very large field. It has beensometimes deliberately provocative in style, in order to try to stimulate constructive and originalthinking about the issues. If people who know something about the field don’t disagree with (orthink implausible or irrelevant) some of what has been said and suggested, then it has probablyfailed in its aim. It is deliberately largely unreferenced, because full referncing would be time-consuming and sometimes contentious, and would require many issues to be dissected in waysthat might be distracting and inhibiting of new thinking. There are few inferences about thenature of neural processing that are certain enough that they could not realistically be challenged.A brief bibliography is provided below, with books that should be useful to those wishing to readmore and to see more detailed expositions and different perspectives. Many potentially relevantand important issues that are closer to artificial intelligence and robotics than (at present) tobiology have been omitted altogether in what we have written, because we do not feel competentto say much about them. That is not to say that they will not, in due course, become central topeoples’ understanding of neural systems. It is hoped that those who espouse them as cruciallyimportant will add to the discussion within and around BIBA by explaining the issues and theirimportance in simple terms.

BibliographyBarlow HB, Blakemore C, Weston-Smith M , eds. (1990). Images and Understanding; Cambridge: CUP

Barlow HB & Mollon JD, eds. (1982). The Senses; Cambridge: CUP

Bear MF, Connors W & Paradiso MA (2001). Neuroscience; exploring the brain; Baltimore: Lippincott,Williams & Wilkins

Bernardo J & Smith A (2000). Bayesian Theory; Chichester: Wiley

Bishop CM (1995) Neural Networks for Pattern Recognition; Clarendon Press

Cooper DN (1999) Human gene evolution; Oxford: Bios.

Dayan O & Abbott LF (2001). Theoretical Neuroscience; MIT Press

Donald M (1990) Origins of the modern mind: Three stages in the evolution of culture and cognition.Cambridge, Mass: Harvard Univ. Press

Durbin R, Miall C & Mitchison G, eds. (1989) The Computing Neuron; Redwood, CA: Addison-Wesley

Good IJ (1950). Probability and the Weighing of evidence; London: Griffin

Good IJ (1983) Good Thinking; the foundations of probability and its applications; Minneapolis: Univ.Minnesota Press

Gregory RL (1997) Eye and Brain; Oxford: OUP

Nicholls JG, Martin AR, Wallace BG (1992). From Neuron to Brain; Sunderland, Mass: Sinauer

Rolls E & Treves A (1997). Neural Networks andf Brain Function; Oxford: OUP

Rolls E (2001). Computational Neuroscience of Vision; Oxford: OUP

Schum DA (1994) Evidential Foundations of Probabilistic Reasoning; Chichester: Wiley

BIBA · BIBA IST–2001-32115 Bayesian Inspired Brain and Artefacts: Using probabilistic logic to...

Documents

Transcript of BIBA · BIBA IST–2001-32115 Bayesian Inspired Brain and Artefacts: Using probabilistic logic to...