arXiv:1911.09451v1 [q-bio.NC] 21 Nov 2019

DEEP NEUROETHOLOGY OF A VIRTUAL RODENT

Josh Merel?1, Diego Aldarondo?2,3, Jesse Marshall?3,4, Yuval Tassa1,Greg Wayne1, Bence Olveczky3,41DeepMind, London, UK.2Program in Neuroscience, 3Center for Brain Science, 4Department of Organismic andEvolutionary Biology, Harvard University, Cambridge, MA 02138, [email protected], [email protected],jesse d [email protected]

ABSTRACT

Parallel developments in neuroscience and deep learning have led to mutuallyproductive exchanges, pushing our understanding of real and artificial neural net-works in sensory and cognitive systems. However, this interaction between fieldsis less developed in the study of motor control. In this work, we develop a virtualrodent as a platform for the grounded study of motor activity in artificial modelsof embodied control. We then use this platform to study motor activity acrosscontexts by training a model to solve four complex tasks. Using methods famil-iar to neuroscientists, we describe the behavioral representations and algorithmsemployed by different layers of the network using a neuroethological approach tocharacterize motor activity relative to the rodent’s behavior and goals. We findthat the model uses two classes of representations which respectively encode thetask-specific behavioral strategies and task-invariant behavioral kinematics. Theserepresentations are reflected in the sequential activity and population dynamics ofneural subpopulations. Overall, the virtual rodent facilitates grounded collabora-tions between deep reinforcement learning and motor neuroscience.

1 INTRODUCTION

Animals have nervous systems that allow them to coordinate their movement and perform a diverseset of complex behaviors. Mammals, in particular, are generalists in that they use the same generalneural network to solve a wide variety of tasks. This flexibility in adapting behaviors towards manydifferent goals far surpasses that of robots or artificial motor control systems. Hence, studies of theneural underpinnings of flexible behavior in mammals could yield important insights into the classesof algorithms capable of complex control across contexts and inspire algorithms for flexible controlin artificial systems.

Recent efforts at the interface of neuroscience and machine learning have sparked renewed interest inconstructive approaches in which artificial models that solve tasks similar to those solved by animalsserve as normative models of biological intelligence. Researchers have attempted to leverage thesemodels to gain insights into the functional transformations implemented by neurobiological circuits,prominently in vision (Khaligh-Razavi & Kriegeskorte, 2014; Yamins et al., 2014; Kar et al., 2019),but also increasingly in other areas, including audition (Kell et al., 2018) and navigation (Baninoet al., 2018; Cueva & Wei, 2018). Efforts to construct models of biological locomotion systems haveinformed our understanding of the mechanisms and evolutionary history of bodies and behavior(Grillner et al., 2007; Ijspeert et al., 2007; Ramdya et al., 2017; Nyakatura et al., 2019). Neuralcontrol approaches have also been applied to the study of reaching movements, though often inconstrained behavioral paradigms (Lillicrap & Scott, 2013), where supervised training is possible(Sussillo et al., 2015; Michaels et al., 2019).

While these approaches model parts of the interactions between animals and their environments(Chiel & Beer, 1997), none attempt to capture the full complexity of embodied control, involvinghow an animal uses its senses, body and behaviors to solve challenges in a physical environment.

?Equal contribution.

1

arX

iv:1

911.

0945

1v1

[q-

bio.

NC

] 2

1 N

ov 2

019

The development of models of embodied control is valuable to the field of motor neuroscience,which typically focuses on restricted behaviors in controlled experimental settings. It is also valuablefor AI research, where flexible models of embodied control could be applicable to robotics.

Here, we introduce a virtual model of a rodent to facilitate grounded investigation of embodied motorsystems. The virtual rodent affords a new opportunity to directly compare principles of artificialcontrol to biological data from real-world rodents, which are more experimentally accessible thanhumans. We draw inspiration from emerging deep reinforcement learning algorithms which nowallow artificial agents to perform complex and adaptive movement in physical environments withsensory information that is increasingly similar to that available to animals (Peng et al., 2016; 2017;Heess et al., 2017; Merel et al., 2019a;b). Similarly, our virtual rodent exists in a physical world,equipped with a set of actuators that must be coordinated for it to behave effectively. It also possessesa sensory system that allows it to use visual input from an egocentric camera located on its head andproprioceptive input to sense the configuration of its body in space.

There are several questions one could answer using the virtual rodent platform. Here we focuson the problem of embodied control across multiple tasks. While some efforts have been made toanalyze neural activity in reduced systems trained to solve multiple tasks (Song et al., 2017; Yanget al., 2019), those studies lacked the important element of motor control in a physical environment.Our rodent platform presents the opportunity to study how representations of movements as well assequences of movements change as a function of goals and task contexts.

To address these questions, we trained our virtual rodent to solve four complex tasks within a physi-cal environment, all requiring the coordinated control of its body. We then ask “Can a neuroscientistunderstand a virtual rodent?” – a more grounded take on the originally satirical “Can a biologist fix aradio?” (Lazebnik, 2002) or the more recent “Could a neuroscientist understand a microprocessor?”(Jonas & Kording, 2017). We take a more sanguine view of the tremendous advances that have beenmade in computational neuroscience in the past decade, and posit that the supposed ‘failure’ of theseapproaches in synthetic systems is partly a misdirection. Analysis approaches in neuroscience weredeveloped with the explicit purpose of understanding sensation and action in real brains, and oftenimplicitly rooted in the types of architectures and processing that are thought relevant in biologicalcontrol systems. With this philosophy, we use analysis approaches common in neuroscience to ex-plore the types of representations and dynamics that the virtual rodent’s neural network employs tocoordinate multiple complex movements in the service of solving motor and cognitive tasks.

2 APPROACH

2.1 VIRTUAL RODENT BODY

Figure 1: (A) Anatomical skeleton of a rodent (as reference; not part of physical simulation). (B) Abody designed around the skeleton to match the anatomy and model collisions with the environment.(C) Purely cosmetic skin to cover the body. (D) Semi-transparent visualization of (A)-(C) overlain.

We implemented a virtual rodent body (Figure 1) in MuJoCo (Todorov et al., 2012), based on mea-surements of laboratory rats (see Appendix A.1). The rodent body has 38 controllable degrees offreedom. The tail, spine, and neck consist of multiple segments with joints, but are controlled bytendons that co-activate multiple joints (spatial tendons in MuJoCo).

The virtual rodent has access to proprioceptive information as well as “raw” egocentric RGB-camera(64×64 pixels) input from a head-mounted camera. The proprioceptive inputs include internaljoint angles and angular velocities, the positions and velocities of the tendons that provide actua-

2

tion, egocentric vectors from the root (pelvis) of the body to the positions of the head and paws, avestibular-like upright orientation vector, touch or contact sensors in the paws, as well as egocentricacceleration, velocity, and 3D angular velocity of the root.

2.2 VIRTUAL RODENT TASKS

Figure 2: Visualizations of four tasks the virtual rodent was trained to solve: (A) jumping over gaps(“gaps run”), (B) foraging in a maze (“maze forage”), (C) escaping from a hilly region (“bowl es-cape”), and (D) touching a ball twice with a forepaw with a precise timing interval between touches(“two-tap”).

We implemented four tasks adapted from previous work in deep reinforcement learning and motorneuroscience (Merel et al., 2019a; Tassa et al., 2018; Kawai et al., 2015) to encourage diverse motorbehaviors in the rodent. The tasks are as follows: (1) Run along a corridor, over “gaps”, with areward for traveling along the corridor at a target velocity (Figure 2A). (2) Collect all the blue orbsin a maze, with a sparse reward for each orb collected (Figure 2B). (3) Escape a bowl-shaped regionby traversing hilly terrain, with a reward proportional to distance from the center of the bowl (Figure2C). (4) Approach orbs in an open field, activate them by touching them with a forepaw, and touchthem a second time after a precise interval of 800ms with a tolerance of±100ms; there is a time-outperiod if the touch is not within the tolerated window and rewards are provided sparsely on the firstand second touch (Figure 2D). We did not provide the agent with a cue or context indicating its task.Rather, the agent had to infer the task from the visual input and behave appropriately.

2.3 TRAINING A MULTI-TASK POLICY

Figure 3: The virtual rodent agent architecture. Egocentric visual image inputs are encoded intofeatures via a small residual network (He et al., 2016) and proprioceptive state observations areencoded via a small multi-layer perceptron. The features are passed into a recurrent LSTM module(Hochreiter & Schmidhuber, 1997). The core module is trained by backpropogation during trainingof the value function. The outputs of the core are also passed as features to the policy module (withthe dashed arrow indicating no backpropogation along this path during training) along with shortcutpaths from the proprioceptive observations as well as encoded features. The policy module consistsof one or more stacked LSTMs (with or without skip connections) which then produce the actionsvia a stochastic policy.

Emboldened by recent results in which end-to-end RL produces a single terrain-adaptive policy(Peng et al., 2016; 2017; Heess et al., 2017), we trained a single architecture on the multiple motor-control-reliant tasks (see Figure 3). To train a single policy to perform all four tasks, we usedan IMPALA-style setup for actor-critic DeepRL (Espeholt et al., 2018); parallel workers collectedrollouts, logged them to a replay, from which a central learner sampled data to perform updates. The

3

value-function critic was trained using off-policy correction via V-trace. To update the actor, weused a variant of MPO (Abdolmaleki et al., 2018) where the E-step is performed using advantagesdetermined from the empirical returns and the value-function, instead of the Q-function (Song et al.,2019). Empirically, we found that the “escape” task was more challenging to learn during interleavedtraining relative to the other tasks. Consequently, we present results arising from training a single-task expert on the escape task and training the multi-task policies using kickstarting for that task(Schmitt et al., 2018), with a weak coefficient (.001 or .005). Kickstarting on this task made the seedsmore reliably solve all four tasks, facilitating comparison of the multi-task policies with differentarchitectures (i.e. the policy having 1, 2, or 3 layers, with or without skip connections across thoselayers). The procedure yields a single neural network that uses visual inputs to determine how tobehave and coordinates its body to move in ways required to solve the tasks. See video examples ofa single policy solving episodes of each task: gaps, forage, escape, and two-tap.

3 ANALYSIS

Figure 4: Ethology of the virtual rodent. (A) Example jumping sequence in gaps run task with a rep-resentative subset of recorded behavioral features. Dashed lines denote the time of the correspondingframes (top). (B) tSNE embedding of 60 behavioral features describing the pose and kinematics ofthe virtual rodent allows identification of rodent behaviors. Points are colored by hand-labeling ofbehavioral clusters identified by watershed clustering. (C) The first two principal components ofdifferent behavioral features reveals that behaviors are more shared across tasks at short, 5-25 Hztimescales (fast kinematics), but no longer 0.3-5 Hz timescales (slow kinematics).

We analyzed the virtual rodent’s neural network activity in conjunction with its behavior to char-acterize how it solves multiple tasks (Figure 4A). We used analyses and perturbation techniquesadapted from neuroscience, where a range of techniques have been developed to highlight the prop-erties of real neural networks. Biological neural networks have been hypothesized to control, select,and modulate movement through a variety of debated mechanisms, ranging from explicit neuralrepresentations of muscle forces and behavioral primitives, to more abstract production of neuraldynamics that could underly movement (Graziano, 2006; Kalaska, 2009; Churchland et al., 2012).A challenge with nearly all of these models however is that they have largely been inspired by find-ings from individual behavioral tasks, making it unclear how to generalize them to a broader rangeof naturalistic behaviors. To provide insight into mechanisms underlying movement in the virtualrodent, and to potentially give insight by proxy into the mechanisms underlying behavior in realrats, we thus systematically tested how the different network layers encoded and generated differentaspects of movement.

For all analyses we logged the virtual rodent’s kinematics, joint angles, computed forces, sensoryinputs, and the cell unit activity of the LSTMs in core and policy layers during 25 trials per taskfrom each network architecture.

4

https://youtu.be/rFelC_YbeLE

https://youtu.be/vBIV1qJpJK8

https://youtu.be/6d0SX56Cn6Q

https://youtu.be/lBKwHzO-z_0

3.1 VIRTUAL RODENTS EXHIBIT BEHAVIORAL FLEXIBILITY.

We began our analysis by quantitatively describing the behavioral repertoire of the virtual rodent. Achallenge in understanding the neural mechanisms underlying behavior is that it can be described atmany timescales. On short timescales, one could describe rodent locomotion using a set of actuatorsthat produce joint-specific patterns of forces and kinematics. However on longer timescales, theseforce patterns are organized into coordinated, re-used movements, such as running, jumping, andturning. These movements can be further combined to form behavioral strategies or goal-directedbehaviors. Relating neural representations to motor behaviors therefore requires analysis methodsthat span multiple timescales of behavioral description. To systematically examine the classes ofbehaviors these networks learn to generate and how they are differentially deployed across tasks,we developed sets of behavioral features that describe the kinematics of the animal on fast (5-25Hz), intermediate (1-25 Hz) or slow (0.3-5 Hz) timescales (Appendix A.2, A.3 ). As validation thatthese features reflected meaningful differences across behaviors, embedding these features usingtSNE (Maaten & Hinton, 2008) produced a behavioral map in which virtual rodent behaviors, weresegregated to different regions of the map (Figure 4B)(see video). This behavioral repertoire of thevirtual rodent consisted of many behaviors observed in rodents, such as rearing, jumping, running,climbing and spinning. While the exact kinematics of the virtual rodent’s behaviors did not exactlymatch those observed in real rats, they did reproduce unexpected features. For instance the stridefrequency of the virtual rodent during galloping matches that observed in rats (Appendix A.3).

We next investigated how these behaviors were used by the virtual rodent across tasks. On shorttimescales, low-level motor features like joint speed and actuator forces occupied similar regionsin principal component space (Figure 4C). In contrast, behavioral kinematics, especially on long,0.3-5 Hz timescales, were more differentiated across tasks. Similar results held when examiningoverlap in other dimensions using multidimensional scaling. Overall this suggests that the networklearned to adapt similar movements in a selective manner for different tasks, suggesting that theagent exhibited a form of behavioral flexibility.

3.2 NETWORKS PRIMARILY REFLECT BEHAVIORS, NOT FORCES

Figure 5: Representational structure of the rodent’s neural network. (A) Example similarity matricesof neural networks and behavioral descriptors. We grouped behavioral descriptors into 50 clustersthat and we computed the average neural population vector during each cluster (AppendixA.4).Similarity was assessed by computing the dot product of either the neural population vector or thebehavioral feature vector within each cluster. (B) Centered Kernel Alignment (CKA) index of neuraland behavioral feature similarity matrices for 3 and 1 policy layer architectures. (C) CKA index offeature similarity matrices across all pairs of network layers. (D) Average CKA index between coreand policy layers and behavioral features, compared across architectures. Points show values fromindividual network seeds. Policy values are averaged across layers.

We next examined the neural activity patterns underlying the virtual rodent’s behavior to test if net-works produced behaviors through explicit representations of forces, kinematics or behaviors. As

5

https://youtu.be/u6o42dsRjF4

expected, core and policy units operate on distinct timescales (See Appendix A.3, Figure 9). Unitsin the core typically fluctuated over timescales of 1-10 seconds, likely representing variables asso-ciated with context and reward. In contrast, units in policy layers were more active over subsecondtimescales, potentially encoding motor and behavioral features.

To quantify which aspects of behavior were encoded in the core and policy layers, and how these pat-terns varied across layers, we used representational similarity analysis (RSA) (Kriegeskorte et al.,2008; Kriegeskorte & Diedrichsen, 2019). RSA provides a global measure of how well differentfeatures are encoded in layers of a neural network by analyzing the geometries of network activityupon exposure to several stimuli, such as objects. To apply RSA, first a representational similarity(or equivalently, dissimilarity) matrix is computed that quantifies the similarity of neural populationresponses to a set of stimuli. To test if different neural populations show similar stimulus encodings,these similarity matricies can then be directly compared across different network layers. Multiplemetrics, such as the matrix correlation or dot product can be used to compare these neural represen-tational similarity matricies. Here we used the linear centered kernel alignment (CKA) index, whichshows invariance to orthonormal rotations of population activity (Kornblith et al., 2019).

RSA can also be used to directly test how well a particular stimulus feature is encoded in a popula-tion. If each stimuli can be quantitively described by one or more feature vectors, a similarity matrixcan also be computed across the set of stimuli themselves. The strength of encoding of a particularset of features can by measured by comparing the correlation of the stimulus feature similarity ma-trix and the neuronal similarity matrix. The correlation strength directly reflects the ability of a lineardecoder trained on the neuronal population vector to distinguish different stimuli (Kriegeskorte &Diedrichsen, 2019). Unlike previous applications of RSA in the analysis of discrete stimuli such asobjects, (Khaligh-Razavi & Kriegeskorte, 2014; Yamins et al., 2014) behavior evolves continuously.To adapt RSA to behavioral analysis, we partitioned time by discretizing each behavioral featureinto 50 clusters (Appendix A.4).

As expected, RSA revealed that core and policy layers encoded somewhat distinct behavioral fea-tures. Policy layers contained greater information about fast timescale kinematics in a manner thatwas largely conserved across layers, while core layers showed more moderate encoding of kinemat-ics that was stronger for slow behavioral features (Figure 5B,C). This difference in encoding waslargely consistent across all architectures tested (Figure 5D).

The feature encoding of policy networks was somewhat consistent with the emergence of a hier-archy of behavioral abstraction. In networks trained with three policy layers, representations weredistributed in timescales across layers, with the last layer (policy 2) showing stronger encoding offast behavioral features, and the first layer (policy 0) instead showing stronger encoding of slowbehavioral features. However, policy layer activity, even close to the motor periphery, did not showstrong explicit encoding of behavioral kinematics or forces.

3.3 BEHAVIORAL REPRESENTATIONS ARE SHARED ACROSS TASKS

We then investigated the degree to which the rodent’s neural networks used the same neural represen-tations to produce behaviors, such as running or spinning, that were shared across tasks. Embeddingpopulation activity into two-dimensions using multidimensional scaling revealed that core neuronrepresentations were highly distinct across all tasks, while policy layers contained more overlap(Figure 6A), suggesting that some behavioral representations were re-used. Comparison of rep-resentational similarity matricies for behaviors that were shared across tasks revealed that policylayers tended to possess a relatively similar encoding of behavioral features, especially fast behav-ioral features, over tasks (Figure 6C; Appendix A.4). This was validated by inspection of neuralactivity during individual behaviors shared across tasks (Appendix A.5, Figure 10). Core layer rep-resentations across almost all behavioral categories were more variable across tasks, consistent withencoding behavioral sequences or task variables.

Interestingly, when comparing this cross-task encoding similarity across architectures, we found thatone layer networks showed a marked increase in the similarity of behavioral encoding across tasks(Figure 6D). This suggests that in networks with lower computational capacity, animals must relyon a smaller, shared behavioral representation across tasks.

6

Figure 6: Policy representations are shared across tasks. (A) Two-dimensional multidimensionalscaling embeddings of core and policy activity shows that while policy representations overlapacross some tasks, core representations are largely distinct. (B) CKA index of the policy 2 andcore network representations of behavioral features during behaviors shared across different tasks(Appendix A.4). Policy 2, but not core networks show similar encoding patterns across the acrossthe maze forage and two-tap tasks, as well as the gaps run and maze forage tasks, consistent with theshared behaviors used across these tasks. (C) The similarity of behavioral feature encoding (CKAindex) across different architectures demonstrates that networks with fewer layers show greater sim-ilarity across tasks. Points show values from individual seeds.

3.4 NEURAL POPULATION DYNAMICS ARE SYNCHRONIZED WITH BEHAVIOR

Figure 7: Neurons in core and policy networks show sequential activity during stereotyped behavior.(A) Example video stills showing the virtual rodent engaged in the two-tap task (B) Average absolutez-scored activity traces of all 128 neurons in each layer during performance of the two-tap sequence.Traces are sorted by the time of peak average firing rate. Dashed lines indicate the times of first andsecond taps. Sequential neural activity is present during the two-tap sequence.

While RSA described which behavioral features were represented in core and policy activity, wewere also interested in describing how neural activity changes over time to produce different behav-iors. We began by analyzing neural activity during the production of stereotyped behaviors. Activity

7

patterns in the two-tap task showed peak activity in core and policy units that was sequentially orga-nized (Figure 7), uniformly tiling time between both taps of the two-tap sequence. This sequentialactivation was observed across tasks and behaviors in the policy network, including during running(see video) where, consistent with policy networks encoding short-timescale kinematic features in atask-invariant manner, neural activity sequences were largely conserved across tasks (See AppendixA.5, Figure 10). These sequences were reliably repeated across instances of the respective behaviors,and in the case of the two-tap sequence, showed reduced neural variability relative to surroundingtimepoints (See Appendix A.6, Figure 11).

Figure 8: Latent network dynamics within tasks reflect rodent behavior on different timescales. (A)Vector field representation of the first two principal components of neural activity in the core andfinal policy layers during the two-tap task. PC spaces show signatures of rotational dynamics. (B)Vector field representation of first two jPC planes for the core and final policy layers during the two-tap task. Apparent rotations within the different planes are associated with behaviors and behavioralfeatures of different timescales, labeled above. Columns denote layer (as in (A)), while rows denotejPC plane. (C) Characteristic frequency of rotations within each jPC plane. Groups of three pointsrespectively indicate the first, second, and third jPC planes for a given layer. Rotations in the coreare slower than those in the policy. (D) Variance explained by each jPC plane.

The finding of sequential activity hints at a putative mechanism for the rodent’s behavioral pro-duction. We next hoped to systematically quantify the types of sequential and dynamical activitypresent in core and policy networks without presupposing the behaviors of interest. To describepopulation dynamics in relation to behavior, we first applied principal components analysis (PCA)to the activity during the performance of single tasks, and visualized the gradient of the populationvector as a vector field. Figure 8A shows such a vector field representation of the first two principalcomponents of the core and final policy layer during the two-tap task. We generated vector fieldsby discretizing the PC space into a two-dimensional grid and calculating the average neural activitygradient with respect to time for each bin.

The vector fields showed strong signatures of rotational dynamics across all layers, likely a signatureof previously described sequential activity. To extract rotational patterns, we used jPCA, a dimen-sionality reduction method that extracts latent rotational dynamics in neural activity (Churchlandet al., 2012). The resulting jPCs form an orthonormal basis that spans the same space as the firstsix traditional PCs, while maximally emphasizing rotational dynamics. Figure 8B shows the vector

8

https://youtu.be/UeYYOKer54g

fields of the first two jPC planes for the core and final policy layers along with their characteristicfrequency. Consistent with our previous findings, jPC planes in the core have lower characteristicfrequencies than those in policy layers across tasks (Figure 8C). The jPC planes also individuallyexplained a large percentage of total neural variability (Figure 8D).

These rotational dynamics in the policy and core jPC planes were respectively associated with theproduction of behaviors and the reward structure of the task. For example, in the two-tap task,rotations in the fastest jPC plane in the core were concurrent with the approach to reward, whilerotations in the second fastest jPC were concurrent with long timescale transitions between runningto the orb and performing the two-tap sequence. Similarly, the fastest jPC in policy layers was corre-lated with the phase of running, while the second fastest was correlated with the phase of the two-tapsequence (video). This trend of core and policy neural dynamics respectively reflecting behavioraland task-related features was also present in other tasks. For example, in the maze forage task, thefirst two jPC planes in the core respectively correlated with reaching the target orb and discoveringthe location of new orbs, while those in the policy were correlated with low-level locomotor featuressuch as running phase (video). Along with RSA, these findings support a model in which the corelayer transforms sensory information into a contextual signal in a task-specific manner. This signalthen modulates activity in the policy toward different trajectories that generate appropriate behav-iors in a more task-independent fashion. For a more complete set of behaviors with neural dynamicsvisualizations overlaid, see Appendix A.7.

3.5 NEURAL PERTURBATIONS CORROBORATE DISTINCT ROLES ACROSS LAYERS

To causally demonstrate the differing roles of core and policy units in respectively encoding task-relevant features and movement, we performed silencing and activation of different neuronal subsetsin the two-tap task. We identified two stereotyped behaviors (rears and spinning jumps) that werereliably used in two different seeds of the agent to reach the orb in the task. We ranked neuronsaccording to the degree of modulation of their z-scored activity during the performance of thesebehaviors. We then inactivated subsets of neurons by clamping activity to the mean values betweenthe first and second taps and observed the effects of inactivation on trial success and behavior.

In both seeds analyzed, inactivation of policy units had a stronger effect on motor behavior than theinactivation of core units. For instance, in the two-tap task, ablation of 64 neurons in the final policylayer disrupts the performance of the spinning jump (Appendix A.8 Figure 12B video). In contrast,ablation of behavior-modulated core units did not prevent the production of the behavior, but mildlyaffected the way in which the behavior is directed toward objects in the environment. For example,ablation of a subset of core units during the performance of a spinning jump had a limited effect, butsometimes resulted in jumps that missed the target orbs (video; See Appendix A.8, Figure 12C).

We also performed a complementary perturbation aimed to elicit behaviors by overwriting the cellstate of neurons in each layer with the average time-varying trajectory of neural activity measuredduring natural performance of a target behavior. The efficacy of stimulation was found to dependon the gross body posture and behavioral state of an animal, but was nevertheless successful insome cases. For example, during the two-tap sequence, we were able to elicit spinning movementscommon to searching behaviors in the forage task (video; See Appendix A.8, Figure 12D, E). Theefficacy of this activation was more reliable in layers closer to the motor output (Figure 12D). Infact, activation of core units rarely elicited spins, but rather elicited sporadic dashes reminiscent ofthe searching strategy of many models during the forage task (video).

4 DISCUSSION

For many computational neuroscientists and artificial intelligence researchers, an aim is to reverse-engineer the nervous system at an appropriate level of abstraction. In the motor system, such aneffort requires that we build embodied models of animals equipped with artificial nervous systemscapable of controlling their synthetic bodies across a range of behavior. Here we introduced a virtualrodent capable of performing a variety of complex locomotor behaviors to solve multiple tasks usinga single policy. We then used this virtual nervous system to study principles of the neural controlof movement across contexts and described several commonalities between the neural activity ofartificial control and previous descriptions of biological control.

9

https://youtu.be/w51o3XGnHnc

https://youtu.be/XV3tz1bpjdg

https://youtu.be/BITPVZB7k28

https://youtu.be/dprddh-Olr8

https://youtu.be/IvwKp6tZuf4

https://youtu.be/9uWd6WLCllw

A key advantage of this approach relative to experimental approaches in neuroscience is that we canfully observe sensory inputs, neural activity, and behavior, facilitating more comprehensive testingof theories related to how behavior can be generated. Furthermore, we have complete knowledgeof the connectivity, sources of variance, and training objectives of each component of the model,providing a rare ground truth to test the validity of our neural analyses. With these advantagesin mind, we evaluated our analyses based on their capacity to both describe the algorithms andrepresentations employed by the virtual rodent and recapitulate the known functional objectivesunderlying its creation without prior knowledge.

To this end, our description of core and policy as respectively representing value and motor pro-duction is consistent with the model’s actor-critic training objectives. But beyond validation, ouranalyses provide several insights into how these objectives are reached. RSA revealed that the cellactivity of core and policy layers had greater similarity with behavioral and postural features thanwith short-timescale actuators. This suggests that the representation of behavior is useful in themoment-to-moment production of motor actions in artificial control, a model that has been pre-viously proposed in biological action selection and motor control (Mink, 1996; Graziano, 2006).These behavioral representations were more consistent across tasks in the policy than in the core,suggesting that task context and value activity in the core engaged task-specific behavioral strategiesthrough the reuse of shared motor activity in the policy.

Our analysis of neural dynamics suggests that reused motor activity patterns are often organized assequences. Specifically, the activity of policy units uniformly tiles time in the production of severalstereotyped behaviors like running, jumping, spinning, and the two-tap sequence. This finding isconsistent with reports linking sequential neural activity to the production of stereotyped motor andtask-oriented behavior in rodents (Berke et al., 2009; Rueda-Orozco & Robbe, 2015; Dhawale et al.,2019), including during task delay periods (Akhlaghpour et al., 2016), as well as in singing birds(Albert & Margoliash, 1996; Hahnloser et al., 2002). Similarly, by relating rotational dynamicsto the virtual rodent’s behavior, we found that different behaviors were seemingly associated withdistinct rotations in neural activity space that evolved at different timescales. These findings areconsistent with a hierarchical control scheme in which policy layer dynamics that generate reusedbehaviors are activated and modulated by sensorimotor signals from the core.

This work represents an early step toward the constructive modeling of embodied control for thepurpose of understanding the neural mechanisms behind the generation of behavior. Incrementallyand judiciously increasing the realism of the model’s embodiment, behavioral repertoire, and neuralarchitecture is a natural path for future research. Our virtual rodent possesses far fewer actuatorsand touch sensors than a real rodent, uses a vastly different sense of vision, and lacks integrationwith olfactory, auditory, and whisker-based sensation (see Zhuang et al., 2017). While the virtualrodent is capable of locomotor behaviors, an increased diversity of tasks involving decision making,memory-based navigation, and working memory could give insight into “cognitive” behaviors ofwhich rodents are capable. Furthermore, biologically-inspired design of neural architectures andtraining procedures should facilitate comparisons to real neural recordings and manipulations. Weexpect that this comparison will help isolate residual elements of animal behavior generation thatare poorly captured by current models of motor control, and encourage the development of artificialneural architectures that can produce increasingly realistic behavior.

AUTHOR CONTRIBUTIONS

Josh and Yuval built the rodent MuJoCo model, with measurements collected by Diego and Jesse.Josh trained the virtual rodent model. Jesse performed behavioral and neural representation analyses.Diego performed neural dynamics analyses. Josh, Jesse, and Diego drafted the manuscript. Allauthors contributed to the conception of the project.

ACKNOWLEDGMENTS

The rodent skeleton reference model was purchased from leo3Dmodels on TurboSquid. Thanks toMax Cant for the rodent skin, and Marcus Wainwright for the skybox and ground textures. D.A. wassupported by NSF GRFP DGE1745303. J.D.M was supported by a fellowship from the Helen HayWhitney foundation sponsored by Vertex and a K99/R00 award from the NINDS.

10

REFERENCES

Abbas Abdolmaleki, Jost Tobias Springenberg, Yuval Tassa, Remi Munos, Nicolas Heess, and Mar-tin Riedmiller. Maximum a posteriori policy optimisation. In International Conference on Learn-ing Representations, 2018.

Hessameddin Akhlaghpour, Joost Wiskerke, Jung Yoon Choi, Joshua P Taliaferro, Jennifer Au, andIlana B Witten. Dissociated sequential activity and stimulus encoding in the dorsomedial striatumduring spatial working memory. Elife, 5:e19507, 2016.

C Yu Albert and Daniel Margoliash. Temporal hierarchical control of singing in birds. Science, 273(5283):1871–1875, 1996.

Andrea Banino, Caswell Barry, Benigno Uria, Charles Blundell, Timothy Lillicrap, Piotr Mirowski,Alexander Pritzel, Martin J Chadwick, Thomas Degris, Joseph Modayil, et al. Vector-basednavigation using grid-like representations in artificial agents. Nature, 557(7705):429, 2018.

Joshua D Berke, Jason T Breck, and Howard Eichenbaum. Striatal versus hippocampal representa-tions during win-stay maze performance. Journal of neurophysiology, 101(3):1575–1587, 2009.

Gordon J Berman, Daniel M Choi, William Bialek, and Joshua W Shaevitz. Mapping the stereotypedbehaviour of freely moving fruit flies. Journal of The Royal Society Interface, 11(99):20140672,2014.

Hillel J Chiel and Randall D Beer. The brain has a body: adaptive behavior emerges from inter-actions of nervous system, body and environment. Trends in neurosciences, 20(12):553–557,1997.

Mark M Churchland, M Yu Byron, John P Cunningham, Leo P Sugrue, Marlene R Cohen, Greg SCorrado, William T Newsome, Andrew M Clark, Paymon Hosseini, Benjamin B Scott, et al. Stim-ulus onset quenches neural variability: a widespread cortical phenomenon. Nature neuroscience,13(3):369, 2010.

Mark M Churchland, John P Cunningham, Matthew T Kaufman, Justin D Foster, Paul Nuyujukian,Stephen I Ryu, and Krishna V Shenoy. Neural population dynamics during reaching. Nature, 487(7405):51, 2012.

Christopher J Cueva and Xue-Xin Wei. Emergence of grid-like representations by training recurrentneural networks to perform spatial localization. arXiv preprint arXiv:1803.07770, 2018.

Ashesh K Dhawale, Steffen BE Wolff, Raymond Ko, and Bence P Olveczky. The basal ganglia cancontrol learned motor sequences independently of motor cortex. bioRxiv, pp. 827261, 2019.

Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymyr Mnih, Tom Ward, YotamDoron, Vlad Firoiu, Tim Harley, Iain Dunning, et al. IMPALA: Scalable distributed deep-rlwith importance weighted actor-learner architectures. In International Conference on MachineLearning, pp. 1406–1415, 2018.

Michael Graziano. The organization of behavioral repertoire in motor cortex. Annu. Rev. Neurosci.,29:105–134, 2006.

Sten Grillner, Alexander Kozlov, Paolo Dario, Cesare Stefanini, Arianna Menciassi, Anders Lansner,and Jeanette Hellgren Kotaleski. Modeling a vertebrate motor system: pattern generation, steeringand control of body orientation. Progress in brain research, 165:221–234, 2007.

Richard HR Hahnloser, Alexay A Kozhevnikov, and Michale S Fee. An ultra-sparse code underli-esthe generation of neural sequences in a songbird. Nature, 419(6902):65, 2002.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recog-nition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.770–778, 2016.

Nicolas Heess, TB Dhruva, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa,Tom Erez, Ziyu Wang, Ali Eslami, et al. Emergence of locomotion behaviours in rich environ-ments. arXiv preprint arXiv:1707.02286, 2017.

11

Norman C Heglund and C Richard Taylor. Speed, stride frequency and energy cost per stride: howdo they change with body size and gait? Journal of Experimental Biology, 138(1):301–318, 1988.

Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.

Auke Jan Ijspeert, Alessandro Crespi, Dimitri Ryczko, and Jean-Marie Cabelguen. From swimmingto walking with a salamander robot driven by a spinal cord model. Science, 315(5817):1416–1420, 2007.

Eric Jonas and Konrad Paul Kording. Could a neuroscientist understand a microprocessor? PLoScomputational biology, 13(1):e1005268, 2017.

John F Kalaska. From intention to action: motor cortex and the control of reaching movements. InProgress in Motor Control, pp. 139–178. Springer, 2009.

Kohitij Kar, Jonas Kubilius, Kailyn Schmidt, Elias B Issa, and James J DiCarlo. Evidence thatrecurrent circuits are critical to the ventral streams execution of core object recognition behavior.Nature neuroscience, 22(6):974, 2019.

Risa Kawai, Timothy Markman, Rajesh Poddar, Raymond Ko, Antoniu L Fantana, Ashesh KDhawale, Adam R Kampff, and Bence P Olveczky. Motor cortex is required for learning butnot for executing a motor skill. Neuron, 86(3):800–812, 2015.

Alexander JE Kell, Daniel LK Yamins, Erica N Shook, Sam V Norman-Haignere, and Josh H Mc-Dermott. A task-optimized neural network replicates human auditory behavior, predicts brainresponses, and reveals a cortical processing hierarchy. Neuron, 98(3):630–644, 2018.

Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte. Deep supervised, but not unsupervised,models may explain it cortical representation. PLoS computational biology, 10(11):e1003915,2014.

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey E. Hinton. Similarity of neuralnetwork representations revisited. CoRR, abs/1905.00414, 2019.

Nikolaus Kriegeskorte and Jorn Diedrichsen. Peeling the onion of brain representations. AnnualReview of Neuroscience, 42(1):407–432, 2019.

Nikolaus Kriegeskorte, Marieke Mur, and Peter A Bandettini. Representational similarity analysis-connecting the branches of systems neuroscience. Frontiers in systems neuroscience, 2:4, 2008.

Yuri Lazebnik. Can a biologist fix a radio? Or, what I learned while studying apoptosis. Cancercell, 2(3):179–182, 2002.

Timothy P Lillicrap and Stephen H Scott. Preference distributions of primary motor cortex neuronsreflect control solutions optimized for limb biomechanics. Neuron, 77(1):168–179, 2013.

Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of machinelearning research, 9(Nov):2579–2605, 2008.

Josh Merel, Arun Ahuja, Vu Pham, Saran Tunyasuvunakool, Siqi Liu, Dhruva Tirumala, NicolasHeess, and Greg Wayne. Hierarchical visuomotor control of humanoids. In International Con-ference on Learning Representations, 2019a.

Josh Merel, Leonard Hasenclever, Alexandre Galashov, Arun Ahuja, Vu Pham, Greg Wayne,Yee Whye Teh, and Nicolas Heess. Neural probabilistic motor primitives for humanoid control.In International Conference on Learning Representations, 2019b.

Jonathan A Michaels, Stefan Schaffelhofer, Andres Agudelo-Toro, and Hansjorg Scherberger. Aneural network model of flexible grasp movement generation. bioRxiv, pp. 742189, 2019.

Jonathan W Mink. The basal ganglia: focused selection and inhibition of competing motor pro-grams. Progress in neurobiology, 50(4):381–425, 1996.

12

John A Nyakatura, Kamilo Melo, Tomislav Horvat, Kostas Karakasiliotis, Vivian R Allen, AmirAndikfar, Emanuel Andrada, Patrick Arnold, Jonas Laustroer, John R Hutchinson, et al. Reverse-engineering the locomotion of a stem amniote. Nature, 565(7739):351, 2019.

Xue Bin Peng, Glen Berseth, and Michiel Van de Panne. Terrain-adaptive locomotion skills usingdeep reinforcement learning. ACM Transactions on Graphics (TOG), 35(4):81, 2016.

Xue Bin Peng, Glen Berseth, KangKang Yin, and Michiel Van De Panne. DeepLoco: Dynamiclocomotion skills using hierarchical deep reinforcement learning. ACM Transactions on Graphics(TOG), 36(4):41, 2017.

Pavan Ramdya, Robin Thandiackal, Raphael Cherney, Thibault Asselborn, Richard Benton,Auke Jan Ijspeert, and Dario Floreano. Climbing favours the tripod gait over alternative fasterinsect gaits. Nature communications, 8:14494, 2017.

Alfonso Renart and Christian K Machens. Variability in neural activity and behavior. Currentopinion in neurobiology, 25:211–220, 2014.

Pavel E Rueda-Orozco and David Robbe. The striatum multiplexes contextual and kinematic infor-mation to constrain motor habits execution. Nature neuroscience, 18(3):453, 2015.

Simon Schmitt, Jonathan J Hudson, Augustin Zidek, Simon Osindero, Carl Doersch, Wojciech MCzarnecki, Joel Z Leibo, Heinrich Kuttler, Andrew Zisserman, Karen Simonyan, et al. Kickstart-ing deep reinforcement learning. arXiv preprint arXiv:1803.03835, 2018.

H Francis Song, Guangyu R Yang, and Xiao-Jing Wang. Reward-based training of recurrent neuralnetworks for cognitive and value-based tasks. Elife, 6:e21492, 2017.

H Francis Song, Abbas Abdolmaleki, Jost Tobias Springenberg, Aidan Clark, Hubert Soyer, Jack WRae, Seb Noury, Arun Ahuja, Siqi Liu, Dhruva Tirumala, et al. V-MPO: On-Policy Maxi-mum a Posteriori Policy Optimization for Discrete and Continuous Control. arXiv preprintarXiv:1909.12238, 2019.

Greg J Stephens, Bethany Johnson-Kerner, William Bialek, and William S Ryu. Dimensionality anddynamics in the behavior of c. elegans. PLoS computational biology, 4(4):e1000028, 2008.

David Sussillo, Mark M Churchland, Matthew T Kaufman, and Krishna V Shenoy. A neural networkthat finds a naturalistic solution for the production of muscle activity. Nature neuroscience, 18(7):1025, 2015.

Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Bud-den, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, and Martin Ried-miller. DeepMind control suite. arXiv preprint arXiv:1801.00690, 2018.

Emanuel Todorov, Tom Erez, and Yuval Tassa. MuJoCo: A physics engine for model-based control.In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033.IEEE, 2012.

Daniel LK Yamins, Ha Hong, Charles F Cadieu, Ethan A Solomon, Darren Seibert, and James JDiCarlo. Performance-optimized hierarchical models predict neural responses in higher visualcortex. Proceedings of the National Academy of Sciences, 111(23):8619–8624, 2014.

Guangyu Robert Yang, Madhura R Joglekar, H Francis Song, William T Newsome, and Xiao-JingWang. Task representations in neural networks trained to perform many cognitive tasks. Natureneuroscience, 22(2):297, 2019.

Chengxu Zhuang, Jonas Kubilius, Mitra JZ Hartmann, and Daniel L Yamins. Toward goal-drivenneural network models for the rodent whisker-trigeminal system. In Advances in Neural Informa-tion Processing Systems, pp. 2555–2565, 2017.

13

A APPENDIX

A.1 RAT MEASUREMENTS

To construct the virtual rodent model, we obtained the mass and lengths of the largest body segmentsthat influence the physical properties of the virtual rodent. First, we dissected cadavers of twofemale Long-Evans rats, and measured the mass of relevant limb segments and organs. Next, wemeasured the lengths of body segments over the skin of animals anesthetized with 2% v/v isofluraneanesthesia in oxygen. We confirmed that these skin based measurements approximated bone lengthsby measuring bone lengths in a third cadaver. The care and experimental manipulation of all animalswere reviewed and approved by the appropriate Institutional Animal Care and Use Committee.

Animal (#)63 64

Body part Mass (g) Average mass (g)

Hindlimb L 21 26 23.5Hindlimb R 21 26 23.5Tail 8 10 9Forelimb R 11 14 12.5Forelimb L 12 13 12.5Full torso 176 187 181.5Head 26 26 26Upper torso 78 71 74.5Lower torso 98 114 106Torso without organs 54 58 56Intestines and stomach 22 32 27Liver 26 17 21.5Pelvis and kidneys 74 80 77Jaw 2.43 4.70 3.57Skull 23 21 22Tail (base to mid) 5.92 7.20 6.56Tail (mid to tip) 1.78 2.30 2.04Scapula L 3.19 4.70 3.94Humerus L 6.25 4.70 5.48Radius/ulna L 2.61 2.8 2.70Forepaw L 0.53 0.5 0.52Scapula R 2.23 3.9 3.07Humerus R 6.08 6.7 6.39Radius/ulna R 2.17 3.3 2.74Forepaw R 0.53 0.5 0.52Hindpaw L 1.66 1.7 1.68Tibia L 9 9 9Femur L 13 16 14.5Hindpaw R 1.81 1.6 1.71Tibia R 5 6 5.5Femur R 13 18 15.5

Total 281 301 291

Table 1: Before weighing, limb segments were divided at their respective joints. Mass of all seg-ments includes all bones, skin, muscle, fascia and adipose layers. L and R refer to the left and rightsides of the animal. Precision of measurements listed without decimal places is ±0.5g

14

Animal (#)48 62 55 56 64 63 62* Average ± std

Age (days) 382 82 330 330 83 83 83Mass (g) 325 273 389 348 283 269 273 309 ± 47

Body part Length (mm)

Ankle to claw L 40.2 39.5 39.7 37.8 39.9 41.5 39.8 39.8 ± 1.1Ankle to toe L 38.4 38.12 37.7 35.6 36.6 39.3 38 37.7 ± 1.2Ankle to pad L 23.4 22.2 23 22.12 22.5 23.3 6.4 20.4 ± 6.2Ankle to claw R 38.2 40.4 38.3 39.3 39.6 38.3 39.0 ± 0.9Ankle to toe R 37 38.7 36.3 37.7 38.6 36.2 37.4 ± 1.1Ankle to pad R 22.4 23.3 21.9 21.8 23.1 24.1 22.8 ± 0.9Tibia L 50 36.3 38.5 49.2 35.8 38.7 34.1 40.4 ± 6.5Femur L 44.5 31.6 32.1 37.9 33.4 35.35 32.4 35.3 ± 4.6Tibia R 36.7 39.1 37.9 35.1 38.4 36.18 37.2 ± 1.5Femur R 32.9 32.1 38.7 31.9 32.1 32.6 33.4 ± 2.6Pelvis 25.8 32 31.7 30.2 26.7 27.2 28.9 ± 2.7Wrist to claw L 15 18.8 17.6 18.6 16 19.02 19.2 17.7 ± 1.6Wrist to finger L 16 15.8 17.4 15.5 17.07 17.6 16.6 ± 0.9Wrist to pad L 6 6.4 8.34 4.9 6.1 6.4 6.4 ± 1.1Wrist to olecranon L 29.1 34 32.5 31.7 33.9 32.1 29.9 31.9 ± 1.9Humerus L 31.9 29.52 31 28.2 27 31.2 25.4 29.2 ± 2.4Scapula L 22.7 24 26.4 29.3 25.9 29.1 26.2 26.2 ± 2.4Wrist to claw R 16.8 17 17.8 15.9 16.3 18.1 17.0 ± 0.8Wrist to finger R 14.1 13 15.6 15.6 15.3 16.9 15.1 ± 1.4Wrist to pad R 5.6 5.8 6.55 5.2 5 5.8 5.7 ± 0.5Wrist to olecranon R 30.6 33.5 31.2 30.4 31.8 29.9 31.2 ± 1.3Humerus R 28.2 33.5 28.8 25 28.2 25.2 28.1 ± 3.1Scapula R 23.8 29.5 25.9 26.2 28.8 24.4 26.4 ± 2.3Headcap width 39 39Headcap length 30 30Skull width 38.8 23.35 23 21.8 22.8 23.9 22.2 25.1 ± 6.1Skull length 57 51.1 61 56.48 53.16 58.13 48 55.0 ± 4.5Skull height 21.59 21.5 21 21.4 ± 0.3Head to thoracic 48.6 71.4 68.68 65 60.4 71.2 64.2 ± 8.7Thoracic to sacral 73.1 73.6 62.9 65.04 64.7 68.8 68.0 ± 4.6Head to sacral 145 126 145.5 127.05 127.2 123.7 140.9 133.6 ± 9.7Head width 53.4 53.4Ear 18 17.55 19.3 17.9 19.2 18.8 18.5 ± 0.7Eye 7.2 8.25 8.6 8.8 8.2 8.3 8.2 ± 0.6

Table 2: Length measurements of limb segments used to construct the virtual rodent model from 7female Long-Evans rats. Measurements were performed using calipers either over the skin or overdissected bones (*). Thoracic and sacral refer to vertebral segments. L and R refer to the left andright sides of the animal’s body.

15

A.2 BEHAVIORAL ANALYSIS

We generated features describing the whole-body pose and kinematics of the virtual rodent on fast,intermediate, and slow temporal scales. To describe the whole-body pose, we took the top 15 prin-cipal components of the virtual rodent’s joint angles and joint positions to yield two 15 dimensionalsets of eigenpostures (Stephens et al., 2008). We combined these into a 30 dimensional set of postu-ral features. To describe the animal’s whole-body kinematics, we computed the continuous wavelettransform of each eigenposture using a Morlet wavelet spanning 25 scales. For each set of eigen-postures this yielded a 375 dimensional time-frequency representation of the underlying kinematics.We then computed the top 15 principal components of each 375 dimensional time-frequency repre-sentation and combined them to yield a 30 dimensional representational description of the animal’sbehavioral kinematics. To facilitate comparison of kinematics to neural representations on differ-ent timescales, we used three sets of wavelet frequencies on 1 to 25 Hz (intermediate), 0.3 to 5 Hz(slow) or 5-25 Hz (fast) timescales. In separate work, we have found that combining postural andkinematic information improves separation of animal behaviors in behavioral embeddings. There-fore, we combined postural and dynamical features, the later on intermediate timescales, to yield a60 dimensional set of ‘behavioral features’ that we used to map the animal’s behavior using tSNE(Figure 4C) (Berman et al., 2014). tSNEs were made using the Barnes-Hut approximation with aperplexity of 30.

A.3 POWER SPECTRAL DENSITY OF BEHAVIOR AND NETWORK ACTIVITY

Figure 9: (A) Power spectral density estimates of four different features describing animal behavior,computed by averaging the spectral density of the top ten principal components of each feature,weighted by the variance they explain. (B) Power spectral density estimates of four different net-work layers, computed by averaging the spectral density of the top ten principal components of eachmatrix of activations, weighted by the variance they explain. Notice that policy layers have morepower in high frequency bands than core layers. Arrows mark peaks in the power spectra corre-sponding to locomotion. Notably, the 4-5 Hz frequency of galloping in the virtual rat matches thatmeasured in laboratory rats (Heglund & Taylor, 1988). Power spectral density was computed usingWelch’s method using a 10 s window size and 5 s overlap.

16

A.4 REPRESENTATIONAL SIMILARITY ANALYSIS

We used representational similarity analysis to compare population representations across differentnetwork layers and to compute the encoding strength of different features describing animal behaviorin the population. Representational similarity analysis has in the past been used to compare neuralpopulation responses in tasks where behavioral stimuli are discrete, for instance corpuses of objectsor faces (Kriegeskorte et al., 2008; Kriegeskorte & Diedrichsen, 2019). A challenge in scaling suchapproaches to neural analysis in the context of behavior is that behavior unfolds continuously intime. It is thus a priori unclear how to discretize behavior into discrete chunks in which to comparerepresentations.

Formally, we defined eight sets of features Bi=1...8 describing the behavior of the animal on differenttimescales. These included features such as joint angles, the angular speed of the joint angles,eigenposture coefficients, and actuator forces that vary on short timescales, as well as behavioralkinematics, which vary on longer timescales and ‘behavioral features’, which consisted of bothkinematics and eigenpostures. Each feature set is a matrix Bi ∈ RMxqi where M is the number oftimepoints in the experiment and qi is the number of features in the set. We discretized each set Bi

using k-means clustering with k = 50 to yield a partition of the timepoints in the experiment Pi.

Using the discretization defined in Pi, we can perform representational similarity analysis to com-pare the structure of population responses across neural network layers Lm and Ln or between agiven network layer and features of the behavior Bi. Following notation in (Kornblith et al., 2019)we let X ∈ Rkxp be a matrix of population responses across p neurons and the k behavioral cat-egories in Pi. We let Y ∈ Rkxq be either the matrix of population responses from q neurons in adistinct network layer, or a set of q features describing the behavior of the animal in the feature setBi.

After computing the response matricies in a given behavioral partition, we compared the representa-tional structure of the matricies XXT and Y Y T . To do so, we compute the similarity between thesematricies using the linear Centered Kernel Alignment index, which is invariant under orthonormalrotations of the population activity. Following (Kornblith et al., 2019), the CKA coeffient is:

CKA(XXT , Y Y T ) =‖XY T ‖F

‖XXT ‖F ‖Y Y T ‖F(1)

Where ‖ · ‖F is the Frobenius norm. For centered X and Y , the numerator is equivalent to thedot-product between the vectorized responses ‖XY T ‖F = 〈vec(XXT ), vec(Y Y T )〉.

For a given network layer Lm, and a behavioral partition Pi, we can denote XXT = DLm

Pi= Dm

i .Similarly, for a given feature set Bi, let DBi

Pi= Di

i . Thus we are interested in characterizing both

CKA (Dmi , Dn

i ) (2)

andCKA

(Dm

i , Dii

). (3)

The former equation describes the similarity across two layers of the network, and the later describesthe similarity of the network activity to a set of behavioral descriptors.

An additional challenge comes when restricting this analysis to comparing the neural representationsof behavioral across different tasks Ta, Tb, where not all behaviors are necessarily used in each task.To make such a comparison, we denote Bi(Ta) to be the set of behavioral clusters observed in taskTa, and BTaTb

i = Bi(Ta) ∩ Bi(Ta) to be the set of behaviors used in each of the two tasks. Wecan then define a restricted partition of timepoints for each task PTa,Tb

i or PTb,Ta

i that includes onlythese behaviors, and compute the representational similarity between the same layer across tasks:

CKA(Dm

i,Ta, Dm

i,Tb

). (4)

We have presented a means of performing representational similarity analysis across continuoustime domains, where the natural units of discretization are unclear and likely manifold. While wefocused on analyzing responses on the population level, it is likely that different subspaces of thepopulation may encode information about distinct behavioral features at different timescales, whichis still an emerging domain in representational similarity analysis techniques.

17

A.5 NEURAL POPULATION ACTIVITY ACROSS TASKS DURING RUNNING

Figure 10: Average activity in the final policy layer (policy 2) during running cycles across differenttasks. In each heatmap, rows correspond to the absolute averaged z-scored activity for individ-ual neurons, while columns denote time relative to the mid stance of the running phase. Acrossheatmaps, neurons are sorted by the time of peak activity in the tasks denoted on the left, suchthat each column of heatmaps contains the same average activity information with rearranged rows.Aligned running bouts were acquired by manually segmenting the the principal component space ofpolicy 2 activity to find instances of mid-stance running and analyzing the surrounding 200 ms.

18

A.6 STEREOTYPED BEHAVIOR INITIATION AND NEURAL VARIABILITY

During the execution of stereotyped behaviors, neural variability was reduced (Figure 11). Recallthat in our setting, neurons have no intrinsic noise, but inherit motor noise through observations ofthe state (i.e. via sensory reafference). This effect loosely resembles, and perhaps informs one lineof interpretation of the widely reported phenomenon of neural variability reducing with stimulus ortask onset (Churchland et al., 2010). Our reproduction of this effect, which simply emerges fromtraining, suggests that variance modulation may partly arise from moments in a task that benefitfrom increased behavioral precision (Renart & Machens, 2014).

Figure 11: Quantification of neural variability in inter-tap interval of two-tap task relative to thesecond tap. (A) Example normalized activity traces of ten randomly selected neurons in the finalpolicy layer. Lines indicate mean normalized activity whiles shaded regions range from the 20thpercentile to the 80th percentile. Dashed lines indicate the times of first and second taps. (B)Standard deviation of normalized activity across all neurons in the final policy layer as a functionof time relative to the second tap. Lines indicate the mean standard deviation while shaded regionsrange from the 20th percentile to the 80th percentile. Observe that variability is reduced during thetwo-tap interval.

A.7 NEURAL DYNAMICS VISUALIZED DURING TASK BEHAVIOR

For completeness, we provide links to videos of a few variants of neural dynamics for each task.

Network Visualization Task (link)

1-layer policy PCA gapsPCA foragePCA escapePCA two-tap

3-layer policy PCA gapsPCA foragePCA escapePCA two-tap

3-layer policy jPCA gapsjPCA foragejPCA escapejPCA two-tap

Table 3: Links to representative visualizations of neural dynamics and behavior

19

https://youtu.be/YviE8ZdOs-o

https://youtu.be/Pmp63jCZ9R8

https://youtu.be/VuQKHV25Dd8

https://youtu.be/tC6bMF8ZWaM

https://youtu.be/zYJEWG13SAw

https://youtu.be/1FvNZ8f1BFU

https://youtu.be/63REKsR8Mbo

https://youtu.be/xfYb8hnNrUs

https://youtu.be/O_7BUT3FVXk

https://youtu.be/XV3tz1bpjdg

https://youtu.be/ac-2km9jfL8

https://youtu.be/w51o3XGnHnc

A.8 PERTURBATION RESULTS

Figure 12: Causal manipulations reveal distinct roles for core and policy layers in the productionof behavior. (A) Two-tap accuracy during the inactivation of units modulated by idiosyncratic be-haviors within the two-tap sequence. Core inactivation has a weaker negative effect on trial successthan policy inactivation for several levels of inactivation. (B) Representative example of a failedtrial during inactivation of the final policy layer in a model that performs a spinning jump during thetwo-tap sequence. The model is incapable of producing the spinning jump behavior while inacti-vated. (C) Representative example of a failed trial during core inactivation in a model that performsa spinning jump during the two-tap sequence. The model is still able to perform the spinning jumpbehavior, but misses the orb. (D) Proportion of attempts at stimulation that successfully elicited spinbehavior during the two-tap task. The efficacy of this activation was more reliable in layers closerto the motor output. (E) Representative example of a single trial in which an extra spin occurs afterpolicy 2 activation.

20

arXiv:1911.09451v1 [q-bio.NC] 21 Nov 2019

Documents

Transcript of arXiv:1911.09451v1 [q-bio.NC] 21 Nov 2019