The striatum multiplexes contextual and kinematic information to constrain motor habits execution

The striatum is required for the acquisition of procedural memories but its contribution to motor control once learning has occurred is unclear. Here we created a task in which rats learned a difficult motor sequence characterized by fine-tuned changes in running speed adjusted to spatial and temporal constraints. After training and extensive practice, we found that the behavior was habitual yet tetrode recordings in the dorsolateral striatum (DLS) revealed continuous integrative representations of running speed, position and time. These representations were weak in naive rats hand-guided to perform the same sequence and developed slowly after learning. Finally, DLS inactivation in well-trained animals preserved the structure of the sequence while increasing its trial-by-trial variability. We conclude that after learning the DLS continuously integrates task-relevant information to constrain the execution of motor habits. Our work provides a straightforward mechanism by which the basal ganglia may contribute to habit formation and motor control.


Introduction
In many of our everyday actions we perform specific sequences of movements with kinematic parameters (e.g., movement time, speed, trajectory) precisely adjusted to taskspecific constraints 1 . Examples can be seen in the actions involved in driving a car, playing sports, using tools and performing arts. Several weeks to several months of training are necessary for the acquisition of these behaviors which are generally referred to as procedural memories or motor skills 2 . With practice, these skills can be performed automatically and there is converging evidence from lesion and physiological studies using motor sequence tasks in rodents 3-10 , monkeys [11][12][13][14] and humans 15,16 that the sensorimotor region of the striatum (the DLS in rodents) contributes to the execution of well-learned procedural memories, especially motor habits 17 . Nevertheless, the exact nature of this contribution remains unclear 18 and debated 19 , possibly because action execution is the readout of higher order processes such as motor planning or action selection 18 . The clarification of this ambiguity is difficult for two reasons. First, behavioral performance during motor sequence tasks is generally quantified through global metrics (e.g., percentage of correct trials, average reaction time, number of lever presses), which overlook changes in kinematic parameters of movements occurring during learning (e.g., movements speed and trajectory, increased stereotypy of behavior and sensory stimuli). Second, due to the massive sensorimotor input of the striatum 20 and its indirect connection toward motor circuits 21 , striatal spiking activity can be influenced by or influence kinematic parameters of action execution. Thus, during and after motor learning, it has not been established whether striatal spiking activity reflects primarily the acquisition of high-level motor control function (e.g., action planning, storage of motor programs), changes in low-level parameters (sensory and proprioceptive stimuli, dynamics of movements) that occurred during learning 12 or specifically contributes to precise kinematic parameters associated with accurate execution 12,18,19,22 . Addressing this problem requires a paradigm that dissociates low-level from high-level processes associated with motor learning.

Results
We designed a task for rats that favors the generation of a motor sequence with fine-tuned kinematic parameters easily quantifiable. Specifically, we customized a motorized treadmill and trained rats to obtain rewards according to a spatiotemporal rule. Once the treadmill was turned on, animals could stop it and receive a drop of sucrose solution by entering a "stop area" located at the front of the treadmill (Fig. 1a). In addition to this spatial rule, a temporal constraint was added: stopping of the treadmill was only effective if animals waited at least 7 s (goal time) before entering the stop area (correct trials; Fig. 1b). If animals entered the stop area before the goal time, an error sound was played and they were forced to run for 20 s (incorrect non-rewarded trials; Fig. 1b). Initially, rats accelerated forward as soon as the treadmill was turned on and entered the stop area before the goal time, resulting in a majority of incorrect trials ( Fig. 1c and Supplementary Video 1). After extensive training, rats executed a stereotyped sequence that could be divided in 3 overlapping phases: 1) passive displacement from the front to the rear portion of the treadmill, 2) stable running and 3) acceleration across the treadmill to enter the stop area (Fig. 1c, Supplementary Fig. 1 and Supplementary Video 2). The high level of stereotypy revealed by the trajectories of the animal was also demonstrated when tracking the position of the left forelimb ( Supplementary Fig. 2). Learning occurred in a two-step process. Within a few sessions, animals learned to perform the "front-rear-front" motor sequence but in the majority of the trials they entered the stop area just before the goal time (sessions 15-45; Fig. 1d), which resulted in a low percentage of correct trials (Fig. 1e). Then, during the longest part of the training, animals progressively adjusted the kinematic parameters of execution to delay their entering in the stop area and increase their proficiency (sessions 30-100; Fig. 1d,e). strength of habits is contingency degradation 23,24 . We designed two tests of contingency degradation adapted to our task. First, we shortened the treadmill (Fig. 2a) after extensive training (> 80 daily sessions), when the animals had reached stable performance (≥ 3 consecutive sessions with ≥ 72.5 % of correct trials). We observed that during several sessions following treadmill shortening, the animals persisted in performing the task using the previously learned kinematic parameters, yielding a majority of premature entrances into the stop area (Fig. 2a). Importantly, while the motor sequence was qualitatively identical before and after shortening (Fig. 2b), a great number of sessions were necessary for the animals to learn the new kinematic parameters adapted to the shorter treadmill (Fig. 2c). Second, we trained a separate group of animals to enter the stop area after a short goal time (4 or 5 s) and once the animals reached stable performance, the goal time was increased to 7 s. After this change, animals persisted in entering in the stop area just after the original goal time for several sessions (sessions 116-118; Fig. 2d) and it took the animals several sessions to return to the learning criterion (37, 16 and 10 sessions for Rat 7, 9 and 10, respectively). In both tests, behavioral persistence occurred despite the fact that rats occasionally performed correct trials under the new conditions (Fig. 2a,d). In some cases, several sessions after the change, the animals "relapsed" and performed the motor sequence using the previously learned kinematic parameters (Fig. 2c,d). This is in support of a motor habit that is difficult to break.
Finally, we examined the impact of reward devaluation on task performance 23,24 . This was achieved by giving the animals free access to the sucrose solution before a probe session. We performed this procedure during training (Fig. 2e) and found that when animals had not yet reached behavioral proficiency, they disengaged from the task during the devaluation test (Fig. 2e). However, when we repeated the same procedure after further training, animals were less and less sensitive to reward devaluation (Fig. 2e). Altogether the results obtained in this set of behavioral manipulations indicate an habitual performance of the motor sequence after learning and extensive practice.
Next we performed tetrode recordings in the DLS of rats with stable proficient performance in the task and examined whether neuronal correlates relevant for execution of the running sequence could be extracted from the spiking activity of well-isolated units (n = 391 from 3 rats; Supplementary Fig. 3). In agreement with previous works, a minority of taskmodulated units responded to the warning cue preceding the start of the treadmill (Supplementary Fig. 4) 25 , to reward delivery ( Supplementary Fig. 5) 26 or fired in bursts synchronized with limb movements (Supplementary Fig. 6) 27,28 . Still, as a population, striatal neurons displayed continuous sequential modulation of their spiking activity during task performance (Fig. 3a,b). To characterize the relationship between spiking activity and execution of the motor sequence we correlated instantaneous firing rates with the main behavioral variables (position on the treadmill, running speed and acceleration) and time. We found that a large fraction of these neurons displayed striking linear correlations between firing rate and either running speed (Fig. 4a-c) or position of the animal on the treadmill (Fig. 4d). The specificity of these correlations was quantified by computing Pearson's partial correlation coefficients (Fig. 4a-d) and was confirmed when plotting error trials ( Supplementary Fig. 7). Correlations between firing rate and running speed could not be accounted by transitions between trotting and galloping ( Supplementary Fig. 8). Additionally, some units had their firing rate co-modulated by time, position and speed (Fig.  4e,f). Finally, striatal spiking activity was weakly correlated with acceleration (Fig. 4g). Next, we used a multiple regression analysis to quantify the extent to which the firing rates of striatal units could be explained by a linear combination of the main behavioral variables (time, position, speed and acceleration; see Online Methods for the validation of these variables as orthogonal predictors). First, the results obtained using partial correlation analysis were confirmed by examining the F values of each coefficient (medians of the F values for speed, position, time and acceleration were 877, 342, 151 and 70 respectively). Second, at the population level, the multiple correlation coefficients were consistently larger than the partial correlation coefficients (Fig. 4g), suggesting that firing rates were generally influenced by more than one variable (Fig. 4b,e). This possibility was confirmed by comparing regression models that took into account a single predictor with models that took into account several (2, 3 or 4) predictors ( Supplementary Fig. 9). Finally, the proportion of units modulated by position, speed and acceleration and the predominance of linear correlations were confirmed with a distinct statistical method which did not rely on linear or monotonic relationships between spiking activity and behavioral variables ( Supplementary  Fig. 10). Altogether these results indicate that during habitual execution of the running sequence, spiking activity in the DLS is principally correlated with running speed and the position of the animal on the treadmill. Additionally, our data also revealed that neurons of the DLS multiplex information from several task-related variables.
The robust linear relationship between striatal activity and running speed could be a correlate of running speed control. If this was the case, this relationship should 1) improve when correlating running speed at a given time with preceding spiking activity and 2) decrease when correlating running speed at a given time with subsequent spiking activity. Such temporal relationship between a "leading" neuronal activity and a "following" behavior has been observed in the parietal cortex 29 and was revealed by systematically shifting the spike times of the recorded units either forward or backward in time and recomputing correlation coefficients for different time shifts. In contrast with what was observed in the parietal cortex 29 , we found that, at the population level, the correlation coefficients between firing rate and running speed were equally affected when spike times were shifted forward or backward relative to behavior (Fig. 5a-c). The analysis revealed that high correlation values were maintained for time shifts ranging from − 0.25 to + 0.25 s (Fig.  5b,c). This result was confirmed by randomly jittering the spikes times of each unit on a trial-by-trial basis over different timescales and recomputing the correlation coefficients (Fig. 5d). We compared the decay of the correlation coefficients for different jitters with the auto-correlation function of the running speed profiles. We found that changes in running speed occurred slightly faster than changes in neural activity (Fig. 5d). This result is more compatible with a continuous modulatory function of the speed-related spiking activity rather than a sharp instruction signal generating the running sequence on a moment-tomoment basis.
Neuronal activity in the region of the DLS where we performed electrophysiological recordings is known to be sensitive to sensory stimulation of the limbs, trunk, neck, head and whiskers 30 . To rule out the possibility that position and speed-related modulations of firing rate passively reflect low-level sensorimotor activity that became stereotyped during learning 12 , we performed recordings in the DLS of "naive" rats (n = 3) hand-guided by the experimenter to perform the running sequence. For this purpose, when the treadmill started, the experimenter used a rectangular plate to gently push the animal towards the rear of the treadmill and to maintain its position for a few seconds. Then the plate was removed and the rats accelerated to cross the treadmill (Fig. 6a). To keep learning minimal, hand-guided animals did not receive rewards during these sessions. Running trajectories were highly similar in the well-trained and hand-guided animals (Fig. 6b). To our surprise the sequential modulation of striatal units observed in well-trained animals ( Fig. 3b) was preserved in the hand-guided naive animals ( Supplementary Fig. 11). Next, we examined possible differences in neuronal representation of the behavioral variables between well-trained and hand-guided animals. Partial and multiple correlation analyses showed that the correlation coefficients between firing rate and both running speed and position were strongly reduced in the hand-guided animals compared with trained animals (well-trained vs. hand-guided, P < 0.001; Fig. 6c,d). The correlation coefficients between firing rates and time and acceleration were less affected (well-trained vs. hand-guided, acceleration, P = 0.006; time, P = 0.044; Fig. 6c,d), certainly because these coefficients were already low in well-trained animals (Fig. 4g). The decrease in linear representation of speed and position was strong and could not be explained by differences in running speed between hand-guided and welltrained animals ( Supplementary Fig. 12). One possibility is that position-and speed-related activities are dampened due to impaired vision by the guiding plate. At the population level, the similar timing and amplitude of striatal firing rate modulations in hand-guided and welltrained animals argue against such possibility ( Fig. 3b and Supplementary Fig. 11). To address directly this issue we took advantage of the fact that hand-guided animals sampled the same range of positions on the treadmill with the plate (when guided from front to rear) and without it (when running freely from rear to front). Partial correlation coefficients between firing rate and position, were identical in both conditions (median ± std; r = 0.097 ± 0.1, no plate; r = 0.089 ± 0.07, plate; P = 0.31). Altogether the results obtained demonstrate that position and speed representations are not byproducts of the stereotyped sensorimotor activity.
The above conclusion was reinforced by the study of one rat for which, following one week of hand-guided sessions, we continued to record neuronal and behavioral activity during 60 training sessions (i.e., no more guiding plate; sucrose delivery for correct trials and long penalty runs for incorrect trials; Fig. 6e). The animal's behavioral performance was poor during the initial training sessions but improved progressively and the front-rear-front motor sequence was learned (Fig. 6e,f). The strength of task representation in the DLS at different times during training was quantified using the median of the multiple correlation coefficients distributions between firing rates and time, position, speed and acceleration. The strength of task representation was similarly low during the stereotyped hand-guided and early training sessions (Fig. 6f), confirming that the guide plate had minor impact on position and speed representations. Noticeably, task representation increased ~ 20 sessions after the behavioral performance became stereotyped and dominated by rewarded trials (Fig.  6f). Altogether, these results demonstrated that representations in the DLS are not mere byproducts of the increased behavioral stereotypy associated with motor learning but emerge after extensive repetition of the correct motor sequence.
Our electrophysiological results revealed that after learning and extensive practice of a motor sequence, the DLS displayed position-and speed-related signals. The fact that the speed-related signal evolves more slowly than the running speed itself suggests that the DLS may modulate or guide the execution of the motor sequence, but does not generate the whole sequence on a moment-by-moment basis. To test this hypothesis, we transiently perturbed neuronal activity in the DLS of highly trained animals (n = 7 from 4 rats) through bilateral injections of muscimol at a dosage that elicited minimal effects on locomotion (50 ng μl −1 , 1 μl per site 31 ; Supplementary Figs. 13 and 15). The variability of the entrance times in the stop area was strikingly and reversibly increased following muscimol injections ( Fig. 7a,b), yet there was no systematic change in the mean trajectory across experiments: the overall structure of the running sequence was always preserved ( Fig. 7c and Supplementary Fig.  14). This result and the fact that rats in the muscimol condition reached fast running speeds ( Supplementary Fig. 15c,d), discount the possibility that decrease in running speed is the cause of the increased variability in entrance times. The similar mean trajectories and number of rewards received ( Supplementary Fig. 14a), before and after muscimol injections, suggest that animals were similarly engaged in the task in both conditions. Another possibility is that following muscimol injections animals were confused and did not start properly the task. This was not supported by the fact that average behavior was preserved following muscimol injections ( Supplementary Fig. 14d-f). Moreover when we focused on trials during which animals started correctly the sequence the increased variability was still apparent after muscimol injection ( Supplementary Fig. 14b,c).
To determine the source of increased variability in entrance times after muscimol injections, speed time-courses were aligned relative to entrance times and plotted over the last 2.5 s preceding the termination of the sequence (Fig. 7d). In control condition the speed timecourses were highly stereotyped and consisted in an increase in running speed followed by a sharp deceleration (Fig. 7d). After muscimol injection the speed time-courses became highly variable and this effect was consistently observed across experiments (Fig. 7d,e). Examination of the speed time-courses during intra-DLS muscimol injection (Fig. 7d) suggested that an alteration in the timing at which rats initiated the final acceleration to cross the treadmill can not, by itself, explain the increased variance in arrival times. To further confirm this, we quantified the time the animals spent in the back of the treadmill and found no systematic change across experiments between control and muscimol conditions (Fig.  7f,g). These behavioral analyses show that both the timing and magnitude of the speed changes are altered after DLS perturbation.
An expected correlate of habitual performance is that, after an occasional error, proficient animals immediately adjust their performance on the next trial. We defined an adjustment index as the difference in entrance time between the error trial and the next trial divided by the size of the error. As expected, in control condition, the indexes were mainly above 1 but after muscimol injection, the indexes dropped (Fig. 7h). Trials in which animals entered in the stop area several seconds after the goal time were not considered as errors but are associated with an increased effort. Well-trained animals rarely performed twice in a row such "long" trials and in control condition the adjustment indexes after "long" trials (entrance time ≥ goal time + 3 s) were in average slightly inferior to 1 s. This immediate behavioral adjustment was impaired after muscimol injections (Fig. 7i). Altogether, after perturbation of DLS neuronal activity with muscimol, the animals' capacity to perform the running sequence was spared but execution became highly variable. This increased variability was associated with a difficulty of the animals to run at the right speed at the right time and impaired adjustment of performance after incorrect trials.

Discussion
There are two confounding factors when studying the function of the DLS during motor learning. First, striatal spiking activity can both modulate actions 32,33 and be modulated by actions 27,28 , through indirect projections toward motor circuits 21 and a wide range of excitatory sensorimotor input 20 , respectively. Second, during learning, the dynamics of movements and associated sensory stimuli will change and eventually become stereotyped 2 . Thus, the inference of striatal mechanisms on the basis of changes in spiking dynamics during motor learning is non trivial and has been subject to debate 12,18,19 . Here we report that during habitual execution of a motor sequence, DLS neurons continuously represent in a combinatorial and linear manner contextual (time and position of the animal on the treadmill) and motor (running speed) information relevant for accurate performance. Importantly the representation of these variables was weak in naive rats performing the same motor sequence under the guidance of the experimenter and developed after learning. Thus, these representations do not merely reflect the stereotyped dynamics of "low-level" task parameters but are a signature of motor habits. Importantly, we found that perturbing these signals spared the average running trajectories but increased trial-by-trial variability and impaired the animals' capacity to adjust their performance after incorrect trials. Altogether the integrative neuronal representations of speed, position and time that we discovered in the DLS, combined with the type of behavioral impairment observed after its neuronal inactivation, suggest that an important function of this region is to continuously constrain the execution of motor habits.
To control the trajectories of the naive animals during hand-guided sessions, we used a guiding plate which could alter the animals vision and artificially weaken position and speed representations. First, this possibility is unlikely as the visual cortex targets the dorsomedial striatum, not the DLS 34 . Second, in the hand-guided recording sessions, we found no difference in position representation when we analyzed separately the portion of the trials in which the plate was used (animals guided from the front to the rear portion of the treadmill) and the portion of the trials during which the plate was removed (when animals accelerated to cross the treadmill). Third, at the neuronal population level, the magnitude and temporal profiles of the firing rate modulations were similar during hand-guided and well-trained conditions. This suggests that the striatal units recorded received similar sensorimotor stimulation in both conditions. Forth, in the training sessions that immediately followed the hand-guided sessions, correlations between firing rate and speed and position remained low even if the guiding plate was not used. Finally, it is not obvious how, in well-trained animals, low-level sensory input can account for linear changes in striatal firing rate versus position and speed. For instance, in the three illustrative speed-correlated units shown in Figure 4, spiking activity was not rhythmical and therefore can not be explained by the locomotion-related dynamics of limbs, head or whiskers 35,36 .
It could also be argued that the position and/or speed correlates are related to reward expectation and that the lack of rewards in the hand-guided sessions is responsible for the weak position and speed representations. Such possibility is not supported by the fact that, at the population level, firing rates were similarly modulated in well-trained and hand-guided animals throughout the entire execution of the task. Additionally, in the recordings performed during learning, many trials were rewarded but position and speed correlates remained weak for several sessions before appearing. Conversely, the fact that in welltrained animals speed and position representations did not drop during the penalty part of error trials (Supplementary Fig. 7) is incompatible with a major contribution of reward expectation. Finally, in our task, the linear correlations between firing rate and speed or position can not be mistaken for reward expectation signals. If this was the case the speedsensitive cells would not decrease their firing rate at the very end of the task when animals slowed down (Fig. 4a-c and Supplementary Fig. 7). And because rats occupied the front of the treadmill at the beginning and end of the trials, two times at which reward expectation is opposite, position representation can be isolated from reward expectation. In conclusion, our data are not compatible with the possibility that speed-and position-related activities in the DLS of well-trained animals simply reflected the stereotypical structure of movements or reward expectations. They are integrative signals acquired following extensive practice of the running sequence.
What could be the function of these signals? The linear correlations between firing rate and position are likely to provide contextual signals that inform the rat on its position relative to the front and the rear portions of the treadmill which are important spatial landmarks for successful performance [37][38][39] . The integration of such signals with a movement-related spiking activity could constrain or modulate the performance of the animal: if running speed is too fast or too slow in a certain portion of the treadmill, the integration of the speed-and position-related striatal activity in downstream brain regions would modulate motor commands either at cortical or subcortical levels. This hypothesis is supported by the fact that the speed-related activity in the DLS surrounds changes in running speed. Importantly, DLS perturbation with muscimol increased trial-by-trial variability of running trajectories, altered the animals' capacity to modulate their performance after error trials but spared the average "front-rear-front" structure of the sequence. The constraining/modulatory function of the DLS we propose, along with its underlying neuronal mechanism, extends to motor habits a recent model of the role of the basal ganglia in motor learning 40 . In this model, mainly based on experiments in the songbird, the striatum combines efferent motor copies and context signals to reinforce and modulate motor plans outside the basal ganglia 40 . We hypothesize that during extensive practice of a learned motor sequence, coincident efferent motor copies and contextual signals in the DLS cause high-level integrative representations of the most important aspects of the learned motor sequence, in the case of our task, the animal's running speed and position on the treadmill and to a lesser extent time. A constraining/modulatory function of these representations, distinct from action generation or action selection, is in line with the observation that Parkinson's disease patients perform reaching movements with preserved spatial accuracy but present a specific deficit in speed selection mechanisms 41 and by studies in rodents and non-human primates suggesting a specific contribution of the basal ganglia to the vigor of movements 12,42 . On the other hand, our results contrast with prominent works in rodents that suggested that the dorsal striatum only encodes the beginning and ending of action sequences through movement-independent signals that control actions like traffic lights control road traffic 4,6,7 . One possible explanation for such discrepancy is that the tasks used (T-maze and sequence of leverpresses) required limited motor control in the middle portion of the sequence. In support of this explanation, when mice were trained to perform faster lever-press sequences, modulation of striatal activity appeared in the middle of the sequence 43 . Our observation that spiking activity in the DLS is continuously modulated by kinematic and contextual parameters during sequence execution might also seems at odds with studies reporting dominant striatal activity that preceded action initiation or cancellation 44,45 . Here we provide evidence that continuous task-related signals modulate motor execution and therefore our work is compatible with a striatal race or accumulative model of action control 46 which, in the case of tasks requiring brief movements 44,45 , would occur before action initiation. We further develop this concept by showing that after extended practice of a motor sequence, the DLS can guide movement execution through continuous integration of multiple task-relevant information. While the role of the DLS in habit formation has been known for a long time, our work revealed a straightforward neuronal mechanism to underlie such function and provides a new framework to understand the role of the basal ganglia in motor control.

Online Methods
All experimental procedures were conducted in accordance with standard ethical guidelines (European Communities Directive 86/60-EEC) and were approved by ethical committees

Behavioral apparatus
The treadmill (Dattica) measured 80 cm long by 14 cm wide and was surrounded by 50 cm high plexiglas walls. There was no resting platform and the only walkable area was the treadmill belt, which was driven by a brushless digital motor (BG 44 SI, Dunkermotoren) controlled by a custom made program (LabVIEW, National Instruments) and a multifunction computer board (PXI-6254, National Instruments). A 60 W light bulb placed behind the camera illuminated the whole apparatus. One wall of the treadmill was equipped with a liquid well to deliver drops of sucrose solution (black triangle, Fig. 1a). A photodetector positioned at 10 cm from the front wall signaled the first entrance of the animal in the so-called stop area in each trial (black circle, Fig. 1a). A warning sound (1.5 kHz, 65 dB) signaled incorrect early entrances in the stop area.

Behavioral training
Animals were handled (2 h d −1 during 5 days), familiarized to run on the treadmill at increasing speeds and trained to perform the task. During training the treadmill speed was fixed for each animal (35-40 cm s −1 , 40-160 trials per session, 1-2 sessions per day). Trials started independently of the animal's position but after a few training sessions rats spent most of the inter-trial periods in the stop area. We established a criterion of performance accuracy (≥ 72.5 % of correct trials over the last 40 trials, for (≥ 3 consecutive sessions). During training the experimenter was not physically present in the behavioral room.

Animals
Long-Evans rats (n = 15, male, 250-400 g) were housed in pairs (individually after surgery) in stable conditions of temperature (

Implantation of tetrode arrays for unit recordings in behaving animals
Under deep isoflurane anesthesia, a 2-by-4 tetrode array (nichrome wires, 12.5 μm diameter, California Fine Wire) loaded on a NLX-9 micro-drive (Neuralynx) was implanted above the dorsal striatum ( Supplementary Fig. 3a), through a craniotomy centered at ML= ± 3.6 mm and AP= + 0.6 mm from bregma. Tetrode tips were gold plated to reduce their impedance to 100-200 kΩ at 1 kHz 47 . Two miniature screws implanted above the cerebellum served as ground and reference. After recovery from the surgery (1-2 weeks), the tetrodes were lowered toward the DLS (25 to 100 μm d −1 ).

Data acquisition and processing
Wide band (0.1-8000 Hz) neurophysiological signals from the tetrode arrays were amplified 1000 times via a Plexon VLSI headstage and a PBX2 amplifier and continuously acquired at 20 kHz on two synchronized National Instruments A/D cards (PCI 6254, 16 bit resolution). Spike sorting was performed semi-automatically 48 using the clustering software KlustaKwik (http://klustakwik.sourceforge.net) and the graphical spike sorting application Klusters (http://klusters.sourceforge.net) 49 .

Animal's position, running speed and acceleration
To determine the position of the animals we used a CCD camera (scA640-70fc, Basler, 60 frame s −1 , 9 pixels cm −1 ) positioned laterally to the treadmill and a fluorescent marker attached to the left forelimb. The marker's positions were extracted with a custom made program (Vision, National Instruments) and averaged in 400 ms long sliding windows (the average duration of a step cycle). Animals' speed and acceleration were derived from the distance traveled every 250 ms.

Firing rate modulation during task execution
Trials were aligned relative to treadmill onset and were divided into non-overlapping windows of 250 ms. For each trial, the firing rate in each window (spike count / 0.25) was smoothed with a Gaussian kernel filter of standard deviation 750 ms and instantaneous firing rates were averaged across trials. In a second approach, we computed perievent histograms with multiple points of references 31 . We restricted our analysis to trials in which the animals performed the archetypical front-rear-front sequence (typically > 70 % of the trials). We automatically selected archetypical trials by using a template matching method based on the averaged front-rear-front sequence performed during the entire session. For each selected trial, we identified 5 consecutive landmarks: 1) light on, 2) treadmill on, 3) reach of the back of the treadmill, 4) acceleration to cross the treadmill, 5) reaching the stop area. Then, for each animal we calculated the average time interval between landmarks 2-3, 3-4, 4-5 across all selected trials. The average time intervals between consecutive landmarks were divided in 250 ms long windows, determining for each animal, a fixed number of windows between landmarks. Next, all the selected trials were divided according to their respective landmarks times and the number of windows between landmarks. Finally, for each unit the instantaneous firing rate was computed, smoothed, transformed into Z-scores and averaged across trials.

Moment-to-moment correlation between firing rate and main task variables
Correlation plots shown in Figure 4 (middle panels) were computed from all the trials, binned in 250 ms long windows, from treadmill onset to treadmill offset. In some cases (Fig.  3, Fig. 6 and Supplementary Fig. 12), we restricted our analysis to trials or portions of trials displaying the archetypical front-rear-front sequence. To avoid biasing our correlation analysis with a minority of data points at the extreme range of the behavioral variables (e.g. lowest and highest speeds), we calculated the distributions of speeds and positions during each session and restricted the analysis to windows for which the value of the variable of interest (speed and position) fell inside the 90 th percentile of its distribution. Using the entire speed and position data did not affect the results. We used partial correlation analysis to quantify the specific correlation between firing rate and a single task-relevant variable (position, speed, acceleration or time), once the effects of the others variables have been removed. We found no significant differences when computing Pearson's and Spearman's correlation coefficients and we only reported Pearson's correlation coefficients. To quantify the extent to which the firing rate of the units was linearly correlated with a combination of task variables we used multiple regression analysis. For each session, multicollinearity in the predictor variables (position, speed, acceleration or time) was discarded by calculating the "variance inflation factor" (median 1.1, range 1.0-1.63; accepted values < 5), the "detectiontolerance index" (median 0.89, range 0.61-0.99; accepted values > 0.2) and the "conditional index" of the Belsley collinearity test (median 2.03, range 1-5.67; accepted values < 10; Matlab codes from Brian Lau http://www.subcortex.net/research/code/collinearitydiagnostics-matlab-code). To test if striatal firing rates were generally influenced by more than one variable we compared regression models that took into account a single predictor with models that took into account several (2, 3 or 4) predictors. This was done by computing the Akaike information criterion values for each model.
To test if changes in firing rate preceded running speed changes, for each trial, spike trains were systematically shifted by 100, 250, 500, 1000 or 2000 ms, backward or forward relative to the behavioral data. Then partial correlation coefficients between running speed and firing rate were recalculated for each shift value. To quantify the timescale of the relationship between firing rate and running speed, spikes times were jittered on a trial-bytrial basis and the jitter values were randomly chosen between ± 100, 250, 500, 1000 or 2000 ms. The partial correlation coefficients between speed and firing rate were recomputed for the different jitter ranges. To compare partial and multiple correlation coefficients between hand-guided and well-trained animals we restricted our correlation analysis to trials in which the animals performed the archetypical front-rear-front sequences.
All the results presented were reproduced when using different bin sizes (50, 100, 250, 500, 750 and 1000 ms, with a fixed smoothing window of 750 ms) or different smoothing windows (100, 250, 500, 750 and 1000 ms, with a fixed bin size of 100 ms)

Tuning curves of striatal units for position, speed and acceleration
For each recording session the ranges of position, speed and acceleration values were divided in 25 bins. The size of the bins was 3.6 cm for position, 5.8 cm s −1 for speed and ~ 10 cm s −2 for acceleration. The first and last bins of acceleration were adjusted to exclude the 5 % smallest and highest acceleration values. The total number of spikes fired in each bin was divided by the total time spent by the animal in that bin. The firing rate tuning curves obtained were smoothed with a Gaussian kernel filter whose standard deviation was 4 bins. To test for the significance of these tuning curves we used a shuffling/bootstrap procedure. For each trial, the spike train of the analyzed unit was circularized and the beginning of the trial was randomly chosen. Then, 3 surrogate tuning curves were recomputed (one for speed, one for position and one for acceleration). The operation was repeated 500 times to generate a global band of confidence (the 5 % highest and lowest values of the 500 × 3 surrogates tuning curves). These procedures reduce multiple comparison issue 50 and the possibility that the firing rate modulation by a given variable (e.g., speed) is secondary to the modulation of firing rate by another variable (e.g., position and acceleration). Next, we designed a method to directly test if the significant relationship between firing rate and a given behavioral variable of interest (e.g. acceleration), can be accounted by the relationships of another behavioral variable (e.g. speed or position) with both firing rate and the variable of interest. For instance, for the unit shown in Supplementary Figure 10a, the increase in firing rate for maximal deceleration values (right panel), could result from the combined increased firing rate when the animal is in the front portion of the treadmill (Supplementary Fig. 10a) and the fact that maximum decelerations tend to occur at the front of the treadmill when the animal finished the running sequence. Fist, we computed the probability distribution of acceleration conditioned on position. Then acceleration values were pseudo-randomly reassigned on the basis of the probability distribution of acceleration conditioned on the position values. By doing so, the relationship between acceleration and position was preserved but the fine relationship between firing rate and acceleration was altered. Once all the acceleration values were generated, a surrogate tuning curve of firing rate relative to acceleration was recomputed and plotted (thin blue dashed line in left panel of Supplementary Fig. 10g). This operation was repeated 100 times.
We considered that if one of the 100 surrogate tuning curves displayed a stronger modulation than the original tuning curve the modulation was not significant. For the tuning curves for acceleration, the entire procedure was performed twice: once to address speed confound and once to address position confound. The same procedure was also performed for tuning curves for speed and position. In the case of the unit in Supplementary Figure  10h, we noticed that the non-linear tuning curve for position (thick blue line, left panel) was largely explained by running speed (thin gray dashed lines, left panel). For all the tuning curves that remained significantly modulated, we systematically subtracted the mean surrogate tuning curve and examined if the correlation coefficient between tuning curve and the variable of interest improved.

Changes in striatal representations during learning
In Rat 17, to increase statistical power, units from 5 consecutive sessions were grouped together (25 ± 5.8 units per group), and we calculated the median value of their multiple correlation coefficients. For each session behavioral accuracy was estimated by computing the Pearson's correlation coefficients between all the possible pairs of trajectories performed in a single session and averaging all the correlation coefficients.

Reversible inactivation of the DLS in behaving animals
We bilaterally implanted guide-cannulae in the DLS (coordinates relative to Bregma: AP = 0.4 mm, ML= ± 4.0 mm, DV = −3.5 mm) of naive (n = 2) and well-trained animals (> 4 months of training, n = 4) under deep isoflurane anesthesia. The injectors protruded the guide-cannulae by 1 mm. Local injections (1 μl per site at 0.2 μl min −1 ) were performed 10 min before the behavioral sessions. In naive animals, the GABA A agonist muscimol (Tocris) was diluted in saline and injected at different concentrations (50 ng μl −1 , 500 ng μl −1 , 1 μg μl −1 ). The highest dose induced potent motor impairments (rigidity, catalepsy). The two lowest doses had no apparent effect on basic locomotor activity in the home cage. To further characterize the effects of the lowest doses animals were video-recorded while running on the treadmill. During these sessions the treadmill was started every minute during 20 s at a fixed speed of 35 cm sec −1 . Two blocks of 30 trials were separated by 30 min of rest in the home cage. Before the first block of trials, animals received bilateral injections of saline and before the second block animals were injected with muscimol. To compare the motor behavior between saline and muscimol injection ( Supplementary Fig. 13) we subtracted the distribution of speeds (or positions) obtained during the saline and muscimol blocks. We generated surrogate muscimol and saline distributions by randomly selecting 15 trials from each condition and subtracted the distributions obtained. We repeated this procedure 300 times and determined a global band of confidence (the 5 % highest and lowest values of the surrogate differences).
To quantify the stereotypy of the speed time-courses during a single session, Pearson's correlation coefficients were computed for all the possible pairs of trials (we restricted our correlation to the last 2.5 s before entrance time). The variance of these correlation coefficients was used as a measurement of running speed stereotypy. We quantified the animals' ability to adjust their behavior after error trials (arrival time < goal time, "early" trials) and after trials with abnormally late entrance times (arrival time > goal time + 3 s, "late" trials). We defined an "adjustment index" as the difference in entrance times between the early (or late) trials and the next trial, divided by the size of the mistake (goal time − early (or late) arrival time). Our analysis was based on a 93 "early" and 87 "late" trials from 7 muscimol sessions and 224 "early" and 97 "late" trials from 30 control experiments. Data from control conditions (before muscimol, before and after saline injections) were pooled together due to the low number of "early" and "late" trials during these sessions.

General statements on data collection and statistics
No statistical methods were used to pre-determine sample sizes but our sample sizes (number of animals and total number of recorded units) are similar to those reported in previous publications 4,7,31,38,39 . Data collection could not be performed blindly however in most of the cases, the experimenter was not present in the experimental room during data collection. Moreover, analysis of the spiking activity and behavior was performed offline and Matlab programs were run in batch mode on all the data, independently of experimental conditions. The statistical tests used do not assumed normality of data distributions (which were always shown). The behavioral and neuronal data along with the Matlab codes used to generate the figures are available upon request to the corresponding author. A supplementary methods checklist is available.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.    . Middle scatter plots show firing rate vs. best correlated task variable (mean ± s.d.). Right panels show partial and multiple correlation coefficients between firing rate and task variables (*P < 0.001, Pearson's partial correlation  Adjustment index (median ± first and third quartiles) in control and muscimol conditions, after error trials (h, entrance time < goal time) and after trials in which animals entered the stop area at least 3 s after goal time (i). * P < 0.05, Kruskall-Wallis and Tukey's HSD test (b, c, e, f and g) and Wilcoxon rank-sum test (h, i).

Rueda-Orozco and Robbe Page 27
Nat Neurosci. Author manuscript; available in PMC 2015 September 01.