How do humans control saccades?

How do humans control saccades?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I've gathered the standard rational for a visual system utilizing saccades from perception textbooks: the neural cost of processing an entire scene at a high level of detail would be prohibitive, but low-fidelity images aren't good enough to function in the world. Thus you have an retina with a high-fidelity center which can sample the scene by saccading around, presumably guided by a combination of information about the low-fidelity surround and the beliefs and goals of the perceiver.

However, I've yet to find a detailed theory that answers the question of how the next saccade target selected.

To clarify, by 'detailed theory', I mean a theory that describes the mechanisms used to accomplish the computational feats described in the standard textbook explanation I mentioned above.

Treisman & Gelade's Feature Integration Theory suggests that we are able to process an entire visual scene in parallel at the level of individual features. For example, in a visual search task, the time required to find a blue circle in a field of red circles is independent of the total number of circles. However, focused attention (typically foveal) is required to integrate independent features into a cohesive object. Thus, if searching for a red circle in a field of blue circles and red squares, search time grows linearly with the number of total objects. This is because the target is made up of two features (circle and red) which need to be integrated in order to be identified-- requiring saccades around the scene.

Several theories of visual search use this distinction to model visual attention shifts. Most notably, Jeremy Wolfe's Guided Search and Itti & Koch's visual attention model. The basic premise behind both models is somewhat similar: low level feature receptors respond automatically and in parallel to the entire visual field. Thus, there are many individual feature maps that represent bottom-up saliency of locations in the visual scene. This bottom-up saliency can be sufficient to trigger a saccade; for instance, a feature map that responds to local motion is beneficial for an organism to identify moving predators. Thus, areas with motion are given high value because they have a history of providing information that is beneficial to an organism.

During task conditions (such as visual search), top-down saliency maps may also be created based on knowledge of what things in the environment have value. If I am searching for my umbrella, I know that it is blue and long and straight, and this information can be encoded in the feature maps that drive saccades.

More generally, saccades are directed at targets that have a high expected value. (It has even been shown that the velocity of saccades is proportional to the expected value of the target: Shadmehr, et al.) This value is determined from a weighted evaluation of both top-down and bottom-up feature maps, available pre-attentively.

The exact location of a saccade is determined through a process called spatial pooling, which attempts to determine the 'center of gravity' of a target, again using low level feature maps. While saccades are amazingly quick and accurate, there is of course some error in final saccadic position which often require smaller saccades to reach the target. It has recently been suggested that this series of saccadic movements emulate Fitts' law with regards to speed-accuracy tradeoffs. A great, thorough review of the current state of saccadic eye movements can be found in Kowler, 2011.

There is obviously quite a bit of detailed information that I haven't covered here-- out of the references cited, I would start with section 3 ("Saccades") of the Kowler article, then move on to the Itti & Koch article for more concrete details on their specific model.

Wolfe, J.M. (1994). Guided Search 2.0. A revised model of visual search. Psychonomic Bulletin & Review, 1, 202-238.

Itti, L., & Koch, C. (2001). Computational modelling of visual attention. Nature Reviews Neuroscience, 2, 1-11.

Shadmehr et al. (2010). Temporal discounting of reward and the cost of time in motor control. Journal of Neuroscience, 30, 10507-10516.

Kowler, E. (2011). Eye movements: The past 25 years. Vision Research, 51, 1457-1483.

** Note I cite Guided Search 2.0 because it is a good exposition of the whole theory, though the theory itself has progressed to 4.0


This paper continues our effort to understand action and perception in terms of variational free-energy minimization (Friston et al., 2006). The minimization of free energy is based on the assumption that biological systems or agents maximize the Bayesian evidence for their model of the world through an active sampling of sensory information. In this context, negative free energy provides a proxy for model evidence that is much easier to evaluate than evidence per se. Under some simplifying assumptions, free-energy reduces to the amount of prediction error. This means that minimizing free-energy corresponds to minimizing prediction errors and can be formulated as predictive coding (Rao and Ballard, 1999 Friston, 2005). Expressed like this, minimizing free-energy sounds perfectly plausible and fits comfortably with Bayesian treatments of perception (Knill and Pouget, 2004 Yuille and Kersten, 2006). However, log model evidence is the complement of self information or surprise in information theory. This means that maximizing evidence corresponds to minimizing surprise in other words, agents should sample their world to preclude surprises. Despite the explanatory power of predictive coding as a metaphor for perceptual inference in the brain, it leads to a rather paradoxical conclusion: if we are trying to minimize surprise, we should avoid sensory stimulation and retire to a dark and quiet room.

This is the dark room problem and is often raised as a natural objection to the principle of free-energy minimization. In Friston et al. (2012), we rehearse the problem and its implications in the form of a three-way conversation between a physicist, a philosopher, and an information theorist. The resolution of the dark room problem is fairly simple: prior beliefs render dark rooms surprising. The existence of these beliefs is assured by natural selection, in the sense that agents that did not find dark rooms surprising would stay there indefinitely, until they die of dehydration or loneliness. However, this answer to the darkroom paradox does not tell us very much about the nature or principles that determine the prior beliefs that are essential for survival. In this paper, we consider prior beliefs more formally using information theory and the free-energy formulation and specify exactly what these prior beliefs are optimizing. In brief, we will see that agents engage actively with their sensorium and must be equipped with prior beliefs that salient features of the world will disclose themselves, or be discovered by active sampling. This leads to a natural explanation for exploratory behavior and visual search strategies, of the sort studied in psychology and psychophysics (Gibson, 1979 Itti and Koch, 2001 Humphreys et al., 2009 Itti and Baldi, 2009 Shires et al., 2010 Shen et al., 2011 Wurtz et al., 2011). Crucially, this behavior is an emergent property of minimizing surprise about sensations and their causes. In brief, this requires an agent to select or sample sensations that are predicted and believe that this sampling will minimize uncertainty about those predictions.

The prior beliefs that emerge from this formulation are sensible from a number of perspectives. We will see that they can be regarded as beliefs that sensory information is acquired to minimize uncertainty about its causes. These sorts of beliefs are commonplace in everyday life and scientific investigation. Perhaps the simplest example is a scientific experiment designed to minimize the uncertainty about some hypothetical mechanism or treatment effect (Daunizeau et al., 2011). In other words, we acquire data we believe will provide evidence for (or against) a hypothesis. In a psychological setting, if we regard perception as hypothesis testing (Gregory, 1980), this translates naturally into an active sampling of sensory data to disclose the hidden objects or causes we believe are generating those data. Neurobiologically, this translates to optimal visual search strategies that optimize the salience of sampling where salience can be defined operationally in terms of minimizing conditional uncertainty about perceptual representations. We will see that prior beliefs about the active sampling of salient features are exactly consistent with the maximization of Bayesian surprise (Itti and Baldi, 2009), optimizing signal detection (Morgan, 2011), the principle of minimum redundancy (Barlow, 1961), and the principle of maximum information transfer (Linsker, 1990 Bialek et al., 2001).

From the point of view of the free-energy principle, a more detailed examination of prior beliefs forces us to consider some important distinctions about hidden states of the world and the controlled nature of perceptual inference. In short, free-energy minimization is applied to both action and perception (Friston, 2010) such that behavior, or more simply movement, tries to minimize prediction errors, and thereby fulfill predictions based upon conditional beliefs about the state of the world. However, the uncertainty associated with those conditional beliefs depends upon the way data are sampled for example, where we direct our gaze or how we palpate a surface. The physical deployment of sensory epithelia is itself a hidden state of the world that has to be inferred. However, these hidden states can be changed by action, which means there is a subset of hidden states over which we have control. These will be referred to as hidden controls states or more simply hidden controls. The prior beliefs considered below pertain to these hidden controls and dictate how we engage actively with the environment to minimize the uncertainty of our perceptual inferences. Crucially, this means that prior beliefs have to be encoded physically (neuronally) leading to the notion of fictive or counterfactual representations in other words, what we would infer about the world, if we sample it in a particularly way. This leads naturally to the internal representation of prior beliefs about fictive sampling and the emergence of things like intention and salience. Furthermore, counterfactual representations take us beyond predictive coding of current sensations and into prospective coding about our sensory behavior in the future. This prospective coding rests on an internal model of control (control states) that may be an important element of generative models that endow agents with a sense of agency. This is because, unlike action, hidden controls are inferred, which requires a probabilistic representation of control. We will try to illustrate these points using visual search and the optimal control of saccadic eye movements (Grossberg et al., 1997 Itti and Baldi, 2009 Srihasam et al., 2009) noting that similar principles should apply to active sampling of any sensory inputs. For example, they should apply to motor control when making inferences about objects causing somatosensory sensations (Gibson, 1979).

This paper comprises four sections. In the first, we focus on theoretical aspects and describe how prior beliefs about hidden control states follow from the basic imperatives of self organization (Ashby, 1947). This section uses a general but rather abstract formulation of agents, in terms of the states they can occupy, that enables us to explain action, perception, and control as corollaries of a single principle. The particular focus here will be on prior beliefs about control and how they can be understood in terms of more familiar constructs such as signal detection theory, the principle of maximum mutual information and specific treatments of visual attention such as Bayesian surprise (Itti and Baldi, 2009). Having established the underlying theory, the second section considers neurobiological implementation in terms of predictive coding and recurrent message passing in the brain. This brief section reprises the implicit neural architecture we have described in many previous publications and extends it to include the encoding of prior beliefs in terms of (place coded) saliency maps. The third and fourth sections provide an illustration of the basic ideas using neuronally plausible simulations of visual search and the control of saccadic eye movements. This illustration allows us to understand Bayes-optimal searches in terms of saliency maps and the saltatory accumulation of evidence during perceptual categorization. We conclude with a brief discussion of the theoretical implications of these ideas and how they could be tested empirically.

Fast saccadic eye-movements in humans suggest that numerosity perception is automatic and direct

Fast saccades are rapid automatic oculomotor responses to salient and ecologically important visual stimuli such as animals and faces. Discriminating the number of friends, foe, or prey may also have an evolutionary advantage. In this study, participants were asked to saccade rapidly towards the more numerous of two arrays. Participants could discriminate numerosities with high accuracy and great speed, as fast as 190 ms. Intermediate numerosities were more likely to elicit fast saccades than very low or very high numerosities. Reaction-times for vocal responses (collected in a separate experiment) were slower, did not depend on numerical range, and correlated only with the slow not the fast saccades, pointing to different systems. The short saccadic reaction-times we observe are surprising given that discrimination using numerosity estimation is thought to require a relatively complex neural circuit, with several relays of information through the parietal and prefrontal cortex. Our results suggest that fast numerosity-driven saccades may be generated on a single feed-forward pass of information recruiting a primitive system that cuts through the cortical hierarchy and rapidly transforms the numerosity information into a saccade command.

1. Introduction

The ability to rapidly estimate the number of enemies or prey, or food sources, can have obvious evolutionary benefits. Many animals, including primates [1], birds [2,3], fish [4], and even insects [5], can discriminate the number of elements in a scene, and many—including honeybees, even have the concept of zero [6,7]. It has been proposed that humans and animals share a ‘number sense’ that enables them to quickly perceive the number of objects in an image [8,9]. This idea opened a vast debate on how numerosity is sensed: directly through dedicated mechanisms [10], or indirectly through the combination of non-numerical properties of the array, such as density and area [11].

Accumulating evidence from both behavioural [12] and neuroimaging studies [13,14] in humans supports the first hypothesis, and further suggests that numerosity is more salient than many non-numerical properties. Six-month-old infants can reliably detect twofold changes in dot number, but need a fourfold change in area for comparable detection [15]. Stroop-like interference paradigms, where adult participants compared ensembles of dots varying along both numerical and non-numerical dimensions, show that numerosity is difficult to ignore during non-numerical judgements, whereas the reverse interference (non-numerical information biasing numerosity) was much weaker [16,17]. Other studies found that primates (both human and non-human) spontaneously orient choices in quantity discrimination tasks based on numerosity. When identifying the odd-one-out of three dot arrays (without instructions on how the stimuli differ), human participants are far more sensitive to numerosity changes than to changes in area or density [18]. Similarly, using a categorization task in which arrays of dots could be labelled as ‘little’ or ‘a lot’, numerate adults and children, innumerate adults, and monkeys all based categorization on the numerical parameters rather than on other non-numerical dimensions [19]. Overall, these studies suggest that not only can numerical information be directly extracted from a visual scene, but it is the dimension to which we are most sensitive, and that most naturally attracts attention.

Many animals, particularly primates, constantly move their eyes to explore the surroundings, to monitor where they are heading and to direct gaze towards objects of interest. Saccadic eye movements can be extremely rapid, especially towards ecologically salient stimuli or possible threats [20,21]. Fischer & Boch [22] first described these fast saccades in monkeys, in response to the sudden appearance of a visual target against a homogeneous background. They found that saccadic onsets were distributed bimodally, with the first peak centred around 75 ms. In humans, similar paradigms triggered fast saccades with latencies of about 100 ms [23]. Interestingly, these fast saccades occur also for more complex stimuli if they are ecologically salient. When simultaneously presented with two images, one containing an animal or a human face, the other landscapes or vehicles, saccades towards the animal occurred within 120–130 ms [21,24], and towards faces within 100–110 ms [20]. It has been proposed that such ultra-rapid saccades might be achieved through hard-wired neural mechanisms developed under evolutionary pressure [20].

In light of the evidence that number is a highly salient visual dimension, we asked whether humans can choose the most numerous array of items with fast saccadic eye movements. We also tested whether the saccade behaviour depends on the numerical range, given the evidence for different mechanisms covering different ranges [12].

2. Material and methods

Fourteen adults (six males, 29 ± 5 yo) with normal or corrected-to-normal vision participated in the saccade experiment 11 of these also completed the vocal-response experiment (five males, 29 ± 5 yo).

Stimuli comprised arrays of dots of 0.35° diameter, half white and half black, on a mid-grey background, constrained within a 14° diameter circle. The number of dots was selected to target the three ranges of numerosity perception: 1–4 dots the subitizing range 12, 17, 24, 35 dots the estimation range 158, 195, 240, and 296 dots the texture-density range (figure 1a). In order to match task difficulty across the estimation and density ranges, each numerosity pair in a trial differed by multiples of the just noticeable difference (JND) (either 2, 4, or 6 JNDs), based on the sensitivity estimates on adult subjects reported by Anobile et al. [25]. Average root-mean squared (RMS) contrast ratios between stimuli pairs were 0.70, 0.75, and 0.84 for the subitizing, estimation, and density ranges. Participants fixated a red 0.35° diameter fixation point while two circles with 15° diameter located at 8° horizontal eccentricity delimited the region within which dot arrays were displayed (figure 1b). After a pseudo-random interval (800–1200 ms) the fixation dot disappeared and only the circles remained onscreen for 200 ms before stimuli were presented (facilitating fast saccadic eye movements [21]). Two arrays of dots were displayed for 200 ms, then immediately replaced by two landing points, which remained onscreen for 800 ms. In the first experiment participants were asked to make a saccadic eye movement towards the most numerous array, as quickly and as accurately as possible. The second experiment was identical, except that participants called out the side containing more dots (‘left’ or ‘right’), as quickly and as accurately as possible. Then a central fixation point appeared, and the program waited for a keypress to start the following trial. Pairs of stimuli designed to target one of the three ranges were randomly chosen from a total of 18 different conditions (six conditions per range, obtained from the combination of two out of four possible numbers), with the larger numerosity randomly left or right. Each side and condition were tested six times.

Figure 1. Stimuli and procedure. (a) Examples of stimuli targeting the subitizing, estimation, and density ranges. (b) Example of the time course of trials for saccadic and vocal reaction times. Participants maintained gaze on a central fixation point that disappeared after a pseudo-random interval (800–1200 ms). After 200 ms, two arrays of dots were briefly displayed. Participants either saccaded towards the most numerous array to one of the two landing points, or called out the side of the screen with the more numerous array.

Participants completed three sessions, for a total of 648 trials, 216 trials for each range. For one subject one saccade session was discarded due to technical problems with the eye movement recording. Participants performed 10 practice trials prior to each experiment. Stimuli were generated under Matlab 9.6 using PsychToolbox routines [26] run by a Macintosh laptop (MacBook Pro, Apple) and presented on an external screen placed at 57 cm from the observer. Eye movements were recorded by an infrared eye tracker (EyeLink 1000), sampling eye position at 1000 Hz. Saccadic reaction-time was measured from stimulus onset. At the beginning of the experiment a standard calibration routine was run. Vocal responses were recorded by the experimenter who pressed the space bar as soon as the participant called out the response. Reaction-time was measured from the stimulus onset to the keypress.

Eye-movement traces were preprocessed to exclude trials where saccades started before stimuli onset, those with saccadic amplitudes shorter than 3 degrees, and those where participants initiated a saccade towards one side but then inverted direction to land on the other. To this aim we estimated the saccadic direction between 50 and 100 ms after the saccadic onset and checked whether this was changed at 200 ms. A total of 7% of trials from both the subitizing and estimation range and 10% of trials from the density range were discarded due to unsteady fixation or corrective saccades. Analysis of saccadic amplitudes across numerical ranges is reported in the electronic supplementary material. For the second experiment we discarded vocal reaction-times faster or slower than 3 standard deviations from the mean reaction-time, calculated separately for each subject and session. Less than 2% of the trials were discarded from each range.

Data were first analysed by merging individual data to form an ‘aggregate participant’. The reaction-time distribution of each numerosity range was binned into 10 ms time bins and plotted to show the proportion of correct and incorrect responses in each bin. The multi-modality of the distributions was verified by applying Hartigan's dip test statistic [27,28] to the reaction-time distribution. For each numerosity range, we estimated the minimum saccadic reaction-time by searching for bins containing significantly more correct than incorrect responses using a binomial test with a criterion of p < 0.05, following the method of Crouzet et al. [20] and Kirchner & Thorpe [21]. The minimum reaction-time was defined by identifying the first of five consecutive bins that reached the criterion set by the binomial test. To further test which of the three ranges had the highest proportion of fastest responses we calculated the cumulative sum of the proportion of trials as a function of reaction-time.

To evaluate the impact of the speed–accuracy trade-off, and to take into account possible differences in task difficulty, we calculated the inverse-efficiency score [29] by dividing the reaction-time by response accuracy for each bin. To test which range elicited the fastest saccades, for each subject we fitted the saccadic reaction-time histograms after merging all ranges with a kernel smoothing function (using the Matlab function ‘histfit’ with kernel option). This fitting procedure revealed two clear peaks in most participants, very similar to the aggregate data. We identified the two highest peaks of the distribution and the minimum between them. The saccadic reaction-time corresponding to this minimum point was chosen to separate fast from slow saccades. Reaction-times, accuracies, inverse-efficiency scores, and proportion of correct fast responses between ranges were entered into a repeated measure ANOVA (with three levels of numerical ranges). Bonferroni corrected post hoc comparisons and corresponding log 10 Bayes Factors are reported. By convention, base 10 logarithm of the Bayes Factor (logBF) > 0.5 is considered substantial evidence in favour of the alternative hypothesis, logBF > 1 strong evidence, and logBF > 2 decisive evidence. logBF < −0.5, −1, or −2 is substantial, strong, or decisive evidence in favour of the null hypothesis.

3. Results

(a) Saccades

Participants saccaded to the more numerous of two briefly presented dot arrays. Figure 2a shows saccadic reaction-time histograms separately for correct and incorrect saccades, for each numerosity range. Following Kirchner & Thorpe [21], we estimated the minimum times required to initiate a correct saccade, adapting their method to the data pooled across participants. Saccades in the estimation range were the fastest, with minimum saccadic reaction-times of 190 ms. On the other hand, the minimum saccadic reaction-times in the subitizing and density ranges were 30–40 ms slower, respectively, 220 and 230 ms.

Figure 2. Saccadic reaction times. (a) Reaction-time histograms of correct (thick lines) and incorrect (thin lines) saccades for the subitizing (red), estimation (blue), and density (green) ranges. Dashed vertical lines refer to the minimum saccadic reaction-time for each range for reliably correct responses. (b) Cumulative sum of the proportion of saccades as a function of saccadic reaction-time. (c) Proportion of saccades as a function of the inverse-efficiency score (defined as the saccadic reaction-times divided by accuracy). (d) Cumulative sum of the proportion of saccades plotted as a function of the inverse-efficiency score. For visualization purposes 20-ms bins were used for the inverse-efficiency score plots.

The distributions of correct saccades followed two distinct peaks, one fast (190–340 ms) and one slower (360 to approx. 600 ms). Bimodality was confirmed by Hartigan's dip test statistic, which was significant in all three ranges (all p < 0.05). On the basis of this division, we separated saccades into fast and slow subsets (greater or less than the value corresponding to the dip between the two peaks) and analysed them separately. The histograms of figure 2 clearly show that subjects were more likely to initiate fast saccades for stimuli in the estimation range than in the other two ranges. The blue curve in figure 2a shows that the highest proportion of correct saccades in the earliest time bins occurred in the estimation range. For a clearer visualization of the results we plotted the cumulative sum of the saccadic reaction-time distributions from the three ranges (figure 2b). The blue curve (estimation range) increased at a faster rate than the other curves, consistent with the higher proportion of fast saccades. To compensate for possible differences in task difficulty, we calculated the inverse-efficiency score by dividing saccadic reaction-times by response accuracy. Even after taking into account the speed–accuracy trade-off, the highest proportion of correct saccades initiated in the earliest time bins occurred when participants were tested with stimuli targeting the estimation range (figure 2c). This was observed also when plotting the cumulative sum of inverse-efficiency scores separately for the three ranges (figure 2d): the curve of the inverse-efficiency scores of the estimation range was much steeper than the other two.

The bimodality of the reaction-time histograms suggests that two different types of saccades occurred. To further study fast saccades, we selected those that were faster than the minima between the two peaks of the saccadic reaction-time distributions of all saccades, separately for each individual participant (see methods). The highest proportion was in the estimation range, reaching 38%, with only 29% and 19% in the subitizing and density ranges, respectively. ANOVA revealed a significant effect of range on the proportion of correct fast saccades (F2,26 = 19.4, p < 0.001). The proportion in the estimation range was significantly higher than both subitizing (t13 = 4.4, p = 0.002, logBF = 1.7) and density (t13 = 5.1, p < 0.001, logBF = 2.2). Subitizing and density ranges also differed (t13 = 3.1, p = 0.02, logBF = 0.8).

We then tested whether saccades were on average faster for the estimation range, independently of whether the fast or slow saccades were selected. We quantified the average saccadic reaction-time for each participant, for each range. As shown in figure 3a, correct saccades were faster for stimuli in the estimation range (abscissa) than in the subitizing and density ranges (ordinate). This was confirmed by the significant effect of range in a repeated measures ANOVA (F2,26 = 20.3, p < 0.001). On average, saccades in the estimation range were performed in 338 ms, significantly faster than subitizing (373 ms t13 = 6.1, p < 0.001, logBF = 2.1) and density (357 ms t13 = 4.9, p < 0.001, logBF = 2.8) ranges. Saccadic reaction-times did not statistically differ between the subitizing and density ranges (t13 = 2.4, p = 0.09, logBF = 0.4). Importantly, these results were not explained by a difference in accuracy: although there was a significant difference in accuracy across ranges (F2,26 = 26.7, p < 0.001), this did not mirror the pattern of saccadic reaction-times (figure 3b). Saccadic accuracy in the density range (79%) was significantly lower than both subitizing (t13 = 5.6, p < 0.001, logBF = 2.5) and estimation (t13 = 5.2, p < 0.001, logBF = 2.3) saccadic accuracy in the estimation range (89%) was lower (although not significantly) than the subitizing range (92%, t13 = 2.5, p = 0.08, logBF = 0.41), inconsistent with the possibility that saccades in the estimation range were faster because the discrimination was easier.

Figure 3. Individual results. Saccadic reaction-time for the correct responses (a, ms), accuracy (b, proportion correct), and inverse-efficiency score (c, ms) measured in the subitizing (red) and density (green) ranges (on the ordinate) plotted against those in the estimation range (on the abscissa). Individual participants are shown in circles, squares show the mean ± standard error of the mean (n = 14).

As a more direct test to evaluate the impact of task difficulty on saccadic reaction-times, we compared inverse-efficiency scores between ranges (figure 3c). Inverse-efficiency in the estimation range was lower (386 ms) than that in the subitizing (410 ms) and density (455 ms) ranges. Repeated measures ANOVA revealed a significant effect of range (F2,26 = 18.6, p < 0.001), with inverse-efficiency for the estimation range significantly differing from both those for subitizing (t13 = 3.01, p = 0.03, logBF = 2.6) and for density (t13 = 5.7, p < 0.001, logBF = 0.7). Inverse-efficiency for density was significantly lower than subitizing (t13 = 3.3, p = 0.02, logBF = 0.9).

Distance effects, typical of magnitude judgements, occurred in all conditions, both when considering all saccades and only the fastest saccades: accuracy increased and reaction times decreased with larger numerical distances (see electronic supplementary material).

Overall, the results from this experiment showed that in general, the estimation range triggered faster saccades independently of accuracy, and that fast correct saccades are more likely to occur in this range than in the subitizing or density ranges.

(b) Vocal responses

We repeated the experiment requiring participants to rapidly respond vocally, rather than move their eyes (figure 1b). With vocal rather than saccadic responses, reaction-time distributions for all ranges were unimodal (figure 4a: Hartigan's dip test statistic, all p > 0.3). Furthermore, the distributions for the three numerosity ranges overlapped, with no clear advantage for the intermediate range (figure 4c,d). The reaction-time differences between ranges was quantified for individual participants. On average, participants gave the correct response in 1095 ms for the subitizing range and in 1102–1104 ms for the estimation and density ranges, not significantly different (F2,20 = 0.44, p = 0.64).

Figure 4. Vocal reaction-times. (a,b) Reaction-time histograms of correct (thick lines) and incorrect (thin lines) vocal responses in the subitizing (red), estimation (blue), and density (green) ranges. The distributions in the three ranges overlap, even when taking into account task difficulty by plotting the results as a function of the inverse-efficiency scores (c) and its cumulative sum (d). For visualization purposes 20-ms and 50-ms bins have been used for reaction-times and inverse-efficiency scores, respectively.

Response accuracies were statistically different between ranges (F2,20 = 9.86, p = 0.001), with the subitizing range significantly more accurate (97%) than the estimation (93%, t10 = 3.8, p = 0.01, logBF = 1.2) or density (91%, t10 = 3.8, p = 0.01, logBF = 1.2) ranges. Importantly, response accuracy did not significantly differ between the estimation and density ranges (t10 = 1.3, p = 0.6, logBF = −0.2), suggesting that task difficulty was successfully matched between these two ranges, at least when evaluated with vocal responses. Inverse-efficiency scores significantly differed between ranges (F2,20 = 8.37, p = 0.002), showing that when taking into account task difficulty, responses in the estimation and density ranges were significantly slower (1202 ms and 1236 ms, respectively) than those in the subitizing range (1127 ms, estimation versus subitizing: t10 = 3.4, p = 0.02, logBF = 0.9 density versus subitizing: t10 = 3.7, p = 0.01, logBF = 1.1). Inverse-efficiency scores did not statistically differ between the estimation and density ranges (t10 = 1.2, p = 0.8, logBF = −0.3). Vocal responses also showed distance effects, with accuracies increasing and reaction-times decreasing with larger numerical distances (see electronic supplementary material).

(c) Relationship between saccades and vocal responses

We examined the relationship between vocal responses and saccades by correlating the inverse-efficiency scores of vocal responses against those of the fast and slow saccades (figure 5). Efficiency scores for vocal responses correlated significantly with those for slow saccades, with the Bayes Factor providing substantial evidence for the correlation (r = 0.66, p = 0.03, logBF = 0.6). However, fast saccades did not correlate with vocal reaction-times, with the Bayes Factor providing substantial evidence for lack of correlation (r = 0.44, p = 0.17, logBF = −0.6). The lack of correlation is further evidence that the fast saccades are driven by different circuitry than the vocal responses.

Figure 5. Correlational analysis. Correlation between the inverse-efficiency score calculated on the fast (grey dots) and slow (black dots) saccades and the inverse-efficiency score calculated on the vocal responses.

4. Discussion

In this study we show that participants can discriminate with saccadic movements the numerosity of briefly flashed dot ensembles as quickly as 190 ms. The bimodality of the saccadic onset distributions strongly resembled that observed in the studies that first described fast saccades in humans and monkeys [22,23]. Saccadic reaction-times were fastest when discriminating intermediate numerosities, with latencies as low as 190 ms, and about 40 ms slower for both very low and very high numerosities. The results could not be explained by a speed–accuracy trade-off, as the fast saccades are as accurate as the slow (which is not always the case [30]). Nor could they be explained by differences in saccadic amplitudes or relative RMS contrast ratios in the different ranges.

Saccadic reaction times vary over a large range, depending on features of the target stimulus (including chromaticity, cone contrast [31,32], size, and shape [33]), task timing (e.g. gap/overlap paradigms), and task (e.g. discrimination versus detection). For example, in a simple detection task, saccadic reaction times can be as slow as 300–320 ms when cone contrast is low [32]. Changing the task from simple detection to discrimination adds about 100 ms per choice alternative [34], obviously more for more difficult than simple tasks [30]. On the other hand, minimum choice saccadic reaction-times towards faces (100–110 ms) [20] and animals (120–130) [21] are lower than the saccade latencies reported here. However, these studies required participants to detect salient stimuli, whereas here, participants discriminated numerosities. Saccades towards a face require detection of face-like characteristics in only one stimulus, while numerosity judgements are by definition relative, requiring processing and comparison of both stimuli. Given the range of saccadic latencies observed for various stimuli and tasks, the 190-ms reaction-times to intermediate numerosities are really quite fast as numerical choices between two alternatives.

Most similar to the current experiment, a previous study [35] has reported fast saccades towards Arabic digits (1–9) with a minimum reaction-time of 230 ms. Beyond the obvious major differences between symbolic and non-symbolic numbers, it is surprising that the saccadic reaction-times measured in this experiment were even faster than those directed towards overlearned (though language mediated) symbolic digits.

Our study shows that numerosity can be accurately processed at very high speeds, suggesting that numerosity discrimination is automatic. Importantly, the fastest reaction times for numerical processing were detected with saccadic eye movements, whereas vocal reactions times showed no tendency for a bimodal distribution or for differences across ranges. While both vocal responses and saccades showed typical distance effects, vocal responses correlated between participants only with slow, but not fast saccades, suggesting that two different systems (one fast and one slow) support numerosity discrimination.

Our results provide further evidence for dissociation between perceptual report and motor action [36]. For example, fast saccades are immune to motion-induced mislocalization of a flash [37] or a bar [38], while slow saccades (greater than 250 and greater than 130 ms, respectively) were fooled by the illusory effect. The amplitude of short-latency saccades (less than or equal to 140 ms) was also only slightly affected by size adaptation, compared to slower saccades [39]. These studies therefore suggest that visuo-motor control may access sensory feed-forward signals before conscious perception is reached through feedback connections.

An interesting aspect of fast saccadic eye movements is that they are not under full voluntary control: even when explicitly asked to saccade towards a neutral image (a vehicle), participants cannot avoid saccading towards the more salient image of a face [20]. This suggests that fast saccades towards salient stimuli tend to be ‘mandatory’, and to rely only marginally on attention. It would be interesting to test whether a change in task instructions also affect saccadic reaction-times and accuracies in the current paradigm. If humans have a natural preference to automatically shift gaze towards the more numerous ensemble, then asking participants to saccade towards the less numerous array may significantly slow down reaction-times or increase errors. Beyond the relative saliency that one numerosity may have over another, the fact that the fastest saccades observed here were more likely to occur in the estimation range lends support to the claim that this system does not tap strongly attentional resources. Attention has a different impact on numerosity perception, depending on the numerosity range. Although subitizing was initially considered a pre-attentive and parallel process [40], several studies have shown that depriving visual attentional resources by double tasks [41–43], inattentional blindness [44], and attentional blink paradigms [45–47] has a detrimental effect on enumeration accuracy and discrimination thresholds for very small numerosities. Likewise, reaction-times and sensory thresholds for discriminating extremely high numerosities in the density range are elevated when participants have to respond to a visual distractor task first [42]. On the contrary, numerical discriminations in the estimation range are less affected by the deprivation of visual attentional resources [41–43]. A recent study has described a patient with an attentional deficit (simultagnosia) who is highly impaired in discriminating very small and very high numerosities, while thresholds for intermediate numerosities are similar to healthy controls, consistent with the notion that numerical comparisons in the estimation range can be performed with minimal reliance on attentional resources [48].

The current experiment reinforces evidence for three separate regimes for number perception, and suggests that mechanisms operating in the estimation range are more direct and automatic. Saccades to targets within the estimation range were overall faster, and a higher proportion of these could be considered ‘fast saccades’. However, it is important to note that although there were more fast saccades in the estimation range, all three numerical ranges had bimodal reaction-time distributions, implicating fast and slow systems in all ranges. Whether this results from ‘leakage’ of the estimation system to the other two ranges, or whether all three ranges have fast and slow processes (in different proportion) is difficult to distinguish from the current experiment.

What can this pattern of result reveal about the underlying neural mechanisms driving fast saccades? One intriguing possibility is that the fastest saccades occur for numerosities in the estimation range because information in this range needs to be pooled over fewer and larger receptive fields compared with the density range. This would be consistent with a recent adaptation study suggesting that receptive fields in the estimation range are larger than those in the subitizing or density ranges [49]. That study suggested that numerosities in the estimation range may be coded by parietal neurons with large receptive field sizes (estimated to cover up to 12 degrees), whereas perception of higher numerosities may arise from low-level feature analysis, most likely carried out by neurons in the early visual areas with smaller receptive field sizes. The faster reaction to numerosities in the estimation range supports this possibility.

Another possibility to explain the short saccadic latencies is that numerical comparisons may be based on feed-forward signals, thought to support the ultra-rapid oculomotor responses by the early visual pathways [50]. Ultra-rapid oculomotor responses are initiated by the superior colliculus [51], a structure that is highly interconnected with the frontal eye field (FEF) and the posterior parietal cortex (PPC), all areas involved in saccadic planning and execution. There is evidence that the systems controlling saccadic eye movements and numerosity perception interact [52–54] and are at least in part controlled by overlapping areas in the parietal cortex [55,56].

Behavioural [29] and electroencephalography (EEG) studies [57,58] have suggested that numerical information may be processed by primitive, relatively direct pathways. For example, there are greater facilitatory effects for monocularly than dichoptically presented stimuli [29], suggesting that numerical processing may start even before the monocular signals are fused, perhaps in the human subcortex. Event-related potential studies also point to the possibility of early and direct encoding of numerical quantities [57,58], with the effect of segregating stimuli into (a number of) perceptual units arising around 150 ms after stimulus onset [59]. This early numerosity signal may originate in V3/V3A—the first of many areas modulated by attention to number [14].

These results suggest that fast saccades towards numerical arrays may be supported by a visual cortical pathway that resolves numerical comparison tasks, either at the level of V3, or through direct connections to parietal and frontal cortices, which then converge in the superior colliculus within the same feed-forward wave. Interestingly, studies in monkeys have identified direct connections between the superior colliculus and V3 as well as between V3 and the caudal part of FEF [60] and to the posterior parietal areas [61], providing a potential physiological substrate. Single-cell recording studies in monkeys [62] estimated a conduction time of 30–35 ms between the retina and V1 and another 20–25 ms for the superior colliculus to elicit a saccade everything that is in between is visual processing. If we consider that the corresponding latencies in humans are probably longer, it is likely that three to four synapses, potentially involving V3, PPC, and FEF, are sufficient to support fast saccades toward ensembles. It would be interesting to test these possibilities directly, taking advantage of the fact that our saccade paradigm can be readily adapted to non-human primates, and other laboratory animals.

In conclusion, we report very fast oculomotor responses towards non-symbolic numerosities in a numerical comparison task, suggesting that numerical information is a highly salient, relevant, and automatically coded visual dimension. The probability of triggering these fast saccades depends on the numerical range: they are more likely to occur when discriminating intermediate numerosities, consistent with observations showing that perceptual responses in that range are more automatic, relying less on attention. By operating on feed-forward signals processed by a very early visual pathway, a phylogenetically ancient system may drive fast saccades for efficient identification of numerosity.


Experimental procedures were approved by the local ethic committee (Comitato Etico Pediatrico Regionale - Azienda Ospedaliero-Universitaria Meyer - Firenze), in accordance with Declaration of Helsinki. Written informed consent was signed by participants prior to the research.


Motor impairment amongst children is a widespread problem. Estimates suggest that 5% of the population have some form of motor disorder that has long-term implications for physical and mental health [1]. Developmental coordination disorder (DCD) is a broad diagnostic construct encompassing heterogeneous presentations. It is a term used to describe children with a core motor deficit in the absence of overt signs of other conditions that might explain the motor difficulties. More specifically, according to the DSM-5 diagnostic criteria, DCD is determined when: a child presents impairment in the acquisition and learning of motor skills in comparison to peer groups (criteria A), these motor deficits significantly and persistently affect activities of daily living and impact academic achievement, leisure and play (criteria B), the onset of motor deficits occur early in development (criteria C), and the deficits cannot be explained by other intellectual disability, visual deficit or other neurological impairment, such as cerebral palsy (criteria D) [2]. In addition, many report the co-occurrence of social and affective problems in DCD, including a lack of concentration, general behavioural problems, poor social competence and poor participation in physical activities [3].

The aetiology of DCD is not well understood and a number of factors may influence the probability of a child meeting the diagnostic criteria (e.g. genetic deficits, birth trauma, etc.) [1]. Nevertheless, there have been numerous attempts to construct causal process-orientated hypotheses to explain the presence of the motor deficits. For example, Wilson and McKenzie [4] identified increased difficulties with ‘visual-spatial processing’ tasks within the DCD population. This review of 50 studies concluded that “perceptual problems, particularly in the visual modality, are associated with difficulties in motor coordination” [4]. The difficulty with such a conclusion is that it rests on the observations of how children have responded (using the motor system) to perceptual stimuli. There is no study to date that has established a perceptual system deficit per se as being a necessary or sufficient feature of DCD. In the absence of evidence for a perceptual deficit, the observation that children show problems in generating responses to perceptual stimuli could be due to the children having motor difficulties rather than a specific perceptual impairment.

More recently, it has been hypothesised that a fundamental deficit in the ability to utilise internal models may underlie the compromised motor control exhibited by children with DCD [5]. Internal models have been extensively used in explaining the control of actions in a number of adaptive behaviours such as reaching, walking, and eye movements [6]. These internal models estimate the sensory consequences of an action, prior to the use of feedback information, and when planning a motor response they can thereby minimise sensory feedback delays [7, 8]. The internally-generated predictions (of sensory consequences) allow for more accurate estimations of the requisite motor signals to be formulated. It is suggested that this internally-generated model is key to the poor motor control observed in DCD [7–9].

The ‘internal model deficit’ hypothesis seems to be difficult to falsify, given that most movement control requires the use of internal models [10–12]. Moreover, reports suggest that children with DCD have similar saccadic eye movement control relative to TD children [13–15]. However, the observations that saccadic eye movements are equivalent in both DCD and TD children tends to be made with regard to simple responses to visually-guided targets [13–15]. Notably, children with DCD do appear to have difficulties in more complex saccadic tasks, such as generating double-step saccades [15], predicting target location [7] and when programming a coordinated (eye and hand) response versus generating a simple eye movement alone [13]. The apparent conflict between these findings (i.e. normal vs abnormal saccade control) can be examined by making direct comparisons between tasks that involve visually-guided reactive responses and actions that require higher order cognitive control, such as planned motor responses. Given the similarities in saccade kinematics between DCD and TD groups in visually-guided tasks, existing deficits in DCD may be associated with the cognitive (attention) control mechanisms of anticipation and inhibition rather than saccadic control per se. This hypothesis is in line with studies that show deficits in DCD during attentional shifts (as saccade latency and inhibition errors) and during volitional control of attention (for review see [5]). Attentional control mechanisms are critical for planning and responding to cued stimuli, and internally generated responses cannot be accurately formulated without this ability. These cognitive control deficits could explain why children with DCD fail at more complex tasks and struggle with the acquisition of new skills.

The present study investigated saccadic eye movements and hand movements in children with DCD and TD controls during cued and non-cued conditions. The cued condition used in this study utilizes both inhibition of a response and anticipation of the next target position (pre-programmed response) and thus may provide a useful indication of the balance achieved between these mechanisms in children with DCD [16]. The experiment was designed in order to: (i) test the hypothesis that children with DCD have difficulties in the cognitive control (inhibition-anticipation) required when planning a motor response (cued conditions), and (ii) determine whether deficits are specific to the coordination of the eye and hand as reported by Wilmut and colleagues [13]. To achieve this, we examined group differences in planning and executing an eye movement alone (EO), hand movement alone (HO) or during the coordination of both actions (EH). Deficits in planned responses due to cognitive (attention) control mechanisms were determined by examining fixation ability, inhibition errors, saccade latency, and the accuracy of the planned response in cued compared to non-cued conditions. In addition, comparing the difference between single versus coordinated responses was undertaken to provide insight into how cognitive control deficits might be manifest within this population.

Eyes as visual sensors

The retina is a sheet of cells at the back of each of our eyes. Some of these cells, called photoreceptors, are sensitive to light. There are two main types: rods are sensitive to light-dark differences and cones are sensitive to color.

These photoreceptors are most densely packed together in a small area at the center of the retina called the fovea. It corresponds to the center of our vision, where resolution is at its highest. Detail progressively decreases for distances further from the center of our visual field – that is, in the periphery (hence “peripheral vision”).


How a new mother’s brain responds her infant’s emotions predicts postpartum depression and anxiety

Could listening to music be slowing you down at work or school?

As we look around our environment, we move our eyes. This enables us to orient the fovea toward what we’re most interested in within the vicinity. These voluntary eye movements are called saccades and are made about three times a second.

Jennifer McDowell

Schaeffer, D. J., Chi, L., Krafft, C. E., Li, Q., Schwarz, N. F., & McDowell, J. E. (2014). Individual differences in working memory moderate the relationship between prosaccade latency and antisaccade error rate. Psychophysiology. Advance online publication. doi: 10.1111/psyp.12380

Pierce, J.E., McCardel, J.B., & McDowell, J.E. (2015). Trial type probability and task switching effects on behavioral response characteristics in a mixed saccade task. Experimental Brain Research, 233(3), 959-69. doi: 10.1007/s00221-014-4170-z

Pierce, J.E., Krafft, C.E., Rodrigue, A.L., Bobilev, A., Lauderdale, J.D., & McDowell, J.E. (2014). Intrinsic functional connectivity networks in individuals with aniridia. Frontiers in Human Neuroscience, 8: 1013. doi: 10.3389/fnhum.2014.01013

Schaeffer, D. J., Krafft, C. E., Schwarz, N. F., Chi, L., Rodrigue, A. L., Pierce, J. E., Allison, J. D., Yanasak, N. E., Liu, T., Davis, C. L., & McDowell, J. E. (2014). The relationship between uncinate fasciculus white matter integrity and verbal memory proficiency in children. NeuroReport, 25, 921-925. doi: 10.1097/WNR.0000000000000204

Schaeffer, D. J., Krafft, C. E., Schwarz, N. F., Chi, L., Rodrigue, A. L., Pierce, J. E., Allison, J. D., Yanasak, N. E., Liu, T., Davis, C. L., & McDowell, J. E. (2014). An 8-month exercise intervention alters uncinate fasciculus white matter integrity in overweight children. Psychophysiology, 51, 728-733. doi: 10.1111/psyp.12227

An Integrative Theory of Prefrontal Cortex Function

Earl K. Miller Jonathan D. Cohen
Vol. 24, 2001


▪ Abstract The prefrontal cortex has long been suspected to play an important role in cognitive control, in the ability to orchestrate thought and action in accordance with internal goals. Its neural basis, however, has remained a mystery. Here, we . Read More

Figure 1: Schematic diagram of some of the extrinsic and intrinsic connections of the prefrontal cortex. The partial convergence of inputs from many brain systems and internal connections of the pref.

Figure 2: Schematic diagram illustrating our suggested role for the PF cortex in cognitive control. Shown are processing units representing cues such as sensory inputs, current motivational state, me.

Figure 3: (A) Shown is the activity of four single prefrontal (PF) neurons when each of two objects, on different trials, instructed either a saccade to the right or a saccade to the left. The lines .

Figure 4: Schematic of the Stroop model. Circles represent processing units, corresponding to a population of neurons assumed to code a given piece of information. Lines represent connections between.

Figure 5: Time course of fMRI activity in dorsolateral prefrontal cortex (DLPFC) and anterior cingulate cortex (ACC) during two phases of a trial in the instructed Stroop task. During the instruction.

Functional anatomy of saccadic adaptation in humans

Positron emission tomography (PET) was used to investigate the neurophysiological substrate of human saccadic adaptation. Subjects made saccadic eye movements toward a visual target that was displaced during the course of the initial saccade, a time when visual perception is suppressed. In one condition, displacement was random from trial to trial, precluding any systematic modification of the initial saccade amplitude. In the second condition, the direction and magnitude of displacement were consistent, causing adaptative modification of the initial saccade amplitude. PET difference images reflecting metabolic changes attributable to the process of saccadic adaptation showed selective activation of the medioposterior cerebellar cortex. This localization is consistent with neurophysiological findings in monkeys and brain-lesioned humans.

Development of Gaze Control in Early Infancy

Gaze control involves eyes, head, and body movements and is guided by mainly three types of information: visual, vestibular, and proprioceptive. Appropriate gaze control is a basis for actions such as reaching, grasping, eating, and manipulation, all of which develop during the first year of life. The development of gaze control is about how young infants gain access to these different kinds of information, how they come to use them, and how they come to coordinate head and eyes to accomplish it. This control develops during the first few weeks of life. A major challenge for the gaze controlling system is how gaze is stabilized on a moving target to keep vision clear, including during self-motion or the compensation of other sudden movements. Furthermore, the tracking has to be timed relative to the object motion. This requires prediction, which is a part of smooth pursuit that emerges at around six weeks and is in full function at three months. The smooth eye and head movements must add up in time and space to the object motion. Then the vestibular and visual neural signals must be properly added. Catch-up saccades compensate when the smooth pursuit is insufficient. In other situations, saccades shift the gaze between objects or situations. Moreover, if a moving object temporarily disappears out of view, one or several saccades predictively recapture the object at the reappearance position (four months). The complex and fast development of gaze has inspired the design of robotic vision (iCub) through processes similar to human development, thus increasing the robot’s flexibility and learning abilities



You do not currently have access to this article


Please login to access the full content.


Access to the full content requires a subscription

Printed from Oxford Research Encyclopedias, Psychology. Under the terms of the licence agreement, an individual user may print out a single article for personal use (for details see Privacy Policy and Legal Notice).

Attentional mechanisms of saccadic eye movements in schizophrenia

Saccadic latency of schizophrenics (N = 15) and normal controls (N = 11) was measured to the left and right visual fields with three fixation conditions that differentially affect saccade latency. Fixation was offset either 1) prior to the target (gap condition), 2) simultaneous with the target onset (control condition), or 3) after target onset (overlap condition). Saccade latencies are typically reduced in the gap condition, which is attributed to the fixation offset acting to facilitate attentional disengagement or as a preparatory warning signal. Repeated measures Analysis of Variance (ANOVA) revealed that whereas the saccadic latencies of schizophrenics and normal controls do not differ for right visual field targets, the schizophrenics' latencies were prolonged to left visual field targets. This difference was most pronounced in the overlap condition, where normal controls produced faster saccades to the left visual field targets, whereas schizophrenics showed the opposite asymmetry. Because the overlap condition provides no early warning of the upcoming target, the lateralized finding suggests a deficit in the right hemisphere mechanisms responsible for sustained attention.

Watch the video: Saccades- control centers (June 2022).


  1. Neale

    Between us speaking, I recommend to look for the answer to your question in

  2. Aescwyn

    In my opinion, mistakes are made. I am able to prove it. Write to me in PM, discuss it.

  3. Takus

    I still heard nothing about this

  4. Zafir

    It agree, the useful message

  5. Kazrazilkree

    Thanks for the explanation, I also think that the simpler the better ...

Write a message