How to measure attention?

There are a lot of ways to measure attention. Some, mainly in psychology, are more qualitative and use questionnaires and their interpretation. Some are quantitative but they focus on the participants feedback (button press, click, etc…) when they see/hear/sense a stimulus.

Here we focus on quantitative techniques which provide fine-grain information about the attentive responses. The attentive response can be either measured directly in the brain, or indirectly through the participants’ eye behavior. Only one of the techniques which are described here is based on participant active feedback: mouse tracking. This is because the mouse tracking feedback is very close to the one of the eye-tracking and this is an emerging approach of interest for the future: it needs less time, less money and provide more data than classical eye-tracking.

Eye-tracking: an indirect cue about covert attention

The use of an eye-tracker is probably the most widely used tool for attention measurement. The idea is to use a device which is able to precisely measure the eyes gaze which obviously only provide information concerning covert attention.

The eye-tracking technology highly evolved during time. Different technologies are described in [1]. One of the first techniques is the EOG (Electro-OculoGraphy). The idea is to measure the skin electric potential around the eye which give the eye direction relative to the head. This issue implies that for a complete eye-tracking system the head must either be attached to a still system or a head tracker system must be used in addition to the EOG. In order to get more precise results, special lenses can be used instead of EOG, but in this case the technique is more invasive and it also only provides the eye direction relative to the head and not the eye gaze as an intersection with a screen for example.

The technique that most of the current commercial and research solutions use is based on the video detection of pupil/corneal reflection. Indeed, an infra-red source sends the light towards the eyes. The light is reflected by the eye and the position of the reflection is used to compute the gaze direction.

While the technique is most of the time the same, the embodiment of the eye-tracker can be very different. The main eye-tracking manufacturers propose the system under different forms [2][3][4].

Some eye-trackers are directly included into the screen which is used to present the data. This setup has the advantage of a very short calibration, but it can only be used with its own screen.
Separate cameras need some additional calibration time but the tests can be done on any screen and even in a real scene by using a scene camera.
The eye-tracking glasses can be used in a very ecological setup, even outside on real-life scenes. An issue of those systems is that it is not easy to aggregate the data from several viewers as the scene which is viewed is not the same. The aggregation needs a non-trivial registration of the scenes.
Cheap devices begin to appear and quite precise cameras are sold less than 100 EUR [5] which is a fraction of the price of a professional eye-tracker. An issue with those eye-trackers is that they are sold with minimal software and it is often difficult to synchronize the stimuli and the related recorded data. Those eye-trackers are mostly used as real-time human-machine interaction devices. Nevertheless, open source projects exist which allow to record data from low cost eye-trackers like Ogama [6].
Finally, webcam-based software is freely available [7]. They are able to provide good quality data and to be used remotely with existing webcams [8].

Mouse-tracking: the low-cost eye-tracking

If eye tracking is the most reliable ground truth in the study of covert visual attention, it needs a good practice for the operator, it has some mandatory constraints for the user (the head might be attached, the calibration process may be long), and it needs a complex system which has a certain cost.

A much simpler way to acquire data about visual attention may be the use of mouse tracking. The mouse can be precisely followed while an Internet browser is opened by using a client-side language like JavaScript. The mouse precise position on the screen can be either captured using a home-made code or some existing libraries like [9][10]. This technique may appear as not very reliable; however, all depends on the context of the experiment.

The first case is the one where the experiment is hidden to the participant: the participant is unaware about the fact that the mouse motion is recorded. In this case the mouse motion is not accurate enough. Indeed there is no automatic following of the eye gaze by the hand even if a tendency of the hand (and consequently the mouse) to follow the gaze is visible. Sometimes the mouse is only used to scroll a page and the eyes are very far from the mouse pointer for example.
The second case is the one where the participant is aware about the experiment and he has a task to follow. This can go from a simple “point the mouse where you look” instruction as in [11] with the first use of mouse tracking for saliency evaluation to more recent approaches as the one of SALICON in [12] where multi-resolution interactive pointing mimicking the fovea resolution is used to push people to point the mouse curser where they look.

In this second case where the participant is aware about his mouse motion tracking, the results of mouse tracking are very close to eye-tracking as shown by Egner and Scheier on their website [13]. However, some unconscious eye movements may be missed, but is this really an issue?

The main advantages of mouse tracking are low price and the complete transparency for the users (they just move a mouse pointer).

However, mouse tracking has several drawbacks:

The first place where the mouse pointer is located is quite important as the observer may look for the pointer. Should it be located outside the image or in the centre of the image? Ideally, the pointer should initially appear randomly in the image to avoid introducing a bias of the initial position of the pointer.
Mouse tracking only highlights areas which are consciously important for the observer. This is more a theoretical drawback as in practice, one should try to predict the conscious interesting regions.
The pointer hides the image region it overlaps, thus the pointer position is never on the important areas but very close to them. This drawback may be partially eliminated by the low-pass filter step performed after the mean of the whole observer set. It is also possible to make a transparent pointer as in [12].

EEG: Get the electric activity from the brain

The EEG technique (ElectroEncephaloGraphy) uses electrodes which are located on the participant scalp. Those electrodes amplify the electrical waves coming from the brain. An issue of this technique is that the skull and scalp attenuates those electrical waves.

While classical research setups have a high number of electrodes with manufacturers like [14][15], some low-cost commercial systems like Emotiv [16] are more compact and easy to install and calibrate. While the latter are easier to use, they are obviously less precise.

EEG studies provided interesting results as the modulation of the gamma band [17] during selective visual attention. Other papers [18] also provide cues about the alpha band modification during attentional shifts.

One very important cue about attention which can be measured using EEG is the P300 event-related potential.

The work of Näätänen et al. [19] in 1978 on the auditory attention provided evidences that the evoked potential has an improved negative response when the subject was presented with rare stimuli compared to frequent ones. This negative component is called the mismatch negativity (MMN), and it was observed in several experiments. The MMN occurs 100 to 200 ms after the stimuli, a time which is perfectly in the range of the pre-attentive attention phase.

Depending on the experiments, different auditory features were isolated: audio frequency [20], audio intensity [19][21][22], spatial origin [23], duration [24] and phonetic changes [25]. All these features were not salient alone, but saliency was induced by the rarity of each one of these features.

The study of the MMN signal for visual attention was led several times in conjunction with audio attention [26][27][28]. But a few experiments were made using only the visual stimuli. Crottaz-Herbette led in her thesis [29] an experiment in the same conditions as for auditory MMN in order to find out if a visual MMN really exists. The result was clearly positive with a high increase of the negativity of the evoked potential when seeing rare stimuli compared with the evoked potential when seeing frequent stimuli. The visual MMN occurs from 120 to 200 ms seconds after the stimulus. The 200 ms frontier strangely matches with the 200 ms needed to initiate a first eye movement, thus to initiate the “attentive” serial attentional mechanism. As for the audio MMN detection, no specific task was asked of the subject who only had to see the stimuli, this MMN component is thus pre-attentive unconscious and automatic.

This study and others [30] also suggest the presence of a MMN response for the somesthesic modality (touch, taste, etc…)

The MMN seems to be a universal component illustrating the brain reaction to an unconscious pre-attentive process. Any unknown stimulus (novel, rare) will be very salient as measured by P300 as the brain will try to know more about it. Rarity is the major engine of the attentional mechanism for visual, auditory and all the other signals acquired from our environment.

Functional imaging: fMRI

The MRI stands for Magnetic Resonance Imaging. The main idea behind this kind of imaging system is that human body is mainly made of water which is itself composed of hydrogen atoms composed of a single proton. Those protons have a magnetic moment (spin) which is randomly oriented most of the time. The MRI device will set up a very high magnetic field which will have as consequence to align the magnetic moment of the protons of the patient body. Radio Frequency (RF) impulsions orthogonal to the initial magnetic field push the protons to align to this new impulsion and they will align back to the initial magnetic field while releasing RF waves. Those waves are captured and they help in constructing an image where clear gray levels mean that there are more protons, therefore, more water in the body parts (like in fat for example) and a darker gray level reveal regions with less water like bones for example.

MRI is initially an anatomical imaging technique, but there is a functional version called fMRI using the BOLD approach. In this case a substance which has magnetic properties is injected into the blood. If a body part or a region of the brain is in its basal activity state, then the substance keeps its initial composition. If the blood pressure is higher with more oxygen (activated state), then the substance composition will change and the magnetic response to MRI will be much higher. In that way, when a region in the brain, for example, is activated, then the blood will have an increased flow and the activated state will push to a high response. fMRI imaging is thus capable of detecting the areas in the brain which are active and to become a great tool for neuroscientists which can visualize which area in the brain is activated during an attention-related patient exercise.

Functional imaging: MEG

MEG stands for MagnetoEncephaloGraphy. The idea is simple: while the EEG detects the electrical field which is heavily distorted when traversing the skull and skin, MEG detects the magnetic field induced by this electrical activity. The magnetic field has the advantage not to be influenced by the skin or the skull. While the idea is simple, in practice the magnetic field is very low which makes it very difficult to measure. This is why the MEG imaging is relatively new: the technological advances let the MEG be more effective based on SQUID (Superconducting Quantum Interference Devices). The magnetic field of the brain can induce electricity in a superconducting device which can be precisely measured. Modern devices have spatial resolutions of 2 millimetres and temporal resolutions of some milliseconds. Moreover, MEG images can be superimposed on MRI anatomic images which help in rapidly localise the main active areas. Finally, participants to MEG imaging can have a sit position which is more natural during exercises than the horizontal position of fMRI or PET scan.

Functional imaging: PET Scan

As for fMRI, PET scan (Positron Electron Tomography) is also a functional imaging and it can thus produce also a higher signal in case of brain activity. The main idea of PET scan is that the substance which is injected to the patient releases positrons (anti-electrons which are particles of the same properties as an electron but with positive charges). Those positrons will almost instantaneously meet an electron and have a very exo-energetic reaction (called annihilation). This annihilation will transform the whole mass of the two particles into energy and release to gamma photons in two opposite directions which will be detected by the scanner sensors. The substance which is injected will go and fixate on the areas of the brain which are the most active, which means that those areas will exhibit a high number of annihilations. As for fMRI, the PET scan let the neuroscientists know which areas of the brain are activated when the patient is performing an attention task.

Functional imaging and attention

Positron emission tomography (PET) and functional magnetic resonance imaging (fMRI) have been extensively used to explore the functional neuroanatomy of cognitive functions. MEG imaging becomes to be used in the field as in [31]. In [32] a review of 275 PET and fMRI studies of attention type, perception, visual attention, memory, language, etc. are described. Depending of the setup and task a large variety of brain regions seem to be involved in attention and related functions (language, memory). This findings support again the idea that at the brain level, there are several attentions and their activity is largely distributed across almost all the brain. Attention goes from low-level to high level processing, from reflexes to memory and emotions and across all the human senses.

References:

[1] Duchowski, Andrew. Eye tracking methodology: Theory and practice. Vol. 373. Springer Science & Business Media, 2007.

[2] Tobii eye tracking technology, http://www.tobii.com/

[3] SMI eye tracking technology, http://www.smivision.com

[4] SR-Research eye tracking technology, http://www.sr-research.com/

[5] Eyetribe low cost eye-trackers, https://theeyetribe.com/

[6] Open source recording from several eye trackers, http://www.ogama.net/

[7] Open source eye-tracking for webcams, http://sourceforge.net/projects/haytham/

[8] Xu, Pingmei, et al. “TurkerGaze: Crowdsourcing Saliency with Webcam based Eye Tracking.” arXiv preprint arXiv:1504.06755 (2015).

[9] Heatmapjs, javascript API, http://www.patrick-wied.at/static/heatmapjs/

[10] Simple Mouse Tracker, http://smt.speedzinemedia.com/

[11] Mancas, Matei. “Relative influence of bottom-up and top-down attention.” Attention in cognitive systems. Springer Berlin Heidelberg, 2009. 212-226.

[12] Jiang, Ming, et al. “SALICON: Saliency in Context.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

[13] Mediaanlyzer web site: http://www.mediaanalyzer.net

[14] Cadwell EEG, http://www.cadwell.com/

[15] Natus EEG, www.natus.com

[16] Emotiv EEG, https://emotiv.com/

[17] Müller, Matthias M., Thomas Gruber, and Andreas Keil. “Modulation of induced gamma band activity in the human EEG by attention and visual information processing.” International Journal of Psychophysiology 38.3 (2000): 283-299.

[18] Sauseng, Paul, et al. “A shift of visual spatial attention is selectively associated with human EEG alpha activity.” European Journal of Neuroscience 22.11 (2005): 2917-2926.

[19] Näätänen, R., Gaillard, A.W.K., and Mäntysalo, S., “Early selective-attention effect on evoked potential reinterpreted”, Acta Psychologica, 42, 313-329, 1978

[20] Sams, H., Paavilainen, P., Alho, K., and Näätänen, R., “Auditory frequency discrimination and event-related potentials”, Electroencephalography and Clinical Neurophysiology, 62, 437-448, 1985

[21] Näätänen, R., and Picton, T., “The N1 wave of the human electric and magnetic response to sound: a review and analysis of the component structure”, Psychophysiology, 24, 375-425, 1987

[22] Paavilainen, P., Alho, K., Reinikainen, K., Sams, M., and Näätänen, R., “Right hemisphere dominance of different mismatch negativities”, Electroencephalography and Clinical Neurophysiology, 78, 466-479, 1991

[23] Paavilainen, P., Karlsson, M.L., Reinikainen, K., and Näätänen, R., “Mismatch Negativity to change in spatial location of an auditory stimulus”, Electroencephalography and Clinical Neurophysiology, 73, 129-141, 1989

[24] Paavilainen, P., Jiang, D., Lavikainen, J., and Näätänen, R., “Stimulus duration and the sensory memory trace: An event-related potential study”, Biological Psychology, 35 (2), 139-152, 1993

[25] Aaltonen, O., Niemi, P., Nyrke, T., and Tuhkahnen, J.M., “Event-related brain potentials and the perception of a phonetic continuum”, Biological psychology, 24, 197-207, 1987

[26] Neville, H.J., and Lawson, D., “Attention to central and peripheral visual space in a movement detection task: an event-related potential and behavioral study. I. Normal hearing adults”, Brain Research, 405, 253-267, 1987

[27] Czigler, I., and Csibra, G., “Event-related potentials in a visual discrimination task: Negative waves related to detection and attention”, Psychophysiology, 27 (6), 669-676, 1990

[28] Alho, K., Woods, D.L., Alagazi, A., and Näätänen, R., “Intermodal selective attention. II. Effects of attentional load on processing of auditory and visual stimuli in central space”, Electroencephalography and Clinical Neurophysiology, 82, 356-368, 1992

[29] Crottaz-Herbette, S., “Attention spatiale auditive et visuelle chez des patients héminégligents et des sujets normaux : étude clinique, comportementale et électrophysiologique“, PhD Thesis, University of Geneva, Switzerland, 2001

[30] Desmedt, J.E., and Tomberg, C., “Mapping early somatosensory evoked potentials in selective attention: Critical evaluation of control conditions used for titrating by difference the cognitive P30, P40, P100 and N140”, Electroencephalography and Clinical Neurophysiology, 74, 321-346, 1989

[31] Downing, Paul, Jia Liu, and Nancy Kanwisher. “Testing cognitive models of visual attention with fMRI and MEG.” Neuropsychologia 39.12 (2001): 1329-1342.

[32] Cabeza, Roberto, and Lars Nyberg. “Imaging cognition II: An empirical review of 275 PET and fMRI studies.” Journal of cognitive neuroscience 12.1 (2000): 1-47.