How to study Human computer interaction

how important is human computer interaction and human-computer interaction fundamentals and practice and human computer interaction applications
Dr.NaveenBansal Profile Pic
Published Date:25-10-2017
Your Website URL(Optional)
12 Eye Gaze Tracking as Input in Human–Computer Interaction Imago animi vultus est, indices oculi. (The counte- nance is the portrait of the soul, and the eyes mark its intentions.) —Marcus Tullius Cicero 12.1 Principle of Operation In the context of HCI applications, Eye Gaze Tracking (EGT) systems are primarily used to determine, in real time, the point-of-gaze (POG) of the subject’s eye on the computer screen. The Point-Of-Gaze (POG) is commonly derived through geometric considerations driven by knowledge of the direction of the optical axis of the eye (OAE), i.e., the 3-dimensional vector indicating the direction in which the user is aiming his/her gaze. The determination of the OAE is estimated from real-time analysis of the image of the eye captured by an infrared video camera. The analysis is based on the location of two critical landmarks in the image: the center of the pupil and the center of the reflection of infrared light on the cornea of the eye (also known as the “glint”). The relative position of these two landmarks, with respect to each other, within the image of the eye captured by the infrared camera is a function of the OAE and, combined with geometrical knowledge obtained in a calibration session, prior to EGT use, can reveal the POG of the subject. Figure 12.1 illustrates the process for POG determination as a block diagram. The corneal reflection (CR) appears as a compact cluster of particularly high intensity (closer to white) pixels in the gray scale images captured by the infrared 293294 12. Eye Gaze Tracking as Input in Human–Computer Interaction Figure 12.1: Process followed for determination of the point-of-gaze (POG). video camera. Therefore, it should be possible to isolate this cluster by setting an appropriately high threshold. Once the cluster is isolated, the approximate centroid can be estimated and used as the CR landmark. The determination of the pupil and its centroid may require a more involved approach. Ordinarily, the pupil will appear as a dark circle (or ellipse due to the angle in which its image is captured) because it is, in fact, an opening into the eye, which does not have internal illumination. Under these circumstances, the pupil will appear as a circle or ellipse of particularly low intensity (closer to black) in the image captured by the camera. This so-called “dark pupil” could then be isolated by setting a low threshold. There are instances, however, in which the retina, in the back section of the inside of the eyeball, reflects light (particularly infrared) and will actually appear as a circle of high intensity, although typically not as high as the glint. This is the so-called “bright pupil,” which would require a high threshold for its isolation. This is, in fact the reason for the “red eye” phenomenon that may appear in standard flash photography, when the powerful light of the flash reaches the fundus of the eye of the subject and its red components are reflected by the retina in such a way that the reflection travels outwardly through the pupil and reaches the camera. It should be noted, however, that this effect requires the source of light and the camera to be nearly collinear, which is achieved by placing the light source (e.g., the flash) very close to the image capturing device (i.e., camera), and both of them as close as possible to an axis that is perpendicular to the pupil and crosses it through its center, i.e., the optical axis of the eye (which defines the direction of gaze). Therefore, the bright pupil appears when infrared illumination is applied in that same axis (“on-axis illumination”). Figure 12.2 Joshi and Barreto 07 shows an image where the bright pupil and the corneal reflection are apparent. Ebisawa and Satoh devised an efficient approach to facilitate the isolation of the pupil in the images of the eye recorded by an infrared camera Ebisawa and Satoh 93, Ebisawa 98. The method consists of acquiring pairs of eye images comprising a “bright pupil” image and a “dark pupil” image, and calculating the12.1. Principle of Operation 295 Figure 12.2: Example of eye image showing the bright pupil and the corneal reflection or “glint” Joshi and Barreto 07. (a) Bright Pupil Image (b) Dark Pupil Image (c) Difference Image Figure 12.3: Bright pupil and dark pupil images. Also shown is the corresponding difference image Joshi and Barreto 07. “difference image” (bright minus dark), in which the pupil area will stand out from the rest of the frame more than in either one of the individual (bright or dark) images. Figure 12.3 shows examples of a “bright pupil” image and a “dark pupil” image Joshi and Barreto 08. It also shows the resulting “difference image,” where the pupil stands out clearly from the very dark background determined by all the pixels in the frame whose appearance is unchanged by the slight displacement of the infrared source used to capture the bright pupil (collinear with the camera axis) and the dark pupil (IR source displaced from the camera axis). The requirement of providing infrared illumination along the camera axis (“collinear”) in order to capture the bright pupil image properly is sometimes ful-296 12. Eye Gaze Tracking as Input in Human–Computer Interaction Figure 12.4: Diagram showing the use of a beam-splitter to achieve collinear IR illumination and recording Joshi and Barreto 08. filled by setting up a distribution of IR LEDs around the camera, for remote (desk) EGT systems. For head-worn systems, however, the dimensions are smaller and it may be necessary to use an infrared beam-splitter that will allow the passage of the IR illumination to the eye and reflect the image of the eye to a camera mounted perpendicularly, in a collinear manner. This arrangement is diagrammed in Figure 12.4 and an example of its practical implementation Joshi and Barreto 08 is shown in Figure 12.5. In this case, the “ON-AXIS” IR source (for bright pupil) is placed right in front of the user’s eye, while the “OFF-AXIS” IR source (for dark pupil) is displaced about 1.5 cm. toward the subject’s nose. Once the centers of the pupil and the corneal reflection have been located in the IR camera image, the direction of gaze and, therefore, the point of gaze in a plane such as the computer screen in front of the user, can be determined as the intersection of the optical axis of the eye (in the direction of gaze) and the plane in question Ware and Mikaelian 87, Hutchinson et al. 89, Jacob 91, Jacob 93, Lankford 00, Sibert and Jacob 00, Sibert et al. 01. Evidently, calculation of this intersection requires knowledge of the location of the plane with respect to the eye, which is normally obtained through a calibration process executed prior to the use of the EGT system. The calibration process relies on instructing the user to direct his/her gaze to pre-established points on the screen, on locations that are known to the calibration program. Therefore, the computer will have those (usually 5 or 9) pairs of calibration target screen coordinates and the associated directions of gaze recorded while the subject was known to be looking at each target. Guestrin and Eizenman have written a comprehensive paper in which they describe the general theory involved in the remote gaze estimation from knowledge of the pupil center and corneal reflection(s) Guestrin and Eizenman 06. In their paper they address12.1. Principle of Operation 297 Figure 12.5: Practical implementation of the arrangement diagrammed in Figure 12.4 Joshi and Barreto 08. the determination of the POG for several cases, starting with the simplest scenario involving one camera and a single illuminator (light source), but also covering the cases with multiple light sources and/or multiple cameras. They indicate that “Using one camera and one light source, the POG can be estimated only if the head is completely stationary. Using one camera and multiple light sources, the POG can be estimated with free head movements, following the completion of a multiple-point calibration procedure. When multiple cameras and multiple light sources are used, the POG can be estimated following a simple one-point calibration procedure” Guestrin and Eizenman 06. This observation highlights the interplay of calibration points and simultaneous images available during the EGT system operation, which determines the degrees of freedom involved in the determination of the POG. This is one of the reasons why contemporary EGT systems have escalated in complexity, commonly involving multiple cameras. In some instances the multiplicity of cameras serves the purpose of providing simultaneous data from both eyes of the subject (binocular EGTs), and in other cases they contribute additional information for purposes such as head tracking, which expands the robustness of the EGT operation with respect to translations and rotations of the head of the subject.298 12. Eye Gaze Tracking as Input in Human–Computer Interaction Side Note 12.1: Early EEG-Based “Eye Tracking” Approach While most contemporary EGT systems used in human–computer interac- tion applications rely on the non-invasive and unobtrusive use of infrared video cameras and illuminators, early attempts to use the direction of gaze of a user to interact with a computer explored other interesting avenues. In particular, Erich E. Sutter proposed Sutter 84, p. 92 an ingenious sys- tem that flickered different regions of the screen at different frequencies. Therefore, depending on the specific region of the screen on which the user is fixating his / her gaze, the EEG recorded from the occipital area of the user’s scalp would have a predominant visual evoked potential (VEP) component at a frequency matching that of the flicker in the observed region. This revealing frequency can be determined, in real time, by means of discrete frequency analysis of the EEG signal, yielding an identification of the region where the user is placing his/her gaze, i.e., performing a low-resolution form of “eye gaze tracking” within the computer screen. 12.2 Post-Processing of POG Data: Fixation Identification During the normal examination of a visual scene, the eye exhibits two types of movements: saccades and fixations. A saccade is a rapid, ballistic motion that moves the eye from one area of focus of the visual scene to another. Each saccade can take 30 to 120 ms, traversing a range of 1°to 40°visual angle Sibert and Jacob 00, Sibert et al. 01. During a saccade vision is suppressed. Upon completion of a saccade the direction of gaze of the eye experiences a period of relative stability known as a fixation. A fixation allows the eye to focus light on the fovea, which is a portion at the center of the retina that contains a higher density of photoreceptors and is, therefore, capable of higher-resolution image sensing. The typical duration of a fixation is between 200 to 600 ms Jacob 91, Sibert and Jacob 00, Sibert et al. 01. However, even during a fixation, the eyes still perform small, jittery motions, usually less that 1°in size. These movements are necessary so as to prevent the loss of vision that could result from constant stimulation of the retinal receptors Martinez-Conde et al. 04. The short movements that take place during a fixation, called “fixational eye movements,” may be of different types Martinez-Conde et al. 13. Carpenter Carpenter 88 identified 3 types: “tremors” (low amplitude, e.g., 0.004°and last a few milliseconds Yarbus 67), “slow drifts” (amplitude from 0.03°to 0.08°and last up to a few seconds Ratliff and Riggs 50) and “microsaccades” (spike-like features with amplitudes of up to 0.2 °and a duration from 10 to 30 ms Engbert 06).12.2. Post-Processing of POG Data: Fixation Identification 299 One of the immediate uses of eye gaze tracking technology as an input method- ology for human-computer interaction is, of course, the control of the screen cursor with the eye gaze. For this particular purpose, it is necessary to separate the gaze fixations from the transient saccades. Since, as pointed out above, the gaze exhibits several sharp and short movements, even within a fixation, such fixation identi- fication process requires detailed consideration. Salvucci and Goldberg studied a number of popular approaches to the fixation identification task and proposed a taxonomy for these methods, in terms of how they utilize spatial and temporal information Salvucci and Goldberg 00. Their taxonomy differentiated algorithms that used EGT spatial information as: velocity-based, dispersion-based, and area- based. Similarly, they distinguished between two types of algorithms, according to their use of temporal characteristics of the EGT traces: duration-sensitive and locally adaptive. In their article, the authors review and evaluate 5 representative algorithms: the velocity-threshold identification (I-VT), the hidden Markov model identification (I-HMM), the dispersion-threshold identification (I-DT), the mini- mum spanning tree identification (I-MST), and the area of interest identification (I-AOI). These 5 representative algorithms were analyzed and compared in terms of their interpretation speed, accuracy, robustness, ease of implementation, and parameter setting. Overall, these authors found that both I-HMM and I-DT provide accurate and robust fixation identification. I-MST also provided robust results, but was found to be the slowest method from all those tested. I-VT was the simplest method to implement but its performance may produce multiple indications for a single fixation (“blips”), particularly when the gaze velocity hovers near the threshold used in this method. Finally, I-AOI was found to exhibit the poorest performance among the methods evaluated and the authors discouraged its use. Chin developed a practical, real-time, fixation identification algorithm, (for the purpose of screen cursor control), that analyzed POG data reported by a desktop, monocular, EGT system, 60 times per second Chin 06. This fixation identifica- tion algorithm utilized temporal and spatial criteria to determine whether or not a fixation had occurred. More specifically, the algorithm extracted a 100 ms moving window (temporal threshold) of consecutive POG data points (POGx, POGy), and calculated the standard deviation of the x- and y-coordinates of these points. If both standard deviations were less than the coordinate thresholds associated with 0.5°of visual angle (spatial threshold), then it was determined that the onset of a fixation had occurred, and the points used to represent the fixation were the coordinates of the centroid of the POG samples received during the 100 ms window analyzed, (Fx, Fy). If it was determined that a fixation had not occurred, then the window was advanced by one data point and fixation identification was performed again. This algorithm is further illustrated by the flowchart shown in Figure 12.6.300 12. Eye Gaze Tracking as Input in Human–Computer Interaction Figure 12.6: EGT fixation identification algorithm flowchart Chin 06. 12.3 Emerging Uses of EGT in HCI: Affective Sensing Clearly, one of the key benefits derived from the operation of an EGT system is the knowledge, in real time, of the point-of-gaze of the eye(s) of the subject. In the con- text of human–computer interaction this POG movement knowledge has primarily been exploited to provide an alternative mechanism of control of the screen cursor12.3. Emerging Uses of EGT in HCI: Affective Sensing 301 in a graphical interface. GUIs have become almost universally prevalent as the WIMP paradigm has replaced text-based interfaces in almost all computer systems. This has, without a doubt, simplified the access to computer applications, to the point at which young children can use many of the applications available in tablets and smartphones. Unfortunately, this has simultaneously presented new obstacles for the computer interaction of individuals with different abilities. In particular, individuals with motor disabilities may not have the capability to move and activate (“click”) the screen cursor in a GUI. Therefore, the pointing capabilities obtained with EGT systems have been used as a helpful substitute for the motor abilities normally exercised in moving the screen cursor. In using an EGT system for cursor control the user provides voluntary and overt information to the computer system, just as it is provided with the use of a mouse. However, embedded in the process of estimating the POG of the user’s eye is an additional set of sources of information that the computer could utilize. Studies have verified that information provided by most EGT systems can, in fact, be used advantageously by a computer system seeking to improve the user’s experience by taking into account his/her affective state. The sub-field of affective computing has been pursuing such enhancement in the interaction between computer systems and their users since the mid 1990s. In 1997, in her pioneering book Affective Computing, Rosalind Picard envisioned a new era of computing in which computers would be much more than mere data processors Picard 97. On the wake of the “artificial intelligence” revolution, which focused on enabling computers to “learn” (i.e., “machine learning”), the computers that Picard envisioned could not only learn and interact with humans intelligently, but they would also be able to interact with them at an affective level, defining “affective computing” as “computing which relates to, arises from, or deliberately influences emotions.” In analyzing the specific capabilities that a computer would require to fulfill Picard’s description, Hudlicka proposed that the following are the key processes involved Hudlicka 03: 1. Affect sensing and recognition (affective information from user to machine). 2. User affect modeling / machine affect modeling. 3. Machine affect expression (affective information from machine to user). EGT systems provide the computer with data which can be used to tackle the first of these processes: Affective sensing, which was identified by Picard as one the key challenges that must be conquered to bring the full promise of affective computing concepts to fruition Picard 03. Most of us can attest to some clear, involuntary, and inconcealable changes in our bodies as reactions to strong emotional stimuli: Our hearts may change their pace during climactic moments in a sports event we witness; our hands may turn cold and sweaty when we are scared; we may feel “a rush of blood to the head,” when we get into a strong argument. These are perceptions of the actual302 12. Eye Gaze Tracking as Input in Human–Computer Interaction reconfiguration of our organism that takes place as a reaction to the psychological stimuli listed. Further, we are also capable of identifying an affective shift in another human being by sensing his/her physiological reconfiguration (e.g., seeing the redness in the face of an angry colleague). Physiological affective sensing proposes that computers could, potentially, measure these physical quantities from their users and utilize those measurements to assess their affective states. The reconfiguration experienced by a human subject as a reaction to psychological stimuli is controlled by the autonomic nervous system (ANS), which innervates many organs all over the body. The ANS can promote a state of restoration in the organism, or, if necessary, cause it to leave such a state, invoking physiologic modifications that are useful in responding to external demands. These changes in physiological variables as a response to manipulations of the psychological or behavioral conditions of the individual are the object of study of psychophysiology Hugdahl 95. The autonomic nervous system coordinates the cardiovascular, respiratory, digestive, urinary, and reproductive functions according to the interaction between a human being and his/her environment, without instructions or interference from the conscious mind Martini et al. 01. According to its structure and functionality, the Autonomic Nervous System (ANS) is studied as composed of two divisions: the sympathetic division and the parasympathetic division. The parasympathetic division stimulates visceral activity and promotes a state of “rest and repose” in the organism, conserving energy and fostering sedentary “housekeeping” activities, such as digestion Martini et al. 01. In contrast, the sympathetic division prepares the body for heightened levels of somatic activity that may be necessary to im- plement a reaction to stimuli that disrupt the “rest and repose” of the organism. When fully activated, this division produces a “flight or fight” response, which readies the body for a crisis that may require sudden, intense physical activity. An increase in sympathetic activity generally stimulates tissue metabolism, increases alertness, and, overall, transforms the body into a new status, which will be better able to cope with a state of crisis. Parts of that re-design or transformation may become apparent to the subject and may be associated with measurable changes in physiological variables. The alternated increases in sympathetic and parasympathetic activation provide, in principle, a way to assess the affective shifts and states experienced by the subject. So, for example, sympathetic activation (in general terms) promotes the secretion of adrenaline and noradrenaline, inhibits bladder contraction, promotes the conversion of glycogen to glucose, inhibits peristalsis and secretion, dilates the bronchi in the lungs, accelerates the heartbeat, inhibits the flow of saliva, reduces the peripheral resistance of the circulatory system, and, importantly for this discussion, dilates the pupils of the eyes. In contrast, parasympathetic activation (in general terms) stimulates the release of bile, contracts the bladder, stimulates peristalsis and secretion, constricts the bronchi in the lungs, slows the heartbeat, and stimulates the flow of saliva.12.3. Emerging Uses of EGT in HCI: Affective Sensing 303 Figure 12.7: Sympathetic (“STRESSED”) and parasympathetic (“RELAXED”) activations in the Circumplex Model of Affect. Adapted from Russell 80. Therefore, an approach to affective sensing based on physiological measures targets specifically the changes in observable variables introduced by sympathetic activation. Figure 12.7 shows the Circumplex Model of Affect Russell 80 with the regions associated with sympathetic (“stressed”) and parasympathetic (“relaxed”) predominance overlaid. This combined diagram provides a rationale for the proposal of detecting sympathetic activation to alert the computer that the user is evolving from the comfortable, “relaxed” states in the lower-right quadrant, toward the “stressed” states in the upper-left quadrant. It has been proposed that sympathetic activation detection can help the imple- mentation of affective computing concepts in some key applications. For example, that will be the case of intelligent tutoring systems, where it is important to distin- guish if the student is relaxed/calmed or if he/she is becoming stressed/frustrated, etc. The sympathetic and parasympathetic divisions of the ANS innervate organs all over the body. Therefore, there are, in principle, many physiological measurements that could be monitored for detection of sympathetic activation. However, in the context of practical human–computer interactions, and keeping in mind the goal of making the affective sensing implementation as unobtrusive as possible, the list of physiological variables that are viable is reduced. This has promoted304 12. Eye Gaze Tracking as Input in Human–Computer Interaction interest in taking advantage of the impressive advances in current eye gaze tracking technology. Modern EGT systems provide robust and unobtrusive tracking for sustained eye monitoring and even re-acquisition of the subject’s eye without requiring the subject’s awareness or cooperation. This suggests that the monitoring of multiple eye parameters (all of them measurable through a high-speed EGT system) may be a viable way to detect sympathetic activation due to stress in the computer user. For a computer user, the sympathetic activation could arise because of (negative) stress or distress, frustration, etc. Therefore, in an affective computing system the detection of sympathetic activation could trigger the implementation of corrective measures to mitigate such distress. Specifically, there are at least two physiological variables that are observable in the eyes of the computer user by means of a high-speed eye gaze tracking unit: A Pupil Diameter Variations: It is well documented that the pair of ago- nist/antagonist muscles that control the pupil diameter are under comple- mentary control of the sympathetic and parasympathetic divisions of the ANS Steinhauer et al. 04. Similarly, variations of pupil diameter under changes of mental workload and in response to auditory affective stimulation have been analyzed. More specifically, the pupil diameter variations in re- sponse to affective stimulation of a computer user have been studied Barreto et al. 07a, Barreto et al. 07b, Gao et al. 09, Gao et al. 10, Ren et al. 13, Ren et al. 14. B Characteristics of Eye Saccades and Fixational Eye Movements (tran- sient deflections of the eye gaze): The muscles that control the eye move- ments (and therefore the direction of gaze) are also innervated by the third cranial nerve, which provides the parasympathetic control to the contrac- tor pupillae muscle, responsible for the constriction of the pupil Martini et al. 01. Several characteristics of the eye saccades have been found to correlate well with varying levels of difficulty in mental tasks performed by experimental subjects. Typically the difficulty levels of the tasks were manipulated in terms of the associated mental workload. However, the intended increase of mental workload is likely to have been associated with increased stress, concurrently. The monitoring of pupil diameter for stress detection has a clear anatomical and physiological rationale. The diameter of this circular aperture is under the control of the ANS through two sets of muscles. The sympathetic ANS division, mediated by posterior hypothalamic nuclei, produces enlargement of the pupil by direct stimulation of the radial dilator muscles, which causes them to contract Steinhauer et al. 04. On the other hand, pupil size decrease is caused by excitation of the circular pupillary constriction muscles innervated by the parasympathetic fibers. The motor nucleus for these muscles is the Edinger–Westphal nucleus located in the midbrain. Sympathetic activation brings about pupillary dilation via two12.3. Emerging Uses of EGT in HCI: Affective Sensing 305 mechanisms: (i) an active component arising from activation of radial pupillary dilator muscles along sympathetic fibers and (ii) a passive component involving inhibition of the Edinger–Westphal nucleus Bressloff and Wood 98. The pupil has been observed to enlarge with increased difficulty in mental tasks where the workload is manipulated as independent variable (frequently “N-back” tasks), but one should keep in mind that such tasks are also likely to produce an affective reaction (e.g., stress) in the subjects. Furthermore, there have been other experiments in which pupil diameter has been found to increase in response to stressor stimuli that do not cause a differential mental workload. Partala and Surakka used sounds from the International Affective Digitized Sounds (IADS) collection Bradley and Lang 99 to provide auditory affective stimulation to 30 subjects, and found that the pupil size variation responded to affectively charged sounds Partala and Surakka 03. Those observations were confirmed in the experiments by Gao and colleagues, which also used affective stimulation that did not explicitly increase the mental workload Gao et al. 09, Gao et al. 10. They used a computer-based version of the “Stroop Color-Word Interference Test,” originally devised by J.R. Stroop Stroop 35. The efficacy of this stress elicitation method has been previously established by Renaud and Blondin Renaud and Blondin 97 and its psychological, physiological, and biochemical effects in subjects have been validated by Tulen’s group Tulen et al. 89. They found that the Stroop test induced increases in plasma and urinary adrenaline, heart rate, respiration rate, electrodermal activity, electromyography, feelings of anxiety, and decreased finger pulse amplitude. All these are consistent with sympathetic activation associated with mental stress. Further, Insulander and Johlin-Dannfelt verified the electrophysiological effects of mental stress induced by a Stroop test implementation very similar to Gao’s and found that “Mental stress-with an emotional component, as elicited by the Stroop conflict word test, had pronounced effects on the electrophysiological properties of the heart, most markedly in the sinus and AV nodes and to a lesser degree in the ventricle” Insulander and Johlin-Dannfelt 03. The Stroop Color-Word Interference Test Stroop 35, in its classical version, requires that the font color of a written word designating a color name be stated verbally. Gao and colleagues created an interactive computer version that requires the subject to click on a screen button with the correct answer Gao et al. 09, Gao et al. 10. If the subject cannot make a decision within 3 seconds, the screen automatically changes to the next trial, which intensifies the stress elicitation Renaud and Blondin 97. In each Stroop trial, a word presented to the subject designates a color that may (“Congruent”) or may not (“Incongruent”) match the font color. Congruent trials are not expected to elicit an affective response. In contrast, the internal contradiction induced in the subject during incongruent trials produces an affective response (sympathetic activation) associated with a stressor stimulus. Figure 12.8 shows the timeline followed during each complete experimental session, for each participating subject in Gao’s experiments. There306 12. Eye Gaze Tracking as Input in Human–Computer Interaction Figure 12.8: Timeline of the stress elicitation experimental protocol Gao et al. 10. are 3 consecutive sections. In each section, there are four segments including: 1) ‘IS’ — An introductory segment to establish an appropriate initial level for his/her psychological state, according to the law of initial values (LIV) Stern et al. 01; 2) ‘C’ — a congruent segment, comprising 45 Stroop congruent word presentations (font color matches the meaning of the word), which are not expected to elicit significant stress in the subject; 3) ‘IC’ — an incongruent segment of the Stroop test (font color and the meaning of the 30 words presented differ), to induce stress in the subject; 4) ‘RS’ — a resting segment to act as a buffer between the sections of the protocol. Figure 12.9: Pupil diameter variations (above) and mean values (below) originally observed during the protocol Barreto et al. 07b.12.3. Emerging Uses of EGT in HCI: Affective Sensing 307 Previous work on pupil monitoring for affective sensing has sought to distin- guish between the congruent (relaxation) and incongruent (stress) experimental segments. The initial approach focused on the mean pupil diameter differences between congruent and incongruent experimental segments. An example of these mean values is shown in Figure 12.9. Originally, a single feature from each seg- ment of pupil diameter (PD) data, which was the normalized mean PD value in each segment, was extracted and used in conjunction with 10 other normalized features obtained from GSR, blood volume pulse (BVP), and skin temperature (ST) signals recorded concurrently. The classification of experimental segments on the basis of those multiple features revealed: A Segment (congruent vs. incongruent) classification by a Support Vector Ma- chine achieved 90.10% average accuracy using all 11 features, but decreased to 61.45% if the information from the pupil diameter (PD mean feature) was removed (not provided to the classifier) Zhai and Barreto 06. B The receiver operating characteristic (ROC) curve for the Pupil Diameter (PD) mean feature had an area AUROC_PDmean = 0:9647, which was much higher than even the second-best feature ROC, from the GSR signal (AUROC_GSRmean= 0:6519) Barreto et al. 07a. More recent studies by Ren et al. have confirmed these observations Ren et al. 13, Ren et al. 14. However, it has also been realized that the Stroop elicitation paradigm may present some limitations, as the low intensity of the stimulation may not actually trigger a stress response in some subjects. Further, the artificial (uncommon) environment in which the subject is asked to take the Stroop Test seems to raise the baseline of sympathetic activation in some of the subjects. This may result in the collection of physiologic data where the differential affective response of some subjects is actually very small. Early studies in pupil diameter monitoring for stress detection took place in controlled conditions where the amount of illumination impinging on the retina of the experimental subject was kept approximately constant. In uncontrolled (real) scenarios, the pupil diameter may also decrease due to increased illumination, according to the pupillary light reflex (PLR). To address this Gao et al. Gao et al., 2009 used an adaptive signal processing approach to account for the effect of illumination changes on pupil size variations, so that the processed pupil diame- ter signal would reflect primarily the changes in pupil diameter due to affective responses, (designated as the “pupillary affective response” or PAR). Figure 12.10 shows a block diagram indicating how an adaptive transversal filter (ATF), adapted using the H-infinity algorithm, was set up for this task Gao et al. 09. This archi- tecture, which follows the general structure of an adaptive interference canceller (AIC) Widrow and Stearns 85, requires a “reference noise” signal, obtained from the same source (but not necessarily identical) as the noise that pollutes the signal of interest, in order to remove that noise. In this case the signal of interest is the308 12. Eye Gaze Tracking as Input in Human–Computer Interaction component in the measured pupil diameter variations that is due to the pupillary affective response, while the noise is the component in the measured pupil diameter variations that is due to the pupillary light reflex. Therefore, Gao and colleagues placed a photo sensor in the forehead of the subject and used the corresponding signal as “noise reference.” The left panel of Figure 12.11 shows one example set of the signals processed by the adaptive interference canceller. The second plot is the illumination intensity recorded by the photo sensor. The two increases in illumination observed in segments IC2 and C3 were introduced deliberately (turning on a desk lamp) to test the robustness of the process to illumination changes. The output from the adaptive interference canceller, labeled MPD (bottom-left plot), is further processed by half-wave rectification and a sliding median filter, to yield the bottom trace of the right panel, the processed modified pupil diameter (PMPD), which was used as an indication of sympathetic activation, i.e. stress. It should be noted that the PMPD signal rises over the zero level almost only exclusively during the Stroop Incongruent (stress) segments (after the initial adaptation of the AIC), as expected. Feeding the mean level of PMPD during each segment, along with the other 9 features obtained from GSR and BVP into a support vector machine, achieved an average accuracy of 76.67% in the classification of segments (“stress” vs. “relax”). In this case, also, the classification accuracy dropped to 54.44% if the PMPDmean feature was not used, while a classifier using this single normalized feature, “PMPDmean,” resulted in an average accuracy of 77.78% Gao et al. 10. Furthermore, the ROC curve obtained for the normalized PMPDmean had, again, a much larger area, AUROC_PMPDmean= 0:9331, than the second-best ROC curve, obtained from the mean GSR average value in each segment, AUROC_GSRmean= 0:6780 Gao et al. 09. These observations confirm the strong potential that the pupil diameter signal captured by an EGT instrument seems to have for affective sensing of a computer user. Figure 12.10: Adaptive interference canceller (AIC) architecture used to minimize the negative impact of the pupillary light reflex (PLR) on affective assessment by pupil diameter monitoring Gao et al. 10.12.3. Emerging Uses of EGT in HCI: Affective Sensing 309 Figure 12.11: Process of the original pupil diameter signal (top-left) by the adaptive interference canceller and subsequent sliding window median analysis to result in the PMPD signal (bottom-right) that indicates the emergence of sympathetic activation (“stress”) in the incongruent Stroop segments (IC1, IC2 and IC3). Gao et al. 09. It was previously pointed out that, even while humans attempt to pay atten- tion to a stationary target (i.e., during a “fixation”), our eyes must continue to perform small movements, shifting the gaze around the target, so that the image projected onto the retina is never constant. These compensatory movements that occur during a “fixation” are called “fixational eye movements” (FEMs). FEMs had been traditionally classified as “tremors” Yarbus 67, “slow drifts” Ratliff and Riggs 50, and “microsaccades” Engbert 06. Recently, however, Abadi and Gowen differentiated a fourth type of fixational eye movement, which they called “saccadic intrusions” (SIs), with these characteristics: “conjugate, horizon- tal saccadic movements which tend to be 3-4 times larger than the physiological microsaccades and take the form of an initial fast eye movement away from the desired position, followed, after a variable duration, by either a return saccade or a drift” Abadi and Gowen 04. Eye movements are affected by the psycho-physiological state of the subject. For example, Ishii and colleagues have attempted to evaluate mental workload by analyzing pursuit eye movements Ishii et al. 13. Most importantly, tremors, slow drifts and microsaccades are involuntary Carpenter 88, Martinez-Conde et al. 04 and saccadic intrusions are also involuntary Abadi and Gowen 04. That is, we do not perform them or regulate their characteristics (timing, speed, amplitude, etc.) under conscious control. Their properties at any given time, therefore, are dependent on the autonomic nervous system of the subject, as the pupil diameter is.310 12. Eye Gaze Tracking as Input in Human–Computer Interaction The properties of (regular) saccades are also defined, in part, by involuntary processes. “The decision to initiate a saccade is under voluntary control, however, once the saccade is initiated the speed that the eye moves is completely involuntary, with no central CNS influence” Connel and Baxeddale 11. These authors found that alteration of the ANS by an increasing dose of buprenorphine in 16 volunteers, induced a (dose dependent) decrease in saccadic velocity, peaking around two hours post dose. Therefore, both (regular) saccades and saccadic intrusions (SIs) have varia- tions in their characteristics that depend on the psycho-physiological state of the subject, as pupil diameter variations do. Tokuda proposed the analysis of SIs as an advantageous alternative to the measurement of pupil diameter variations to evaluate mental workload. In their studies they used an (auditory) N-back task to manipulate mental workload and found that both pupil diameter and occurrence of SIs correlated well with the degree of difficulty of the N-back task. It should be noted that, while the more difficult (larger N) N-back task was meant to manipulate the mental workload, large N tasks are also likely to induce a stressful response in the subject, which could also be the cause for the increase in pupil diameter and in the occurrence of SIs. Similarly, a study used eye tracking (among other measures) to evaluate the psycho-physiological response of anesthesiologists during the simulation of a critical incident endangering the patient (progressive anaphylaxis, reaching severe anaphylaxis) Schulz et al. 11. While this study meant to evaluate workload, it used an independent variable (inclusion or exclusion of the simulation of a critical incident) that clearly altered the stress level in the anesthesiologists participating as subjects, concurrently. These researchers confirmed an increase in pupil diameter in the sessions that included the simulated critical incident, and saw the pupil size vary in correlation to the progressive severity of the critical incident, so that “the simulator conditions explained 92.6% of the variance in pupil diameter.” It was also found that “assessment of duration of fixation (time between regular saccades) as a function of simulator state by mixed models revealed a highly significant association (p 0.001). The independent variable ‘simulator state’ explained 65% of the variance during incident scenarios” Schulz et al. 11. These findings lend additional credibility to the emerging utilization of EGT systems to obtain implicit information about the affective state of a computer user, defining a new avenue for research into the additional potential of EGT technology as input mechanism in human–computer interaction. Further Reading To help appreciating the evolution of EGT systems for HCI, it is interesting to read one of Robert Jacob’s earliest reports on EGT Jacob 90.12.3. Emerging Uses of EGT in HCI: Affective Sensing 311 In 2007 Andrew Duchowski published a comprehensive book on EGT method- ology Duchowski 07 that can be consulted to obtain more detailed information on multiple aspects related to these instruments. Finally, an interesting comparison of the origins (1990) of these technologies in HCI with a current state of the art can be obtained by reading Heiko Drewes’ contribution Drewes 15 to the book on Interactive Displays edited by Bhomik. Exercises 1. Create a survey report with the current EGT commercially available EGT systems. Create a table to compare the different attributes found on those commercial EGT systems, including prices. 2. According to the study by Guestrin and Eizenman (2006), what is the minimum set of requirements for estimating the point of gaze, if the head of the subject cannot be expected to be perfectly stationary: a Using only one camera and one light source. b Using one camera and multiple light sources, after a multi-point cali- bration. c Using multiple cameras and multiple light sources, after a single-point calibration. d Using multiple cameras and multiple light sources, after a multi-point calibration. 3. To obtain a “bright pupil” image it is necessary that: a The IR illuminator and the IR camera are placed on the same axis (collinear). b The IR illuminator and the IR camera are placed on perpendicular axes. c The IR illuminator is displaced about 1.5 cm from the camera axis. d The “on-axis” illuminator and the “off-axis” illuminator are turned on alternatively. 4. The rationale for monitoring the pupil diameter readings provided by an EGT system to detect the emergence of stress in a computer user is that: a Stress is associated with activation of the parasympathetic division of the ANS, which activates the muscles that make the pupil dilate. b Stress is associated with activation of the parasympathetic division of the ANS, which activates the muscles that make the pupil reduce its size.312 12. Eye Gaze Tracking as Input in Human–Computer Interaction c Stress is associated with activation of the sympathetic division of the ANS, which activates the muscles that make the pupil reduce its size. d Stress is associated with activation of the sympathetic division of the ANS, which activates the muscles that make the pupil dilate.

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.