The goal of gaze detection is to locate the position (on a monitor) where a user is looking. Previous researches use one wide view camera, which can capture the user's entire face. However, the image resolution is too low with such a camera and the fine movements of user's eye cannot be exactly detected. So, we propose the new gaze detection system with dual cameras (a wide and a narrow view camera). In order to locate the user's eye position accurately, the narrow-view camera has the functionalities of auto focusing/panning/tilting based on the detected 3D eye positions from the wide view camera. In addition, we use the IR-LED illuminators for wide and narrow view camera, which can ease the detecting of facial features, pupil and iris position. To overcome the problem of specular reflection on glasses by illuminator, we use dual IR-LED illuminators for wide and narrow view camera and detect the accurate eye position, which is not hidden by the specular reflection. Experimental results show that the gaze detection error between the computed positions and the real ones is about 2.89 cm of RMS error.
Yuusuke NAKANO Koji TSUKADA Saeko TAKAGI Kei IWASAKI Fujiichi YOSHIMOTO
The importance of informal communication on the Internet has been increasing in recent years. Several systems for informal communication have been developed. These systems, however, require a particular server and/or specialized 3D contents. In this paper, we propose a system, named InCom, for informal communication in a 3D virtual environment. Browsers which are component of InCom generate 3D virtual worlds from existing common 2D HTML documents. Browsers communicate in a peer-to-peer manner. Using avatars makes gaze awareness smooth. Our results show that users shared interests by gaze awareness.
Cheng-Chin CHIANG Chi-Lun HUANG
This paper presents the design of an automatic surveillance system to monitor the dangerous non-frontal gazes of the car driver. To track the driver's eyes, we propose a novel filter to locate the "between-eye", which is the middle point between the two eyes, to help the fast locating of eyes. We also propose a specially designed criterion function named mean ratio function to accurately locate the positions of eyes. To analyze the gazes of the driver, a multilayer perceptron neural network is trained to examine whether the driver is losing the proper gaze or not. By incorporating the neural network output with some well-designed alarm-issuing rules, the system performs the monitoring task for single dedicated driver and multiple different drivers with a satisfied performance in our experiments.
Thitiporn LERTRUSDACHAKUL Akinori TAGUCHI Terumasa AOKI Hiroshi YASUDA
This paper addresses issues regarding to the development of teleconferencing support collaboration focusing on the realistic sensation domain. It argues that the gaze communications are the important mechanisms to enable visual channel and social presence in human-human communications. We propose a new aspect to establish multiple eye contacts and community awareness in multiparty videoconference (VC). The participants can aware of being recognized from any remote sites while they are talking with each other. Community awareness means the ability to aware of group communication in the videoconference. The participant can recognize of who is talking with whom and any communicative groups in a conference. An intelligent image arrangement through a unique position of camera is built and simulated. The systematic placement of images serves the gaze communications by utilizing the characteristic of gaze direction and image's position. The experimental results show that the proposed approach has the significant improvement in the interpersonal communication compared with the conventional VC system.
Gaze detection is to locate the position on a monitor screen where a user is looking. In our work, we implement it with a computer vision system setting a single camera above a monitor and a user moves (rotates and/or translates) her face to gaze at a different position on the monitor. For our case, the user is requested not to move pupils of her eyes when she gazes at a different position on the monitor screen, though we are working on to relax this restriction. To detect the gaze position, we extract facial features (both eyes, nostrils and lip corners) automatically in 2D camera images. From the movement of feature points detected in starting images, we can compute the initial 3D positions of those features by recursive estimation algorithm. Then, when a user moves her head in order to gaze at one position on a monitor, the moved 3D positions of those features can be computed from 3D motion estimation by Iterative Extended Kalman Filter (IEKF) and affine transform. Finally, the gaze position on a monitor is computed from the normal vector of the plane determined by those moved 3D positions of features. Especially, in order to obtain the exact 3D positions of initial feature points, we unify three coordinate systems (face, monitor and camera coordinate system) based on perspective transformation. As experimental results, the 3D position estimation error of initial feature points, which is the RMS error between the estimated initial 3D feature positions and the real positions (measured by 3D position tracker sensor) is about 1.28 cm (0.75 cm in X axis, 0.85 cm in Y axis, 0.6 cm in Z axis) and the 3D motion estimation errors of feature points by Iterative Extended Kalman Filter (IEKF) are about 2.8 degrees and 1.21 cm in rotation and translation, respectively. From that, we can obtain the gaze position on a monitor (17 inches) and the gaze position accuracy between the calculated positions and the real ones is about 2.06 inches of RMS error.
Kang Ryoung PARK Si Wook NAM Min Suk LEE Jaihie KIM
This paper describes a new method for detecting the gaze position of a user on a monitor from monocular images. In order to detect the gaze position, we extract facial features (both eyes, nostrils and lip corners) automatically in 2D camera images and estimate the 3D depth information and the initial 3D positions of those features by recursive estimation algorithm in starting images. Then, when a user moves his/her head in order to gaze at one position on a monitor, the moved 3D positions of those features can be estimated from 3D motion estimation by Extended Kalman Filter (EKF) and affine transform. Finally, the gaze position on a monitor is calculated from the normal vector of the plane determined by those moved 3D positions of features. Especially, in order to obtain the exact 3D depth and positions of initial feature points, we unify three coordinate systems (face, monitor and camera coordinate system) based on perspective transformation. As experimental results, the 3D depth and the position estimation error of initial feature points, which is the RMS error between the estimated initial 3D feature positions and the real positions (measured by 3D position tracker sensor) is about 1.28 cm (0.75 cm in X axis, 0.85 cm in Y axis, 0.6 cm in Z axis) and the 3D motion estimation errors of feature points by Extended Kalman Filter (EKF) are about 3.6 degrees and 1.4 cm in rotation and translation, respectively. From that, we can obtain the gaze position on a monitor (17 inches) and the gaze position accuracy between the calculated positions and the real ones is about 2.1 inches of RMS error.
The networked reality is defined to be the virtual reality used in networks and using networks. The paper describes several levels of the networked reality and their applications.
Minoru KOBAYASHI Hiroshi ISHII
The goal of visual telecommunication has been to create a sense of "being there" or "telepresence." This paper introduces a novel shared drawing medium called ClearBoard that goes beyond "being there" by providing virtual shared workspace. It realizes (1) a seamless integration of shared drawing space and partner's image, and (2) eye contact to support real-time and remote collaboration by two users. We devised the key metaphor: "talking through and drawing on a transparent glass window" to design ClearBoard. A prototype, ClearBoard-1 is implemented based on the "Drafter-Mirror" architecture. This paper first reviews previous work on shared drawing support to clarify our design goals. We then examine three metaphors that fulfill these goals. The design requirements and the two possible system architectures of ClearBoard are described. Finally, some findings gained through the experimental use of the prototype, including the feature of "gaze awareness," are discussed.
Hidetomo SAKAINO Akira TOMONO Fumio KISHINO
In a display system with a line-of-gaze (LOG) controller, it is difficult to make the directions and motions of a LOG-controlled object coincide as closely as possible in the display with the user's intended LOG-directions and motions. This is because LOG behavior is not only smooth, but also saccadic due to the problem of involuntary eye movement. This article introduces a flexible on-line LOG-control scheme to realize nearly perfect LOG operation. Using a mesh-wise cursor pattern, the first visual experiment elucidates subjectively that a Kalman Filter (KF) for smoothing and predicting is effective in filtering out macro-saccadic changes of the LOG and in predicting sudden changes of the saccade while movement is in progress. It must be assumed that the LOG trajectory can be described by a linear position-velocity-acceleration approximation of Sklansky Model (SM). Furthermore, the second experiment uses a four-point pattern and simulations to scrutinize the two physical properties of velocity and direction-changes of the LOG in order to quantitatively and efficiently resolve "moving" and "gazing". In order to greatly reduce the number of LOG-small-position changes while gazing, the proposed Gaze-Holding algorithm (GH) with a gaze-potential function is combined with the KF. This algorithm allows the occurrence frequency of the micro-saccade to be reduced from approximately 25 Hz to 1 or 2 Hz. This great reduction in the frequency of the LOG-controlled object moves is necessary to achieve the user's desired LOG-response while gazing. Almost perfect LOG control is accomplished by the on-line SM+KF+GH scheme while either gazing or moving. A menu-selection task was conducted to verify the effectiveness of the proposed on-line LOG-control method.