IEICE global.ieice.org Site

Keyword Search Result

[Keyword] multimodal(33hit)

21-33hit(33hit)

The Fusion of Two User-friendly Biometric Modalities: Iris and Face
Byungjun SON Yillbyung LEE

LETTER-Image Recognition, Computer Vision

Vol:
E89-D No:1
Page(s):
372-376
In this paper, we present the biometric authentication system based on the fusion of two user-friendly biometric modalities: Iris and Face. Using one biometric feature can lead to good results, but there is no reliable way to verify the classification. To achieve robust identification and verification we are combining two different biometric features. We specifically apply 2-D discrete wavelet transform to extract the feature sets of low dimensionality from the iris and face. And then to obtain Reduced Joint Feature Vector (RJFV) from these feature sets, Direct Linear Discriminant Analysis (DLDA) is used in our multimodal system. This system can operate in two modes: to identify a particular person or to verify a person's claimed identity. Our results for both cases show that the proposed method leads to a reliable person authentication system.
Wearable Telepresence System Based on Multimodal Communication for Effective Teleoperation with a Humanoid
Yong-Ho SEO Hun-Young PARK Taewoo HAN Hyun Seung YANG

PAPER

Vol:
E89-D No:1
Page(s):
11-19
This paper presents a new type of wearable teleoperation system that can be applied to the control of a humanoid robot. The proposed system has self-contained computing hardware with a stereo head-mounted display, a microphone, a set of headphones, and a wireless LAN. It also has a mechanism that tracks arm and head motion by using several types of sensors that detect the motion data of an operator, along with a simple force reflection mechanism that uses vibration motors at appropriate joints. For remote tasks, we use intelligent self-sensory feedback and autonomous behavior, such as automatic grasping and obstacle avoidance in a slave robot, and we feed the information back to an operator through a multimodal communication channel. Through this teleoperation system, we successfully demonstrate several teleoperative tasks, including object manipulation and mobile platform control of a humanoid robot.
Proposal of a Multimodal Interaction Description Language for Various Interactive Agents
Masahiro ARAKI Akiko KOUZAWA Kenji TACHIBANA

PAPER

Vol:
E88-D No:11
Page(s):
2469-2476
In this paper, we propose a new multimodal interaction description language, MIML (Multimodal Interaction Markup Language), which defines dialogue patterns between human and various types of interactive agents. The feature of this language is three-layered description of agent-based interactive systems. The high-level description is a task definition that can easily construct typical agent-based interactive task control information. The middle-level description is an interaction description that defines agent's behavior and user's input at the granularity of dialogue segment. The low-level description is a platform dependent description that can override the pre-defined function in the interaction description. The connection between task-level and interaction-level is realized by generation of interaction description templates from the task level description. The connection between interaction-level and platform-level is realized by a binding mechanism of XML. As a result of the comparison with other languages, MIML has advantages in high-level interaction description, modality extensibility and compatibility with standardized technologies.
Interactive Object Recognition System for a Helper Robot Using Photometric Invariance
Md. Altab HOSSAIN Rahmadi KURNIA Akio NAKAMURA Yoshinori KUNO

PAPER

Vol:
E88-D No:11
Page(s):
2500-2508
We are developing a helper robot that carries out tasks ordered by the user through speech. The robot needs a vision system to recognize the objects appearing in the orders. It is, however, difficult to realize vision systems that can work in various conditions. Thus, we have proposed to use the human user's assistance through speech. When the vision system cannot achieve a task, the robot makes a speech to the user so that the natural response by the user can give helpful information for its vision system. Our previous system assumes that it can segment images without failure. However, if there are occluded objects and/or objects composed of multicolor parts, segmentation failures cannot be avoided. This paper presents an extended system that tries to recover from segmentation failures using photometric invariance. If the system is not sure about segmentation results, the system asks the user by appropriate expressions depending on the invariant values. Experimental results show the usefulness of the system.
New Cycling Environments Using Multimodal Knowledge and Ad-hoc Network
Sachiyo YOSHITAKI Yutaka SAKANE Yoichi TAKEBAYASHI

PAPER

Vol:
E87-D No:6
Page(s):
1377-1385
We have been developing new cycling environments by using knowledge sharing and speech communication. We have offered multimodal knowledge contents to share knowledge on safe and exciting cycling. We accumulated 140 contents, focused on issues such as riding techniques, trouble shootings, and preparations on cycling. We have also offered a new way of speech communication using an ad-hoc wireless LAN technology for safe cycling. Group cycling requires frequent communication to lead the group safely. Speech communication achieves spontaneous communication between group members without looking around or speaking loudly. Experimental result through actual cycling has shown the effectiveness of sharing multimodal knowledge contents and speech communication. Our new developed environment has an advantage of increasing multimodal knowledge through the accumulation of personal experiences of actual cycling.
Conversation Robot Participating in Group Conversation
Yosuke MATSUSAKA Tsuyoshi TOJO Tetsunori KOBAYASHI

INVITED PAPER

Vol:
E86-D No:1
Page(s):
26-36
We developed a conversation system which can participate in a group conversation. Group conversation is a form of conversation in which three or more participants talk to each other about a topic on an equal footing. Conventional conversation systems have been designed under the assumption that each system merely talked with only one person. Group conversation is different from these conventional systems in the following points. It is necessary for the system to understand the conversational situation such as who is speaking, to whom he is speaking, and also to whom the other participants pay attention. It is also necessary for the system itself to try to affect the situation appropriately. In this study, we realized the function of recognizing the conversational situation, by combining image processing and acoustic processing, and the function of working on the conversational situation utilizing facial and body actions of the robot. Thus, a robot that can join in the group conversation was realized.
An Empirical Performance Comparison of Niching Methods for Genetic Algorithms
Hisashi SHIMODAIRA

PAPER-Biocybernetics, Neurocomputing

Vol:
E85-D No:11
Page(s):
1872-1880
Various niching methods have been developed to maintain the population diversity. The feature of these methods is to prevent the proliferation of similar individuals in the niche (subpopulation) based on the similarity measure. This paper demonstrates that they are effective to avoid premature convergence in a case where only one global optimum in multimodal functions is searched. The performance of major niching methods in such a case is investigated and compared by experiments using seven benchmark functions. The niching methods tested in this paper are deterministic crowding, probabilistic crowding, restricted tournament selection, clearing procedure and diversity-control-oriented genetic algorithm (DCGA). According to the experiment, each method shows a fairly good global-optimum-searching capability. However, no method can completely avoid premature convergence in all functions. In addition, no method shows a better searching capability than the other methods in all functions.
The Efficiency of Various Multimodal Input Interfaces Evaluated in Two Empirical Studies
Xiangshi REN Gao ZHANG Guozhong DAI

PAPER-Welfare Engineering

Vol:
E84-D No:10
Page(s):
1421-1426
Although research into multimodal interfaces has been around for a long time, we believe that some basic issues have not been studied yet, e.g. the choice of modalities and their combinations is usually made without any quantitative evaluation. This study seeks to identify the best combinations of modalities through usability testing. How do users choose different interaction modes when they work on a particular application? Two experimental evaluations were conducted to compare interaction modes on a CAD system and a map system respectively. For the CAD system, the results show that, in terms of total manipulation time (drawing and modification time) and subjective preferences, the "pen + speech + mouse" combination was the best of the seven interaction modes tested. On the map system, the results show that the "pen + speech" combination mode is the best of fourteen interaction modes tested. The experiments also provide information on how users adapt to each interaction mode and the ease with which they are able to use these modes.
Multimodal Pattern Classifiers with Feedback of Class Memberships
Kohei INOUE Kiichi URAHAMA

LETTER-Bio-Cybernetics and Neurocomputing

Vol:
E82-D No:3
Page(s):
712-716
Feedback of class memberships is incorporated into multimodal pattern classifiers and their unsupervised learning algorithm is presented. Classification decision at low levels is revised by the feedback information which also enables the reconstruction of patterns at low levels. The effects of the feedback are examined for the McGurk effect by using a simple model.
Monochromatic Visualization of Multimodal Images by Projection Pursuit
Seiji HOTTA Kiichi URAHAMA

LETTER-Image Theory

Vol:
E81-A No:12
Page(s):
2715-2718
A method of visualization of multimodal images by one monochromatic image is presented on the basis of the projection pursuit approach of the inverse process of the anisotropic diffusion which is a method of image restoration enhancing contrasts at edges. The extension of the projection from a linear one to nonlinear sigmoidal functions enhances the contrast further. The deterministic annealing technique is also incorporated into the optimization process for improving the contrast enhancement ability of the projection. An application of this method to a pair of MRI images of brains reveals its promising performance of superior visualization of tissues.
Use of Multimodal Information in Facial Emotion Recognition
Liyanage C. DE SILVA Tsutomu MIYASATO Ryohei NAKATSU

PAPER-Artificial Intelligence and Cognitive Science

Vol:
E81-D No:1
Page(s):
105-114
Detection of facial emotions are mainly addressed by computer vision researchers based on facial display. Also detection of vocal expressions of emotions is found in research work done by acoustic researchers. Most of these research paradigms are devoted purely to visual or purely to auditory human emotion detection. However we found that it is very interesting to consider both of these auditory and visual informations together, for processing, since we hope this kind of multimodal information processing will become a datum of information processing in future multimedia era. By several intensive subjective evaluation studies we found that human beings recognize Anger, happiness, Surprise and Dislike by their visual appearance, compared to voice only detection. When the audio track of each emotion clip is dubbed with a different type of auditory emotional expression, still Anger, Happiness and Surprise were video dominant. However Dislike emotion gave mixed responses to different speakers. In both studies we found that Sadness and Fear emotions were audio dominant. As a conclusion to the paper we propose a method of facial emotion detection by using a hybrid approach, which uses multimodal informations for facial emotion recognition.
A Speech Dialogue System with Multimodal Interface for Telephone Directory Assistance
Osamu YOSHIOKA Yasuhiro MINAMI Kiyohiro SHIKANO

PAPER

Vol:
E78-D No:6
Page(s):
616-621
This paper describes a multimodal dialogue system employing speech input. This system uses three input methods (through a speech recognizer, a mouse, and a keyboard) and two output methods (through a display and using sound). For the speech recognizer, an algorithm is employed for large-vocabulary speaker-independent continuous speech recognition based on the HMM-LR technique. This system is implemented for telephone directory assistance to evaluate the speech recognition algorithm and to investigate the variations in speech structure that users utter to computers. Speech input is used in a multimodal environment. The collecting of dialogue data between computers and users is also carried out. Twenty telephone-number retrieval tasks are used to evaluate this system. In the experiments, all the users are equally trained in using the dialogue system with an interactive guidance system implemented on a workstation. Simplified city maps that indicate subscriber names and addresses are used to reduce the implicit restrictions imposed by written sentences, thus allowing each user to develop his own forms of expression. The task completion rate is 99.0% and approximately 75% of the users say that they prefer this system to using a telephone book. Moreover, there is a significant decrease in nonkeyword usage, i.e., the usage of words other than names and addresses, for users who receive more utterance practice.
Design and Construction of an Advisory Dialogue Database
Tadahiko KUMAMOTO Akira ITO Tsuyoshi EBINA

PAPER-Databases

Vol:
E78-D No:4
Page(s):
420-427
We are aming to develop a computer-based consultant system which helps novice computer users to achieve their task goals on computers through natural language dialogues. Our target is spoken Japanese. To develop effective methods for processing spoken Japanese, it is essential to analyze real dialogues and to find the characteristics of spoken Japanese. In this paper, we discuss the design problems associated with constructing a spoken dialogue database from the viewpoint of advisory dialogue collection, describe XMH (X-window-based electronic mail handling program) usage experiments made to collect advisory dialogues between novice XMH users and an expert consultant, and show the dialogue database we constructed from these dialogues. The main features of our database are as follows: (1) Our target dialogues were advisory ones. (2) The advisory dialogues were all related to the use of XMH that has a visual interface operated by a keyboard and a mouse. (3) The primary objective of the users was not to engage in dialogues but to achieve specific task goals using XMH. (4) Not only what the users said but also XMH operations performed by the users are included as dialogue elements. This kind of dialogue database is a very effective source for developing new methods for processing spoken language in multimodal consultant systems, and we have therefore made it available to the public. Based on our analysis of the database, we have already developed several effective methods such as a method for recognizing user's communicative intention from a transcript of spoken Japanese, and a method for controlling dialogues between a novice XMH user and the computer-based consultant system which we are developing. Also, we have proposed several response generation rules as the response strategy for the consultant system. We have developed an experimental consultant system by implementing the above methods and strategy.

21-33hit(33hit)

Keyword Search Result

[Keyword] multimodal(33hit)

The Fusion of Two User-friendly Biometric Modalities: Iris and Face

Wearable Telepresence System Based on Multimodal Communication for Effective Teleoperation with a Humanoid

Proposal of a Multimodal Interaction Description Language for Various Interactive Agents

Interactive Object Recognition System for a Helper Robot Using Photometric Invariance

New Cycling Environments Using Multimodal Knowledge and Ad-hoc Network

Conversation Robot Participating in Group Conversation

An Empirical Performance Comparison of Niching Methods for Genetic Algorithms

The Efficiency of Various Multimodal Input Interfaces Evaluated in Two Empirical Studies

Multimodal Pattern Classifiers with Feedback of Class Memberships

Monochromatic Visualization of Multimodal Images by Projection Pursuit

Use of Multimodal Information in Facial Emotion Recognition

A Speech Dialogue System with Multimodal Interface for Telephone Directory Assistance

Design and Construction of an Advisory Dialogue Database

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles