The search functionality is under construction.

IEICE TRANSACTIONS on Information

  • Impact Factor

    0.72

  • Eigenfactor

    0.002

  • article influence

    0.1

  • Cite Score

    1.4

Advance publication (published online immediately after acceptance)

Volume E91-D No.6  (Publication Date:2008/06/01)

    Special Section on Human Communication III
  • FOREWORD Open Access

    Shunichi YONEMURA  

     
    FOREWORD

      Page(s):
    1593-1593
  • Dive into the Movie

    Shigeo MORISHIMA  

     
    INVITED PAPER

      Page(s):
    1594-1603

    "Dive into the Movie (DIM)" is a name of project to aim to realize a world innovative entertainment system which can provide an immersion experience into the story by giving a chance to audience to share an impression with his family or friends by watching a movie in which all audience can participate in the story as movie casts. To realize this system, several techniques to model and capture the personal characteristics instantly in face, body, gesture, hair and voice by combining computer graphics, computer vision and speech signal processing technique. Anyway, all of the modeling, casting, character synthesis, rendering and compositing processes have to be performed on real-time without any operator. In this paper, first a novel entertainment system, Future Cast System (FCS), is introduced which can create DIM movie with audience's participation by replacing the original roles' face in a pre-created CG movie with audiences' own highly realistic 3D CG faces. Then the effects of DIM movie on audience experience are evaluated subjectively. The result suggests that most of the participants are seeking for higher realism, impression and satisfaction by replacing not only face part but also body, hair and voice. The first experimental trial demonstration of FCS was performed at the Mitsui-Toshiba pavilion of the 2005 World Exposition in Aichi Japan. Then, 1,640,000 people have experienced this event during 6 months of exhibition and FCS became one of the most popular events at Expo.2005.

  • An Effective QoS Control Scheme for 3D Virtual Environments Based on User's Perception

    Takayuki KURODA  Takuo SUGANUMA  Norio SHIRATORI  

     
    PAPER-Media Communication

      Page(s):
    1604-1612

    In this paper, we present a new three-dimensional (3D) virtual environment (3DVE) system named "QuViE/P", which can enhance quality of service (QoS), that users actually feel, as good as possible when resources of computers and networks are limited. To realize this, we focus on characteristics of user's perceptual quality evaluation on 3D objects. We propose an effective QoS control scheme for QuViE/P by introducing relationships between system's internal quality parameters and user's perceptual quality parameters. This scheme can appropriately maintain the QoS of the 3DVE system and it is expected to improve convenience when using 3DVE system where resources are insufficient. We designed and implemented a prototype of QuViE/P using a multiagent framework. The experiment results show that even when the computer resource is reduced to 20% of the required amount, the proposed scheme can maintain the quality of important objects to a certain level.

  • Study of Spatial Configurations of Equipment for Online Sign Interpretation Service

    Kaoru NAKAZONO  Saori TANAKA  

     
    PAPER-Media Communication

      Page(s):
    1613-1621

    This paper discusses the design of configurations of videophone equipment aimed at online sign interpretation. We classified interpretation services into three types of situations: on-site interpretation, partial online interpretation, and full online interpretation. For each situation, the spatial configurations of the equipment are considered keeping the issue of nonverbal signals in mind. Simulation experiments of sign interpretation were performed using these spatial configurations and the qualities of the configurations were assessed. The preferred configurations had the common characteristics that the hearing subject could see the face of his/her principal conversation partner, that is, the deaf subject. The results imply that hearing people who do not understand sign language utilize nonverbal signals for facilitating interpreter-mediated conversation.

  • Online Chat Dependency: The Influence of Social Anxiety

    Chih-Chien WANG  Shu-Chen CHANG  

     
    PAPER-Media Communication

      Page(s):
    1622-1627

    Recent developments in information technology have made it easy for people to "chat" online with others in real time, and many do so regularly. "Virtual" relationships can be attractive, especially for people with social interaction problems in the "real world". This study examines the influence on online chat dependency of three dimensions of social anxiety: general social situation fear, negative evaluation fear, and novel social situation fear. Participants of this study were 454 college students. The survey results show that negative evaluation fear and general social situation fear are relative to online chat dependency, while novel social situation fear does not seem to be a relevant factor.

  • Facial Expression Generation from Speaker's Emotional States in Daily Conversation

    Hiroki MORI  Koh OHSHIMA  

     
    PAPER-Media Communication

      Page(s):
    1628-1633

    A framework for generating facial expressions from emotional states in daily conversation is described. It provides a mapping between emotional states and facial expressions, where the former is represented by vectors with psychologically-defined abstract dimensions, and the latter is coded by the Facial Action Coding System. In order to obtain the mapping, parallel data with rated emotional states and facial expressions were collected for utterances of a female speaker, and a neural network was trained with the data. The effectiveness of proposed method is verified by a subjective evaluation test. As the result, the Mean Opinion Score with respect to the suitability of generated facial expression was 3.86 for the speaker, which was close to that of hand-made facial expressions.

  • Body Movement Synchrony in Psychotherapeutic Counseling: A Study Using the Video-Based Quantification Method

    Chika NAGAOKA  Masashi KOMORI  

     
    PAPER-Human Information Processing

      Page(s):
    1634-1640

    Body movement synchrony (i.e. rhythmic synchronization between the body movements of interacting partners) has been described by subjective impressions of skilled counselors and has been considered to reflect the depth of the client-counselor relationship. This study analyzed temporal changes in body movement synchrony through a video analysis of client-counselor dialogues in counseling sessions. Four 50-minute psychotherapeutic counseling sessions were analyzed, including two negatively evaluated sessions (low evaluation groups) and two positively evaluated sessions (high evaluation groups). In addition, two 50-minute ordinary advice sessions between two high school teachers and the clients in the high rating group were analyzed. All sessions represent role-playing. The intensity of the participants' body movement was measured using a video-based system. Temporal change of body movement synchrony was analyzed using moving correlations of the intensity between the two time series. The results revealed (1) A consistent temporal pattern among the four counseling cases, though the moving correlation coefficients were higher for the high evaluation group than the low evaluation group and (2) Different temporal patterns for the counseling and advice sessions even when the clients were the same. These results were discussed from the perspective of the quality of client-counselor relationship.

  • Separation between Sound and Light Enhances Audio-Visual Prior Entry Effect

    Yuki HONGOH  Shinichi KITA  Yoshiharu SOETA  

     
    PAPER-Human Information Processing

      Page(s):
    1641-1648

    We examined how spatial disparity between the auditory and visual stimuli modulated the audio-visual (A-V) prior entry effect. Spatial and temporal proximity of multisensory stimuli are crucial factors for multisensory perception in most cases (e.g. [1],[2]). However our previous research[3],[4] suggested that this well-accepted hypothesis was not applicable to the A-V prior entry effect. In order to examine the effect of the spatial disparity on the A-V prior entry effect, six loudspeakers and two light emitting diodes (LEDs) were used as stimuli. The loudspeakers were located at 10, 25, and 90 degrees from the midline of the participants to both right and left sides. A preceding sound was presented from one of these six loudspeakers. After the preceding sound, two visual targets were presented successively at a short interval and participants judged which visual target was presented first. Two colour changeable ('red' or 'green') LEDs were used for the visual targets and participants judged the order of visual targets by their colour not by their side in order to avoid the response bias as much as possible. The visual targets were situated at 10 degrees or 25 degrees from the participants' midline to both right and left in the Experiment 1. Results showed a biased judgment that the visual target at the sound presented side was presented first. The amplitude of the A-V prior entry effect was greater when the preceding sound source was more apart from the midline of participants. This effect of spatial separation indicated that the clarity of either right or left side of the preceding sound enhanced the amplitude of the A-V prior entry effect (Experiment 2). These results challenge the belief that the spatial proximity of multisensory stimuli is a crucial factor for multisensory perception.

  • Mechanism of Perceptual Categorization in the Pre-Linguistic Period

    Tamami SUDO  Ken MOGI  

     
    PAPER-Human Information Processing

      Page(s):
    1649-1655

    In this study, we conducted a series of experiments using stimuli characterized by various attributes in order to understand the categorization process in an infant's pre-linguistic development. The infants are able to assign the same label to members within the same category by focusing attention on specific features or functions common to the members. The ability to categorize is likely to play an essential role in an infant's overall cognitive development. Specifically, we investigated how the infants use different strategies in the process of linguistic categorization. In one strategy, members of a single category are derived from perceptual similarities within the most representative members, i.e., the prototypical members. Alternatively, each membership is established by referring to the linguistic labels for each category provided by the caretaker, in a symbol grounding process. We found that the infant is able to employ these strategies in a flexible manner in its development. We discuss the interplay between different cognitive strategies, including the prototype effects in the infant's cognitive development and the implications for cortical mechanism involved.

  • An MEG Study of Temporal Characteristics of Semantic Integration in Japanese Noun Phrases

    Hirohisa KIGUCHI  Nobuhiko ASAKURA  

     
    PAPER-Human Information Processing

      Page(s):
    1656-1663

    Many studies of on-line comprehension of semantic violations have shown that the human sentence processor rapidly constructs a higher-order semantic interpretation of the sentence. What remains unclear, however, is the amount of time required to detect semantic anomalies while concatenating two words to form a phrase with very rapid stimuli presentation. We aimed to examine the time course of semantic integration in concatenating two words in phrase structure building, using magnetoencephalography (MEG). In the MEG experiment, subjects decided whether two words (a classifier and its corresponding noun), presented each for 66 ms, form a semantically correct noun phrase. Half of the stimuli were matched pairs of classifiers and nouns. The other half were mismatched pairs of classifiers and nouns. In the analysis of MEG data, there were three primary peaks found at approximately 25 ms (M1), 170 ms (M2) and 250 ms (M3) after the presentation of the target words. As a result, only the M3 latencies were significantly affected by the stimulus conditions. Thus, the present results indicate that the semantic integration in concatenating two words starts from approximately 250 ms.

  • A Collaborative Knowledge Management Process for Implementing Healthcare Enterprise Information Systems

    Po-Hsun CHENG  Sao-Jie CHEN  Jin-Shin LAI  Feipei LAI  

     
    PAPER-Interface Design

      Page(s):
    1664-1672

    This paper illustrates a feasible health informatics domain knowledge management process which helps gather useful technology information and reduce many knowledge misunderstandings among engineers who have participated in the IBM mainframe rightsizing project at National Taiwan University (NTU) Hospital. We design an asynchronously sharing mechanism to facilitate the knowledge transfer and our health informatics domain knowledge management process can be used to publish and retrieve documents dynamically. It effectively creates an acceptable discussion environment and even lessens the traditional meeting burden among development engineers. An overall description on the current software development status is presented. Then, the knowledge management implementation of health information systems is proposed.

  • Interactive Cosmetic Makeup of a 3D Point-Based Face Model

    Jeong-Sik KIM  Soo-Mi CHOI  

     
    PAPER-Interface Design

      Page(s):
    1673-1680

    We present an interactive system for cosmetic makeup of a point-based face model acquired by 3D scanners. We first enhance the texture of a face model in 3D space using low-pass Gaussian filtering, median filtering, and histogram equalization. The user is provided with a stereoscopic display and haptic feedback, and can perform simulated makeup tasks including the application of foundation, color makeup, and lip gloss. Fast rendering is achieved by processing surfels using the GPU, and we use a BSP tree data structure and a dynamic local refinement of the facial surface to provide interactive haptics. We have implemented a prototype system and evaluated its performance.

  • Animation of Mapped Photo Collections for Storytelling

    Hideyuki FUJITA  Masatoshi ARIKAWA  

     
    PAPER-Interface Design

      Page(s):
    1681-1692

    Our research goal is to facilitate the sharing of stories with digital photographs. Some map websites now collect stories associated with peoples' relationships to places. Users map collections of places and include their intangible emotional associations with each location along with photographs, videos, etc. Though this framework of mapping stories is important, it is not sufficiently expressive to communicate stories in a narrative fashion. For example, when the number of the mapped collections of places is particularly large, it is neither easy for viewers to interpret the map nor is it easy for the creator to express a story as a series of events in the real world. This is because each narrative, in the form of a sequence of textual narratives, a sequence of photographs, a movie, or audio is mapped to just one point. As a result, it is up to the viewer to decide which points on the map must be read, and in what order. The conventional framework is fairly suitable for mapping and expressing fragments or snapshots of a whole story and not for conveying the whole story as a narrative using the entire map as the setting. We therefore propose a new framework, Spatial Slideshow, for mapping personal photo collections and representing them as stories such as route guidances, sightseeing guidances, historical topics, fieldwork records, personal diaries, and so on. It is a fusion of personal photo mapping and photo storytelling. Each story is conveyed through a sequence of mapped photographs, presented as a synchronized animation of a map and an enhanced photo slideshow. The main technical novelty of this paper is a method for creating three-dimensional animations of photographs that induce the visual effect of motion from photo to photo. We believe that the proposed framework may have considerable significance in facilitating the grassroots development of spatial content driven by visual communication concerning real-world locations or events.

  • Control of Speed and Power in a Humanoid Robot Arm Using Pneumatic Actuators for Human-Robot Coexisting Environment

    Kiyoshi HOSHINO  

     
    PAPER-Interface Design

      Page(s):
    1693-1699

    A new type of humanoid robot arm which can coexist and be interactive with human beings are looked for. For the purpose of implementation of human smooth and fast movement to a pneumatic robot, the author used a humanoid robot arm with pneumatic agonist-antagonist actuators as endoskeletons which has control mechanism in the stiffness of each joint, and the controllability was experimentally discussed. Using Kitamori 's method to experimentally decide the control gains and using I-PD controller, three joints of the humanoid robot arm were experimentally controlled. The damping control algorithm was also adopted to the wrist joint, to modify the speed in accordance with the power. The results showed that the controllability to step-wise input was less than one degree in error to follow the target angles, and the time constant was less than one second. The simultaneous input of command to three joints was brought about the overshoot of about ten percent increase in error. The humanoid robot arm can generate the calligraphic motions, moving quickly at some times but slowly at other times, or particularly softly on some occasions but stiffly on other occasions at high accuracy.

  • Prototyping Tool for Web-Based Multiuser Online Role-Playing Game

    Shusuke OKAMOTO  Masaru KAMADA  Tatsuhiro YONEKURA  

     
    LETTER-Interface Design

      Page(s):
    1700-1703

    This letter proposes a prototyping tool for Web-based Multiuser Online Role-Playing Game (MORPG). The design goal is to make this tool simple and powerful. The tool is comprised of a GUI editor, a translator and a runtime environment. The GUI editor is used to edit state-transition diagrams, each of which defines the behavior of the fictional characters. The state-transition diagrams are translated into C program codes, which plays the role of a game engine in RPG system. The runtime environment includes PHP, JavaScript with Ajax and HTML. So the prototype system can be played on the usual Web browser, such as Firefox, Safari and IE. On a click or key press by a player, the Web browser sends it to the Web server to reflect its consequence on the screens which other players are looking at. Prospected users of this tool include programming novices and schoolchildren. The knowledge or skill of any specific programming languages is not required to create state-transition diagrams. Its structure is not only suitable for the definition of a character behavior but also intuitive to help novices understand. Therefore, the users can easily create Web-based MORPG system with the tool.

  • Regular Section
  • Polynomial Time Identification of Strict Deterministic Restricted One-Counter Automata in Some Class from Positive Data

    Mitsuo WAKATSUKI  Etsuji TOMITA  

     
    PAPER-Algorithm Theory

      Page(s):
    1704-1718

    A deterministic pushdown automaton (dpda) having just one stack symbol is called a deterministic restricted one-counter automaton (droca). When it accepts an input by empty stack, it is called strict. This paper is concerned with a subclass of real-time strict droca's, called Szilard strict droca's, and studies the problem of identifying the subclass in the limit from positive data. The class of languages accepted by Szilard strict droca's coincides with the class of Szilard languages (or, associated languages) of strict droca's and is incomparable to each of the class of regular languages and that of simple languages. After providing some properties of languages accepted by Szilard strict droca's, we show that the class of Szilard strict droca's is polynomial time identifiable in the limit from positive data in the sense of Yokomori. This identifiability is proved by giving an exact characteristic sample of polynomial size for a language accepted by a Szilard strict droca. The class of very simple languages, which is a proper subclass of simple languages, is also proved to be polynomial time identifiable in the limit from positive data by Yokomori, but it is yet unknown whether there exists a characteristic sample of polynomial size for any very simple language.

  • Efficient Storage and Querying of Horizontal Tables Using a PIVOT Operation in Commercial Relational DBMSs

    Sung-Hyun SHIN  Yang-Sae MOON  Jinho KIM  Sang-Wook KIM  

     
    PAPER-Database

      Page(s):
    1719-1729

    In recent years, a horizontal table with a large number of attributes is widely used in OLAP or e-business applications to analyze multidimensional data efficiently. For efficient storing and querying of horizontal tables, recent works have tried to transform a horizontal table to a traditional vertical table. Existing works, however, have the drawback of not considering an optimized PIVOT operation provided (or to be provided) in recent commercial RDBMSs. In this paper we propose a formal approach that exploits the optimized PIVOT operation of commercial RDBMSs for storing and querying of horizontal tables. To achieve this goal, we first provide an overall framework that stores and queries a horizontal table using an equivalent vertical table. Under the proposed framework, we then formally define 1) a method that stores a horizontal table in an equivalent vertical table and 2) a PIVOT operation that converts a stored vertical table to an equivalent horizontal view. Next, we propose a novel method that transforms a user-specified query on horizontal tables to an equivalent PIVOT-included query on vertical tables. In particular, by providing transformation rules for all five elementary operations in relational algebra as theorems, we prove our method is theoretically applicable to commercial RDBMSs. Experimental results show that, compared with the earlier work, our method reduces storage space significantly and also improves average performance by several orders of magnitude. These results indicate that our method provides an excellent framework to maximize performance in handling horizontal tables by exploiting the optimized PIVOT operation in commercial RDBMSs.

  • Efficient Query-by-Content Audio Retrieval by Locality Sensitive Hashing and Partial Sequence Comparison

    Yi YU  Kazuki JOE  J. Stephen DOWNIE  

     
    PAPER-Contents Technology and Web Information Systems

      Page(s):
    1730-1739

    This paper investigates suitable indexing techniques to enable efficient content-based audio retrieval in large acoustic databases. To make an index-based retrieval mechanism applicable to audio content, we investigate the design of Locality Sensitive Hashing (LSH) and the partial sequence comparison. We propose a fast and efficient audio retrieval framework of query-by-content and develop an audio retrieval system. Based on this framework, four different audio retrieval schemes, LSH-Dynamic Programming (DP), LSH-Sparse DP (SDP), Exact Euclidian LSH (E2LSH)-DP, E2LSH-SDP, are introduced and evaluated in order to better understand the performance of audio retrieval algorithms. The experimental results indicate that compared with the traditional DP and the other three compititive schemes, E2LSH-SDP exhibits the best tradeoff in terms of the response time, retrieval accuracy and computation cost.

  • A Real-Time Decision Support System for Voltage Collapse Avoidance in Power Supply Networks

    Chen-Sung CHANG  

     
    PAPER-Artificial Intelligence and Cognitive Science

      Page(s):
    1740-1747

    This paper presents a real-time decision support system (RDSS) based on artificial intelligence (AI) for voltage collapse avoidance (VCA) in power supply networks. The RDSS scheme employs a fuzzy hyperrectangular composite neural network (FHRCNN) to carry out voltage risk identification (VRI). In the event that a threat to the security of the power supply network is detected, an evolutionary programming (EP)-based algorithm is triggered to determine the operational settings required to restore the power supply network to a secure condition. The effectiveness of the RDSS methodology is demonstrated through its application to the American Electric Power Provider System (AEP, 30-bus system) under various heavy load conditions and contingency scenarios. In general, the numerical results confirm the ability of the RDSS scheme to minimize the risk of voltage collapse in power supply networks. In other words, RDSS provides Power Provider Enterprises (PPEs) with a viable tool for performing on-line voltage risk assessment and power system security enhancement functions.

  • Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network

    Xiao-Dong WANG  Keikichi HIROSE  Jin-Song ZHANG  Nobuaki MINEMATSU  

     
    PAPER-Pattern Recognition

      Page(s):
    1748-1755

    A method was developed for automatic recognition of syllable tone types in continuous speech of Mandarin by integrating two techniques, tone nucleus modeling and neural network classifier. The tone nucleus modeling considers a syllable F0 contour as consisting of three parts: onset course, tone nucleus, and offset course. Two courses are transitions from/to neighboring syllable F0 contours, while the tone nucleus is intrinsic part of the F0 contour. By viewing only the tone nucleus, acoustic features less affected by neighboring syllables are obtained. When using the tone nucleus modeling, automatic detection of tone nucleus comes crucial. An improvement was added to the original detection method. Distinctive acoustic features for tone types are not limited to F0 contours. Other prosodic features, such as waveform power and syllable duration, are also useful for tone recognition. Their heterogeneous features are rather difficult to be handled simultaneously in hidden Markov models (HMM), but are easy in neural networks. We adopted multi-layer perceptron (MLP) as a neural network. Tone recognition experiments were conducted for speaker dependent and independent cases. In order to show the effect of integration, experiments were conducted also for two baselines: HMM classifier with tone nucleus modeling, and MLP classifier viewing entire syllable instead of tone nucleus. The integrated method showed 87.1% of tone recognition rate in speaker dependent case, and 80.9% in speaker independent case, which was about 10% relative error reduction as compared to the baselines.

  • Local Subspace Classifier with Transform-Invariance for Image Classification

    Seiji HOTTA  

     
    PAPER-Pattern Recognition

      Page(s):
    1756-1763

    A family of linear subspace classifiers called local subspace classifier (LSC) outperforms the k-nearest neighbor rule (kNN) and conventional subspace classifiers in handwritten digit classification. However, LSC suffers very high sensitivity to image transformations because it uses projection and the Euclidean distances for classification. In this paper, I present a combination of a local subspace classifier (LSC) and a tangent distance (TD) for improving accuracy of handwritten digit recognition. In this classification rule, we can deal with transform-invariance easily because we are able to use tangent vectors for approximation of transformations. However, we cannot use tangent vectors in other type of images such as color images. Hence, kernel LSC (KLSC) is proposed for incorporating transform-invariance into LSC via kernel mapping. The performance of the proposed methods is verified with the experiments on handwritten digit and color image classification.

  • The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006

    Heiga ZEN  Tomoki TODA  Keiichi TOKUDA  

     
    PAPER-Speech and Hearing

      Page(s):
    1764-1773

    We describe a statistical parametric speech synthesis system developed by a joint group from the Nagoya Institute of Technology (Nitech) and the Nara Institute of Science and Technology (NAIST) for the annual open evaluation of text-to-speech synthesis systems named Blizzard Challenge 2006. To improve our 2005 system (Nitech-HTS 2005), we investigated new features such as mel-generalized cepstrum-based line spectral pairs (MGC-LSPs), maximum likelihood linear transform (MLLT), and a full covariance global variance (GV) probability density function (pdf). A combination of mel-cepstral coefficients, MLLT, and full covariance GV pdf scored highest in subjective listening tests, and the 2006 system performed significantly better than the 2005 system. The Blizzard Challenge 2006 evaluations show that Nitech-NAIST-HTS 2006 is competitive even when working with relatively large speech databases.

  • The Use of Overlapped Sub-Bands in Multi-Band, Multi-SNR, Multi-Path Recognition of Noisy Word Utterances

    Yutaka TSUBOI  Takehiro IHARA  Kazuyuki TAKAGI  Kazuhiko OZEKI  

     
    PAPER-Speech and Hearing

      Page(s):
    1774-1782

    A solution to the problem of improving robustness to noise in automatic speech recognition is presented in the framework of multi-band, multi-SNR, and multi-path approaches. In our word recognizer, the whole frequency band is divided into seven-overlapped sub-bands, and then sub-band noisy phoneme HMMs are trained on speech data mixed with the filtered white Gaussian noise at multiple SNRs. The acoustic model of a word is built as a set of concatenations of clean and noisy sub-band phoneme HMMs arranged in parallel. A Viterbi decoder allows a search path to transit to another SNR condition at a phoneme boundary. The recognition scores of the sub-bands are then recombined to give the score for a word. Experiments show that the overlapped seven-band system yields the best performance under nonstationary ambient noises. It is also shown that the use of filtered white Gaussian noise is advantageous for training noisy phoneme HMMs.

  • Minimum Mean Absolute Error Predictors for Lossless Image Coding

    Yoshihiko HASHIDUME  Yoshitaka MORIKAWA  Shuichi MAKI  

     
    PAPER-Image Processing and Video Processing

      Page(s):
    1783-1792

    In this paper, we investigate minimum mean absolute error (mmae) predictors for lossless image coding. In some prediction-based lossless image coding systems, coding performance depends largely on the efficiency of predictors. In this case, minimum mean square error (mmse) predictors are often used. Generally speaking, these predictors have a problem that outliers departing very far from a regression line are conspicuous enough to obscure inliers. That is, in image compression, large prediction errors near edges cause the degradation of the prediction accuracy of flat areas. On the other hand, mmae predictors are less sensitive to edges and provide more accurate prediction for flat areas than mmse predictors. At the same time, the prediction accuracy of edge areas is brought down. However, the entropy of the prediction errors based on mmae predictors is reduced compared with that of mmse predictors because general images mainly consist of flat areas. In this study, we adopt the Laplacian and the Gaussian function models for prediction errors based on mmae and mmse predictors, respectively, and show that mmae predictors outperform conventional mmse-based predictors including weighted mmse predictors in terms of coding performance.

  • Specific and Class Object Recognition for Service Robots through Autonomous and Interactive Methods

    Al MANSUR  Yoshinori KUNO  

     
    PAPER-Image Recognition, Computer Vision

      Page(s):
    1793-1803

    Service robots need to be able to recognize and identify objects located within complex backgrounds. Since no single method may work in every situation, several methods need to be combined and robots have to select the appropriate one automatically. In this paper we propose a scheme to classify situations depending on the characteristics of the object of interest and user demand. We classify situations into four groups and employ different techniques for each. We use Scale-invariant feature transform (SIFT), Kernel Principal Components Analysis (KPCA) in conjunction with Support Vector Machine (SVM) using intensity, color, and Gabor features for five object categories. We show that the use of appropriate features is important for the use of KPCA and SVM based techniques on different kinds of objects. Through experiments we show that by using our categorization scheme a service robot can select an appropriate feature and method, and considerably improve its recognition performance. Yet, recognition is not perfect. Thus, we propose to combine the autonomous method with an interactive method that allows the robot to recognize the user request for a specific object and class when the robot fails to recognize the object. We also propose an interactive way to update the object model that is used to recognize an object upon failure in conjunction with the user's feedback.

  • Jigsaw-Puzzle-Like 3D Glyphs for Visualization of Grammatical Constraints

    Noritaka OSAWA  

     
    PAPER-Computer Graphics

      Page(s):
    1804-1812

    Three-dimensional visualization using jigsaw-puzzle-like glyphs, or shapes, is proposed as a means of representing grammatical constraints in programming. The proposed visualization uses 3D glyphs such as convex, concave, and wireframe shapes. A semantic constraint, such as a type constraint in an assignment, is represented by an inclusive match between 3D glyphs. An application of the proposed visualization method to a subset of the Java programming language is demonstrated. An experimental evaluation showed that the 3D glyphs are easier to learn and enable users to more quickly understand their relationships than 2D glyphs and 1D symbol sequences.

  • Improved Clonal Selection Algorithm Combined with Ant Colony Optimization

    Shangce GAO  Wei WANG  Hongwei DAI  Fangjia LI  Zheng TANG  

     
    PAPER-Biocybernetics, Neurocomputing

      Page(s):
    1813-1823

    Both the clonal selection algorithm (CSA) and the ant colony optimization (ACO) are inspired by natural phenomena and are effective tools for solving complex problems. CSA can exploit and explore the solution space parallely and effectively. However, it can not use enough environment feedback information and thus has to do a large redundancy repeat during search. On the other hand, ACO is based on the concept of indirect cooperative foraging process via secreting pheromones. Its positive feedback ability is nice but its convergence speed is slow because of the little initial pheromones. In this paper, we propose a pheromone-linker to combine these two algorithms. The proposed hybrid clonal selection and ant colony optimization (CSA-ACO) reasonably utilizes the superiorities of both algorithms and also overcomes their inherent disadvantages. Simulation results based on the traveling salesman problems have demonstrated the merit of the proposed algorithm over some traditional techniques.

  • A Simple Algorithm for Transposition-Invariant Amplified (δ, γ)-Matching

    Inbok LEE  

     
    LETTER-Algorithm Theory

      Page(s):
    1824-1826

    Approximate pattern matching plays an important role in various applications. In this paper we focus on (δ, γ)-matching, where a character can differ at most δ and the sum of these errors is smaller than γ. We show how to find these matches when the pattern is transformed by yx + β, without knowing α and β in advance.

  • Extending LogicWeb via Hereditary Harrop Formulas

    Keehang KWON  Dae-Seong KANG  

     
    LETTER-Fundamentals of Software and Theory of Programs

      Page(s):
    1827-1829

    We propose HHWeb, an extension to LogicWeb with hereditary Harrop formulas. HHWeb extends the LogicWeb of Loke and Davison by allowing goals of the form ( x1... xn D) G (or equivalently x1... xn(D G)) where D is a web page and G is a goal. This goal is intended to be solved by instantiating x1,...,xn in D by new names and then solving the resulting goal. The existential quantifications at the head of web pages are particularly flexible in controlling the visibility of names. For example, they can provide scope to functions and constants as well as to predicates. In addition, they have such simple semantics that implementation becomes more efficient. Finally, they provide a client-side interface which is useful for customizing web pages.

  • Improved Frame Mode Selection for AMR-WB+ Based on Decision Tree

    Jong Kyu KIM  Nam Soo KIM  

     
    LETTER-Speech and Hearing

      Page(s):
    1830-1833

    In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.

  • Quantization Parameter Refinement in H.264 through ρ-Domain Rate Model

    Yutao DONG  Xiangzhong FANG  Jing YANG  

     
    LETTER-Speech and Hearing

      Page(s):
    1834-1837

    This letter proposes a new algorithm of refining the quantization parameter in H.264 real-time encoding. In the H.264 encoding, the quantization parameter computed according to the quadratic rate model is not accurate in meeting the target bit rate. In order to make the actual encoded bit rate closer to the target bit rate, ρ-domain rate model is introduced in our proposed quantization parameter refinement algorithm. Simulation results show that the proposed algorithm achieves obvious gain in PSNR and has stabler encoded bit rate compared to Jiang's algorithm.

  • Melody Track Selection Using Discriminative Language Model

    Xiao WU  Ming LI  Hongbin SUO  Yonghong YAN  

     
    LETTER-Music Information Processing

      Page(s):
    1838-1840

    In this letter we focus on the task of selecting the melody track from a polyphonic MIDI file. Based on the intuition that music and language are similar in many aspects, we solve the selection problem by introducing an n-gram language model to learn the melody co-occurrence patterns in a statistical manner and determine the melodic degree of a given MIDI track. Furthermore, we propose the idea of using background model and posterior probability criteria to make modeling more discriminative. In the evaluation, the achieved 81.6% correct rate indicates the feasibility of our approach.