IEICE TRANSACTIONS on Information

Impact Factor

0.72
Eigenfactor

0.002
article influence

0.1
Cite Score

1.4

To the Advance publication
To the Archives

Advance publication (published online immediately after acceptance)

Vision Transformer with Key-select Routing Attention for Single Image Dehazing
Lihan TONG Weijia LI Qingxia YANG Liyuan CHEN Peng CHEN

Pubricized:
2024/07/01
- Summary
- Free PDF (2MB)
Towards Superior Pruning Performance in Federated Learning with Discriminative Data
Yinan YANG

Pubricized:
2024/06/27
- Summary
- Free PDF (7.9MB)
CLEAR & RETURN: Stopping Run-time Countermeasures in Cryptographic Primitives
Myung-Hyun KIM Seungkwang LEE

Pubricized:
2024/06/26
- Summary
- Free PDF (1.7MB)
SH-YOLO: Small Target High Performance YOLO for abnormal behavior detection in escalator scene
Shuoyan LIU Chao LI Yuxin LIU Yanqiu WANG

Pubricized:
2024/06/26
- Summary
- Free PDF (708.4KB)
Design and implementation of opto-electrical hybrid floating-point multipliers
Takumi INABA Takatsugu ONO Koji INOUE Satoshi KAWAKAMI

Pubricized:
2024/06/26
- Summary
- Free PDF (2.5MB)
Geometric Refactoring of Quantum and Reversible Circuits using Graph Algorithms
Martin LUKAC Saadat NURSULTAN Georgiy KRYLOV Oliver KESZOCZE Abilmansur RAKHMETTULAYEV Michitaka KAMEYAMA

Pubricized:
2024/06/24
- Summary
- Free PDF (1010KB)
IAD-Net: Single-Image Dehazing Network Based on Image Attention
Zheqing ZHANG Hao ZHOU Chuan LI Weiwei JIANG

Pubricized:
2024/06/20
- Summary
- Free PDF (9.8MB)
Improving the Accuracy of Differential-Neural Distinguisher For DES, Chaskey, and PRESENT
Liu ZHANG Zilong WANG Yindong CHEN

Pubricized:
2024/06/20
- Summary
- Free PDF (355.3KB)
Multi-Scale Contrastive Learning for Human Pose Estimation
Wenxia Bao An Lin Hua Huang Xianjun Yang Hemu Chen

Pubricized:
2024/06/17
- Summary
- Free PDF (1MB)
HDR-VDA: A Full Stage Data Augmentation Method for HDR Video Reconstruction
Fengshan ZHAO Qin LIU Takeshi IKENAGA

Pubricized:
2024/06/17
- Summary
- Free PDF (1.2MB)
Evaluating Introduction of Systems by Goal Dependency Modeling
Haruhiko KAIYA Shinpei OGATA Shinpei HAYASHI

Pubricized:
2024/06/11
- Summary
- Free PDF (1.3MB)
MISpeller: Multimodal Information Enhancement for Chinese Spelling Correction
Jiakai LI Jianyong DUAN Hao WANG Li HE Qing ZHANG

Pubricized:
2024/06/07
- Summary
- Free PDF (3.3MB)
Integrating Event Elements for Chinese-Vietnamese Cross-lingual Event Retrieval
Yuxin HUANG Yuanlin YANG Enchang ZHU Yin LIANG Yantuan XIAN

Pubricized:
2024/06/04
- Summary
- Free PDF (3.9MB)
Space-efficient FPT Algorithms for Degeneracy
Naohito MATSUMOTO Kazuhiro KURITA Masashi KIYOMI

Pubricized:
2024/05/31
- Summary
- Free PDF (101.3KB)
Learning Fast Deployment for UAV-Assisted Disaster System
Na XING Lu LI Ye ZHANG Shiyi YANG

Pubricized:
2024/05/30
- Summary
- Free PDF (10.3MB)
TDEM: Table data extraction model based on cell segmentation
Zhe Wang Zhe-Ming Lu Hao Luo Yang-Ming Zheng

Pubricized:
2024/05/30
- Summary
- Free PDF (838KB)
Reliable image matching using optimal combination of color and intensity information based on relationship with surrounding objects
Rina TAGAMI Hiroki KOBAYASHI Shuichi AKIZUKI Manabu HASHIMOTO

Pubricized:
2024/05/30
- Summary
- Free PDF (4.2MB)
The Least Core of Routing Game Without Triangle Inequality
Tomohiro KOBAYASHI Tomomi MATSUI

Pubricized:
2024/05/30
- Summary
- Free PDF (232.9KB)
Enumerating floorplans with Aligned Columns
Shin-ichi NAKANO

Pubricized:
2024/05/30
- Summary
- Free PDF (365.4KB)
A Two-Phase Algorithm for Reliable and Energy-Efficient Heterogeneous Embedded Systems
Hongzhi XU Binlian ZHANG

Pubricized:
2024/05/27
- Summary
- Free PDF (902.4KB)
Smart Contract Timestamp Vulnerability Detection Based on Code Homogeneity
Weizhi WANG Lei XIA Zhuo ZHANG Xiankai MENG

Pubricized:
2024/05/27
- Summary
- Free PDF (706KB)
Neural End-to-end Speech Translation Leveraged by ASR Posterior Distribution
Yuka KO Katsuhito SUDOH Sakriani SAKTI Satoshi NAKAMURA

Pubricized:
2024/05/24
- Summary
- Free PDF (1.5MB)
Watermarking Method with Scaling Rate Estimation Using Pilot Signal
Rinka KAWANO Masaki KAWAMURA

Pubricized:
2024/05/22
- Summary
- Free PDF (1MB)
Type-enhanced Ensemble Triple Representation via Triple-aware Attention for Cross-lingual Entity Alignment
Zhishuo ZHANG Chengxiang TAN Xueyan ZHAO Min YANG

Pubricized:
2024/05/22
- Summary
- Free PDF (5.1MB)
Joint Optimization of Task Offloading and Resource Allocation for UAV-Assisted Edge Computing: A Stackelberg Bilayer Game Approach
Peng WANG Guifen CHEN Zhiyao SUN

Pubricized:
2024/05/21
- Summary
- Free PDF (678.2KB)
EfficientNet Empowered by Dendritic Learning for Diabetic Retinopathy
Zeyuan JU Zhipeng LIU Yu GAO Haotian LI Qianhang DU Kota YOSHIKAWA Shangce GAO

Pubricized:
2024/05/20
- Summary
- Free PDF (517.4KB)
6T-8T hybrid SRAM for lower-power neural-network processing by lowering operating voltage
Ji WU Ruoxi YU Kazuteru NAMBA

Pubricized:
2024/05/20
- Summary
- Free PDF (495.1KB)
Chinese Spelling Correction Based on Knowledge Enhancement and Contrastive Learning
Hao WANG Yao Ma Jianyong Duan Li HE Xin Li

Pubricized:
2024/05/17
- Summary
- Free PDF (1.2MB)
TIG: A Multitask Temporal Interval Guided Framework for Key Frame Detection
Shijie WANG Xuejiao HU Sheng LIU Ming LI Yang LI Sidan DU

Pubricized:
2024/05/17
- Summary
- Free PDF (10.6MB)
Node-to-node and Node-to-set Disjoint Paths Problems in Bicubes
Arata KANEKO Htoo Htoo Sandi KYAW Kunihiro FUJIYOSHI Keiichi KANEKO

Pubricized:
2024/05/17
- Summary
- Free PDF (1MB)
Remote Sensing Image Dehazing Using Multi-Scale Gated Attention For Flight Simulator
Qi LIU Bo WANG Shihan TAN Shurong ZOU Wenyi GE

Pubricized:
2024/05/14
- Summary
- Free PDF (4.2MB)
Large Class Detection using GNNs: A graph based deep learning approach utilizing three typical GNN model architectures
HanYu Zhang Tomoji Kishi

Pubricized:
2024/05/14
- Summary
- Free PDF (1.4MB)
Functional Decomposition of Symmetric Multiple-Valued Functions and Their Compact Representation in Decision Diagrams
Shinobu NAGAYAMA Tsutomu SASAO Jon T. BUTLER

Pubricized:
2024/05/14
- Summary
- Free PDF (303.5KB)
Greedy selection of sensors for linear Bayesian estimation under correlated noise
Yoon Hak KIM

Pubricized:
2024/05/14
- Summary
- Free PDF (115KB)
New Bounds for Quick Computation of the Lower Bound on the Gate Count of Toffoli-Based Reversible Logic Circuits
Takashi HIRAYAMA Rin SUZUKI Katsuhisa YAMANAKA Yasuaki NISHITANI

Pubricized:
2024/05/10
- Summary
- Free PDF (222.1KB)
Evaluation of Multi-valued Data Transmission in Two-Dimensional Symbol Mapping using Linear Mixture Model
Yosuke IIJIMA Atsunori OKADA Yasushi YUMINAKA

Pubricized:
2024/05/09
- Summary
- Free PDF (8.2MB)
Using Genetic Algorithm and Mathematical Programming Model for Ambulance Location Problem in Emergency Medical Service
Batnasan Luvaanjalba Elaine Yi-Ling Wu

Pubricized:
2024/05/08
- Summary
- Free PDF (909.3KB)
Enhanced Data Transfer Cooperating with Artificial Triplets for Scene Graph Generation
KuanChao CHU Satoshi YAMAZAKI Hideki NAKAYAMA

Pubricized:
2024/04/30
- Summary
- Free PDF (9.2MB)
A mmWave sensor and camera fusion system for indoor occupancy detection and tracking
Shenglei LI Haoran LUO Tengfei SHAO Reiko HISHIYAMA

Pubricized:
2024/04/26
- Summary
- Free PDF (4.1MB)
Evaluating PAM-4 Data Transmission Quality using Multi-Dimensional Mapping of Received Symbols
Yasushi YUMINAKA Kazuharu NAKAJIMA Yosuke IIJIMA

Pubricized:
2024/04/25
- Summary
- Free PDF (7.5MB)
Unsupervised Intrusion Detection Based on Asymmetric Auto-Encoder Feature Extraction
Chunbo Liu Liyin Wang Zhikai Zhang Chunmiao Xiang Zhaojun Gu Zhi Wang Shuang Wang

Pubricized:
2024/04/25
- Summary
- Free PDF (1.2MB)
Reinforced Voxel-RCNN:An efficient 3D Object Detection Method Based on Feature Aggregation
Jia-ji JIANG Hai-bin WAN Hong-min SUN Tuan-fa QIN Zheng-qiang WANG

Pubricized:
2024/04/24
- Summary
- Free PDF (4.6MB)
A Channel Contrastive Attention-based Local-Nonlocal Mutual block on Super-Resolution
Yuhao LIU Zhenzhong CHU Lifei WEI

Pubricized:
2024/04/23
- Summary
- Free PDF (1.5MB)
Error-Tolerance-Aware Write-Energy Reduction of MTJ-Based Quantized Neural Network Hardware
Ken ASANO Masanori NATSUI Takahiro HANYU

Pubricized:
2024/04/22
- Summary
- Free PDF (2.1MB)
Skin diagnostic method using Fontana-Masson stained images of stratum corneum cells
Shuto HASEGAWA Koichiro ENOMOTO Taeko MIZUTANI Yuri OKANO Takenori TANAKA Osamu SAKAI

Pubricized:
2024/04/19
- Summary
- Free PDF (6.6MB)
Confidence-Driven Contrastive Learning for Document Classification without Annotated Data
Zhewei XU Mizuho IWAIHARA

Pubricized:
2024/04/19
- Summary
- Free PDF (2.2MB)
Delta-Sigma Domain Signal Processing Revisited with Related Topics in Stochastic Computing
Takao WAHO Akihisa KOYAMA Hitoshi HAYASHI

Pubricized:
2024/04/17
- Summary
- Free PDF (1.7MB)
Extending Binary Neural Networks to Bayesian Neural Networks with Probabilistic Interpretation of Binary Weights
Taisei SAITO Kota ANDO Tetsuya ASAI

Pubricized:
2024/04/17
- Summary
- Free PDF (1.1MB)
Unveiling Python Version Compatibility Challenges in Code Snippets on Stack Overflow
Shiyu YANG Tetsuya KANDA Daniel M. GERMAN Yoshiki HIGO

Pubricized:
2024/04/16
- Summary
- Free PDF (488.3KB)
On Easily Reconstructable Logic Functions
Tsutomu SASAO

Pubricized:
2024/04/16
- Summary
- Free PDF (240.7KB)
Tracking WebVR User Activities through Hand Motions: An Attack Perspective
Jiyeon LEE

Pubricized:
2024/04/16
- Summary
- Free PDF (3MB)
Permissionless Blockchain-Based Sybil-Resistant Self-Sovereign Identity Utilizing Attested Execution Secure Processors
Koichi MORIYAMA Akira OTSUKA

Pubricized:
2024/04/15
- Summary
- Free PDF (4.1MB)
Cross-Corpus Speech Emotion Recognition Based on Causal Emotion Information Representation
Hongliang FU Qianqian LI Huawei TAO Chunhua ZHU Yue XIE Ruxue GUO

Pubricized:
2024/04/12
- Summary
- Free PDF (8.1MB)
Investigating and Enhancing the Neural Distinguisher for Differential Cryptanalysis
Gao WANG Gaoli WANG Siwei SUN

Pubricized:
2024/04/12
- Summary
- Free PDF (1.7MB)
Nuclear Norm Minus Frobenius Norm Minimization with Rank Residual Constraint for Image Denoising
Hua HUANG Yiwen SHAN Chuan LI Zhi WANG

Pubricized:
2024/04/09
- Summary
- Free PDF (6.9MB)
Improved Just Noticeable Difference Model Based Algorithm for Fast CU Partition in V-PCC
Zhi LIU Heng WANG Yuan LI Hongyun LU Hongyuan JING Mengmeng ZHANG

Pubricized:
2024/04/05
- Summary
- Free PDF (1.1MB)
MDX-Mixer: Music Demixing by Leveraging Source Signals Separated by Existing Demixing Models
Tomoyasu NAKANO Masataka GOTO

Pubricized:
2024/04/05
- Summary
- Free PDF (2.4MB)
Machine Learning-based System for Heat-Resistant Analysis of Car Lamp Design
Hyebong CHOI Joel SHIN Jeongho KIM Samuel YOON Hyeonmin PARK Hyejin CHO Jiyoung JUNG

Pubricized:
2024/04/03
- Summary
- Free PDF (1.4MB)
Agent Allocation-Action Learning with Dynamic Heterogeneous Graph in Multi-task Games
Xianglong LI Yuan LI Jieyuan ZHANG Xinhai XU Donghong LIU

Pubricized:
2024/04/03
- Summary
- Free PDF (3.9MB)
FSAMT : Face Shape Adaptive Makeup Transfer
Haoran LUO Tengfei SHAO Shenglei LI Reiko HISHIYAMA

Pubricized:
2024/04/02
- Summary
- Free PDF (984.6KB)
Artifact Removal Using Attention Guided Local-Global Dual-Stream Network for Sparse-View CT Reconstruction
Chang SUN Yitong LIU Hongwen YANG

Pubricized:
2024/03/29
- Summary
A CNN-based feature pyramid segmentation Strategy for acoustic；scene classification
Ji XI Yue XIE Pengxu JIANG Wei JIANG

Pubricized:
2024/03/26
- Summary
An IP Core Protection Scheme Based on Hybrid Lightweight Encryption for Neuromorphic Computing System
Ming PAN

The aritcle processing charge of this paper has not been paid.

Pubricized:
2022/09/14
- Summary

Whole issue

Volume E91-D No.6 (Publication Date:2008/06/01)

Special Section on Human Communication III

FOREWORD Open Access
Shunichi YONEMURA

FOREWORD

Page(s):
1593-1593
- HTML
- Free PDF (43.9KB)
Dive into the Movie
Shigeo MORISHIMA

INVITED PAPER

Page(s):
1594-1603
"Dive into the Movie (DIM)" is a name of project to aim to realize a world innovative entertainment system which can provide an immersion experience into the story by giving a chance to audience to share an impression with his family or friends by watching a movie in which all audience can participate in the story as movie casts. To realize this system, several techniques to model and capture the personal characteristics instantly in face, body, gesture, hair and voice by combining computer graphics, computer vision and speech signal processing technique. Anyway, all of the modeling, casting, character synthesis, rendering and compositing processes have to be performed on real-time without any operator. In this paper, first a novel entertainment system, Future Cast System (FCS), is introduced which can create DIM movie with audience's participation by replacing the original roles' face in a pre-created CG movie with audiences' own highly realistic 3D CG faces. Then the effects of DIM movie on audience experience are evaluated subjectively. The result suggests that most of the participants are seeking for higher realism, impression and satisfaction by replacing not only face part but also body, hair and voice. The first experimental trial demonstration of FCS was performed at the Mitsui-Toshiba pavilion of the 2005 World Exposition in Aichi Japan. Then, 1,640,000 people have experienced this event during 6 months of exhibition and FCS became one of the most popular events at Expo.2005.
An Effective QoS Control Scheme for 3D Virtual Environments Based on User's Perception
Takayuki KURODA Takuo SUGANUMA Norio SHIRATORI

PAPER-Media Communication

Page(s):
1604-1612
In this paper, we present a new three-dimensional (3D) virtual environment (3DVE) system named "QuViE/P", which can enhance quality of service (QoS), that users actually feel, as good as possible when resources of computers and networks are limited. To realize this, we focus on characteristics of user's perceptual quality evaluation on 3D objects. We propose an effective QoS control scheme for QuViE/P by introducing relationships between system's internal quality parameters and user's perceptual quality parameters. This scheme can appropriately maintain the QoS of the 3DVE system and it is expected to improve convenience when using 3DVE system where resources are insufficient. We designed and implemented a prototype of QuViE/P using a multiagent framework. The experiment results show that even when the computer resource is reduced to 20% of the required amount, the proposed scheme can maintain the quality of important objects to a certain level.
Study of Spatial Configurations of Equipment for Online Sign Interpretation Service
Kaoru NAKAZONO Saori TANAKA

PAPER-Media Communication

Page(s):
1613-1621
This paper discusses the design of configurations of videophone equipment aimed at online sign interpretation. We classified interpretation services into three types of situations: on-site interpretation, partial online interpretation, and full online interpretation. For each situation, the spatial configurations of the equipment are considered keeping the issue of nonverbal signals in mind. Simulation experiments of sign interpretation were performed using these spatial configurations and the qualities of the configurations were assessed. The preferred configurations had the common characteristics that the hearing subject could see the face of his/her principal conversation partner, that is, the deaf subject. The results imply that hearing people who do not understand sign language utilize nonverbal signals for facilitating interpreter-mediated conversation.
Online Chat Dependency: The Influence of Social Anxiety
Chih-Chien WANG Shu-Chen CHANG

PAPER-Media Communication

Page(s):
1622-1627
Recent developments in information technology have made it easy for people to "chat" online with others in real time, and many do so regularly. "Virtual" relationships can be attractive, especially for people with social interaction problems in the "real world". This study examines the influence on online chat dependency of three dimensions of social anxiety: general social situation fear, negative evaluation fear, and novel social situation fear. Participants of this study were 454 college students. The survey results show that negative evaluation fear and general social situation fear are relative to online chat dependency, while novel social situation fear does not seem to be a relevant factor.
Facial Expression Generation from Speaker's Emotional States in Daily Conversation
Hiroki MORI Koh OHSHIMA

PAPER-Media Communication

Page(s):
1628-1633
A framework for generating facial expressions from emotional states in daily conversation is described. It provides a mapping between emotional states and facial expressions, where the former is represented by vectors with psychologically-defined abstract dimensions, and the latter is coded by the Facial Action Coding System. In order to obtain the mapping, parallel data with rated emotional states and facial expressions were collected for utterances of a female speaker, and a neural network was trained with the data. The effectiveness of proposed method is verified by a subjective evaluation test. As the result, the Mean Opinion Score with respect to the suitability of generated facial expression was 3.86 for the speaker, which was close to that of hand-made facial expressions.
Body Movement Synchrony in Psychotherapeutic Counseling: A Study Using the Video-Based Quantification Method
Chika NAGAOKA Masashi KOMORI

PAPER-Human Information Processing

Page(s):
1634-1640
Body movement synchrony (i.e. rhythmic synchronization between the body movements of interacting partners) has been described by subjective impressions of skilled counselors and has been considered to reflect the depth of the client-counselor relationship. This study analyzed temporal changes in body movement synchrony through a video analysis of client-counselor dialogues in counseling sessions. Four 50-minute psychotherapeutic counseling sessions were analyzed, including two negatively evaluated sessions (low evaluation groups) and two positively evaluated sessions (high evaluation groups). In addition, two 50-minute ordinary advice sessions between two high school teachers and the clients in the high rating group were analyzed. All sessions represent role-playing. The intensity of the participants' body movement was measured using a video-based system. Temporal change of body movement synchrony was analyzed using moving correlations of the intensity between the two time series. The results revealed (1) A consistent temporal pattern among the four counseling cases, though the moving correlation coefficients were higher for the high evaluation group than the low evaluation group and (2) Different temporal patterns for the counseling and advice sessions even when the clients were the same. These results were discussed from the perspective of the quality of client-counselor relationship.
Separation between Sound and Light Enhances Audio-Visual Prior Entry Effect
Yuki HONGOH Shinichi KITA Yoshiharu SOETA

PAPER-Human Information Processing

Page(s):
1641-1648
We examined how spatial disparity between the auditory and visual stimuli modulated the audio-visual (A-V) prior entry effect. Spatial and temporal proximity of multisensory stimuli are crucial factors for multisensory perception in most cases (e.g. [1],[2]). However our previous research[3],[4] suggested that this well-accepted hypothesis was not applicable to the A-V prior entry effect. In order to examine the effect of the spatial disparity on the A-V prior entry effect, six loudspeakers and two light emitting diodes (LEDs) were used as stimuli. The loudspeakers were located at 10, 25, and 90 degrees from the midline of the participants to both right and left sides. A preceding sound was presented from one of these six loudspeakers. After the preceding sound, two visual targets were presented successively at a short interval and participants judged which visual target was presented first. Two colour changeable ('red' or 'green') LEDs were used for the visual targets and participants judged the order of visual targets by their colour not by their side in order to avoid the response bias as much as possible. The visual targets were situated at 10 degrees or 25 degrees from the participants' midline to both right and left in the Experiment 1. Results showed a biased judgment that the visual target at the sound presented side was presented first. The amplitude of the A-V prior entry effect was greater when the preceding sound source was more apart from the midline of participants. This effect of spatial separation indicated that the clarity of either right or left side of the preceding sound enhanced the amplitude of the A-V prior entry effect (Experiment 2). These results challenge the belief that the spatial proximity of multisensory stimuli is a crucial factor for multisensory perception.
Mechanism of Perceptual Categorization in the Pre-Linguistic Period
Tamami SUDO Ken MOGI

PAPER-Human Information Processing

Page(s):
1649-1655
In this study, we conducted a series of experiments using stimuli characterized by various attributes in order to understand the categorization process in an infant's pre-linguistic development. The infants are able to assign the same label to members within the same category by focusing attention on specific features or functions common to the members. The ability to categorize is likely to play an essential role in an infant's overall cognitive development. Specifically, we investigated how the infants use different strategies in the process of linguistic categorization. In one strategy, members of a single category are derived from perceptual similarities within the most representative members, i.e., the prototypical members. Alternatively, each membership is established by referring to the linguistic labels for each category provided by the caretaker, in a symbol grounding process. We found that the infant is able to employ these strategies in a flexible manner in its development. We discuss the interplay between different cognitive strategies, including the prototype effects in the infant's cognitive development and the implications for cortical mechanism involved.
An MEG Study of Temporal Characteristics of Semantic Integration in Japanese Noun Phrases
Hirohisa KIGUCHI Nobuhiko ASAKURA

PAPER-Human Information Processing

Page(s):
1656-1663
Many studies of on-line comprehension of semantic violations have shown that the human sentence processor rapidly constructs a higher-order semantic interpretation of the sentence. What remains unclear, however, is the amount of time required to detect semantic anomalies while concatenating two words to form a phrase with very rapid stimuli presentation. We aimed to examine the time course of semantic integration in concatenating two words in phrase structure building, using magnetoencephalography (MEG). In the MEG experiment, subjects decided whether two words (a classifier and its corresponding noun), presented each for 66 ms, form a semantically correct noun phrase. Half of the stimuli were matched pairs of classifiers and nouns. The other half were mismatched pairs of classifiers and nouns. In the analysis of MEG data, there were three primary peaks found at approximately 25 ms (M1), 170 ms (M2) and 250 ms (M3) after the presentation of the target words. As a result, only the M3 latencies were significantly affected by the stimulus conditions. Thus, the present results indicate that the semantic integration in concatenating two words starts from approximately 250 ms.
A Collaborative Knowledge Management Process for Implementing Healthcare Enterprise Information Systems
Po-Hsun CHENG Sao-Jie CHEN Jin-Shin LAI Feipei LAI

PAPER-Interface Design

Page(s):
1664-1672
This paper illustrates a feasible health informatics domain knowledge management process which helps gather useful technology information and reduce many knowledge misunderstandings among engineers who have participated in the IBM mainframe rightsizing project at National Taiwan University (NTU) Hospital. We design an asynchronously sharing mechanism to facilitate the knowledge transfer and our health informatics domain knowledge management process can be used to publish and retrieve documents dynamically. It effectively creates an acceptable discussion environment and even lessens the traditional meeting burden among development engineers. An overall description on the current software development status is presented. Then, the knowledge management implementation of health information systems is proposed.
Interactive Cosmetic Makeup of a 3D Point-Based Face Model
Jeong-Sik KIM Soo-Mi CHOI

PAPER-Interface Design

Page(s):
1673-1680
We present an interactive system for cosmetic makeup of a point-based face model acquired by 3D scanners. We first enhance the texture of a face model in 3D space using low-pass Gaussian filtering, median filtering, and histogram equalization. The user is provided with a stereoscopic display and haptic feedback, and can perform simulated makeup tasks including the application of foundation, color makeup, and lip gloss. Fast rendering is achieved by processing surfels using the GPU, and we use a BSP tree data structure and a dynamic local refinement of the facial surface to provide interactive haptics. We have implemented a prototype system and evaluated its performance.
Animation of Mapped Photo Collections for Storytelling
Hideyuki FUJITA Masatoshi ARIKAWA

PAPER-Interface Design

Page(s):
1681-1692
Our research goal is to facilitate the sharing of stories with digital photographs. Some map websites now collect stories associated with peoples' relationships to places. Users map collections of places and include their intangible emotional associations with each location along with photographs, videos, etc. Though this framework of mapping stories is important, it is not sufficiently expressive to communicate stories in a narrative fashion. For example, when the number of the mapped collections of places is particularly large, it is neither easy for viewers to interpret the map nor is it easy for the creator to express a story as a series of events in the real world. This is because each narrative, in the form of a sequence of textual narratives, a sequence of photographs, a movie, or audio is mapped to just one point. As a result, it is up to the viewer to decide which points on the map must be read, and in what order. The conventional framework is fairly suitable for mapping and expressing fragments or snapshots of a whole story and not for conveying the whole story as a narrative using the entire map as the setting. We therefore propose a new framework, Spatial Slideshow, for mapping personal photo collections and representing them as stories such as route guidances, sightseeing guidances, historical topics, fieldwork records, personal diaries, and so on. It is a fusion of personal photo mapping and photo storytelling. Each story is conveyed through a sequence of mapped photographs, presented as a synchronized animation of a map and an enhanced photo slideshow. The main technical novelty of this paper is a method for creating three-dimensional animations of photographs that induce the visual effect of motion from photo to photo. We believe that the proposed framework may have considerable significance in facilitating the grassroots development of spatial content driven by visual communication concerning real-world locations or events.
Control of Speed and Power in a Humanoid Robot Arm Using Pneumatic Actuators for Human-Robot Coexisting Environment
Kiyoshi HOSHINO

PAPER-Interface Design

Page(s):
1693-1699
A new type of humanoid robot arm which can coexist and be interactive with human beings are looked for. For the purpose of implementation of human smooth and fast movement to a pneumatic robot, the author used a humanoid robot arm with pneumatic agonist-antagonist actuators as endoskeletons which has control mechanism in the stiffness of each joint, and the controllability was experimentally discussed. Using Kitamori 's method to experimentally decide the control gains and using I-PD controller, three joints of the humanoid robot arm were experimentally controlled. The damping control algorithm was also adopted to the wrist joint, to modify the speed in accordance with the power. The results showed that the controllability to step-wise input was less than one degree in error to follow the target angles, and the time constant was less than one second. The simultaneous input of command to three joints was brought about the overshoot of about ten percent increase in error. The humanoid robot arm can generate the calligraphic motions, moving quickly at some times but slowly at other times, or particularly softly on some occasions but stiffly on other occasions at high accuracy.
Prototyping Tool for Web-Based Multiuser Online Role-Playing Game
Shusuke OKAMOTO Masaru KAMADA Tatsuhiro YONEKURA

LETTER-Interface Design

Page(s):
1700-1703
This letter proposes a prototyping tool for Web-based Multiuser Online Role-Playing Game (MORPG). The design goal is to make this tool simple and powerful. The tool is comprised of a GUI editor, a translator and a runtime environment. The GUI editor is used to edit state-transition diagrams, each of which defines the behavior of the fictional characters. The state-transition diagrams are translated into C program codes, which plays the role of a game engine in RPG system. The runtime environment includes PHP, JavaScript with Ajax and HTML. So the prototype system can be played on the usual Web browser, such as Firefox, Safari and IE. On a click or key press by a player, the Web browser sends it to the Web server to reflect its consequence on the screens which other players are looking at. Prospected users of this tool include programming novices and schoolchildren. The knowledge or skill of any specific programming languages is not required to create state-transition diagrams. Its structure is not only suitable for the definition of a character behavior but also intuitive to help novices understand. Therefore, the users can easily create Web-based MORPG system with the tool.

Regular Section

Polynomial Time Identification of Strict Deterministic Restricted One-Counter Automata in Some Class from Positive Data
Mitsuo WAKATSUKI Etsuji TOMITA

PAPER-Algorithm Theory

Page(s):
1704-1718
A deterministic pushdown automaton (dpda) having just one stack symbol is called a deterministic restricted one-counter automaton (droca). When it accepts an input by empty stack, it is called strict. This paper is concerned with a subclass of real-time strict droca's, called Szilard strict droca's, and studies the problem of identifying the subclass in the limit from positive data. The class of languages accepted by Szilard strict droca's coincides with the class of Szilard languages (or, associated languages) of strict droca's and is incomparable to each of the class of regular languages and that of simple languages. After providing some properties of languages accepted by Szilard strict droca's, we show that the class of Szilard strict droca's is polynomial time identifiable in the limit from positive data in the sense of Yokomori. This identifiability is proved by giving an exact characteristic sample of polynomial size for a language accepted by a Szilard strict droca. The class of very simple languages, which is a proper subclass of simple languages, is also proved to be polynomial time identifiable in the limit from positive data by Yokomori, but it is yet unknown whether there exists a characteristic sample of polynomial size for any very simple language.
Efficient Storage and Querying of Horizontal Tables Using a PIVOT Operation in Commercial Relational DBMSs
Sung-Hyun SHIN Yang-Sae MOON Jinho KIM Sang-Wook KIM

PAPER-Database

Page(s):
1719-1729
In recent years, a horizontal table with a large number of attributes is widely used in OLAP or e-business applications to analyze multidimensional data efficiently. For efficient storing and querying of horizontal tables, recent works have tried to transform a horizontal table to a traditional vertical table. Existing works, however, have the drawback of not considering an optimized PIVOT operation provided (or to be provided) in recent commercial RDBMSs. In this paper we propose a formal approach that exploits the optimized PIVOT operation of commercial RDBMSs for storing and querying of horizontal tables. To achieve this goal, we first provide an overall framework that stores and queries a horizontal table using an equivalent vertical table. Under the proposed framework, we then formally define 1) a method that stores a horizontal table in an equivalent vertical table and 2) a PIVOT operation that converts a stored vertical table to an equivalent horizontal view. Next, we propose a novel method that transforms a user-specified query on horizontal tables to an equivalent PIVOT-included query on vertical tables. In particular, by providing transformation rules for all five elementary operations in relational algebra as theorems, we prove our method is theoretically applicable to commercial RDBMSs. Experimental results show that, compared with the earlier work, our method reduces storage space significantly and also improves average performance by several orders of magnitude. These results indicate that our method provides an excellent framework to maximize performance in handling horizontal tables by exploiting the optimized PIVOT operation in commercial RDBMSs.
Efficient Query-by-Content Audio Retrieval by Locality Sensitive Hashing and Partial Sequence Comparison
Yi YU Kazuki JOE J. Stephen DOWNIE

PAPER-Contents Technology and Web Information Systems

Page(s):
1730-1739
This paper investigates suitable indexing techniques to enable efficient content-based audio retrieval in large acoustic databases. To make an index-based retrieval mechanism applicable to audio content, we investigate the design of Locality Sensitive Hashing (LSH) and the partial sequence comparison. We propose a fast and efficient audio retrieval framework of query-by-content and develop an audio retrieval system. Based on this framework, four different audio retrieval schemes, LSH-Dynamic Programming (DP), LSH-Sparse DP (SDP), Exact Euclidian LSH (E²LSH)-DP, E²LSH-SDP, are introduced and evaluated in order to better understand the performance of audio retrieval algorithms. The experimental results indicate that compared with the traditional DP and the other three compititive schemes, E²LSH-SDP exhibits the best tradeoff in terms of the response time, retrieval accuracy and computation cost.
A Real-Time Decision Support System for Voltage Collapse Avoidance in Power Supply Networks
Chen-Sung CHANG

PAPER-Artificial Intelligence and Cognitive Science

Page(s):
1740-1747
This paper presents a real-time decision support system (RDSS) based on artificial intelligence (AI) for voltage collapse avoidance (VCA) in power supply networks. The RDSS scheme employs a fuzzy hyperrectangular composite neural network (FHRCNN) to carry out voltage risk identification (VRI). In the event that a threat to the security of the power supply network is detected, an evolutionary programming (EP)-based algorithm is triggered to determine the operational settings required to restore the power supply network to a secure condition. The effectiveness of the RDSS methodology is demonstrated through its application to the American Electric Power Provider System (AEP, 30-bus system) under various heavy load conditions and contingency scenarios. In general, the numerical results confirm the ability of the RDSS scheme to minimize the risk of voltage collapse in power supply networks. In other words, RDSS provides Power Provider Enterprises (PPEs) with a viable tool for performing on-line voltage risk assessment and power system security enhancement functions.
Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network
Xiao-Dong WANG Keikichi HIROSE Jin-Song ZHANG Nobuaki MINEMATSU

PAPER-Pattern Recognition

Page(s):
1748-1755
A method was developed for automatic recognition of syllable tone types in continuous speech of Mandarin by integrating two techniques, tone nucleus modeling and neural network classifier. The tone nucleus modeling considers a syllable F0 contour as consisting of three parts: onset course, tone nucleus, and offset course. Two courses are transitions from/to neighboring syllable F0 contours, while the tone nucleus is intrinsic part of the F0 contour. By viewing only the tone nucleus, acoustic features less affected by neighboring syllables are obtained. When using the tone nucleus modeling, automatic detection of tone nucleus comes crucial. An improvement was added to the original detection method. Distinctive acoustic features for tone types are not limited to F0 contours. Other prosodic features, such as waveform power and syllable duration, are also useful for tone recognition. Their heterogeneous features are rather difficult to be handled simultaneously in hidden Markov models (HMM), but are easy in neural networks. We adopted multi-layer perceptron (MLP) as a neural network. Tone recognition experiments were conducted for speaker dependent and independent cases. In order to show the effect of integration, experiments were conducted also for two baselines: HMM classifier with tone nucleus modeling, and MLP classifier viewing entire syllable instead of tone nucleus. The integrated method showed 87.1% of tone recognition rate in speaker dependent case, and 80.9% in speaker independent case, which was about 10% relative error reduction as compared to the baselines.
Local Subspace Classifier with Transform-Invariance for Image Classification
Seiji HOTTA

PAPER-Pattern Recognition

Page(s):
1756-1763
A family of linear subspace classifiers called local subspace classifier (LSC) outperforms the k-nearest neighbor rule (kNN) and conventional subspace classifiers in handwritten digit classification. However, LSC suffers very high sensitivity to image transformations because it uses projection and the Euclidean distances for classification. In this paper, I present a combination of a local subspace classifier (LSC) and a tangent distance (TD) for improving accuracy of handwritten digit recognition. In this classification rule, we can deal with transform-invariance easily because we are able to use tangent vectors for approximation of transformations. However, we cannot use tangent vectors in other type of images such as color images. Hence, kernel LSC (KLSC) is proposed for incorporating transform-invariance into LSC via kernel mapping. The performance of the proposed methods is verified with the experiments on handwritten digit and color image classification.
The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006
Heiga ZEN Tomoki TODA Keiichi TOKUDA

PAPER-Speech and Hearing

Page(s):
1764-1773
We describe a statistical parametric speech synthesis system developed by a joint group from the Nagoya Institute of Technology (Nitech) and the Nara Institute of Science and Technology (NAIST) for the annual open evaluation of text-to-speech synthesis systems named Blizzard Challenge 2006. To improve our 2005 system (Nitech-HTS 2005), we investigated new features such as mel-generalized cepstrum-based line spectral pairs (MGC-LSPs), maximum likelihood linear transform (MLLT), and a full covariance global variance (GV) probability density function (pdf). A combination of mel-cepstral coefficients, MLLT, and full covariance GV pdf scored highest in subjective listening tests, and the 2006 system performed significantly better than the 2005 system. The Blizzard Challenge 2006 evaluations show that Nitech-NAIST-HTS 2006 is competitive even when working with relatively large speech databases.
The Use of Overlapped Sub-Bands in Multi-Band, Multi-SNR, Multi-Path Recognition of Noisy Word Utterances
Yutaka TSUBOI Takehiro IHARA Kazuyuki TAKAGI Kazuhiko OZEKI

PAPER-Speech and Hearing

Page(s):
1774-1782
A solution to the problem of improving robustness to noise in automatic speech recognition is presented in the framework of multi-band, multi-SNR, and multi-path approaches. In our word recognizer, the whole frequency band is divided into seven-overlapped sub-bands, and then sub-band noisy phoneme HMMs are trained on speech data mixed with the filtered white Gaussian noise at multiple SNRs. The acoustic model of a word is built as a set of concatenations of clean and noisy sub-band phoneme HMMs arranged in parallel. A Viterbi decoder allows a search path to transit to another SNR condition at a phoneme boundary. The recognition scores of the sub-bands are then recombined to give the score for a word. Experiments show that the overlapped seven-band system yields the best performance under nonstationary ambient noises. It is also shown that the use of filtered white Gaussian noise is advantageous for training noisy phoneme HMMs.
Minimum Mean Absolute Error Predictors for Lossless Image Coding
Yoshihiko HASHIDUME Yoshitaka MORIKAWA Shuichi MAKI

PAPER-Image Processing and Video Processing

Page(s):
1783-1792
In this paper, we investigate minimum mean absolute error (mmae) predictors for lossless image coding. In some prediction-based lossless image coding systems, coding performance depends largely on the efficiency of predictors. In this case, minimum mean square error (mmse) predictors are often used. Generally speaking, these predictors have a problem that outliers departing very far from a regression line are conspicuous enough to obscure inliers. That is, in image compression, large prediction errors near edges cause the degradation of the prediction accuracy of flat areas. On the other hand, mmae predictors are less sensitive to edges and provide more accurate prediction for flat areas than mmse predictors. At the same time, the prediction accuracy of edge areas is brought down. However, the entropy of the prediction errors based on mmae predictors is reduced compared with that of mmse predictors because general images mainly consist of flat areas. In this study, we adopt the Laplacian and the Gaussian function models for prediction errors based on mmae and mmse predictors, respectively, and show that mmae predictors outperform conventional mmse-based predictors including weighted mmse predictors in terms of coding performance.
Specific and Class Object Recognition for Service Robots through Autonomous and Interactive Methods
Al MANSUR Yoshinori KUNO

PAPER-Image Recognition, Computer Vision

Page(s):
1793-1803
Service robots need to be able to recognize and identify objects located within complex backgrounds. Since no single method may work in every situation, several methods need to be combined and robots have to select the appropriate one automatically. In this paper we propose a scheme to classify situations depending on the characteristics of the object of interest and user demand. We classify situations into four groups and employ different techniques for each. We use Scale-invariant feature transform (SIFT), Kernel Principal Components Analysis (KPCA) in conjunction with Support Vector Machine (SVM) using intensity, color, and Gabor features for five object categories. We show that the use of appropriate features is important for the use of KPCA and SVM based techniques on different kinds of objects. Through experiments we show that by using our categorization scheme a service robot can select an appropriate feature and method, and considerably improve its recognition performance. Yet, recognition is not perfect. Thus, we propose to combine the autonomous method with an interactive method that allows the robot to recognize the user request for a specific object and class when the robot fails to recognize the object. We also propose an interactive way to update the object model that is used to recognize an object upon failure in conjunction with the user's feedback.
Jigsaw-Puzzle-Like 3D Glyphs for Visualization of Grammatical Constraints
Noritaka OSAWA

PAPER-Computer Graphics

Page(s):
1804-1812
Three-dimensional visualization using jigsaw-puzzle-like glyphs, or shapes, is proposed as a means of representing grammatical constraints in programming. The proposed visualization uses 3D glyphs such as convex, concave, and wireframe shapes. A semantic constraint, such as a type constraint in an assignment, is represented by an inclusive match between 3D glyphs. An application of the proposed visualization method to a subset of the Java programming language is demonstrated. An experimental evaluation showed that the 3D glyphs are easier to learn and enable users to more quickly understand their relationships than 2D glyphs and 1D symbol sequences.
Improved Clonal Selection Algorithm Combined with Ant Colony Optimization
Shangce GAO Wei WANG Hongwei DAI Fangjia LI Zheng TANG

PAPER-Biocybernetics, Neurocomputing

Page(s):
1813-1823
Both the clonal selection algorithm (CSA) and the ant colony optimization (ACO) are inspired by natural phenomena and are effective tools for solving complex problems. CSA can exploit and explore the solution space parallely and effectively. However, it can not use enough environment feedback information and thus has to do a large redundancy repeat during search. On the other hand, ACO is based on the concept of indirect cooperative foraging process via secreting pheromones. Its positive feedback ability is nice but its convergence speed is slow because of the little initial pheromones. In this paper, we propose a pheromone-linker to combine these two algorithms. The proposed hybrid clonal selection and ant colony optimization (CSA-ACO) reasonably utilizes the superiorities of both algorithms and also overcomes their inherent disadvantages. Simulation results based on the traveling salesman problems have demonstrated the merit of the proposed algorithm over some traditional techniques.
A Simple Algorithm for Transposition-Invariant Amplified (δ, γ)-Matching
Inbok LEE

LETTER-Algorithm Theory

Page(s):
1824-1826
Approximate pattern matching plays an important role in various applications. In this paper we focus on (δ, γ)-matching, where a character can differ at most δ and the sum of these errors is smaller than γ. We show how to find these matches when the pattern is transformed by y=αx + β, without knowing α and β in advance.
Extending LogicWeb via Hereditary Harrop Formulas
Keehang KWON Dae-Seong KANG

LETTER-Fundamentals of Software and Theory of Programs

Page(s):
1827-1829
We propose HHWeb, an extension to LogicWeb with hereditary Harrop formulas. HHWeb extends the LogicWeb of Loke and Davison by allowing goals of the form ( x₁... x_n D) G (or equivalently x₁... x_n(D G)) where D is a web page and G is a goal. This goal is intended to be solved by instantiating x₁,...,x_n in D by new names and then solving the resulting goal. The existential quantifications at the head of web pages are particularly flexible in controlling the visibility of names. For example, they can provide scope to functions and constants as well as to predicates. In addition, they have such simple semantics that implementation becomes more efficient. Finally, they provide a client-side interface which is useful for customizing web pages.
Improved Frame Mode Selection for AMR-WB+ Based on Decision Tree
Jong Kyu KIM Nam Soo KIM

LETTER-Speech and Hearing

Page(s):
1830-1833
In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.
Quantization Parameter Refinement in H.264 through ρ-Domain Rate Model
Yutao DONG Xiangzhong FANG Jing YANG

LETTER-Speech and Hearing

Page(s):
1834-1837
This letter proposes a new algorithm of refining the quantization parameter in H.264 real-time encoding. In the H.264 encoding, the quantization parameter computed according to the quadratic rate model is not accurate in meeting the target bit rate. In order to make the actual encoded bit rate closer to the target bit rate, ρ-domain rate model is introduced in our proposed quantization parameter refinement algorithm. Simulation results show that the proposed algorithm achieves obvious gain in PSNR and has stabler encoded bit rate compared to Jiang's algorithm.
Melody Track Selection Using Discriminative Language Model
Xiao WU Ming LI Hongbin SUO Yonghong YAN

LETTER-Music Information Processing

Page(s):
1838-1840
In this letter we focus on the task of selecting the melody track from a polyphonic MIDI file. Based on the intuition that music and language are similar in many aspects, we solve the selection problem by introducing an n-gram language model to learn the melody co-occurrence patterns in a statistical manner and determine the melodic degree of a given MIDI track. Furthermore, we propose the idea of using background model and posterior probability criteria to make modeling more discriminative. In the evaluation, the achieved 81.6% correct rate indicates the feasibility of our approach.

IEICE TRANSACTIONS on Information

Advance publication (published online immediately after acceptance)

Volume E91-D No.6 (Publication Date:2008/06/01)

FOREWORD Open Access

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles