The search functionality is under construction.
The search functionality is under construction.

IEICE TRANSACTIONS on Information

  • Impact Factor

    0.59

  • Eigenfactor

    0.002

  • article influence

    0.1

  • Cite Score

    1.4

Advance publication (published online immediately after acceptance)

Volume E94-D No.12  (Publication Date:2011/12/01)

    Special Section on Parallel and Distributed Computing and Networking
  • FOREWORD Open Access

    Shuichi ICHIKAWA  

     
    FOREWORD

      Page(s):
    2297-2297
  • NSIM: An Interconnection Network Simulator for Extreme-Scale Parallel Computers

    Hideki MIWA  Ryutaro SUSUKITA  Hidetomo SHIBAMURA  Tomoya HIRAO  Jun MAKI  Makoto YOSHIDA  Takayuki KANDO  Yuichiro AJIMA  Ikuo MIYOSHI  Toshiyuki SHIMIZU  Yuji OINAGA  Hisashige ANDO  Yuichi INADOMI  Koji INOUE  Mutsumi AOYAGI  Kazuaki MURAKAMI  

     
    PAPER

      Page(s):
    2298-2308

    In the near future, interconnection networks of massively parallel computer systems will connect more than a hundred thousands of computing nodes. The performance evaluation of the interconnection networks can provide real insights to help the development of efficient communication library. Hence, to evaluate the performance of such interconnection networks, simulation tools capable of modeling the networks with sufficient details, supporting a user-friendly interface to describe communication patterns, providing the users with enough performance information, completing simulations within a reasonable time, are a real necessity. This paper introduces a novel interconnection network simulator NSIM, for the evaluation of the performance of extreme-scale interconnection networks. The simulator implements a simplified simulation model so as to run faster without any loss of accuracy. Unlike the existing simulators, NSIM is built on the execution-driven simulation approach. The simulator also provides a MPI-compatible programming interface. Thus, the simulator can emulate parallel program execution and correctly simulate point-to-point and collective communications that are dynamically changed by network congestion. The experimental results in this paper showed sufficient accuracy of this simulator by comparing the simulator and the real machine. We also confirmed that the simulator is capable of evaluating ultra large-scale interconnection networks, consumes smaller memory area, and runs faster than the existing simulator. This paper also introduces a simulation service built on a cloud environment. Without installing NSIM, users can simulate interconnection networks with various configurations by using a web browser.

  • Design and Implementation of a Contention-Aware Coscheduling Strategy on Multi-Programmed Heterogeneous Clusters

    Jung-Lok YU  Hee-Jung BYUN  

     
    PAPER

      Page(s):
    2309-2318

    Coscheduling has been gained a resurgence of interest as an effective technique to enhance the performance of parallel applications in multi-programmed clusters. However, existing coscheduling schemes do not adequately handle priority boost conflicts, leading to significantly degraded performance. To address this problem, in our previous study, we devised a novel algorithm that reorders the scheduling sequence of conflicting processes based on the rescheduling latency of their correspondents in remote nodes. In this paper, we exhaustively explore the design issues and implementation details of our contention-aware coscheduling scheme over Myrinet-based cluster system. We also practically analyze the impact of various system parameters and job characteristics on the performance of all considered schemes on a heterogeneous Linux cluster using a generic coscheduling framework. The results show that our approach outperforms existing schemes (by up to 36.6% in avg. job response time), reducing both boost conflict ratio and overall message delay.

  • Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster

    Junichi OHMURA  Takefumi MIYOSHI  Hidetsugu IRIE  Tsutomu YOSHINAGA  

     
    PAPER

      Page(s):
    2319-2327

    In this paper, we propose an approach to obtaining enhanced performance of the Linpack benchmark on a GPU-accelerated PC cluster connected via relatively slow inter-node connections. For one node with a quad-core Intel Xeon W3520 processor and a NVIDIA Tesla C1060 GPU card, we implement a CPU–GPU parallel double-precision general matrix–matrix multiplication (dgemm) operation, and achieve a performance improvement of 34% compared with the GPU-only case and 64% compared with the CPU-only case. For an entire 16-node cluster, each node of which is the same as the above and is connected with two gigabit Ethernet links, we use a computation-communication overlap scheme with GPU acceleration for the Linpack benchmark, and achieve a performance improvement of 28% compared with the GPU-accelerated high-performance Linpack benchmark (HPL) without overlapping. Our overlap GPU acceleration solution uses overlaps in which the main inter-node communication and data transfer to the GPU device memory are overlapped with the main computation task on the CPU cores. These overlaps use multi-core processors, which almost all of today's high-performance computers use. In particular, as well as using a CPU core for communication tasks, we also simultaneously use other CPU cores and the GPU for computation tasks. In order to enable overlap between inter-node communication and computation tasks, we eliminate their close dependence by breaking the main computation task into smaller tasks and rescheduling. Based on a scheme in which part of the CPU computation power is simultaneously used for tasks other than computation tasks, we experimentally find the optimal computation ratio for CPUs; this ratio differs from the case of parallel dgemm operation of one node.

  • Evaluation of GPU-Based Empirical Mode Decomposition for Off-Line Analysis

    Pulung WASKITO  Shinobu MIWA  Yasue MITSUKURA  Hironori NAKAJO  

     
    PAPER

      Page(s):
    2328-2337

    In off-line analysis, the demand for high precision signal processing has introduced a new method called Empirical Mode Decomposition (EMD), which is used for analyzing a complex set of data. Unfortunately, EMD is highly compute-intensive. In this paper, we show parallel implementation of Empirical Mode Decomposition on a GPU. We propose the use of “partial+total” switching method to increase performance while keeping the precision. We also focused on reducing the computation complexity in the above method from O(N) on a single CPU to O(N/P log (N)) on a GPU. Evaluation results show our single GPU implementation using Tesla C2050 (Fermi architecture) achieves a 29.9x speedup partially, and a 11.8x speedup totally when compared to a single Intel dual core CPU.

  • Acceleration of FDTD Method Using a Novel Algorithm on the Cell B.E.

    Sho ENDO  Jun SONODA  Motoyuki SATO  Takafumi AOKI  

     
    PAPER

      Page(s):
    2338-2344

    Finite difference time domain (FDTD) method has been accelerated on the Cell Broadband Engine (Cell B.E.). However the problem has arisen that speedup is limited by the bandwidth of the main memory on large-scale analysis. As described in this paper, we propose a novel algorithm and implement FDTD using it. We compared the novel algorithm with results obtained using region segmentation, thereby demonstrating that the proposed algorithm has shorter calculation time than that provided by region segmentation.

  • A Network Clustering Algorithm for Sybil-Attack Resisting

    Ling XU  Ryusuke EGAWA  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  

     
    PAPER

      Page(s):
    2345-2352

    The social network model has been regarded as a promising mechanism to defend against Sybil attack. This model assumes that honest peers and Sybil peers are connected by only a small number of attack edges. Detection of the attack edges plays a key role in restraining the power of Sybil peers. In this paper, an attack-resisting, distributed algorithm, named Random walk and Social network model-based clustering (RSC), is proposed to detect the attack edges. In RSC, peers disseminate random walk packets to each other. For each edge, the number of times that the packets pass this edge reflects the betweenness of this edge. RSC observes that the betweennesses of attack edges are higher than those of the non-attack edges. In this way, the attack edges can be identified. To show the effectiveness of RSC, RSC is integrated into an existing social network model-based algorithm called SOHL. The results of simulations with real world social network datasets show that RSC remarkably improves the performance of SOHL.

  • Traffic Anomaly Analysis and Characteristics on a Virtualized Network Testbed

    Chunghan LEE  Hirotake ABE  Toshio HIROTSU  Kyoji UMEMURA  

     
    PAPER

      Page(s):
    2353-2361

    Network testbeds have been used for network measurement and experiments. In such testbeds, resources, such as CPU, memory, and I/O interfaces, are shared and virtualized to maximize node utility for many users. A few studies have investigated the impact of virtualization on precise network measurement and understood Internet traffic characteristics on virtualized testbeds. Although scheduling latency and heavy loads are reportedly affected in precise network measurement, no clear conditions or criteria have been established. Moreover, empirical-statistical criteria and methods that pick out anomalous cases for precise network experiments are required on userland because virtualization technology used in the provided testbeds is hardly replaceable. In this paper, we show that ‘oversize packet spacing’, which can be caused by CPU scheduling latency, is a major cause of throughput instability on a virtualized network testbed even when no significant changes occur in well-known network metrics. These are unusual anomalies on virtualized network environment. Empirical-statistical analysis results accord with results at previous work. If network throughput is decreased by the anomalies, we should carefully review measurement results. Our empirical approach enables anomalous cases to be identified. We present CPU availability as an important criterion for estimating the anomalies.

  • Adaptive Prefetching Scheme for Peer-to-Peer Video-on-Demand Systems with a Media Server

    Ryusuke UEDERA  Satoshi FUJITA  

     
    PAPER

      Page(s):
    2362-2369

    In this paper, we consider Peer-to-Peer Video-on-Demand (P2P VoD) systems based on the BitTorrent file sharing protocol. Since the Rarest First policy adopted in the original BitTorrent protocol frequently fails to collect pieces corresponding to a video file by their playback time, we need to develop a new piece selection rule particularly designed for P2P VoDs. In the proposed scheme, we assume the existence of a media server which can upload any piece upon request, and try to bound the load of such media server with two techniques. The first technique is to estimate pieces which are not held by any peer and prefetch them from the media server. The second technique is to switch the mode of each peer according to the estimated size of the P2P network. The performance of the proposed scheme is evaluated by simulation.

  • Localization Using a Mobile Beacon with Directional Antenna for Wireless Sensor Networks

    Yao-Hung WU  Wei-Mei CHEN  

     
    PAPER

      Page(s):
    2370-2377

    Wireless sensor networks are comprised of several sensor nodes that communicate via wireless technology. Locating the sensor nodes is a fundamental problem in developing applications for wireless sensor networks. In this paper, we introduce a distributed localization scheme, called the Rectangle Overlapping Approach (ROA), using a mobile beacon with GPS and a directional antenna. The node locations are computed by performing simple operations that rely on the rotation angle and position of the mobile beacon. Simulation results show that the proposed scheme is very efficient and that the node positions can be determined accurately when the beacon follows a random waypoint movement model.

  • A Graph Rewriting Approach for Converting Asynchronous ROMs into Synchronous Ones

    Md. Nazrul Islam MONDAL  Koji NAKANO  Yasuaki ITO  

     
    PAPER

      Page(s):
    2378-2388

    Most of FPGAs have Configurable Logic Blocks (CLBs) to implement combinational and sequential circuits and block RAMs to implement Random Access Memories (RAMs) and Read Only Memories (ROMs). Circuit design that minimizes the number of clock cycles is easy if we use asynchronous read operations. However, most of FPGAs support synchronous read operations, but do not support asynchronous read operations. The main contribution of this paper is to provide one of the potent approaches to resolve this problem. We assume that a circuit using asynchronous ROMs designed by a non-expert or quickly designed by an expert is given. Our goal is to convert this circuit with asynchronous ROMs into an equivalent circuit with synchronous ones. The resulting circuit with synchronous ROMs can be embedded into FPGAs. We also discuss several techniques to decrease the latency and increase the clock frequency of the resulting circuits.

  • Minimum-Energy Semi-Static Scheduling of a Periodic Real-Time Task on DVFS-Enabled Multi-Core Processors

    Wan Yeon LEE  Hyogon KIM  Heejo LEE  

     
    LETTER

      Page(s):
    2389-2392

    The proposed scheduling scheme minimizes the energy consumption of a real-time task on the multi-core processor with the dynamic voltage and frequency scaling capability. The scheme allocates a pertinent number of cores to the task execution, inactivates unused cores, and assigns the lowest frequency meeting the deadline. For a periodic real-time task with consecutive real-time instances, the scheme prepares the minimum-energy solutions for all input cases at off-line time, and applies one of the prepared solutions to each real-time instance at runtime.

  • Regular Section
  • An Efficient Method of Computing Impact Degrees for Multiple Reactions in Metabolic Networks with Cycles

    Takeyuki TAMURA  Yang CONG  Tatsuya AKUTSU  Wai-Ki CHING  

     
    PAPER-Fundamentals of Information Systems

      Page(s):
    2393-2399

    The impact degree is a measure of the robustness of a metabolic network against deletion of single or multiple reaction(s). Although such a measure is useful for mining important enzymes/genes, it was defined only for networks without cycles. In this paper, we extend the impact degree for metabolic networks containing cycles and develop a simple algorithm to calculate the impact degree. Furthermore we improve this algorithm to reduce computation time for the impact degree by deletions of multiple reactions. We applied our method to the metabolic network of E. coli, that includes reference pathways, consisting of 3281 reaction nodes and 2444 compound nodes, downloaded from KEGG database, and calculate the distribution of the impact degree. The results of our computational experiments show that the improved algorithm is 18.4 times faster than the simple algorithm for deletion of reaction-pairs and 11.4 times faster for deletion of reaction-triplets. We also enumerate genes with high impact degrees for single and multiple reaction deletions.

  • Development and Outdoor Evaluation of an Experimental Platform in an 80-MHz Bandwidth 22 MIMO-OFDM System in 5.2-GHz Band

    Hisayoshi KANO  Shingo YOSHIZAWA  Takashi GUNJI  Shougo OKAMOTO  Morio TAWARAYAMA  Yoshikazu MIYANAGA  

     
    PAPER-Computer System

      Page(s):
    2400-2408

    The IEEE802.11ac task group has announced the use of a wider channel that extends the channel bandwidth to more than 80 MHz. We present an experimental platform consisting of a baseband and a RF unit in a 22 MIMO-OFDM system for the wider channel and report its system performance results from a field experiment. The MIMO-OFDM transceiver in the baseband unit has been designed to detect real-time MIMO and provides a maximum data rate of 600 Mbps. OFDM tends to cause high peak PAPR for wider channels and distorts the power amplifier performance in the RF unit. We have improved the non-linear distortion by optimizing the OFDM preamble and evaluated its performance by conducting a simulation integrated with baseband processing and a RF. In the field experiment, our platform tested the communication performance in a farm and a passage environment.

  • Design of an OpenVG Hardware Rendering Engine

    Yong-Luo SHEN  Seok-Jae KIM  Sang-Woo SEO  Hyun-Goo LEE  Hyeong-Cheol OH  

     
    PAPER-Computer System

      Page(s):
    2409-2417

    This paper introduces a hardware engine for rendering two-dimensional vector graphics based on the OpenVG standard in portable devices. We focus on two design challenges posed by the rendering engines: the number of vertices to represent the images and the amount of memory usage. Redundant vertices are eliminated using adaptive tessellation, in which the redundancy can be judged using a proposed cost-per-quality measure. A simplified edge-flag rendering algorithm and the scanline-based rendering scheme are adopted to reduce external memory access. The designed rendering engine occupies approximately 173 K gates and can satisfy real-time requirements of many applications when it is implemented using a 0.18 µm, 1.8 V CMOS standard cell library. An FPGA prototype using a system-on-a-chip platform has been developed and tested.

  • Open Code Coverage Framework: A Framework for Consistent, Flexible and Complete Measurement of Test Coverage Supporting Multiple Programming Languages

    Kazunori SAKAMOTO  Fuyuki ISHIKAWA  Hironori WASHIZAKI  Yoshiaki FUKAZAWA  

     
    PAPER-Software Engineering

      Page(s):
    2418-2430

    Test coverage is an important indicator of whether software has been sufficiently tested. However, there are several problems with the existing measurement tools for test coverage, such as their cost of development and maintenance, inconsistency, and inflexibility in measurement. We propose a consistent and flexible measurement framework for test coverage that we call the Open Code Coverage Framework (OCCF). It supports multiple programming languages by extracting the commonalities from multiple programming languages using an abstract syntax tree to help in the development of the measurement tools for the test coverage of new programming languages. OCCF allows users to add programming language support independently of the test-coverage-criteria and also to add test-coverage-criteria support independently of programming languages in order to take consistent measurements in each programming language. Moreover, OCCF provides two methods for changin the measurement range and elements using XPath and adding user code in order to make more flexible measurements. We implemented a sample tool for C, Java, and Python using OCCF. OCCF can measure four test-coverage-criteria. We also confirmed that OCCF can support C#, Ruby, JavaScript, and Lua. Moreover, we reduced the lines of code (LOCs) required to implement measurement tools for test coverage by approximately 90% and the time to implement a new test-coverage-criterion by over 80% in an experiment that compared OCCF with the conventional non-framework-based tools.

  • Conflict-Based Checking the Integrity of Linux Package Dependencies

    Yuqing LAN  Mingxia KUANG  Wenbin ZHOU  

     
    PAPER-Software Engineering

      Page(s):
    2431-2439

    A Linux operating system release is composed of a large number of software packages, with complex dependencies. The management of dependency relationship is the foundation of building and maintaining a Linux operating system release, and checking the integrity of the dependencies is the key of the dependency management. The widespread adoption of Linux operating systems in many areas of the information technology society has drawn the attention on the issues regarding how to check the integrity of complexity dependencies of Linux packages and how to manage a huge number of packages in a consistent and effective way. Linux distributions have already provided the tools for managing the tasks of installing, removing and upgrading the packages they were made of. A number of tools have been provided to handle these tasks on the client side. However, there is a lack of tools that could help the distribution editors to maintain the integrity of Linux package dependencies on the server side. In this paper we present a method based on conflict to check the integrity of Linux package dependencies. From the perspective of conflict, this method achieves the goal to check the integrity of package dependencies on the server side by removing the conflict associating with the packages. Our contribution provides an effective and automatic way to support distribution editors in handling those issues. Experiments using this method are very successful in checking the integrity of package dependencies in Linux software distributions.

  • Modeling Uncertainty in Moving Objects Databases

    Shayma ALKOBAISI  Wan D. BAE  Sada NARAYANAPPA  

     
    PAPER-Data Engineering, Web Information Systems

      Page(s):
    2440-2459

    The increase in the advanced location based services such as traffic coordination and management necessitates the need for advanced models tracking the positions of Moving Objects (MOs) like vehicles. Due to computer processing limitations, it is impossible for MOs to continuously update their locations. This results in the uncertainty nature of a MO's location between any two reported positions. Efficiently managing and quantifying the uncertainty regions of MOs are needed in order to support different types of queries and to improve query response time. This challenging problem of modeling uncertainty regions associated with MO was recently addressed by researchers and resulted in models that ranged from linear which require few properties of MOs as input to the models, to non-linear that are able to more accurately represent uncertainty regions by considering higher degree input. This paper summarizes and discusses approaches in modeling uncertainty regions associated with MOs. It further illustrates the need for appropriate approximations especially in the case of non-linear models as the uncertainty regions become rather irregularly shaped and difficult to manage. Finally, we demonstrate through several experimental sets the advantage of non-linear models over linear models when the uncertainty regions of MOs are approximated by two different approximations; the Minimum Bounding Box (MBB) and the Tilted Minimum Bounding Box (TMBB).

  • Understanding of Network Operator-Friendly P2P Traffic Control Techniques

    HyunYong LEE  Akihiro NAKAO  

     
    PAPER-Information Network

      Page(s):
    2460-2467

    In the network operator-friendly P2P traffic control technique such as P4P, peers are supposed to select their communication partners by following a guidance issued by the network operator. Thus, the guidance has significant impact on the traffic control. However, detailed performance study of available guidances is missing. Most existing approaches do not show how they affect intra-domain traffic control in detail while mostly focusing on inter-domain traffic control. In this paper, we try to understand how the guidances affect the intra and inter-domain traffic control for better guidance improving the traffic control. Through simulations, we reveal the following. The performance-based guidance reflecting the networking status shows attractive results in distributing the traffic over intra-domain links and in reducing the cross-domain traffic and the charging volume of inter-domain link compared to the distance-based guidance enforcing simple localization. However, the performance-based guidance shows one limitation that can cause unstable traffic control. To overcome the identified limitation, we propose peer-assisted measurement and traffic estimation approach. Then, we verify our approach through simulations.

  • Measuring the Similarity of Protein Structures Using Image Compression Algorithms

    Morihiro HAYASHIDA  Tatsuya AKUTSU  

     
    PAPER-Artificial Intelligence, Data Mining

      Page(s):
    2468-2478

    For measuring the similarity of biological sequences and structures such as DNA sequences, protein sequences, and tertiary structures, several compression-based methods have been developed. However, they are based on compression algorithms only for sequential data. For instance, protein structures can be represented by two-dimensional distance matrices. Therefore, it is expected that image compression is useful for measuring the similarity of protein structures because image compression algorithms compress data horizontally and vertically. This paper proposes series of methods for measuring the similarity of protein structures. In the methods, an original protein structure is transformed into a distance matrix, which is regarded as a two-dimensional image. Then, the similarity of two protein structures is measured by a kind of compression ratio of the concatenated image. We employed several image compression algorithms, JPEG, GIF, PNG, IFS, and SPC. Since SPC often gave better results among the other image compression methods, and it is simple and easy to be modified, we modified SPC and obtained MSPC. We applied the proposed methods to clustering of protein structures, and performed Receiver Operating Characteristic (ROC) analysis. The results of computational experiments suggest that MSPC has the best performance among existing compression-based methods. We also present some theoretical results on the time complexity and Kolmogorov complexity of image compression-based protein structure comparison.

  • Movement-Imagery Brain-Computer Interface: EEG Classification of Beta Rhythm Synchronization Based on Cumulative Distribution Function

    Teruyoshi SASAYAMA  Tetsuo KOBAYASHI  

     
    PAPER-Human-computer Interaction

      Page(s):
    2479-2486

    We developed a novel movement-imagery-based brain-computer interface (BCI) for untrained subjects without employing machine learning techniques. The development of BCI consisted of several steps. First, spline Laplacian analysis was performed. Next, time-frequency analysis was applied to determine the optimal frequency range and latencies of the electroencephalograms (EEGs). Finally, trials were classified as right or left based on β-band event-related synchronization using the cumulative distribution function of pretrigger EEG noise. To test the performance of the BCI, EEGs during the execution and imagination of right/left wrist-bending movements were measured from 63 locations over the entire scalp using eight healthy subjects. The highest classification accuracies were 84.4% and 77.8% for real movements and their imageries, respectively. The accuracy is significantly higher than that of previously reported machine-learning-based BCIs in the movement imagery task (paired t-test, p < 0.05). It has also been demonstrated that the highest accuracy was achieved even though subjects had never participated in movement imageries.

  • Matching Handwritten Line Drawings with Von Mises Distributions

    Katsutoshi UEAOKI  Kazunori IWATA  Nobuo SUEMATSU  Akira HAYASHI  

     
    PAPER-Pattern Recognition

      Page(s):
    2487-2494

    A two-dimensional shape is generally represented with line drawings or object contours in a digital image. Shapes can be divided into two types, namely ordered and unordered shapes. An ordered shape is an ordered set of points, while an unordered shape is an unordered set. As a result, each type typically uses different attributes to define the local descriptors involved in representing the local distributions of points sampled from the shape. Throughout this paper, we focus on unordered shapes. Since most local descriptors of unordered shapes are not scale-invariant, we usually make the shapes in an image data set the same size through scale normalization, before applying shape matching procedures. Shapes obtained through scale normalization are suitable for such descriptors if the original whole shapes are similar. However, they are not suitable if parts of each original shape are drawn using different scales. Thus, in this paper, we present a scale-invariant descriptor constructed by von Mises distributions to deal with such shapes. Since this descriptor has the merits of being both scale-invariant and a probability distribution, it does not require scale normalization and can employ an arbitrary measure of probability distributions in matching shape points. In experiments on shape matching and retrieval, we show the effectiveness of our descriptor, compared to several conventional descriptors.

  • A Study on Pitch Patterns in Japanese Speakers of English with Verification by Speech Re-Synthesis

    Tomoko NARIAI  Kazuyo TANAKA  

     
    PAPER-Speech and Hearing

      Page(s):
    2495-2502

    Certain irregularities in the utterances of words or phrases often occur in English spoken by Japanese native subject, referred to in this article as Japanese English. Japanese English is linguistically presumed to reflect the phonetic characteristics of Japanese. We consider the prosodic feature patterns as one of the most common causes of irregularities in Japanese English, and that Japanese English would have better prosodic patterns if its particular characteristics were modified. This study investigates prosodic differences between Japanese English and English speakers' English, and shows the quantitative results of a statistical analysis of pitch. The analysis leads to rules that show how to modify Japanese English to have pitch patterns closer to those of English speakers. On the basis of these rules, the pitch patterns of test speech samples of Japanese English are modified, and then re-synthesized. The modified speech is evaluated in a listening experiment by native English subjects. The result of the experiment shows that on average, over three-fold of the English subjects support the proposed modification against original speech. Therefore, the results of the experiments indicate practical verification of validity of the rules. Additionally, the results suggest that irregularities of prominence lie in Japanese English sentences. This can be explained by the prosodic transfer of first language prosodic characteristics on second language prosodic patterns.

  • Error Corrective Fusion of Classifier Scores for Spoken Language Recognition

    Omid DEHZANGI  Bin MA  Eng Siong CHNG  Haizhou LI  

     
    PAPER-Speech and Hearing

      Page(s):
    2503-2512

    This paper investigates a new method for fusion of scores generated by multiple classification sub-systems that help to further reduce the classification error rate in Spoken Language Recognition (SLR). In recent studies, a variety of effective classification algorithms have been developed for SLR. Hence, it has been a common practice in the National Institute of Standards and Technology (NIST) Language Recognition Evaluations (LREs) to fuse the results from several classification sub-systems to boost the performance of the SLR systems. In this work, we introduce a discriminative performance measure to optimize the performance of the fusion of 7 language classifiers developed as IIR's submission to the 2009 NIST LRE. We present an Error Corrective Fusion (ECF) method in which we iteratively learn the fusion weights to minimize error rate of the fusion system. Experiments conducted on the 2009 NIST LRE corpus demonstrate a significant improvement compared to individual sub-systems. Comparison study is also conducted to show the effectiveness of the ECF method.

  • Design of Real-Time Self-Frame-Rate-Control Foreground Detection for Multiple Camera Surveillance System

    Tsung-Han TSAI  Chung-Yuan LIN  

     
    PAPER-Image Recognition, Computer Vision

      Page(s):
    2513-2522

    Emerging video surveillance technologies are based on foreground detection to achieve event detection automatically. Integration foreground detection with a modern multi-camera surveillance system can significantly increase the surveillance efficiency. The foreground detection often leads to high computational load and increases the cost of surveillance system when a mass deployment of end cameras is needed. This paper proposes a DSP-based foreground detection algorithm. Our algorithm incorporates a temporal data correlation predictor (TDCP) which can exhibit the correlation of data and reduce computation based on this correlation. With the DSP-oriented foreground detection, an adaptive frame rate control is developed as a low cost solution for multi-camera surveillance system. The adaptive frame rate control automatically detects the computational load of foreground detection on multiple video sources and adaptively tunes the TDCP to meet the real-time specification. Therefore, no additional hardware cost is required when the number of deployed cameras is increased. Our method has been validated on a demonstration platform. Performance can achieve real-time CIF frame processing for a 16-camera surveillance system by single-DSP chip. Quantitative evaluation demonstrates that our solution provides satisfied detection rate, while significantly reducing the hardware cost.

  • A Novel Sequential Tree Algorithm Based on Scoreboard for MPI Broadcast Communication

    Won-young CHUNG  Jae-won PARK  Seung-Woo LEE  Won Woo RO  Yong-surk LEE  

     
    LETTER-Computer System

      Page(s):
    2523-2527

    The message passing interface (MPI) broadcast communication commonly causes a severe performance bottleneck in multicore system that uses distributed memory. Thus, in this paper, we propose a novel algorithm and hardware structure for the MPI broadcast communication to reduce the bottleneck situation. The transmission order is set based on the state of each processing node that comprises the multicore system, so the novel algorithm minimizes the performance degradation caused by conflict. The proposed scoreboard MPI unit is evaluated by modeling it with SystemC and implemented using VerilogHDL. The size of the proposed scoreboard MPI unit occupies less than 1.03% of the whole chip, and it yields a highly improved performance up to 75.48% as its maximum with 16 processing nodes. Hence, with respect to low-cost design and scalability, this scoreboard MPI unit is particularly useful towards increasing overall performance of the embedded MPSoC.

  • On Improving the Reliability and Performance of the YAFFS Flash File System

    Seungjae BAEK  Heekwon PARK  Jongmoo CHOI  

     
    LETTER-Software System

      Page(s):
    2528-2532

    In this paper, we propose three techniques to improve the performance of YAFFS (Yet Another Flash File System), while enhancing the reliability of the system. Specifically, we first propose to manage metadata and user data separately on segregated blocks. This modification not only leads to the reduction of the mount time but also reduces the garbage collection time. Second, we tailor the wear-leveling to the segregated metadata and user data blocks. That is, worn out blocks between the segregated blocks are swapped, which leads to more evenly worn out blocks increasing the lifetime of the system. Finally, we devise an analytic model to predict the expected garbage collection time. By accurately predicting the garbage collection time, the system can perform garbage collection at more opportune times when the user's perceived performance may not be negatively affected. Performance evaluation results based on real implementations show that our modifications enhance performance and reliability without incurring additional overheads. Specifically, the YAFFS with our proposed techniques outperforms the original YAFFS by six times in terms of mount speed and five times in terms of benchmark performance, while reducing the average erase count of blocks by 14%.

  • Simulation-Based Tactics Generation for Warship Combat Using the Genetic Algorithm

    Yong-Jun YOU  Sung-Do CHI  Jae-Ick KIM  

     
    LETTER-Artificial Intelligence, Data Mining

      Page(s):
    2533-2536

    In most existing warships combat simulation system, the tactics of a warship is manipulated by human operators. For this reason, the simulation results are restricted due to the capabilities of human operators. To deal with this, we have employed the genetic algorithm for supporting the evolutionary simulation environment. In which, the tactical decision by human operators is replaced by the human model with a rule-based chromosome for representing tactics so that the population of simulations are created and hundreds of simulation runs are continued on the basis of the genetic algorithm without any human intervention until finding emergent tactics which shows the best performance throughout the simulation. Several simulation tests demonstrate the techniques.

  • Speech Enhancement Based on Data-Driven Residual Gain Estimation

    Yu Gwang JIN  Nam Soo KIM  Joon-Hyuk CHANG  

     
    LETTER-Speech and Hearing

      Page(s):
    2537-2540

    In this letter, we propose a novel speech enhancement algorithm based on data-driven residual gain estimation. The entire system consists of two stages. At the first stage, a conventional speech enhancement algorithm enhances the input signal while estimating several signal-to-noise ratio (SNR)-related parameters. The residual gain, which is estimated by a data-driven method, is applied to further enhance the signal at the second stage. A number of experimental results show that the proposed speech enhancement algorithm outperforms the conventional speech enhancement technique based on soft decision and the data-driven approach using SNR grid look-up table.

  • High-Accuracy Sub-Pixel Registration for Noisy Images Based on Phase Correlation

    Bei HE  Guijin WANG  Xinggang LIN  Chenbo SHI  Chunxiao LIU  

     
    LETTER-Image Processing and Video Processing

      Page(s):
    2541-2544

    This paper proposes a high-accuracy sub-pixel registration framework based on phase correlation for noisy images. First we introduce a denoising module, where the edge-preserving filter is adopted. This strategy not only filters off the noise but also preserves most of the original image signal. A confidence-weighted optimization module is then proposed to fit the linear phase plane discriminately and to achieve sub-pixel shifts. Experiments demonstrate the effectiveness of the combination of our modules and improvements of the accuracy and robustness against noise compared to other sub-pixel phase correlation methods in the Fourier domain.

  • A Novel Bayes' Theorem-Based Saliency Detection Model

    Xin HE  Huiyun JING  Qi HAN  Xiamu NIU  

     
    LETTER-Image Recognition, Computer Vision

      Page(s):
    2545-2548

    We propose a novel saliency detection model based on Bayes' theorem. The model integrates the two parts of Bayes' equation to measure saliency, each part of which was considered separately in the previous models. The proposed model measures saliency by computing local kernel density estimation of features in the center-surround region and global kernel density estimation of features at each pixel across the whole image. Under the proposed model, a saliency detection method is presented that extracts DCT (Discrete Cosine Transform) magnitude of local region around each pixel as the feature. Experiments show that the proposed model not only performs competitively on psychological patterns and better than the current state-of-the-art models on human visual fixation data, but also is robust against signal uncertainty.

  • Implementation of Scale and Rotation Invariant On-Line Object Tracking Based on CUDA

    Quan MIAO  Guijin WANG  Xinggang LIN  

     
    LETTER-Image Recognition, Computer Vision

      Page(s):
    2549-2552

    Object tracking is a major technique in image processing and computer vision. Tracking speed will directly determine the quality of applications. This paper presents a parallel implementation for a recently proposed scale- and rotation-invariant on-line object tracking system. The algorithm is based on NVIDIA's Graphics Processing Units (GPU) using Compute Unified Device Architecture (CUDA), following the model of single instruction multiple threads. Specifically, we analyze the original algorithm and propose the GPU-based parallel design. Emphasis is placed on exploiting the data parallelism and memory usage. In addition, we apply optimization technique to maximize the utilization of NVIDIA's GPU and reduce the data transfer time. Experimental results show that our GPGPU-based method running on a GTX480 graphics card could achieve up to 12X speed-up compared with the efficiency equivalence on an Intel E8400 3.0 GHz CPU, including I/O time.

  • Hybrid Parallel Extraction of Isosurface Components from 3D Rectilinear Volume Data

    Bong-Soo SOHN  

     
    LETTER-Computer Graphics

      Page(s):
    2553-2556

    We describe an efficient algorithm that extracts a connected component of an isosurface, or a contour, from a 3D rectilinear volume data. The efficiency of the algorithm is achieved by three factors: (i) directly working with rectilinear grids, (ii) parallel utilization of a multi-core CPU for extracting active cells, the cells containing the contour, and (iii) parallel utilization of a many-core GPU for computing the geometries of a contour surface in each active cell using CUDA. Experimental results show that our hybrid parallel implementation achieved up to 20x speedup over existing methods on an ordinary PC. Our work coupled with the Contour Tree framework is useful for quickly segmenting, displaying, and analyzing a feature of interest in 3D rectilinear volume data without being distracted by other features.

  • A Storage-Efficient Suffix Tree Construction Algorithm for Human Genome Sequences

    Woong-Kee LOH  Heejune AHN  

     
    LETTER-Biological Engineering

      Page(s):
    2557-2560

    The suffix tree is one of most widely adopted indexes in the application of genome sequence alignment. Although it supports very fast alignment, it has a couple of shortcomings, such as a very long construction time and a very large volume size. Loh et al. [7] proposed a suffix tree construction algorithm with dramatically improved performance; however, the size still remains as a challenging problem. We propose an algorithm by extending the one by Loh et al. to reduce the suffix tree size. As a result of our experiments, our algorithm constructed a suffix tree of approximately 60% of the size within almost the same time period.