The search functionality is under construction.

IEICE TRANSACTIONS on Information

  • Impact Factor

    0.72

  • Eigenfactor

    0.002

  • article influence

    0.1

  • Cite Score

    1.4

Advance publication (published online immediately after acceptance)

Volume E95-D No.12  (Publication Date:2012/12/01)

    Special Section on Parallel and Distributed Computing and Networking
  • FOREWORD Open Access

    Hideharu AMANO  

     
    FOREWORD

      Page(s):
    2749-2749
  • Parallel Dynamic Cloud Rendering Method Based on Physical Cellular Automata Model

    Liqiang ZHANG  Chao LI  Haoliang SUN  Changwen ZHENG  Pin LV  

     
    PAPER-Parallel and Distributed Computing

      Page(s):
    2750-2758

    Due to the complicated composition of cloud and its disordered transformation, the rendering of cloud does not perfectly meet actual prospect by current methods. Based on physical characteristics of cloud, a physical cellular automata model of Dynamic cloud is designed according to intrinsic factor of cloud, which describes the rules of hydro-movement, deposition and accumulation and diffusion. Then a parallel computing architecture is designed to compute the large-scale data set required by the rendering of dynamical cloud, and a GPU-based ray-casting algorithm is implemented to render the cloud volume data. The experiment shows that cloud rendering method based on physical cellular automata model is very efficient and able to adequately exhibit the detail of cloud.

  • Blocked United Algorithm for the All-Pairs Shortest Paths Problem on Hybrid CPU-GPU Systems

    Kazuya MATSUMOTO  Naohito NAKASATO  Stanislav G. SEDUKHIN  

     
    PAPER-Parallel and Distributed Computing

      Page(s):
    2759-2768

    This paper presents a blocked united algorithm for the all-pairs shortest paths (APSP) problem. This algorithm simultaneously computes both the shortest-path distance matrix and the shortest-path construction matrix for a graph. It is designed for a high-speed APSP solution on hybrid CPU-GPU systems. In our implementation, two most compute intensive parts of the algorithm are performed on the GPU. The first part is to solve the APSP sub-problem for a block of sub-matrices, and the other part is a matrix-matrix “multiplication” for the APSP problem. Moreover, the amount of data communication between CPU (host) memory and GPU memory is reduced by reusing blocks once sent to the GPU. When a problem size (the number of vertices in a graph) is large enough compared to a block size, our implementation of the blocked algorithm requires CPU GPU exchanging of three blocks during a block computation on the GPU. We measured the performance of the algorithm implementation on two different CPU-GPU systems. A system containing an Intel Sandy Bridge CPU (Core i7 2600K) and an AMD Cayman GPU (Radeon HD 6970) achieves the performance up to 1.1 TFlop/s in a single precision.

  • Asymptotically Optimal Merging on ManyCore GPUs

    Arne KUTZNER  Pok-Son KIM  Won-Kwang PARK  

     
    PAPER-Parallel and Distributed Computing

      Page(s):
    2769-2777

    We propose a family of algorithms for efficiently merging on contemporary GPUs, so that each algorithm requires O(m log (+1)) element comparisons, where m and n are the sizes of the input sequences with mn. According to the lower bounds for merging all proposed algorithms are asymptotically optimal regarding the number of necessary comparisons. First we introduce a parallely structured algorithm that splits a merging problem of size 2l into 2i subproblems of size 2l-i, for some arbitrary i with (0 ≤ il). This algorithm represents a merger for i=l but it is rather inefficient in this case. The efficiency is boosted by moving to a two stage approach where the splitting process stops at some predetermined level and transfers control to several parallely operating block-mergers. We formally prove the asymptotic optimality of the splitting process and show that for symmetrically sized inputs our approach delivers up to 4 times faster runtimes than the thrust::merge function that is part of the Thrust library. For assessing the value of our merging technique in the context of sorting we construct and evaluate a MergeSort on top of it. In the context of our benchmarking the resulting MergeSort clearly outperforms the MergeSort implementation provided by the Thrust library as well as Cederman's GPU optimized variant of QuickSort.

  • Lossless Compression of Double-Precision Floating-Point Data for Numerical Simulations: Highly Parallelizable Algorithms for GPU Computing

    Mamoru OHARA  Takashi YAMAGUCHI  

     
    PAPER-Parallel and Distributed Computing

      Page(s):
    2778-2786

    In numerical simulations using massively parallel computers like GPGPU (General-Purpose computing on Graphics Processing Units), we often need to transfer computational results from external devices such as GPUs to the main memory or secondary storage of the host machine. Since size of the computation results is sometimes unacceptably large to hold them, it is desired that the data is compressed and stored. In addition, considering overheads for transferring data between the devices and host memories, it is preferable that the data is compressed in a part of parallel computation performed on the devices. Traditional compression methods for floating-point numbers do not always show good parallelism. In this paper, we propose a new compression method for massively-parallel simulations running on GPUs, in which we combine a few successive floating-point numbers and interleave them to improve compression efficiency. We also present numerical examples of compression ratio and throughput obtained from experimental implementations of the proposed method runnig on CPUs and GPUs.

  • Implementation of a GPU-Oriented Absorbing Boundary Condition for 3D-FDTD Electromagnetic Simulation

    Keisuke DOHI  Yuichiro SHIBATA  Kiyoshi OGURI  Takafumi FUJIMOTO  

     
    PAPER-Parallel and Distributed Computing

      Page(s):
    2787-2795

    In this paper, we propose and discuss efficient GPU implementation techniques of absorbing boundary conditions (ABCs) for a 3D finite-difference time-domain (FDTD) electromagnetic field simulation for antenna design. In view of architectural nature of GPUs, the idea of a periodic boundary condition is introduced to implementation of perfect matched layers (PMLs) as well as a transformation technique of PML equations for partial boundaries. We also present efficient implementation method of a non-uniform grid. The evaluation results with a typical simulation model reveal that our proposed technique almost double the simulation performance and eventually achieve the 55.8% of the peak memory bandwidth of a target GPU.

  • Applying Model-Driven Approach to Building Rapid Distributed Data Services

    Chih-Min LO  Sun-Jen HUANG  

     
    PAPER-Computer System and Services

      Page(s):
    2796-2809

    The globalization of commerce has increased the importance of retrieving and updating complex and distributed information efficiently. Web services currently show that the most promise for building distributed application systems and model-driven architecture is a new approach to developing such applications. The expanding scale and complexity of enterprise information systems (EISs) under distributed computing environments has made sharing and exchanging data particularly challenging. Data services are applications tailored specifically for information oriented tasks to deal with business service requirements, and are heavily dependent on the distributed architecture of consumer data processing. The implementation of a data service can eliminate inconsistency among various application systems in the exchange of data. This paper proposes a data-oriented model-driven developmental framework to deal with these issues, in which a platform independent model (PIM) is divided into a service model, a logic data model, and a service composition model. We also divide a platform specific model (PSM) into a physical data model and a data service model. In this development method, we define five meta-models and outline a set of rules governing the transformation from PIMs into PSMs. A code generator is also included to transform each PSM into the application code. We include a case study to demonstrate the feasibility and merits of the proposed development framework with a case study.

  • Comparing Operating Systems Scalability on Multicore Processors by Microbenchmarking

    Yan CUI  Yu CHEN  Yuanchun SHI  

     
    PAPER-Computer System and Services

      Page(s):
    2810-2820

    Multicore processor architectures have become ubiquitous in today's computing platforms, especially in parallel computing installations, with their power and cost advantages. While the technology trend continues towards having hundreds of cores on a chip in the foreseeable future, an urgent question posed to system designers as well as application users is whether applications can receive sufficient support on today's operating systems for them to scale to many cores. To this end, people need to understand the strengths and weaknesses on their support on scalability and to identify major bottlenecks limiting the scalability, if any. As open-source operating systems are of particular interests in the research and industry communities, in this paper we choose three operating systems (Linux, Solaris and FreeBSD) to systematically evaluate and compare their scalability by using a set of highly-focused microbenchmarks for broad and detailed understanding their scalability on an AMD 32-core system. We use system profiling tools and analyze kernel source codes to find out the root cause of each observed scalability bottleneck. Our results reveal that there is no single operating system among the three standing out on all system aspects, though some system(s) can prevail on some of the system aspects. For example, Linux outperforms Solaris and FreeBSD significantly for file-descriptor- and process-intensive operations. For applications with intensive sockets creation and deletion operations, Solaris leads FreeBSD, which scales better than Linux. With the help of performance tools and source code instrumentation and analysis, we find that synchronization primitives protecting shared data structures in the kernels are the major bottleneck limiting system scalability.

  • Robust Lightweight Embedded Virtualization Layer Design with Simple Hardware Assistance

    Tsung-Han LIN  Yuki KINEBUCHI  Tatsuo NAKAJIMA  

     
    PAPER-Computer System and Services

      Page(s):
    2821-2832

    In this paper, we propose a virtualization architecture for a multi-core embedded system to provide more system reliability and security while maintaining performance and without introducing additional special hardware supports or implementing a complex protection mechanism in the virtualization layer. Embedded systems, especially consumer electronics, have often used virtualization. Virtualization is not a new technique, as there are various uses for both GPOS (General Purpose Operating System) and RTOS (Real Time Operating System). The surge of the multi-core platforms in embedded systems also helps consolidate the virtualization system for better performance and lower power consumption. Embedded virtualization design usually uses two approaches. The first is to use the traditional VMM, but it is too complicated for use in the embedded environment without additional special hardware support. The other approach uses the microkernel, which imposes a modular design. The guest systems, however, would suffer from considerable modifications in this approach, as the microkernel allows guest systems to run in the user space. For some RTOSes and their applications originally running in the kernel space, this second approach is more difficult to use because those codes use many privileged instructions. To achieve better reliability and keep the virtualization layer design lightweight, this work uses a common hardware component adopted in multi-core embedded processors. In most embedded platforms, vendors provide additional on-chip local memory for each physical core, and these local memory areas are only private to their cores. By taking advantage of this memory architecture, we can mitigate the above-mentioned problems at once. We choose to re-map the virtualization layer's program on the local memory, called SPUMONE, which runs all guest systems in the kernel space. Doing so, it can provide additional reliability and security for the entire system because the SPUMONE design in a multi-core platform has each instance installed on a separate processor core. This design differs from traditional virtualization layer design, and the content of each SPUMONE is inaccessible to the others. We also achieve this goal without adding overhead to the overall performance.

  • SLA_Driven Adaptive Resource Allocation for Virtualized Servers

    Wei ZHANG  Li RUAN  Mingfa ZHU  Limin XIAO  Jiajun LIU  Xiaolan TANG  Yiduo MEI  Ying SONG  Yuzhong SUN  

     
    PAPER-Computer System and Services

      Page(s):
    2833-2843

    In order to reduce cost and improve efficiency, many data centers adopt virtualization solutions. The advent of virtualization allows multiple virtual machines hosted on a single physical server. However, this poses new challenges for resource management. Web workloads which are dominant in data centers are known to vary dynamically with time. In order to meet application's service level agreement (SLA), how to allocate resources for virtual machines has become an important challenge in virtualized server environments, especially when dealing with fluctuating workloads and complex server applications. User experience is an important manifestation of SLA and attracts more attention. In this paper, the SLA is defined by server-side response time. Traditional resource allocation based on resource utilization has some drawbacks. We argue that dynamic resource allocation directly based on real-time user experience is more reasonable and also has practical significance. To address the problem, we propose a system architecture that combines response time measurements and analysis of user experience for resource allocation. An optimization model is introduced to dynamically allocate the resources among virtual machines. When resources are insufficient, we provide service differentiation and firstly guarantee resource requirements of applications that have higher priorities. We evaluate our proposal using TPC-W and Webbench. The experimental results show that our system can judiciously allocate system resources. The system helps stabilize applications' user experience. It can reduce the mean deviation of user experience from desired targets.

  • Survivability Analysis for a Wireless Ad Hoc Network Based on Semi-Markov Model

    Zhipeng YI  Tadashi DOHI  

     
    PAPER-Network and Communication

      Page(s):
    2844-2851

    Network survivability is defined as the ability of a network keeping connected under failures and/or attacks. In this paper, we propose two stochastic models; binomial model and negative binomial model, to quantify the network survivability and compare them with the existing Poisson model. We give mathematical formulae of approximate network survivability for respective models and use them to carry out the sensitivity analysis on model parameters. Throughout numerical examples it is shown that the network survivability can change drastically when the number of network nodes is relatively small under a severe attack mode which is called the Black hole attack.

  • A Swarm Inspired Method for Efficient Data Transfer

    Yutaka KAWAI  Adil HASAN  Go IWAI  Takashi SASAKI  Yoshiyuki WATASE  

     
    PAPER-Network and Communication

      Page(s):
    2852-2859

    In this paper we report on an approach inspired by Ant Colony Optimization (ACO) to provide a fault tolerant and efficient means of transferring data in dynamic environments. We investigate the problem of distributing data between a client and server by using pheromone equations. Ants choose the best source of food by selecting the strongest pheromone trail leaving the nest. The pheromone decays over-time and needs to be continually reinforced to define the optimum route in a dynamic environment. This resembles the dynamic environment for the distribution of data between clients and servers. Our approach uses readily available network and server information to construct a pheromone that determines the best server from which to download data. We demonstrate that the approach is self-optimizing and capable of adapting to dynamic changes in the environment.

  • Traffic Engineering of Peer-Assisted Content Delivery Network with Content-Oriented Incentive Mechanism

    Naoya MAKI  Takayuki NISHIO  Ryoichi SHINKUMA  Tatsuya MORI  Noriaki KAMIYAMA  Ryoichi KAWAHARA  Tatsuro TAKAHASHI  

     
    PAPER-Network and Communication

      Page(s):
    2860-2869

    In content services where people purchase and download large-volume contents, minimizing network traffic is crucial for the service provider and the network operator since they want to lower the cost charged for bandwidth and the cost for network infrastructure, respectively. Traffic localization is an effective way of reducing network traffic. Network traffic is localized when a client can obtain the requested content files from other a near-by altruistic client instead of the source servers. The concept of the peer-assisted content distribution network (CDN) can reduce the overall traffic with this mechanism and enable service providers to minimize traffic without deploying or borrowing distributed storage. To localize traffic effectively, content files that are likely to be requested by many clients should be cached locally. This paper presents a novel traffic engineering scheme for peer-assisted CDN models. Its key idea is to control the behavior of clients by using content-oriented incentive mechanism. This approach enables us to optimize traffic flows by letting altruistic clients download content files that are most likely contributed to localizing traffic among clients. In order to let altruistic clients request the desired files, we combine content files while keeping the price equal to the one for a single content. This paper presents a solution for optimizing the selection of content files to be combined so that cross traffic in a network is minimized. We also give a model for analyzing the upper-bound performance and the numerical results.

  • Analytical Modeling of Network Throughput Prediction on the Internet

    Chunghan LEE  Hirotake ABE  Toshio HIROTSU  Kyoji UMEMURA  

     
    PAPER-Network and Communication

      Page(s):
    2870-2878

    Predicting network throughput is important for network-aware applications. Network throughput depends on a number of factors, and many throughput prediction methods have been proposed. However, many of these methods are suffering from the fact that a distribution of traffic fluctuation is unclear and the scale and the bandwidth of networks are rapidly increasing. Furthermore, virtual machines are used as platforms in many network research and services fields, and they can affect network measurement. A prediction method that uses pairs of differently sized connections has been proposed. This method, which we call connection pair, features a small probe transfer using the TCP that can be used to predict the throughput of a large data transfer. We focus on measurements, analyses, and modeling for precise prediction results. We first clarified that the actual throughput for the connection pair is non-linearly and monotonically changed with noise. Second, we built a previously proposed predictor using the same training data sets as for our proposed method, and it was unsuitable for considering the above characteristics. We propose a throughput prediction method based on the connection pair that uses ν-support vector regression and the polynomial kernel to deal with prediction models represented as a non-linear and continuous monotonic function. The prediction results of our method compared to those of the previous predictor are more accurate. Moreover, under an unstable network state, the drop in accuracy is also smaller than that of the previous predictor.

  • A Distributed TDMA Scheduling Algorithm with Distance-Measurement-Based Power Control for Sensor Networks

    Koji SATO  Shiro SAKATA  

     
    PAPER-Network and Communication

      Page(s):
    2879-2887

    This paper proposes a distributed TDMA slot scheduling algorithm with power control, which the slot allocation priority is controlled by distance measurement information. In the proposed scheme, Lamport's bakery algorithm for mutual exclusion is applied for prioritized slot allocation based on the distance measurement information between nodes, and a packet-based transmission power control scheme is combined. This aims at achieving media access control methods which can construct a local network practically by limiting the scope. The proposed scheme can be shown as a possible replacement of DRAND algorithm for Z-MAC scheme in a distance-measurement-oriented manner. The scheme can contribute to the efficient TDMA slot allocation.

  • Towards Cost-Effective P2P Traffic Classification in Cloud Environment

    Tao BAN  Shanqing GUO  Masashi ETO  Daisuke INOUE  Koji NAKAO  

     
    PAPER-Network and Communication

      Page(s):
    2888-2897

    Characterization of peer-to-peer (P2P) traffic is an essential step to develop workload models towards capacity planning and cyber-threat countermeasure over P2P networks. In this paper, we present a classification scheme for characterizing P2P file-sharing hosts based on transport layer statistical features. The proposed scheme is accessed on a virtualized environment that simulates a P2P-friendly cloud system. The system shows high accuracy in differentiating P2P file-sharing hosts from ordinary hosts. Its tunability regarding monitoring cost, system response time, and prediction accuracy is demonstrated by a series of experiments. Further study on feature selection is pursued to identify the most essential discriminators that contribute most to the classification. Experimental results show that an equally accurate system could be obtained using only 3 out of the 18 defined discriminators, which further reduces the monitoring cost and enhances the adaptability of the system.

  • Mapping Optimization of Affine Loop Nests for Reconfigurable Computing Architecture

    Dajiang LIU  Shouyi YIN  Chongyong YIN  Leibo LIU  Shaojun WEI  

     
    PAPER-Computer Architecture

      Page(s):
    2898-2907

    Reconfigurable computing system is a class of parallel architecture with the ability of computing in hardware to increase performance, while remaining much of flexibility of a software solution. This architecture is particularly suitable for running regular and compute-intensive tasks, nevertheless, most compute-intensive tasks spend most of their running time in nested loops. Polyhedron model is a powerful tool to give a reasonable transformation on such nested loops. In this paper, a number of issues are addressed towards the goal of optimization of affine loop nests for reconfigurable cell array (RCA), such as approach to make the most use of processing elements (PE) while minimizing the communication volume by loop transformation in polyhedron model, determination of tilling form by the intra-statement dependence analysis and determination of tilling size by the tilling form and the RCA size. Experimental results on a number of kernels demonstrate the effectiveness of the mapping optimization approaches developed. Compared with DFG-based optimization approach, the execution performances of 1-d jacobi and matrix multiplication are improved by 28% and 48.47%. Lastly, the run-time complexity is acceptable for the practical cases.

  • A Hybrid Photonic Burst-Switched Interconnection Network for Large-Scale Manycore System

    Quanyou FENG  Huanzhong LI  Wenhua DOU  

     
    PAPER-Computer Architecture

      Page(s):
    2908-2918

    With the trend towards increasing number of cores, for example, 1000 cores, interconnection network in manycore chips has become the critical bottleneck for providing communication infrastructures among on-chip cores as well as to off-chip memory. However, conventional on-chip mesh topologies do not scale up well because remote cores are generally separated by too many hops due to the small-radix routers within these networks. Moreover, projected scaling of electrical processor-memory network appears unlikely to meet the enormous demand for memory bandwidth while satisfying stringent power budget. Fortunately, recent advances in 3D integration technology and silicon photonics have provided potential solutions to these challenges. In this paper, we propose a hybrid photonic burst-switched interconnection network for large-scale manycore processors. We embed an electric low-diameter flattened butterfly into 3D stacking layers using integer linear programming, which results in a scalable low-latency network for inter-core packets exchange. Furthermore, we use photonic burst switching (PBS) for processor-memory network. PBS is an adaptation of optical burst switching for chip-scale communication, which can significantly improve the power efficiency by leveraging sub-wavelength, bandwidth-efficient optical switching. Using our physically-accurate network-level simulation environment, we examined the system feasibility and performances. Simulation results show that our hybrid network achieves up to 25% of network latency reduction and up to 6 times energy savings, compared to conventional on-chip mesh network and optical circuit-switched memory access scheme.

  • Design and Implementation of a Handshake Join Architecture on FPGA

    Yasin OGE  Takefumi MIYOSHI  Hideyuki KAWASHIMA  Tsutomu YOSHINAGA  

     
    PAPER-Computer Architecture

      Page(s):
    2919-2927

    A novel design is proposed to implement highly parallel stream join operators on a field-programmable gate array (FPGA), by examining handshake join algorithm for hardware implementation. The proposed design is evaluated in terms of the hardware resource usage, the maximum clock frequency, and the performance. Experimental results indicate that the proposed implementation can handle considerably high input rates, especially at low match rates. Results of simulation conducted to optimize size of buffers included in join and merge units give a new intuition regarding static and adaptive buffer tuning in handshake join.

  • Using Cacheline Reuse Characteristics for Prefetcher Throttling

    Hidetsugu IRIE  Takefumi MIYOSHI  Goki HONJO  Kei HIRAKI  Tsutomu YOSHINAGA  

     
    PAPER-Computer Architecture

      Page(s):
    2928-2938

    One of the significant issues of processor architecture is to overcome memory latency. Prefetching can greatly improve cache performance, but it has the drawback of cache pollution, unless its aggressiveness is properly set. Several techniques that have been proposed for prefetcher throttling use accuracy as a metric, but their robustness were not sufficient because of the variations in programs' working set sizes and cache capacities. In this study, we revisit prefetcher throttling from the viewpoint of data lifetime. Exploiting the characteristics of cache line reuse, we propose Cache-Convection-Control-based Prefetch Optimization Plus (CCCPO+), which enhances the feedback algorithm of our previous CCCPO. Evaluation results showed that this novel approach achieved a 30% improvement over no prefetching in the geometric mean of the SPEC CPU 2006 benchmark suite with 256 KB LLC, 1.8% over the latest prefetcher throttling, and 0.5% over our previous CCCPO. Moreover, it showed superior stability compared to related works, while lowering the hardware cost.

  • A Fully Programmable Reed-Solomon Decoder on a Multi-Core Processor Platform

    Bei HUANG  Kaidi YOU  Yun CHEN  Zhiyi YU  Xiaoyang ZENG  

     
    PAPER-Computer Architecture

      Page(s):
    2939-2947

    Reed-Solomon (RS) codes are widely used in digital communication and storage systems. Unlike usual VLSI approaches, this paper presents a high throughput fully programmable Reed-Solomon decoder on a multi-core processor. The multi-core processor platform is a 2-Dimension mesh array of Single Instruction Multiple Data (SIMD) cores, and it is well suited for digital communication applications. By fully extracting the parallelizable operations of the RS decoding process, we propose multiple optimization techniques to improve system throughput, including: task level parallelism on different cores, data level parallelism on each SIMD core, minimizing memory access, and route length minimized task mapping techniques. For RS(255, 239, 8), experimental results show that our 12-core implementation achieve a throughput of 4.35 Gbps, which is much better than several other published implementations. From the results, it is predictable that the throughput is linear with the number of cores by our approach.

  • An Optimal Resource Sharing in Hierarchical Virtual Organizations in the Grid

    Kyong Hoon KIM  Guy Martin TCHAMGOUE  Yong-Kee JUN  Wan Yeon LEE  

     
    LETTER

      Page(s):
    2948-2951

    In large-scale collaborative computing, users and resource providers organize various Virtual Organizations (VOs) to share resources and services. A VO organizes other sub-VOs for the purpose of achieving the VO goal, which forms hierarchical VO environments. VO participants agree upon a certain policies, such as resource sharing amount or user accesses. In this letter, we provide an optimal resource sharing mechanism in hierarchical VO environments under resource sharing agreements. The proposed algorithm enhances resource utilization and reduces mean response time of each user.

  • Hybrid Parallel Implementation of Inverse Matrix Computation by SMW Formula for Interactive Simulation

    Shotaro IWANAGA  Shinji FUKUMA  Shin-ichiro MORI  

     
    LETTER

      Page(s):
    2952-2953

    In this paper, a hybrid parallel implementation of inverse matrix computation using SMW formula is proposed. By aggregating the memory bandwidth in the hybrid parallel implementation, the bottleneck due to the memory bandwidth limitation in the authors previous multicore implementation has been dissolved. More than 8 times of speed up is also achieved with dual-core 8-nodes implementation which leads more than 20 simulation steps per second, or near real-time performance.

  • Secure Ranking over Encrypted Documents

    Jiuling ZHANG  Beixing DENG  Xing LI  Xiao-lei ZHANG  

     
    LETTER

      Page(s):
    2954-2955

    Ranking the encrypted documents stored on secure cloud computing servers is becoming prominent with the expansion of the encrypted data collection. In our work, order preserving encryption is employed to pre-rank the encrypted documents. Paillier's additive homomorphic encryption is used to re-rank the top pre-ranked documents of some considerate scale.

  • Scalable Cache-Optimized Concurrent FIFO Queue for Multicore Architectures

    Changwoo MIN  Hyung Kook JUN  Won Tae KIM  Young Ik EOM  

     
    LETTER

      Page(s):
    2956-2957

    A concurrent FIFO queue is a widely used fundamental data structure for parallelizing software. In this letter, we introduce a novel concurrent FIFO queue algorithm for multicore architecture. We achieve better scalability by reducing contention among concurrent threads, and improve performance by optimizing cache-line usage. Experimental results on a server with eight cores show that our algorithm outperforms state-of-the-art algorithms by a factor of two.

  • A Push-Pull Chunk Delivery for Mesh-Based P2P Live Streaming

    Chee Yik KEONG  Poo Kuan HOONG  Choo-Yee TING  

     
    LETTER

      Page(s):
    2958-2959

    In this paper, we propose an adaptive chunk scheduling for mesh-based peer-to-peer live streaming system, a hybrid class of push and pull chunk delivery approach. The proposed rule-based push-pull scheduler simultaneously pull video chunk from lower latency peers to fill up missing chunks and push video chunk adaptively for rapid chunk delivery. We performed comparative simulation study against rarest first push-pull and status-wise push-pull to prove the efficiency of our proposed algorithm. Mesh-push is made possible by effectively exploiting the information through buffer map exchange. The findings of performance evaluation have suggested a better video continuity and achieved lower source to end delay.

  • Regular Section
  • Integer Programming-Based Approach to Attractor Detection and Control of Boolean Networks

    Tatsuya AKUTSU  Yang ZHAO  Morihiro HAYASHIDA  Takeyuki TAMURA  

     
    PAPER-Fundamentals of Information Systems

      Page(s):
    2960-2970

    The Boolean network (BN) can be used to create discrete mathematical models of gene regulatory networks. In this paper, we consider three problems on BNs that are known to be NP-hard: detection of a singleton attractor, finding a control strategy that shifts a BN from a given initial state to the desired state, and control of attractors. We propose integer programming-based methods which solve these problems in a unified manner. Then, we present results of computational experiments which suggest that the proposed methods are useful for solving moderate size instances of these problems. We also show that control of attractors is -hard, which suggests that control of attractors is harder than the other two problems.

  • Two-Level Service-Oriented Architecture Based on Product-Line

    Joonseok PARK  Mikyeong MOON  Keunhyuk YEOM  

     
    PAPER-Software Engineering

      Page(s):
    2971-2981

    Software product-line engineering is the successful reuse of technology when applied to component-based software development. The main concept and structure of this technology is developing reusable core assets by applying commonality and variability, and then developing new software reusing these core assets. Recently, the emergence of service-oriented environments, called SOA, has provided flexible reuse environments by reusing pre-developed component structure as service units; this is platform-independent and can integrate into heterogeneous environments. The core asset of an SOA is the service. Therefore, we can increase the reusability of an SOA by combining it with the concept of a product-line. These days, there exists research that combines SOA and product-lines, taking into account reusability. However, current research does not consider the interaction between the provider and consumer in SOA environments. Furthermore, this research tends to focus on more fragmentary aspects of product-line engineering, such as modeling and proposing variability in services. In this paper, we propose a mechanism named 2-Level SOA, including a supporting environment. This proposed mechanism deploys and manages the reusable service. In addition, by reusing and customizing this reusable service, we can develop and generate new services. Our proposed approach provides a structure to maximize the flexibility of SOA, develops services that consider systematic reuse, and constructs service-oriented applications by reusing this pre-developed reusable service. Therefore, our approach can increase both efficiency and productivity when developing service-oriented applications.

  • Pro-Detection of Atrial Fibrillation Using Mixture of Experts

    Mohamed Ezzeldin A. BASHIR  Kwang Sun RYU  Unil YUN  Keun Ho RYU  

     
    PAPER-Data Engineering, Web Information Systems

      Page(s):
    2982-2990

    A reliable detection of atrial fibrillation (AF) in Electrocardiogram (ECG) monitoring systems is significant for early treatment and health risk reduction. Various ECG mining and analysis studies have addressed a wide variety of clinical and technical issues. However, there is still room for improvement mostly in two areas. First, the morphological descriptors not only between different patients or patient clusters but also within the same patient are potentially changing. As a result, the model constructed using an old training data no longer needs to be adjusted in order to identify new concepts. Second, the number and types of ECG parameters necessary for detecting AF arrhythmia with high quality encounter a massive number of challenges in relation to computational effort and time consumption. We proposed a mixture technique that caters to these limitations. It includes an active learning method in conjunction with an ECG parameter customization technique to achieve a better AF arrhythmia detection in real-time applications. The performance of our proposed technique showed a sensitivity of 95.2%, a specificity of 99.6%, and an overall accuracy of 99.2%.

  • A Trust Distributed DRM System Using Smart Cards

    Ming-Kung SUN  Michael CHANG  Hsiao-Ching LIN  Chi-Sung LAIH  Hui-Tang LIN  

     
    PAPER-Data Engineering, Web Information Systems

      Page(s):
    2991-3000

    Digital Rights Management (DRM) ensures that the usage of digital media adheres to the intentions of the copyright holder and prevents the unauthorized modification or distribution of media. Due to the widespread adoption of digital content use, DRM has received a fair amount of attention and has seen implementation in many commercial models. Although many DRM schemes have been introduced in the literature, they still suffer from some security issues and may not guarantee the quality of performance. In this paper, we propose a trust-distributed DRM model to provide improvements for realistic DRM environments to bring more functionality to users. We use the features of the smart cards to provide an option of anonymity for the consumer while continuing to protect the rights of the copyright holder and the financial interests of the media industry. We also classify the security criteria of DRM systems and show that our proposed smart card based DRM scheme satisfies all of these criteria.

  • Test Pattern Ordering and Selection for High Quality Test Set under Constraints

    Michiko INOUE  Akira TAKETANI  Tomokazu YONEDA  Hideo FUJIWARA  

     
    PAPER-Dependable Computing

      Page(s):
    3001-3009

    Nano-scale VLSI design is facing the problems of increased test data volume. Small delay defects are becoming possible sources of test escapes, and high delay test quality and therefore a greater volume of test data are required. The increased test data volume requires more tester memory and test application time, and both result in test cost inflation. Test pattern ordering gives a practical solution to reduce test cost, where test patterns are ordered so that more defects can be detected as early as possible. In this paper, we propose a test pattern ordering method based on SDQL (Statistical Delay Quality Level), which is a measure of delay test quality considering small delay defects. Our proposed method orders test patterns so that SDQL shrinks fast, which means more delay defects can be detected as early as possible. The proposed method efficiently orders test patterns with minimal usage of time-consuming timing-aware fault simulation. Experimental results demonstrate that our method can obtain test pattern ordering within a reasonable time, and also suggest how to prepare test sets suitable as inputs of test pattern ordering.

  • Incremental Non-Gaussian Analysis on Multivariate EEG Signal Data

    Kam Swee NG  Hyung-Jeong YANG  Soo-Hyung KIM  Sun-Hee KIM  

     
    PAPER-Artificial Intelligence, Data Mining

      Page(s):
    3010-3016

    In this paper, we propose a novel incremental method for discovering latent variables from multivariate data with high efficiency. It integrates non-Gaussianity and an adaptive incremental model in an unsupervised way to extract informative features. Our proposed method discovers a small number of compact features from a very large number of features and can still achieve good predictive performance in EEG signals. The promising EEG signal classification results from our experiments prove that this approach can successfully extract important features. Our proposed method also has low memory requirements and computational costs.

  • Privacy Preserving Using Dummy Data for Set Operations in Itemset Mining Implemented with ZDDs

    Keisuke OTAKI  Mahito SUGIYAMA  Akihiro YAMAMOTO  

     
    PAPER-Artificial Intelligence, Data Mining

      Page(s):
    3017-3025

    We present a privacy preserving method based on inserting dummy data into original data on the data structure called Zero-suppressed BDDs (ZDDs). Our task is distributed itemset mining, which is frequent itemset mining from horizontally partitioned databases stored in distributed places called sites. We focus on the fundamental case in which there are two sites and each site has a database managed by its owner. By dividing the process of distributed itemset mining into the set union and the set intersection, we show how to make the operations secure in the sense of undistinguishability of data, which is our criterion for privacy preserving based on the already proposed criterion, p-indistinguishability. Our method conceals the original data in each operation by inserting dummy data, where ZDDs, BDD-based directed acyclic graphs, are adopted to represent sets of itemsets compactly and to implement the set operations in constructing the distributed itemset mining process. As far as we know, this is the first technique which gives a concrete representation of sets of itemsets and an implementation of set operations for privacy preserving in distributed itemset mining. Our experiments show that the proposed method provides undistinguishability of dummy data. Furthermore, we compare our method with Secure Multiparty Computation (SMC), which is one of the well-known techniques of secure computation.

  • Implicit Influencing Group Discovery from Mobile Applications Usage

    Masaji KATAGIRI  Minoru ETOH  

     
    PAPER-Office Information Systems, e-Business Modeling

      Page(s):
    3026-3036

    This paper presents an algorithmic approach to acquiring the influencing relationships among users by discovering implicit influencing group structure from smartphone usage. The method assumes that a time series of users' application downloads and activations can be represented by individual inter-personal influence factors. To achieve better predictive performance and also to avoid over-fitting, a latent feature model is employed. The method tries to extract the latent structures by monitoring cross validating predictive performances on approximated influence matrices with reduced ranks, which are generated based on an initial influence matrix obtained from a training set. The method adopts Nonnegative Matrix Factorization (NMF) to reduce the influence matrix dimension and thus to extract the latent features. To validate and demonstrate its ability, about 160 university students voluntarily participated in a mobile application usage monitoring experiment. An empirical study on real collected data reveals that the influencing structure consisted of six influencing groups with two types of mutual influence, i.e. intra-group influence and inter-group influence. The results also highlight the importance of sparseness control on NMF for discovering latent influencing groups. The obtained influencing structure provides better predictive performance than state-of-the-art collaborative filtering methods as well as conventional methods such as user-based collaborative filtering techniques and simple popularity.

  • Classification of Prostate Histopathology Images Based on Multifractal Analysis

    Chamidu ATUPELAGE  Hiroshi NAGAHASHI  Masahiro YAMAGUCHI  Tokiya ABE  Akinori HASHIGUCHI  Michiie SAKAMOTO  

     
    PAPER-Pattern Recognition

      Page(s):
    3037-3045

    Histopathology is a microscopic anatomical study of body tissues and widely used as a cancer diagnosing method. Generally, pathologists examine the structural deviation of cellular and sub-cellular components to diagnose the malignancy of body tissues. These judgments may often subjective to pathologists' skills and personal experiences. However, computational diagnosis tools may circumvent these limitations and improve the reliability of the diagnosis decisions. This paper proposes a prostate image classification method by extracting textural behavior using multifractal analysis. Fractal geometry is used to describe the complexity of self-similar structures as a non-integer exponent called fractal dimension. Natural complex structures (or images) are not self-similar, thus a single exponent (the fractal dimension) may not be adequate to describe the complexity of such structures. Multifractal analysis technique has been introduced to describe the complexity as a spectrum of fractal dimensions. Based on multifractal computation of digital imaging, we obtain two textural feature descriptors; i) local irregularity: α and ii) global regularity: f(α). We exploit these multifractal feature descriptors with a texton dictionary based classification model to discriminate cancer/non-cancer tissues of histopathology images of H&E stained prostate biopsy specimens. Moreover, we examine other three feature descriptors; Gabor filter bank, LM filter bank and Haralick features to benchmark the performance of the proposed method. Experiment results indicated that the performance of the proposed multifractal feature descriptor outperforms the other feature descriptors by achieving over 94% of correct classification accuracy.

  • A Forced Alignment Based Approach for English Passage Reading Assessment

    Junbo ZHANG  Fuping PAN  Bin DONG  Qingwei ZHAO  Yonghong YAN  

     
    PAPER-Speech and Hearing

      Page(s):
    3046-3052

    This paper presents our investigation into improving the performance of our previous automatic reading quality assessment system. The method of the baseline system is calculating the average value of the Phone Log-Posterior Probability (PLPP) of all phones in the voice to be assessed, and the average value is used as the reading quality assessment feature. In this paper, we presents three improvements. First, we cluster the triphones, and then calculate the average value of the normalized PLPP for each classification separately, and use this average values as the multi-dimensional assessment features instead of the original one-dimensional assessment feature. This method is simple but effective, which made the score difference of the machine scoring and manual scoring decrease by 30.2% relatively. Second, in order to assess the reading rhythm, we train Gaussian Mixture Models (GMM), which contain the information of each triphone's relative duration under standard pronunciation. Using the GMM, we can calculate the probability that the relative duration of each phone is conform to the standard pronunciation, and the average value of the probabilities is added to the assessment feature vector as a dimension of feature, which decreased the score difference between the machine scoring and manual scoring by 9.7% relatively. Third, we detect Filled Pauses (FP) by analyzing the formant curve, and then calculate the relative duration of FP, and add the relative duration of FP to the assessment feature vector as a dimension of feature. This method made the score difference between the machine scoring and manual scoring be further decreased by 10.2% relatively. Finally, when the feature vector extracted by the three methods are used together, the score difference between the machine scoring and manual scoring was decreased by 43.9% relatively compared to the baseline system.

  • Mastering Signal Processing in MPEG SAOC

    Kwangki KIM  Minsoo HAHN  Jinsul KIM  

     
    PAPER-Speech and Hearing

      Page(s):
    3053-3059

    MPEG spatial audio object coding (SAOC) is a new audio coding standard which efficiently represents various audio objects as a down-mix signal and spatial parameters. MPEG SAOC has a backward compatibility with existing playback systems for the down-mix signal. If a mastering signal is used for providing CD-like sound quality instead of the down-mix signal, an output signal decoded with the mastering signal may be easily degraded due to the difference between the down-mix and the mastering signals. To successfully use the mastering signal in MPEG SAOC, the difference between two signals should be eliminated. As a simple mastering signal processing, we propose a mastering signal processing using the mastering down-mix gain (MDG) which is similar to the arbitrary down-mix gain of MPEG Surround. Also, we propose an enhanced mastering signal processing using the MDG bias in order to reduce quantization errors of the MDG. Experimental results show that the proposed schemes can improve sound quality of the output signal decoded with the mastering signal. Especially, the enhanced method shows better performance than the simple method in the aspects of the quantization errors and the sound quality.

  • Incorporating Contextual Information into Bag-of-Visual-Words Framework for Effective Object Categorization

    Shuang BAI  Tetsuya MATSUMOTO  Yoshinori TAKEUCHI  Hiroaki KUDO  Noboru OHNISHI  

     
    PAPER-Image Recognition, Computer Vision

      Page(s):
    3060-3068

    Bag of visual words is a promising approach to object categorization. However, in this framework, ambiguity exists in patch encoding by visual words, due to information loss caused by vector quantization. In this paper, we propose to incorporate patch-level contextual information into bag of visual words for reducing the ambiguity mentioned above. To achieve this goal, we construct a hierarchical codebook in which visual words in the upper hierarchy contain contextual information of visual words in the lower hierarchy. In the proposed method, from each sample point we extract patches of different scales, all of which are described by the SIFT descriptor. Then, we build the hierarchical codebook in which visual words created from coarse scale patches are put in the upper hierarchy, while visual words created from fine scale patches are put in the lower hierarchy. At the same time, by employing the corresponding relationship among these extracted patches, visual words in different hierarchies are associated with each other. After that, we design a method to assign patch pairs, whose patches are extracted from the same sample point, to the constructed codebook. Furthermore, to utilize image information effectively, we implement the proposed method based on two sets of features which are extracted through different sampling strategies and fuse them using a probabilistic approach. Finally, we evaluate the proposed method on dataset Caltech 101 and dataset Caltech 256. Experimental results demonstrate the effectiveness of the proposed method.

  • 3D Reconstruction with Globally-Optimized Point Selection

    Norimichi UKITA  Kazuki MATSUDA  

     
    PAPER-Image Recognition, Computer Vision

      Page(s):
    3069-3077

    This paper proposes a method for reconstructing accurate 3D surface points. To this end, robust and dense reconstruction with Shape-from-Silhouettes (SfS) and accurate multiview stereo are integrated. Unlike gradual shape shrinking and/or bruteforce large space search by existing space carving approaches, our method obtains 3D points by SfS and stereo independently, and then selects correct ones from them. The point selection is achieved in accordance with spatial consistency and smoothness of 3D point coordinates and normals. The globally optimized points are selected by graph-cuts. Experimental results with several subjects containing complex shapes demonstrate that our method outperforms existing approaches and our previous method.

  • Face Representation and Recognition with Local Curvelet Patterns

    Wei ZHOU  Alireza AHRARY  Sei-ichiro KAMATA  

     
    PAPER-Image Recognition, Computer Vision

      Page(s):
    3078-3087

    In this paper, we propose Local Curvelet Binary Patterns (LCBP) and Learned Local Curvelet Patterns (LLCP) for presenting the local features of facial images. The proposed methods are based on Curvelet transform which can overcome the weakness of traditional Gabor wavelets in higher dimensions, and better capture the curve singularities and hyperplane singularities of facial images. LCBP can be regarded as a combination of Curvelet features and LBP operator while LLCP designs several learned codebooks from patch sets, which are constructed by sampling patches from Curvelet filtered facial images. Each facial image can be encoded into multiple pattern maps and block-based histograms of these patterns are concatenated into an histogram sequence to be used as a face descriptor. During the face representation phase, one input patch is encoded by one pattern in LCBP while multi-patterns in LLCP. Finally, an effective classifier called Weighted Histogram Spatially constrained Earth Mover's Distance (WHSEMD) which utilizes the discriminative powers of different facial parts, the different patterns and the spatial information of face is proposed. Performance assessment in face recognition and gender estimation under different challenges shows that the proposed approaches are superior than traditional ones.

  • Keys Distributing Optimization of CP-ABE Based Access Control in Cryptographic Cloud Storage

    Yong CHENG  Jiangchun REN  Zhiying WANG  Songzhu MEI  Jie ZHOU  

     
    LETTER-Data Engineering, Web Information Systems

      Page(s):
    3088-3091

    In this letter, we introduce a novel keys distribution optimization scheme for CP-ABE based access control. This scheme integrates roles, role hierarchies and objects grouping to accelerate keys distribution, meanwhile the CP-ABE encrypting overhead is reduced by adopting deterministic cryptographic function. Experiments show that our scheme obtains noticeable improvement over the original one, especially when the number of objects is much greater than that of users.

  • Geographic Routing Algorithm with Location Errors

    Yuanwei JING  Yan WANG  

     
    LETTER-Information Network

      Page(s):
    3092-3096

    Geographic routing uses the geographical location information provided by nodes to make routing decisions. However, the nodes can not obtain accurate location information due to the effect of measurement error. A new routing strategy using maximum expected distance and angle (MEDA) algorithm is proposed to improve the performance and promote the successive transmission rate. We firstly introduce the expected distance and angle, and then we employ the principal component analysis to construct the object function for selecting the next hop node. We compare the proposed algorithm with maximum expectation within transmission range (MER) and greedy routing scheme (GRS) algorithms. Simulation results show that the proposed MEDA algorithm outperforms the MER and GRS algorithms with higher successive transmission rate.

  • A Perceptually Adaptive QIM Scheme for Efficient Watermark Synchronization

    Hwai-Tsu HU  Chu YU  

     
    LETTER-Information Network

      Page(s):
    3097-3100

    This study presents an adaptive quantization index modulation scheme applicable on a small audio segment, which in turn allows the watermarking technique to withstand time-shifting and cropping attacks. The exploitation of auditory masking further ensures the robustness and imperceptibility of the embedded watermark. Experimental results confirmed the efficacy of this scheme against common signal processing attacks.

  • Software FMEA for Safety-Critical System Based on Co-analysis of System Model and Software Model

    Guoqi LI  

     
    LETTER-Dependable Computing

      Page(s):
    3101-3105

    Software FMEA is valuable and practically used for embedded software of safety-critical systems. In this paper, a novel method for Software FMEA is presented based on co-analysis of system model and software model. The method is hopeful to detect quantitative and dynamic effects by a targeted software failure. A typical application of the method is provided to illustrate the procedure and the applicable scenarios. In addition, a pattern is refined from the application for further reuse.

  • On d-Asymptotics for High-Dimensional Discriminant Analysis with Different Variance-Covariance Matrices

    Takanori AYANO  Joe SUZUKI  

     
    LETTER-Artificial Intelligence, Data Mining

      Page(s):
    3106-3108

    In this paper we consider the two-class classification problem with high-dimensional data. It is important to find a class of distributions such that we cannot expect good performance in classification for any classifier. In this paper, when two population variance-covariance matrices are different, we give a reasonable sufficient condition for distributions such that the misclassification rate converges to the worst value as the dimension of data tends to infinity for any classifier. Our results can give guidelines to decide whether or not an experiment is worth performing in many fields such as bioinformatics.

  • Approximate Nearest Neighbor Based Feature Quantization Algorithm for Robust Hashing

    Yue nan LI  Hao LUO  

     
    LETTER-Image Processing and Video Processing

      Page(s):
    3109-3112

    In this letter, the problem of feature quantization in robust hashing is studied from the perspective of approximate nearest neighbor (ANN). We model the features of perceptually identical media as ANNs in the feature set and show that ANN indexing can well meet the robustness and discrimination requirements of feature quantization. A feature quantization algorithm is then developed by exploiting the random-projection based ANN indexing. For performance study, the distortion tolerance and randomness of the quantizer are analytically derived. Experimental results demonstrate that the proposed work is superior to state-of-the-art quantizers, and its random nature can provide robust hashing with security against hash forgery.

  • A Novel Expression Deformation Model for 3D Face Recognition

    Chuanjun WANG  Li LI  Xuefeng BAI  Xiamu NIU  

     
    LETTER-Image Recognition, Computer Vision

      Page(s):
    3113-3116

    The accuracy of non-rigid 3D face recognition is highly influenced by the capability to model the expression deformations. Given a training set of non-neutral and neutral 3D face scan pairs from the same subject, a set of Fourier series coefficients for each face scan is reconstructed. The residues on each frequency of the Fourier series between the finely aligned pairs contain the expression deformation patterns and PCA is applied to learn these patterns. The proposed expression deformation model is then built by the eigenvectors with top eigenvalues from PCA. Recognition experiments are conducted on a 3D face database that features a rich set of facial expression deformations, and experimental results demonstrate the feasibility and merits of the proposed model.

  • A Design of Genetically Optimized Linguistic Models

    Keun-Chang KWAK  

     
    LETTER-Biocybernetics, Neurocomputing

      Page(s):
    3117-3120

    In this paper, we propose a method for designing genetically optimized Linguistic Models (LM) with the aid of fuzzy granulation. The fundamental idea of LM introduced by Pedrycz is followed and their design framework based on Genetic Algorithm (GA) is enhanced. A LM is designed by the use of information granulation realized via Context-based Fuzzy C-Means (CFCM) clustering. This clustering technique builds information granules represented as a fuzzy set. However, it is difficult to optimize the number of linguistic contexts, the number of clusters generated by each context, and the weighting exponent. Thus, we perform simultaneous optimization of design parameters linking information granules in the input and output spaces based on GA. Experiments on the coagulant dosing process in a water purification plant reveal that the proposed method shows better performance than the previous works and LM itself.