The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] OMP(3945hit)

1441-1460hit(3945hit)

  • Compact Architecture for ASIC and FPGA Implementation of the KASUMI Block Cipher

    Dai YAMAMOTO  Kouichi ITOH  Jun YAJIMA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E94-A No:12
      Page(s):
    2628-2638

    Compact design is very important for embedded systems such as wireless sensor nodes, RFID tags and mobile devices because of their limited hardware (H/W) resources. This paper proposes a compact H/W implementation for the KASUMI block cipher, which is the 3GPP standard encryption algorithm. In [8] and [9], Yamamoto et al. proposed a method of reducing the register size for the MISTY1 FO function (YYI-08), and implemented very compact MISTY1 H/W. In this paper we aim to implement the smallest KASUMI H/W to date by applying a YYI-08 configuration to KASUMI, whose FO function has a similar structure to that of MISTY1. However, we discovered that straightforward application of YYI-08 raises problems. We therefore propose a new YYI-08 configuration improved for KASUMI and the compact H/W architecture. The new YYI-08 configuration consists of new FL function calculation schemes and a suitable calculation order. According to our logic synthesis on a 0.11-µm ASIC process, the gate size is 2.99 K gates, which, to our knowledge, is the smallest to date.

  • Radio Interface Technologies for Cooperative Transmission in 3GPP LTE-Advanced Open Access

    Tetsushi ABE  Yoshihisa KISHIYAMA  Yoshikazu KAKURA  Daichi IMAMURA  

     
    INVITED PAPER

      Vol:
    E94-B No:12
      Page(s):
    3202-3210

    This paper presents an overview of radio interface technologies for cooperative transmission in 3GPP LTE-Advanced, i.e., coordinated multi-point (CoMP) transmission, enhanced inter-cell interference coordination (eICIC) for heterogeneous deployments, and relay transmission techniques. This paper covers not only the technical components in the 3GPP specifications that have already been released, but also those that were discussed in the Study Item phase of LTE-Advanced, and those that are currently being discussed in 3GPP for potential specification in future LTE releases.

  • Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster

    Junichi OHMURA  Takefumi MIYOSHI  Hidetsugu IRIE  Tsutomu YOSHINAGA  

     
    PAPER

      Vol:
    E94-D No:12
      Page(s):
    2319-2327

    In this paper, we propose an approach to obtaining enhanced performance of the Linpack benchmark on a GPU-accelerated PC cluster connected via relatively slow inter-node connections. For one node with a quad-core Intel Xeon W3520 processor and a NVIDIA Tesla C1060 GPU card, we implement a CPU–GPU parallel double-precision general matrix–matrix multiplication (dgemm) operation, and achieve a performance improvement of 34% compared with the GPU-only case and 64% compared with the CPU-only case. For an entire 16-node cluster, each node of which is the same as the above and is connected with two gigabit Ethernet links, we use a computation-communication overlap scheme with GPU acceleration for the Linpack benchmark, and achieve a performance improvement of 28% compared with the GPU-accelerated high-performance Linpack benchmark (HPL) without overlapping. Our overlap GPU acceleration solution uses overlaps in which the main inter-node communication and data transfer to the GPU device memory are overlapped with the main computation task on the CPU cores. These overlaps use multi-core processors, which almost all of today's high-performance computers use. In particular, as well as using a CPU core for communication tasks, we also simultaneously use other CPU cores and the GPU for computation tasks. In order to enable overlap between inter-node communication and computation tasks, we eliminate their close dependence by breaking the main computation task into smaller tasks and rescheduling. Based on a scheme in which part of the CPU computation power is simultaneously used for tasks other than computation tasks, we experimentally find the optimal computation ratio for CPUs; this ratio differs from the case of parallel dgemm operation of one node.

  • Analyzing Emergence in Complex Adaptive System: A Sign-Based Model of Stigmergy

    Chuanjun REN  Xiaomin JIA  Hongbing HUANG  Shiyao JIN  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E94-D No:11
      Page(s):
    2212-2218

    The description and analysis of emergence in complex adaptive system has recently become a topic of great interest in the field of systems, and lots of ideas and methods have been proposed. A Sign-based model of Stigmergy is proposed in this paper. Stigmergy is widely used in complex systems. We pick up “Sign” as a key notion to understand it. A definition of “Sign” is given, which reveals the Sign's nature and exploit the significations and relationships carried by the “Sign”. Then, a Sign-based model of Stigmergy is consequently developed, which captures the essential characteristics of Stigmergy. The basic architecture of Stigmergy as well as its constituents are presented and then discussed. The syntax and operational semantics of Stigmergy configurations are given. We illustrate the methodology of analyzing emergence in CAS by using our model.

  • Strength-Strength and Strength-Degree Correlation Measures for Directed Weighted Complex Network Analysis

    Shi-Ze GUO  Zhe-Ming LU  Zhe CHEN  Hao LUO  

     
    LETTER-Artificial Intelligence, Data Mining

      Vol:
    E94-D No:11
      Page(s):
    2284-2287

    This Letter defines thirteen useful correlation measures for directed weighted complex network analysis. First, in-strength and out-strength are defined for each node in the directed weighted network. Then, one node-based strength-strength correlation measure and four arc-based strength-strength correlation measures are defined. In addition, considering that each node is associated with in-degree, out-degree, in-strength and out-strength, four node-based strength-degree correlation measures and four arc-based strength-degree correlation measures are defined. Finally, we use these measures to analyze the world trade network and the food web. The results demonstrate the effectiveness of the proposed measures for directed weighted networks.

  • Rethinking Business Model in Cloud Computing: Concept and Example

    Ping DU  Akihiro NAKAO  

     
    PAPER

      Vol:
    E94-D No:11
      Page(s):
    2119-2128

    In cloud computing, a cloud user pays proportionally to the amount of the consumed resources (bandwidth, memory, and CPU cycles etc.). We posit that such a cloud computing system is vulnerable to DDoS (Distributed Denial-of-Service) attacks against quota. Attackers can force a cloud user to pay more and more money by exhausting its quota without crippling its execution system or congesting links. In this paper, we address this issue and claim that cloud should enable users to pay only for their admitted traffic. We design and prototype such a charging model in a CoreLab testbed infrastructure and show an example application.

  • Compression of Dynamic 3D Meshes and Progressive Displaying

    Bin-Shyan JONG  Chi-Kang KAO  Juin-Ling TSENG  Tsong-Wuu LIN  

     
    PAPER-Computer Graphics

      Vol:
    E94-D No:11
      Page(s):
    2271-2279

    This paper introduces a new dynamic 3D mesh representation that provides 3D animation support of progressive display and drastically reduces the amount of storage space required for 3D animation. The primary purpose of progressive display is to allow viewers to get animation as quickly as possible, rather than having to wait until all data has been downloaded. In other words, this method allows for the simultaneous transmission and playing of 3D animation. Experiments show that coarser 3D animation could be reconstructed with as little as 150 KB of data transferred. Using the sustained transmission of refined operators, viewers feel that resolution approaches that of the original animation. The methods used in this study are based on a compression technique commonly used in 3D animation - clustered principle component analysis, using the linearly independent rules of principle components, so that animation can be stored using smaller amounts of data. This method can be coupled with streaming technology to reconstruct animation through iterative updating. Each principle component is a portion of the streaming data to be stored and transmitted after compression, as well as a refined operator during the animation update process. This paper considers errors and rate-distortion optimization, and introduces weighted progressive transmitting (WPT), using refined sequences from optimized principle components, so that each refinement yields an increase in quality. In other words, with identical data size, this method allows each principle component to reduce allowable error and provide the highest quality 3D animation.

  • Decision Tree-Based Acoustic Models for Speech Recognition with Improved Smoothness

    Masami AKAMINE  Jitendra AJMERA  

     
    PAPER-Speech and Hearing

      Vol:
    E94-D No:11
      Page(s):
    2250-2258

    This paper proposes likelihood smoothing techniques to improve decision tree-based acoustic models, where decision trees are used as replacements for Gaussian mixture models to compute the observation likelihoods for a given HMM state in a speech recognition system. Decision trees have a number of advantageous properties, such as not imposing restrictions on the number or types of features, and automatically performing feature selection. This paper describes basic configurations of decision tree-based acoustic models and proposes two methods to improve the robustness of the basic model: DT mixture models and soft decisions for continuous features. Experimental results for the Aurora 2 speech database show that a system using decision trees offers state-of-the-art performance, even without taking advantage of its full potential and soft decisions improve the performance of DT-based acoustic models with 16.8% relative error rate reduction over hard decisions.

  • Complexity Reduced Transmit Diversity Scheme for Time Domain Synchronous OFDM Systems

    Zhaocheng WANG  Jintao WANG  Linglong DAI  

     
    PAPER-Terrestrial Wireless Communication/Broadcasting Technologies

      Vol:
    E94-B No:11
      Page(s):
    3116-3124

    This paper proposes a novel scheme to reduce the complexity of existing transmit diversity solutions to time domain synchronous OFDM (TDS-OFDM). The space shifted constant amplitude zero autocorrelation (CAZAC) sequence based preamble is proposed for channel estimation. Two flexible frame structures are proposed for adaptive system design as well as cyclicity reconstruction of the received inverse discrete Fourier transform (IDFT) block. With regard to channel estimation and cyclicity reconstruction, the complexity of the proposed scheme is only around 7.20% of that of the conventional solutions. Simulation results demonstrate that better bit error rate (BER) performance can be achieved over doubly selective channels.

  • A Fast Systematic Optimized Comparison Algorithm for CNU Design of LDPC Decoders

    Jui-Hui HUNG  Sau-Gee CHEN  

     
    PAPER-Communication Theory and Signals

      Vol:
    E94-A No:11
      Page(s):
    2246-2253

    This work first investigates two existing check node unit (CNU) architectures for LDPC decoding: self-message-excluded CNU (SME-CNU) and two-minimum CNU (TM-CNU) architectures, and analyzes their area and timing complexities based on various realization approaches. Compared to TM-CNU architecture, SME-CNU architecture is faster in speed but with much higher complexity for comparison operations. To overcome this problem, this work proposes a novel systematic optimization algorithm for comparison operations required by SME-CNU architectures. The algorithm can automatically synthesize an optimized fast comparison operation that guarantees a shortest comparison delay time and a minimized total number of 2-input comparators. High speed is achieved by adopting parallel divide-and-conquer comparison operations, while the required comparators are minimized by developing a novel set construction algorithm that maximizes shareable comparison operations. As a result, the proposed design significantly reduces the required number of comparison operations, compared to conventional SME-CNU architectures, under the condition that both designs have the same speed performance. Besides, our preliminary hardware simulations show that the proposed design has comparable hardware complexity to low-complexity TM-CNU architectures.

  • A Low Complexity 1D-Based Successive GSC Structure for 2D Adaptive Beamformer Implementation

    Yung-Yi WANG  

     
    LETTER-Digital Signal Processing

      Vol:
    E94-A No:11
      Page(s):
    2448-2452

    In this study, we propose a one dimensional (1D) based successive generalized sidelobe canceller (GSC) structure for the implementation of 2D adaptive beamformers using a uniform rectangular antenna array (URA). The proposed approach takes advantage of the URA feature that the 2D spatial signature of the receive signal can be decomposed into an outer product of two 1D spatial signatures. The 1D spatial signatures lie in the column and the row spaces of the receive signal matrix, respectively. It follows that the interferers can be successively eliminated by two rounds of 1D-based GSC structure. As compared to the conventional 2D-GSC structure, computer simulations show that in addition to having significantly low computational complexity, the proposed adaptive approach possesses higher convergence rate.

  • Parallel Implementation Strategy for CoHOG-Based Pedestrian Detection Using a Multi-Core Processor

    Ryusuke MIYAMOTO  Hiroki SUGANO  

     
    PAPER-Image Processing

      Vol:
    E94-A No:11
      Page(s):
    2315-2322

    Pedestrian detection from visual images, which is used for driver assistance or video surveillance, is a recent challenging problem. Co-occurrence histograms of oriented gradients (CoHOG) is a powerful feature descriptor for pedestrian detection and achieves the highest detection accuracy. However, its calculation cost is too large to calculate it in real-time on state-of-the-art processors. In this paper, to obtain optimal parallel implementation for an NVIDIA GPU, several kinds of parallelism of CoHOG-based detection are shown and evaluated suitability for implementation. The experimental result shows that the detection process can be performed at 16.5 fps in QVGA images on NVIDIA Tesla C1060 by optimized parallel implementation. By our evaluation, it is shown that the optimal strategy of parallel implementation for an NVIDIA GPU is different from that of FPGA. We discuss about the reason and show the advantages of each device. To show the scalability and portability of GPU implementation, the same object code is executed on other NVIDA GPUs. The experimental result shows that GTX570 can perform the CoHOG-based pedestiran detection 21.3 fps in QVGA images.

  • Color Saturation Compensation in iCAM06 for High-Chroma HDR Imaging

    Hwi-Gang KIM  Sung-Hak LEE  Tae-Wuk BAE  Kyu-Ik SOHNG  

     
    LETTER-Image Processing

      Vol:
    E94-A No:11
      Page(s):
    2353-2357

    An image appearance model called iCAM06 was designed for high dynamic range (HDR) image rendering. The dynamic range of an HDR image needs to be mapped on output devices, which is called tone compression or tone mapping. The iCAM06, the representative HDR rendering algorithm, uses tone compression for image reproduction on the low dynamic range of output devices. However, color saturation reduction occurs during its tone compression process. We propose a saturation correction method using the inverse compensation in order to recover the saturation reduction in the iCAM06. Experimental results show that the proposed method has better performance than the iCAM06 from the viewpoint of saturation accuracy and rendering preference.

  • Design and Performance of Rate-Compatible Non-binary LDPC Convolutional Codes

    Hironori UCHIKAWA  Kenta KASAI  Kohichi SAKANIWA  

     
    PAPER-Coding Theory

      Vol:
    E94-A No:11
      Page(s):
    2135-2143

    In this paper, we present a construction method of non-binary low-density parity-check (LDPC) convolutional codes. Our construction method is an extension of Felstrom and Zigangirov construction [1] for non-binary LDPC convolutional codes. The rate-compatibility of the non-binary convolutional code is also discussed. The proposed rate-compatible code is designed from one single mother (2,4)-regular non-binary LDPC convolutional code of rate 1/2. Higher-rate codes are produced by puncturing the mother code and lower-rate codes are produced by multiplicatively repeating the mother code. Simulation results show that non-binary LDPC convolutional codes of rate 1/2 outperform state-of-the-art binary LDPC convolutional codes with comparable constraint bit length. Also the derived low-rate and high-rate non-binary LDPC convolutional codes exhibit good decoding performance without loss of large gap to the Shannon limits.

  • 2-Adic Complexity of Self-Shrinking Sequence

    Huijuan WANG  Qiaoyan WEN  Jie ZHANG  

     
    LETTER-Cryptography and Information Security

      Vol:
    E94-A No:11
      Page(s):
    2462-2465

    This paper studies the 2-adic complexity of the self-shrinking sequence under the relationship between 2-adic integers and binary sequences. Based on the linear complexity and the number of the sequences which have the same connection integer, we conclude that the 2-adic complexity of the self-shrinking sequence constructed by a binary m-sequence of order n has a lower bound 2n-2-1. Furthermore, it is shown that its 2-adic complexity has a bigger lower bound under some circumstances.

  • PCA-Based Detection Algorithm of Moving Target Buried in Clutter in Doppler Frequency Domain

    Muhammad WAQAS  Shouhei KIDERA  Tetsuo KIRIMOTO  

     
    LETTER-Sensing

      Vol:
    E94-B No:11
      Page(s):
    3190-3194

    This letter proposes a novel technique for detecting a target signal buried in clutter using principal component analysis (PCA) for pulse-Doppler radar systems. The conventional detection algorithm is based on the fast Fourier transform-constant false alarm rate (FFT-CFAR) approaches. However, the detection task becomes extremely difficult when the Doppler spectrum of the target is completely buried in the spectrum of clutter. To enhance the detection probability in the above situations, the proposed method employs the PCA algorithm, which decomposes the target and clutter signals into uncorrelated components. The performances of the proposed method and the conventional FFT-CFAR based detection method are evaluated in terms of the receiver operating characteristics (ROC) for various signal-to-clutter ratio (SCR) cases. The results of numerical simulations show that the proposed method significantly enhances the detection probability compared with that obtained using the conventional FFT-CFAR method, especially for lower SCR situations.

  • On the Autocorrelation and Linear Complexity of Some 2p Periodic Quaternary Cyclotomic Sequences over F4

    Pinhui KE  Zheng YANG  Jie ZHANG  

     
    LETTER-Information Theory

      Vol:
    E94-A No:11
      Page(s):
    2472-2477

    We determine the autocorrelations of the quaternary sequence over F4 and its modified version introduced by Du et al. [X.N. Du et al., Linear complexity of quaternary sequences generated using generalized cyclotomic classes modulo 2p, IEICE Trans. Fundamentals, vol.E94-A, no.5, pp.1214–1217, 2011]. Furthermore, we reveal a drawback in the paper aforementioned and remark that the proof in the paper by Kim et al. can be simplified.

  • A Ternary Zero-Correlation Zone Sequence Set Having Wide Inter-Subset Zero-Correlation Zone

    Takafumi HAYASHI  Takao MAEDA  Shinya MATSUFUJI  Satoshi OKAWA  

     
    LETTER-Sequence

      Vol:
    E94-A No:11
      Page(s):
    2230-2235

    The present paper introduces a novel construction of ternary sequences having a zero-correlation zone. The cross-correlation function and the side-lobe of the auto-correlation function of the proposed sequence set is zero for the phase shifts within the zero-correlation zone. The proposed sequence set consists of more than one subset having the same member size. The correlation function of the sequences of a pair of different subsets, referred to as the inter-subset correlation function, has a wider zero-correlation zone than that of the correlation function of sequences of the same subset (intra-subset correlation function). The wide inter-subset zero-correlation enables performance improvement during application of the proposed sequence set. The proposed sequence set has a zero-correlation zone for periodic, aperiodic, and odd correlation functions.

  • Low-Complexity Constant Multiplication Based on Trigonometric Identities with Applications to FFTs

    Fahad QURESHI  Oscar GUSTAFSSON  

     
    PAPER-Digital Signal Processing

      Vol:
    E94-A No:11
      Page(s):
    2361-2368

    In this work we consider optimized twiddle factor multipliers based on shift-and-add-multiplication. We propose a low-complexity structure for twiddle factors with a resolution of 32 points. Furthermore, we propose a slightly modified version of a previously reported multiplier for a resolution of 16 points with lower round-off noise. For completeness we also include results on optimal coefficients for eight-points resolution. We perform finite word length analysis for both coefficients and round-off errors and derive optimized coefficients with minimum complexity for varying requirements.

  • A User Scheduling with Minimum-Rate Requirement for Maximum Sum-Rate in MIMO-BC

    Seungkyu CHOI  Chungyong LEE  

     
    LETTER-Wireless Communication Technologies

      Vol:
    E94-B No:11
      Page(s):
    3179-3182

    This letter considers a sum-rate maximization problem with user scheduling wherein each user has a minimum-rate requirement in multiple-input-multiple-output broadcast channel. The multiuser strategy used in the user scheduling is a joint transceiver scheme with block diagonal geometric mean decomposition. Since optimum solution to the user scheduling problem generally requires exhaustive search, we propose a suboptimum user scheduling algorithm with each user's minimum-rate requirement as the main constraint. In order to satisfy maximum sum-rate and minimum-rate constraints simultaneously, we additionally consider power allocation for scheduled users. Simulation results show that the proposed user scheduling algorithm, together with the user power allocation, achieves sum-rate close to the exhaustive search, while also guarantees minimum-rate requirement of each user.

1441-1460hit(3945hit)