IEICE global.ieice.org Site

Keyword Search Result

[Keyword] OMP(3945hit)

1441-1460hit(3945hit)

Compact Architecture for ASIC and FPGA Implementation of the KASUMI Block Cipher
Dai YAMAMOTO Kouichi ITOH Jun YAJIMA

PAPER-High-Level Synthesis and System-Level Design

Vol:
E94-A No:12
Page(s):
2628-2638
Compact design is very important for embedded systems such as wireless sensor nodes, RFID tags and mobile devices because of their limited hardware (H/W) resources. This paper proposes a compact H/W implementation for the KASUMI block cipher, which is the 3GPP standard encryption algorithm. In [8] and [9], Yamamoto et al. proposed a method of reducing the register size for the MISTY1 FO function (YYI-08), and implemented very compact MISTY1 H/W. In this paper we aim to implement the smallest KASUMI H/W to date by applying a YYI-08 configuration to KASUMI, whose FO function has a similar structure to that of MISTY1. However, we discovered that straightforward application of YYI-08 raises problems. We therefore propose a new YYI-08 configuration improved for KASUMI and the compact H/W architecture. The new YYI-08 configuration consists of new FL function calculation schemes and a suitable calculation order. According to our logic synthesis on a 0.11-µm ASIC process, the gate size is 2.99 K gates, which, to our knowledge, is the smallest to date.
Radio Interface Technologies for Cooperative Transmission in 3GPP LTE-Advanced Open Access
Tetsushi ABE Yoshihisa KISHIYAMA Yoshikazu KAKURA Daichi IMAMURA

INVITED PAPER

Vol:
E94-B No:12
Page(s):
3202-3210
This paper presents an overview of radio interface technologies for cooperative transmission in 3GPP LTE-Advanced, i.e., coordinated multi-point (CoMP) transmission, enhanced inter-cell interference coordination (eICIC) for heterogeneous deployments, and relay transmission techniques. This paper covers not only the technical components in the 3GPP specifications that have already been released, but also those that were discussed in the Study Item phase of LTE-Advanced, and those that are currently being discussed in 3GPP for potential specification in future LTE releases.
Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster
Junichi OHMURA Takefumi MIYOSHI Hidetsugu IRIE Tsutomu YOSHINAGA

PAPER

Vol:
E94-D No:12
Page(s):
2319-2327
In this paper, we propose an approach to obtaining enhanced performance of the Linpack benchmark on a GPU-accelerated PC cluster connected via relatively slow inter-node connections. For one node with a quad-core Intel Xeon W3520 processor and a NVIDIA Tesla C1060 GPU card, we implement a CPU–GPU parallel double-precision general matrix–matrix multiplication (dgemm) operation, and achieve a performance improvement of 34% compared with the GPU-only case and 64% compared with the CPU-only case. For an entire 16-node cluster, each node of which is the same as the above and is connected with two gigabit Ethernet links, we use a computation-communication overlap scheme with GPU acceleration for the Linpack benchmark, and achieve a performance improvement of 28% compared with the GPU-accelerated high-performance Linpack benchmark (HPL) without overlapping. Our overlap GPU acceleration solution uses overlaps in which the main inter-node communication and data transfer to the GPU device memory are overlapped with the main computation task on the CPU cores. These overlaps use multi-core processors, which almost all of today's high-performance computers use. In particular, as well as using a CPU core for communication tasks, we also simultaneously use other CPU cores and the GPU for computation tasks. In order to enable overlap between inter-node communication and computation tasks, we eliminate their close dependence by breaking the main computation task into smaller tasks and rescheduling. Based on a scheme in which part of the CPU computation power is simultaneously used for tasks other than computation tasks, we experimentally find the optimal computation ratio for CPUs; this ratio differs from the case of parallel dgemm operation of one node.
Analyzing Emergence in Complex Adaptive System: A Sign-Based Model of Stigmergy
Chuanjun REN Xiaomin JIA Hongbing HUANG Shiyao JIN

PAPER-Artificial Intelligence, Data Mining

Vol:
E94-D No:11
Page(s):
2212-2218
The description and analysis of emergence in complex adaptive system has recently become a topic of great interest in the field of systems, and lots of ideas and methods have been proposed. A Sign-based model of Stigmergy is proposed in this paper. Stigmergy is widely used in complex systems. We pick up “Sign” as a key notion to understand it. A definition of “Sign” is given, which reveals the Sign's nature and exploit the significations and relationships carried by the “Sign”. Then, a Sign-based model of Stigmergy is consequently developed, which captures the essential characteristics of Stigmergy. The basic architecture of Stigmergy as well as its constituents are presented and then discussed. The syntax and operational semantics of Stigmergy configurations are given. We illustrate the methodology of analyzing emergence in CAS by using our model.
Strength-Strength and Strength-Degree Correlation Measures for Directed Weighted Complex Network Analysis
Shi-Ze GUO Zhe-Ming LU Zhe CHEN Hao LUO

LETTER-Artificial Intelligence, Data Mining

Vol:
E94-D No:11
Page(s):
2284-2287
This Letter defines thirteen useful correlation measures for directed weighted complex network analysis. First, in-strength and out-strength are defined for each node in the directed weighted network. Then, one node-based strength-strength correlation measure and four arc-based strength-strength correlation measures are defined. In addition, considering that each node is associated with in-degree, out-degree, in-strength and out-strength, four node-based strength-degree correlation measures and four arc-based strength-degree correlation measures are defined. Finally, we use these measures to analyze the world trade network and the food web. The results demonstrate the effectiveness of the proposed measures for directed weighted networks.
Rethinking Business Model in Cloud Computing: Concept and Example
Ping DU Akihiro NAKAO

PAPER

Vol:
E94-D No:11
Page(s):
2119-2128
In cloud computing, a cloud user pays proportionally to the amount of the consumed resources (bandwidth, memory, and CPU cycles etc.). We posit that such a cloud computing system is vulnerable to DDoS (Distributed Denial-of-Service) attacks against quota. Attackers can force a cloud user to pay more and more money by exhausting its quota without crippling its execution system or congesting links. In this paper, we address this issue and claim that cloud should enable users to pay only for their admitted traffic. We design and prototype such a charging model in a CoreLab testbed infrastructure and show an example application.
Compression of Dynamic 3D Meshes and Progressive Displaying
Bin-Shyan JONG Chi-Kang KAO Juin-Ling TSENG Tsong-Wuu LIN

PAPER-Computer Graphics

Vol:
E94-D No:11
Page(s):
2271-2279
This paper introduces a new dynamic 3D mesh representation that provides 3D animation support of progressive display and drastically reduces the amount of storage space required for 3D animation. The primary purpose of progressive display is to allow viewers to get animation as quickly as possible, rather than having to wait until all data has been downloaded. In other words, this method allows for the simultaneous transmission and playing of 3D animation. Experiments show that coarser 3D animation could be reconstructed with as little as 150 KB of data transferred. Using the sustained transmission of refined operators, viewers feel that resolution approaches that of the original animation. The methods used in this study are based on a compression technique commonly used in 3D animation - clustered principle component analysis, using the linearly independent rules of principle components, so that animation can be stored using smaller amounts of data. This method can be coupled with streaming technology to reconstruct animation through iterative updating. Each principle component is a portion of the streaming data to be stored and transmitted after compression, as well as a refined operator during the animation update process. This paper considers errors and rate-distortion optimization, and introduces weighted progressive transmitting (WPT), using refined sequences from optimized principle components, so that each refinement yields an increase in quality. In other words, with identical data size, this method allows each principle component to reduce allowable error and provide the highest quality 3D animation.
Decision Tree-Based Acoustic Models for Speech Recognition with Improved Smoothness
Masami AKAMINE Jitendra AJMERA

PAPER-Speech and Hearing

Vol:
E94-D No:11
Page(s):
2250-2258
This paper proposes likelihood smoothing techniques to improve decision tree-based acoustic models, where decision trees are used as replacements for Gaussian mixture models to compute the observation likelihoods for a given HMM state in a speech recognition system. Decision trees have a number of advantageous properties, such as not imposing restrictions on the number or types of features, and automatically performing feature selection. This paper describes basic configurations of decision tree-based acoustic models and proposes two methods to improve the robustness of the basic model: DT mixture models and soft decisions for continuous features. Experimental results for the Aurora 2 speech database show that a system using decision trees offers state-of-the-art performance, even without taking advantage of its full potential and soft decisions improve the performance of DT-based acoustic models with 16.8% relative error rate reduction over hard decisions.
Complexity Reduced Transmit Diversity Scheme for Time Domain Synchronous OFDM Systems
Zhaocheng WANG Jintao WANG Linglong DAI

PAPER-Terrestrial Wireless Communication/Broadcasting Technologies

Vol:
E94-B No:11
Page(s):
3116-3124
This paper proposes a novel scheme to reduce the complexity of existing transmit diversity solutions to time domain synchronous OFDM (TDS-OFDM). The space shifted constant amplitude zero autocorrelation (CAZAC) sequence based preamble is proposed for channel estimation. Two flexible frame structures are proposed for adaptive system design as well as cyclicity reconstruction of the received inverse discrete Fourier transform (IDFT) block. With regard to channel estimation and cyclicity reconstruction, the complexity of the proposed scheme is only around 7.20% of that of the conventional solutions. Simulation results demonstrate that better bit error rate (BER) performance can be achieved over doubly selective channels.
A Fast Systematic Optimized Comparison Algorithm for CNU Design of LDPC Decoders
Jui-Hui HUNG Sau-Gee CHEN

PAPER-Communication Theory and Signals

Vol:
E94-A No:11
Page(s):
2246-2253
This work first investigates two existing check node unit (CNU) architectures for LDPC decoding: self-message-excluded CNU (SME-CNU) and two-minimum CNU (TM-CNU) architectures, and analyzes their area and timing complexities based on various realization approaches. Compared to TM-CNU architecture, SME-CNU architecture is faster in speed but with much higher complexity for comparison operations. To overcome this problem, this work proposes a novel systematic optimization algorithm for comparison operations required by SME-CNU architectures. The algorithm can automatically synthesize an optimized fast comparison operation that guarantees a shortest comparison delay time and a minimized total number of 2-input comparators. High speed is achieved by adopting parallel divide-and-conquer comparison operations, while the required comparators are minimized by developing a novel set construction algorithm that maximizes shareable comparison operations. As a result, the proposed design significantly reduces the required number of comparison operations, compared to conventional SME-CNU architectures, under the condition that both designs have the same speed performance. Besides, our preliminary hardware simulations show that the proposed design has comparable hardware complexity to low-complexity TM-CNU architectures.
A Low Complexity 1D-Based Successive GSC Structure for 2D Adaptive Beamformer Implementation
Yung-Yi WANG

LETTER-Digital Signal Processing

Vol:
E94-A No:11
Page(s):
2448-2452
In this study, we propose a one dimensional (1D) based successive generalized sidelobe canceller (GSC) structure for the implementation of 2D adaptive beamformers using a uniform rectangular antenna array (URA). The proposed approach takes advantage of the URA feature that the 2D spatial signature of the receive signal can be decomposed into an outer product of two 1D spatial signatures. The 1D spatial signatures lie in the column and the row spaces of the receive signal matrix, respectively. It follows that the interferers can be successively eliminated by two rounds of 1D-based GSC structure. As compared to the conventional 2D-GSC structure, computer simulations show that in addition to having significantly low computational complexity, the proposed adaptive approach possesses higher convergence rate.
Parallel Implementation Strategy for CoHOG-Based Pedestrian Detection Using a Multi-Core Processor
Ryusuke MIYAMOTO Hiroki SUGANO

PAPER-Image Processing

Vol:
E94-A No:11
Page(s):
2315-2322
Pedestrian detection from visual images, which is used for driver assistance or video surveillance, is a recent challenging problem. Co-occurrence histograms of oriented gradients (CoHOG) is a powerful feature descriptor for pedestrian detection and achieves the highest detection accuracy. However, its calculation cost is too large to calculate it in real-time on state-of-the-art processors. In this paper, to obtain optimal parallel implementation for an NVIDIA GPU, several kinds of parallelism of CoHOG-based detection are shown and evaluated suitability for implementation. The experimental result shows that the detection process can be performed at 16.5 fps in QVGA images on NVIDIA Tesla C1060 by optimized parallel implementation. By our evaluation, it is shown that the optimal strategy of parallel implementation for an NVIDIA GPU is different from that of FPGA. We discuss about the reason and show the advantages of each device. To show the scalability and portability of GPU implementation, the same object code is executed on other NVIDA GPUs. The experimental result shows that GTX570 can perform the CoHOG-based pedestiran detection 21.3 fps in QVGA images.
Color Saturation Compensation in iCAM06 for High-Chroma HDR Imaging
Hwi-Gang KIM Sung-Hak LEE Tae-Wuk BAE Kyu-Ik SOHNG

LETTER-Image Processing

Vol:
E94-A No:11
Page(s):
2353-2357
An image appearance model called iCAM06 was designed for high dynamic range (HDR) image rendering. The dynamic range of an HDR image needs to be mapped on output devices, which is called tone compression or tone mapping. The iCAM06, the representative HDR rendering algorithm, uses tone compression for image reproduction on the low dynamic range of output devices. However, color saturation reduction occurs during its tone compression process. We propose a saturation correction method using the inverse compensation in order to recover the saturation reduction in the iCAM06. Experimental results show that the proposed method has better performance than the iCAM06 from the viewpoint of saturation accuracy and rendering preference.
Design and Performance of Rate-Compatible Non-binary LDPC Convolutional Codes
Hironori UCHIKAWA Kenta KASAI Kohichi SAKANIWA

PAPER-Coding Theory

Vol:
E94-A No:11
Page(s):
2135-2143
In this paper, we present a construction method of non-binary low-density parity-check (LDPC) convolutional codes. Our construction method is an extension of Felstrom and Zigangirov construction [1] for non-binary LDPC convolutional codes. The rate-compatibility of the non-binary convolutional code is also discussed. The proposed rate-compatible code is designed from one single mother (2,4)-regular non-binary LDPC convolutional code of rate 1/2. Higher-rate codes are produced by puncturing the mother code and lower-rate codes are produced by multiplicatively repeating the mother code. Simulation results show that non-binary LDPC convolutional codes of rate 1/2 outperform state-of-the-art binary LDPC convolutional codes with comparable constraint bit length. Also the derived low-rate and high-rate non-binary LDPC convolutional codes exhibit good decoding performance without loss of large gap to the Shannon limits.
2-Adic Complexity of Self-Shrinking Sequence
Huijuan WANG Qiaoyan WEN Jie ZHANG

LETTER-Cryptography and Information Security

Vol:
E94-A No:11
Page(s):
2462-2465
This paper studies the 2-adic complexity of the self-shrinking sequence under the relationship between 2-adic integers and binary sequences. Based on the linear complexity and the number of the sequences which have the same connection integer, we conclude that the 2-adic complexity of the self-shrinking sequence constructed by a binary m-sequence of order n has a lower bound 2n-2-1. Furthermore, it is shown that its 2-adic complexity has a bigger lower bound under some circumstances.
PCA-Based Detection Algorithm of Moving Target Buried in Clutter in Doppler Frequency Domain
Muhammad WAQAS Shouhei KIDERA Tetsuo KIRIMOTO

LETTER-Sensing

Vol:
E94-B No:11
Page(s):
3190-3194
This letter proposes a novel technique for detecting a target signal buried in clutter using principal component analysis (PCA) for pulse-Doppler radar systems. The conventional detection algorithm is based on the fast Fourier transform-constant false alarm rate (FFT-CFAR) approaches. However, the detection task becomes extremely difficult when the Doppler spectrum of the target is completely buried in the spectrum of clutter. To enhance the detection probability in the above situations, the proposed method employs the PCA algorithm, which decomposes the target and clutter signals into uncorrelated components. The performances of the proposed method and the conventional FFT-CFAR based detection method are evaluated in terms of the receiver operating characteristics (ROC) for various signal-to-clutter ratio (SCR) cases. The results of numerical simulations show that the proposed method significantly enhances the detection probability compared with that obtained using the conventional FFT-CFAR method, especially for lower SCR situations.
On the Autocorrelation and Linear Complexity of Some 2p Periodic Quaternary Cyclotomic Sequences over F₄
Pinhui KE Zheng YANG Jie ZHANG

LETTER-Information Theory

Vol:
E94-A No:11
Page(s):
2472-2477
We determine the autocorrelations of the quaternary sequence over F4 and its modified version introduced by Du et al. [X.N. Du et al., Linear complexity of quaternary sequences generated using generalized cyclotomic classes modulo 2p, IEICE Trans. Fundamentals, vol.E94-A, no.5, pp.1214–1217, 2011]. Furthermore, we reveal a drawback in the paper aforementioned and remark that the proof in the paper by Kim et al. can be simplified.
A Ternary Zero-Correlation Zone Sequence Set Having Wide Inter-Subset Zero-Correlation Zone
Takafumi HAYASHI Takao MAEDA Shinya MATSUFUJI Satoshi OKAWA

LETTER-Sequence

Vol:
E94-A No:11
Page(s):
2230-2235
The present paper introduces a novel construction of ternary sequences having a zero-correlation zone. The cross-correlation function and the side-lobe of the auto-correlation function of the proposed sequence set is zero for the phase shifts within the zero-correlation zone. The proposed sequence set consists of more than one subset having the same member size. The correlation function of the sequences of a pair of different subsets, referred to as the inter-subset correlation function, has a wider zero-correlation zone than that of the correlation function of sequences of the same subset (intra-subset correlation function). The wide inter-subset zero-correlation enables performance improvement during application of the proposed sequence set. The proposed sequence set has a zero-correlation zone for periodic, aperiodic, and odd correlation functions.
Low-Complexity Constant Multiplication Based on Trigonometric Identities with Applications to FFTs
Fahad QURESHI Oscar GUSTAFSSON

PAPER-Digital Signal Processing

Vol:
E94-A No:11
Page(s):
2361-2368
In this work we consider optimized twiddle factor multipliers based on shift-and-add-multiplication. We propose a low-complexity structure for twiddle factors with a resolution of 32 points. Furthermore, we propose a slightly modified version of a previously reported multiplier for a resolution of 16 points with lower round-off noise. For completeness we also include results on optimal coefficients for eight-points resolution. We perform finite word length analysis for both coefficients and round-off errors and derive optimized coefficients with minimum complexity for varying requirements.
A User Scheduling with Minimum-Rate Requirement for Maximum Sum-Rate in MIMO-BC
Seungkyu CHOI Chungyong LEE

LETTER-Wireless Communication Technologies

Vol:
E94-B No:11
Page(s):
3179-3182
This letter considers a sum-rate maximization problem with user scheduling wherein each user has a minimum-rate requirement in multiple-input-multiple-output broadcast channel. The multiuser strategy used in the user scheduling is a joint transceiver scheme with block diagonal geometric mean decomposition. Since optimum solution to the user scheduling problem generally requires exhaustive search, we propose a suboptimum user scheduling algorithm with each user's minimum-rate requirement as the main constraint. In order to satisfy maximum sum-rate and minimum-rate constraints simultaneously, we additionally consider power allocation for scheduled users. Simulation results show that the proposed user scheduling algorithm, together with the user power allocation, achieves sum-rate close to the exhaustive search, while also guarantees minimum-rate requirement of each user.

1441-1460hit(3945hit)

Keyword Search Result

[Keyword] OMP(3945hit)

Compact Architecture for ASIC and FPGA Implementation of the KASUMI Block Cipher

Radio Interface Technologies for Cooperative Transmission in 3GPP LTE-Advanced Open Access

Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster

Analyzing Emergence in Complex Adaptive System: A Sign-Based Model of Stigmergy

Strength-Strength and Strength-Degree Correlation Measures for Directed Weighted Complex Network Analysis

Rethinking Business Model in Cloud Computing: Concept and Example

Compression of Dynamic 3D Meshes and Progressive Displaying

Decision Tree-Based Acoustic Models for Speech Recognition with Improved Smoothness

Complexity Reduced Transmit Diversity Scheme for Time Domain Synchronous OFDM Systems

A Fast Systematic Optimized Comparison Algorithm for CNU Design of LDPC Decoders

A Low Complexity 1D-Based Successive GSC Structure for 2D Adaptive Beamformer Implementation

Parallel Implementation Strategy for CoHOG-Based Pedestrian Detection Using a Multi-Core Processor

Color Saturation Compensation in iCAM06 for High-Chroma HDR Imaging

Design and Performance of Rate-Compatible Non-binary LDPC Convolutional Codes

2-Adic Complexity of Self-Shrinking Sequence

PCA-Based Detection Algorithm of Moving Target Buried in Clutter in Doppler Frequency Domain

On the Autocorrelation and Linear Complexity of Some 2p Periodic Quaternary Cyclotomic Sequences over F₄

A Ternary Zero-Correlation Zone Sequence Set Having Wide Inter-Subset Zero-Correlation Zone

Low-Complexity Constant Multiplication Based on Trigonometric Identities with Applications to FFTs

A User Scheduling with Minimum-Rate Requirement for Maximum Sum-Rate in MIMO-BC

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles