IEICE global.ieice.org Site

Keyword Search Result

[Keyword] SI(16314hit)

2841-2860hit(16314hit)

Insufficient Vectorization: A New Method to Exploit Superword Level Parallelism
Wei GAO Lin HAN Rongcai ZHAO Yingying LI Jian LIU

PAPER-Software System

Pubricized:
2016/09/29
Vol:
E100-D No:1
Page(s):
91-106
Single-instruction multiple-data (SIMD) extension provides an energy-efficient platform to scale the performance of media and scientific applications while still retaining post-programmability. However, the major challenge is to translate the parallel resources of the SIMD hardware into real application performance. Currently, all the slots in the vector register are used when compilers exploit SIMD parallelism of programs, which can be called sufficient vectorization. Sufficient vectorization means all the data in the vector register is valid. Because all the slots which vector register provides must be used, the chances of vectorizing programs with low SIMD parallelism are abandoned by sufficient vectorization method. In addition, the speedup obtained by full use of vector register sometimes is not as great as that obtained by partial use. Specifically, the length of vector register provided by SIMD extension becomes longer, sufficient vectorization method cannot exploit the SIMD parallelism of programs completely. Therefore, insufficient vectorization method is proposed, which refer to partial use of vector register. First, the adaptation scene of insufficient vectorization is analyzed. Second, the methods of computing inter-iteration and intra-iteration SIMD parallelism for loops are put forward. Furthermore, according to the relationship between the parallelism and vector factor a method is established to make the choice of vectorization method, in order to vectorize programs as well as possible. Finally, code generation strategy for insufficient vectorization is presented. Benchmark test results show that insufficient vectorization method vectorized more programs than sufficient vectorization method by 107.5% and the performance achieved by insufficient vectorization method is 12.1% higher than that achieved by sufficient vectorization method.
Related-Key Attacks on Reduced-Round Hierocrypt-L1
Bungo TAGA Shiho MORIAI Kazumaro AOKI

PAPER

Vol:
E100-A No:1
Page(s):
126-137
In this paper, we present several cryptanalyses of Hierocrypt-L1 block cipher, which was selected as one of the CRYPTREC recommended ciphers in Japan in 2003. We present a differential attack and an impossible differential attack on 8 S-function layers in a related-key setting. We first show that there exist the key scheduling differential characteristics which always hold, then we search for differential paths for the data randomizing part with the minimum active S-boxes using the above key differentials. We also show that our impossible differential attack is a new type.
Power Analysis on Unrolled Architecture with Points-of-Interest Search and Its Application to PRINCE Block Cipher
Ville YLI-MÄYRY Naofumi HOMMA Takafumi AOKI

PAPER

Vol:
E100-A No:1
Page(s):
149-157
This paper explores the feasibility of power analysis attacks against low-latency block ciphers implemented with unrolled architectures capable of encryption/decryption in a single clock cycle. Unrolled architectures have been expected to be somewhat resistant against side-channel attacks compared to typical loop architectures because of no memory (i.e. register) element storing intermediate results in a synchronous manner. In this paper, we present a systematic method for selecting Points-of-Interest for power analysis on unrolled architectures as well as calculating dynamic power consumption at a target function. Then, we apply the proposed method to PRINCE, which is known as one of the most efficient low-latency ciphers, and evaluate its validity with an experiment using a set of unrolled PRINCE processors implemented on an FPGA. Finally, a countermeasure against such analysis is discussed.
Perfect Gaussian Integer Sequences of Degree-4 Using Difference Sets
Xiuping PENG Jiadong REN Chengqian XU Kai LIU

LETTER-Spread Spectrum Technologies and Applications

Vol:
E99-A No:12
Page(s):
2604-2608
In this letter, based on cyclic difference sets with parameters $(N,rac{N-1}{2},rac{N-3}{4})$ and complex transformations, a method for constructing degree-4 perfect Gaussian integer sequences (PGISs) with good balance property of length $N'equiv2( ext{mod}4)$ are presented. Furthermore, the elements distribution of the proposed Gaussian integer sequences (GISs) is derived.
Multi-Track Joint Decoding Schemes Using Two-Dimensional Run-Length Limited Codes for Bit-Patterned Media Magnetic Recording
Hidetoshi SAITO

PAPER-Signal Processing for Storage

Vol:
E99-A No:12
Page(s):
2248-2255
This paper proposes an effective signal processing scheme using a modulation code with two-dimensional (2D) run-length limited (RLL) constraints for bit-patterned media magnetic recording (BPMR). This 2D signal processing scheme is applied to be one of two-dimensional magnetic recording (TDMR) schemes for shingled magnetic recording on bit patterned media (BPM). A TDMR scheme has been pointed out an important key technology for increasing areal density toward 10Tb/in2. From the viewpoint of 2D signal processing for TDMR, multi-track joint decoding scheme is desirable to increase an effective transfer rate because this scheme gets readback signals from several adjacent parallel tracks and detect recorded data written in these tracks simultaneously. Actually, the proposed signal processing scheme for BPMR gets mixed readback signal sequences from the parallel tracks using a single reading head and these readback signal sequences are equalized to a frequency response given by a desired 2D generalized partial response system. In the decoding process, it leads to an increase in the effective transfer rate by using a single maximum likelihood (ML) sequence detector because the recorded data on the parallel tracks are decoded for each time slot. Furthermore, a new joint pattern-dependent noise-predictive (PDNP) sequence detection scheme is investigated for multi-track recording with media noise. This joint PDNP detection is embed in a ML detector and can be useful to eliminate media noise. Using computer simulation, it is shown that the joint PDNP detection scheme is able to compensate media noise in the equalizer output which is correlated and data-dependent.
Analytical Stability Modeling for CMOS Latches in Low Voltage Operation
Tatsuya KAMAKARI Jun SHIOMI Tohru ISHIHARA Hidetoshi ONODERA

PAPER

Vol:
E99-A No:12
Page(s):
2463-2472
In synchronous LSI circuits, memory subsystems such as Flip-Flops and SRAMs are essential components and latches are the base elements of the common memory logics. In this paper, a stability analysis method for latches operating in a low voltage region is proposed. The butterfly curve of latches is a key for analyzing a retention failure of latches. This paper discusses a modeling method for retention stability and derives an analytical stability model for latches. The minimum supply voltage where the latches can operate with a certain yield can be accurately derived by a simple calculation using the proposed model. Monte-Carlo simulation targeting 65nm and 28nm process technology models demonstrates the accuracy and the validity of the proposed method. Measurement results obtained by a test chip fabricated in a 65nm process technology also demonstrate the validity. Based on the model, this paper shows some strategies for variation tolerant design of latches.
Reliability-Security Tradeoff for Secure Transmission with Untrusted Relays
Dechuan CHEN Weiwei YANG Jianwei HU Yueming CAI Xin LIU

LETTER-Communication Theory and Signals

Vol:
E99-A No:12
Page(s):
2597-2599
In this paper, we identify the tradeoff between security and reliability in the amplify-and-forward (AF) distributed beamforming (DBF) cooperative network with K untrusted relays. In particular, we derive the closed-form expressions for the connection outage probability (COP), the secrecy outage probability (SOP), the tradeoff relationship, and the secrecy throughput. Analytical and simulation results demonstrate that increasing K leads to the enhancement of the reliability performance, but the degradation of the security performance. This tradeoff also means that there exists an optimal K maximizing the secrecy throughput.
Synthesis and Automatic Layout of Resistive Digital-to-Analog Converter Based on Mixed-Signal Slice Cell
Mitsutoshi SUGAWARA Kenji MORI Zule XU Masaya MIYAHARA Kenichi OKADA Akira MATSUZAWA

PAPER

Vol:
E99-A No:12
Page(s):
2435-2443
We propose a synthesis and automatic layout method for mixed-signal circuits with high regularity. As the first step of this research, a resistive digital-to-analog converter (RDAC) is presented. With a size calculation routine, the area of this RDAC is minimized while satisfying the required matching precision without any optimization loops. We propose to partition the design into slices comprising of both analog and digital cells. These cells are programmed to be synthesized as similar as custom P-Cells based on the calculation above, and automatically laid out to form one slice cell. To synthesize digital circuits, without using digital standard cell library, we propose a versatile unit digital block consisting of 8 transistors. With one or several blocks, the transistors' interconnections are programmed in the units to realize various logic gates. By using this block, the slice shapes are aligned so that the layout space in between the slices are minimized. The proposed mixed-signal slice-based partition facilitates the place-and-route of the whole RDAC. The post-layout simulation shows that the generated 9-bit RDAC achieves 1GHz sampling frequency, -0.11/0.09 and -0.30/0.75 DNL and INL, respectively, 3.57mW power consumption, and 0.0038mm2 active area.
Up-Stream Dispatching of Power by Density of Power Packet
Shinya NAWATA Ryo TAKAHASHI Takashi HIKIHARA

LETTER-Systems and Control

Vol:
E99-A No:12
Page(s):
2581-2584
Power packet is a unit of electric power transferred by a pulse with an information tag. This letter discusses up-stream dispatching of required power at loads to sources through density modulation of power packet. Here, power is adjusted at a proposed router which dispatches power packets according to the tags. It is analyzed by averaging method and numerically verified.
A New Algorithm for Reducing Components of a Gaussian Mixture Model
Naoya YOKOYAMA Daiki AZUMA Shuji TSUKIYAMA Masahiro FUKUI

PAPER

Vol:
E99-A No:12
Page(s):
2425-2434
In statistical methods, such as statistical static timing analysis, Gaussian mixture model (GMM) is a useful tool for representing a non-Gaussian distribution and handling correlation easily. In order to repeat various statistical operations such as summation and maximum for GMMs efficiently, the number of components should be restricted around two. In this paper, we propose a method for reducing the number of components of a given GMM to two (2-GMM). Moreover, since the distribution of each component is represented often by a linear combination of some explanatory variables, we propose a method to compute the covariance between each explanatory variable and the obtained 2-GMM, that is, the sensitivity of 2-GMM to each explanatory variable. In order to evaluate the performance of the proposed methods, we show some experimental results. The proposed methods minimize the normalized integral square error of probability density function of 2-GMM by the sacrifice of the accuracy of sensitivities of 2-GMM.
Asymptotic Behavior of Error Probability in Continuous-Time Gaussian Channels with Feedback
Shunsuke IHARA

PAPER-Shannon Theory

Vol:
E99-A No:12
Page(s):
2107-2115
We investigate the coding scheme and error probability in information transmission over continuous-time additive Gaussian noise channels with feedback. As is known, the error probability can be substantially reduced by using feedback, namely, under the average power constraint, the error probability may decrease more rapidly than the exponential of any order. Recently Gallager and Nakibolu proposed, for discrete-time additive white Gaussian noise channels, a feedback coding scheme such that the resulting error probability Pe(N) at time N decreases with an exponential order αN which is linearly increasing with N. The multiple-exponential decay of the error probability has been studied mostly for white Gaussian channels, so far. In this paper, we treat continuous-time Gaussian channels, where the Gaussian noise processes are not necessarily white nor stationary. The aim is to prove a stronger result on the multiple-exponential decay of the error probability. More precisely, for any positive constant α, there exists a feedback coding scheme such that the resulting error probability Pe(T) at time T decreases more rapidly than the exponential of order αT as T→∞.
Performance Improvement of Error-Resilient 3D DWT Video Transmission Using Invertible Codes
Kotoku OMURA Shoichiro YAMASAKI Tomoko K. MATSUSHIMA Hirokazu TANAKA Miki HASEYAMA

PAPER-Video Coding

Vol:
E99-A No:12
Page(s):
2256-2265
Many studies have applied the three-dimensional discrete wavelet transform (3D DWT) to video coding. It is known that corruptions of the lowest frequency sub-band (LL) coefficients of 3D DWT severely affect the visual quality of video. Recently, we proposed an error resilient 3D DWT video coding method (the conventional method) that employs dispersive grouping and an error concealment (EC). The EC scheme of our conventional method adopts a replacement technique of the lost LL coefficients. In this paper, we propose a new 3D DWT video transmission method in order to enhance error resilience. The proposed method adopts an error correction scheme using invertible codes to protect LL coefficients. We use half-rate Reed-Solomon (RS) codes as invertible codes. Additionally, to improve performance by using the effect of interleave, we adopt a new configuration scheme at the RS encoding stage. The evaluation by computer simulation compares the performance of the proposed method with that of other EC methods, and indicates the advantage of the proposed method.
Average Coding Rate of a Multi-Shot Tunstall Code with an Arbitrary Parsing Tree Sequence
Mitsuharu ARIMURA

LETTER-Source Coding and Data Compression

Vol:
E99-A No:12
Page(s):
2281-2285
Average coding rate of a multi-shot Tunstall code, which is a variation of variable-to-fixed length (VF) lossless source codes, for stationary memoryless sources is investigated. A multi-shot VF code parses a given source sequence to variable-length blocks and encodes them to fixed-length codewords. If we consider the situation that the parsing count is fixed, overall multi-shot VF code can be treated as a one-shot VF code. For this setting of Tunstall code, the compression performance is evaluated using two criterions. The first one is the average coding rate which is defined as the codeword length divided by the average block length. The second one is the expectation of the pointwise coding rate. It is proved that both of the above average coding rate converge to the entropy of a stationary memoryless source under the assumption that the geometric mean of the leaf counts of the multi-shot Tunstall parsing trees goes to infinity.
Range Limiter Using Connection Bounding Box for SA-Based Placement of Mixed-Grained Reconfigurable Architecture
Takashi KISHIMOTO Wataru TAKAHASHI Kazutoshi WAKABAYASHI Hiroyuki OCHI

PAPER

Vol:
E99-A No:12
Page(s):
2328-2334
In this paper, we propose a novel placement algorithm for mixed-grained reconfigurable architectures (MGRAs). MGRA consists of coarse-grained and fine-grained clusters, in order to implement a combined digital systems of high-speed data paths with multi-bit operands and random logic circuits for state machines and bit-wise operations. For accelerating simulated annealing based FPGA placement algorithm, range limiter has been proposed to control the distance of two blocks to be interchanged. However, it is not applicable to MGRAs due to the heterogeneous structure of MGRAs. Proposed range limiter using connection bounding box effectively keeps the size of range limiter to encourage moves across fine-grain blocks in non-adjacent clusters. From experimental results, the proposed method achieved 47.8% reduction of cost in the best case compared with conventional methods.
A Highly-Adaptable and Small-Sized In-Field Power Analyzer for Low-Power IoT Devices
Ryosuke KITAYAMA Takashi TAKENAKA Masao YANAGISAWA Nozomu TOGAWA

PAPER

Vol:
E99-A No:12
Page(s):
2348-2362
Power analysis for IoT devices is strongly required to protect attacks from malicious attackers. It is also very important to reduce power consumption itself of IoT devices. In this paper, we propose a highly-adaptable and small-sized in-field power analyzer for low-power IoT devices. The proposed power analyzer has the following advantages: (A) The proposed power analyzer realizes signal-averaging noise reduction with synchronization signal lines and thus it can reduce wide frequency range of noises; (B) The proposed power analyzer partitions a long-term power analysis process into several analysis segments and measures voltages and currents of each analysis segment by using small amount of data memories. By combining these analysis segments, we can obtain long-term analysis results; (C) The proposed power analyzer has two amplifiers that amplify current signals adaptively depending on their magnitude. Hence maximum readable current can be increased with keeping minimum readable current small enough. Since all of (A), (B) and (C) do not require complicated mechanisms nor circuits, the proposed power analyzer is implemented on just a 2.5cm×3.3cm board, which is the smallest size among the other existing power analyzers for IoT devices. We have measured power and energy consumption of the AES encryption process on the IoT device and demonstrated that the proposed power analyzer has only up to 1.17% measurement errors compared to a high-precision oscilloscope.
Low Complexity Reed-Solomon Decoder Design with Pipelined Recursive Euclidean Algorithm
Kazuhito ITO

PAPER

Vol:
E99-A No:12
Page(s):
2453-2462
A Reed-Solomon (RS) decoder is designed based on the pipelined recursive Euclidean algorithm in the key equation solution. While the Euclidean algorithm uses less Galois multipliers than the modified Euclidean (ME) and reformulated inversionless Berlekamp-Massey (RiBM) algorithms, division between two elements in Galois field is required. By implementing the division with a multi-cycle Galois inverter and a serial Galois multiplier, the proposed key equation solver architecture achieves lower complexity than the conventional ME and RiBM based architectures. The proposed RS (255,239) decoder reduces the hardware complexity by 25.9% with 6.5% increase in decoding latency.
Hardware-Efficient Local Extrema Detection for Scale-Space Extrema Detection in SIFT Algorithm
Kazuhito ITO Hiroki HAYASHI

LETTER

Vol:
E99-A No:12
Page(s):
2507-2510
In this paper a hardware-efficient local extrema detection (LED) method used for scale-space extrema detection in the SIFT algorithm is proposed. By reformulating the reuse of the intermediate results in taking the local maximum and minimum, the necessary operations in LED are reduced without degrading the detection accuracy. The proposed method requires 25% to 35% less logic resources than the conventional method when implemented in an FPGA with a slight increase in latency.
A Deep Neural Network Based Quasi-Linear Kernel for Support Vector Machines
Weite LI Bo ZHOU Benhui CHEN Jinglu HU

PAPER-Neural Networks and Bioengineering

Vol:
E99-A No:12
Page(s):
2558-2565
This paper proposes a deep quasi-linear kernel for support vector machines (SVMs). The deep quasi-linear kernel can be constructed by using a pre-trained deep neural network. To realize this goal, a multilayer gated bilinear classifier is first designed to mimic the functionality of the pre-trained deep neural network, by generating the gate control signals using the deep neural network. Then, a deep quasi-linear kernel is derived by applying an SVM formulation to the multilayer gated bilinear classifier. In this way, we are able to further implicitly optimize the parameters of the multilayer gated bilinear classifier, which are a set of duplicate but independent parameters of the pre-trained deep neural network, by using an SVM optimization. Experimental results on different data sets show that SVMs with the proposed deep quasi-linear kernel have an ability to take advantage of the pre-trained deep neural networks and outperform SVMs with RBF kernels.
Signal Power Estimation Based on Orthogonal Projection and Oblique Projection
Norisato SUGA Toshihiro FURUKAWA

LETTER-Digital Signal Processing

Vol:
E99-A No:12
Page(s):
2571-2575
In this letter, we show the new signal power estimation method base on the subspace projection. This work mainly contributes to the SINR estimation problem because, in this research, the signal power estimation is implicitly or explicitly performed. The difference between our method and the conventional method related to this topic is the exploitation of the subspace character of the signals constructing the observed signal. As tools to perform subspace operation, we apply orthogonal projection and oblique projection which can extracts desired parameters. In the proposed scheme, the statistics of the projected observed signal by these projection are used to estimate the parameters.
GPU-Accelerated Bulk Execution of Multiple-Length Multiplication with Warp-Synchronous Programming Technique
Takumi HONDA Yasuaki ITO Koji NAKANO

PAPER-GPU computing

Pubricized:
2016/08/24
Vol:
E99-D No:12
Page(s):
3004-3012
In this paper, we present a GPU implementation of bulk multiple-length multiplications. The idea of our GPU implementation is to adopt a warp-synchronous programming technique. We assign each multiple-length multiplication to one warp that consists of 32 threads. In parallel processing using multiple threads, usually, it is costly to synchronize execution of threads and communicate within threads. In warp-synchronous programming technique, however, execution of threads in a warp can be synchronized instruction by instruction without any barrier synchronous operations. Also, inter-thread communication can be performed by warp shuffle functions without accessing shared memory. The experimental results show that our GPU implementation on NVIDIA GeForce GTX 980 attains a speed-up factor of 52 for 1024-bit multiple-length multiplication over the sequential CPU implementation. Moreover, we use this 1024-bit multiple-length multiplication for larger size of bits as a sub-routine. The GPU implementation attains a speed-up factor of 21 for 65536-bit multiple-length multiplication.

2841-2860hit(16314hit)

Keyword Search Result

[Keyword] SI(16314hit)

Insufficient Vectorization: A New Method to Exploit Superword Level Parallelism

Related-Key Attacks on Reduced-Round Hierocrypt-L1

Power Analysis on Unrolled Architecture with Points-of-Interest Search and Its Application to PRINCE Block Cipher

Perfect Gaussian Integer Sequences of Degree-4 Using Difference Sets

Multi-Track Joint Decoding Schemes Using Two-Dimensional Run-Length Limited Codes for Bit-Patterned Media Magnetic Recording

Analytical Stability Modeling for CMOS Latches in Low Voltage Operation

Reliability-Security Tradeoff for Secure Transmission with Untrusted Relays

Synthesis and Automatic Layout of Resistive Digital-to-Analog Converter Based on Mixed-Signal Slice Cell

Up-Stream Dispatching of Power by Density of Power Packet

A New Algorithm for Reducing Components of a Gaussian Mixture Model

Asymptotic Behavior of Error Probability in Continuous-Time Gaussian Channels with Feedback

Performance Improvement of Error-Resilient 3D DWT Video Transmission Using Invertible Codes

Average Coding Rate of a Multi-Shot Tunstall Code with an Arbitrary Parsing Tree Sequence

Range Limiter Using Connection Bounding Box for SA-Based Placement of Mixed-Grained Reconfigurable Architecture

A Highly-Adaptable and Small-Sized In-Field Power Analyzer for Low-Power IoT Devices

Low Complexity Reed-Solomon Decoder Design with Pipelined Recursive Euclidean Algorithm

Hardware-Efficient Local Extrema Detection for Scale-Space Extrema Detection in SIFT Algorithm

A Deep Neural Network Based Quasi-Linear Kernel for Support Vector Machines

Signal Power Estimation Based on Orthogonal Projection and Oblique Projection

GPU-Accelerated Bulk Execution of Multiple-Length Multiplication with Warp-Synchronous Programming Technique

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles