IEICE global.ieice.org Site

Keyword Search Result

[Keyword] SI(16314hit)

4361-4380hit(16314hit)

Simulating Cardiac Electrophysiology in the Era of GPU-Cluster Computing
Jun CHAI Mei WEN Nan WU Dafei HUANG Jing YANG Xing CAI Chunyuan ZHANG Qianming YANG

PAPER

Vol:
E96-D No:12
Page(s):
2587-2595
This paper presents a study of the applicability of clusters of GPUs to high-resolution 3D simulations of cardiac electrophysiology. By experimenting with representative cardiac cell models and ODE solvers, in association with solving the monodomain equation, we quantitatively analyze the obtainable computational capacity of GPU clusters. It is found that for a 501×501×101 3D mesh, which entails a 0.1mm spatial resolution, a 128-GPU cluster only needs a few minutes to carry out a 100,000-time-step cardiac excitation simulation that involves a four-variable cell model. Even higher spatial and temporal resolutions are achievable for such simplified mathematical models. On the other hand, our experiments also show that a dramatically larger cluster of GPUs is needed to handle a very detailed cardiac cell model.
Dynamic Spectrum Control Aided Spectrum Sharing with Nonuniform Sampling-Based Channel Sounding
Quang Thang DUONG Shinsuke IBI Seiichi SAMPEI

PAPER-Wireless Communication Technologies

Vol:
E96-B No:12
Page(s):
3172-3180
This paper studies channel sounding for selfish dynamic spectrum control (S-DSC) in which each link dynamically maps its spectral components onto a necessary amount of discrete frequencies having the highest channel gain of the common system band. In S-DSC, it is compulsory to conduct channel sounding for the entire system band by using a reference signal whose spectral components are sparsely allocated by S-DSC. Using nonuniform sampling theory, this paper exploits the finite impulse response characteristic of frequency selective fading channels to carry out the channel sounding. However, when the number of spectral components is relatively small compared to the number of discrete frequencies of the system band, reliability of the channel sounding deteriorates severely due to the ill-conditioned problem and degradation in channel capacity of the next frame occurs as a result. Aiming at balancing frequency selection diversity effect and reliability of channel sounding, this paper proposes an S-DSC which allocates an appropriate number of spectral components onto discrete frequencies with low predicted channel gain besides mapping the rest onto those with high predicted channel gain. A numerical analysis confirms that the proposed S-DSC gives significant enhancement in channel capacity performance.
Fourier Analysis of Sequences over a Composition Algebra of the Real Number Field
Takao MAEDA Takafumi HAYASHI

LETTER-Sequence

Vol:
E96-A No:12
Page(s):
2452-2456
To analyze the structure of a set of perfect sequences over a composition algebra of the real number field, transforms of a set of sequences similar to the discrete Fourier transform (DFT) are introduced. The discrete cosine transform, discrete sine transform, and generalized discrete Fourier transform (GDFT) of the sequences are defined and the fundamental properties of these transforms are proved. We show that GDFT is bijective and that there exists a relationship between these transforms and a convolution of sequences. Applying these properties to the set of perfect sequences, a parameterization theorem of such sequences is obtained.
A New Delay Distribution Model with a Half Triangular Distribution for Statistical Static Timing Analysis
Shuji TSUKIYAMA Masahiro FUKUI

PAPER-Device and Circuit Modeling and Analysis

Vol:
E96-A No:12
Page(s):
2542-2552
The long-term degradation due to aging such as NBTI (Negative Bias Temperature Instability) is a hot issue in the current circuit design using nanometer process technologies, since it causes a delay fault in the field. In order to resolve the problem, we must estimate delay variation caused by long-term degradation in design stage, but over estimation must be avoided so as to make timing design easier. If we can treat such a variation statistically, and if we treat it together with delay variations due to process variability, then we can reduce over margin in timing design. Moreover, such a statistical static timing analyzer treating process variability and long-term degradation together will help us to select an appropriate set of paths for which field testing are conducted to detect delay faults. In this paper, we propose a new delay model with a half triangular distribution, which is introduced for handling a random factor with unknown distribution such as long term degradation. Then, we show an algorithm for finding the statistical maximum, which is one of key operations in statistical static timing analysis. We also show a few experimental results demonstrating the effect of the proposed model and algorithm.
Periodic Pattern Coding for Last Level Cache Data Compression
Haruhiko KANEKO

PAPER-Data Compression

Vol:
E96-A No:12
Page(s):
2351-2359
In spite of continuous improvement of computational power of multi/many-core processors, the memory access performance of the processors has not been improved sufficiently, and thus the overall performance of recent processors is often restricted by the delay of off-chip memory accesses. Low-delay data compression for last level cache (LLC) would be effective to improve the processor performance because the compression increases the effective size of LLC, and thus reduces the number of off-chip memory accesses. This paper proposes a novel data compression method suitable for high-speed parallel decoding in the LLC. Since cache line data often have periodicity of certain lengths, such as 32- or 64-bit instructions, 32-bit integers, and 64-bit floating point numbers, an information word is encoded as a base pattern and a differential pattern between the original word and the base pattern. Evaluation using a GPU simulator shows that the compression ratio of the proposed coding is comparable to LZSS coding and X-Match Pro and superior to other conventional compression algorithms for cache memories. Also this paper presents an experimental decoder designed for ASIC, and the synthesized result shows that the decoder can decompress cache line data of length 32bytes in four clock cycles. Evaluation of the IPC on the GPU simulator shows that, for several benchmark programs, the IPC achieved by the proposed coding is higher than that by the conventional BΔI coding, where the maximum improvement of the IPC is 20%.
GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems
Fumihiko INO Shinta NAKAGAWA Kenichi HAGIHARA

PAPER

Vol:
E96-D No:12
Page(s):
2604-2616
This paper presents a stream programming framework, named GPU-chariot, for accelerating stream applications running on graphics processing units (GPUs). The main contribution of our framework is that it realizes efficient software pipelines on multi-GPU systems by enabling out-of-order execution of CPU functions, kernels, and data transfers. To achieve this out-of-order execution, we apply a runtime scheduler that not only maximizes the utilization of system resources but also encapsulates the number of GPUs available in the system. In addition, we implement a load-balancing capability to flow data efficiently through multiple GPUs. Furthermore, a callback interface enables overlapping execution of functions in third-party libraries. By using kernels with different performance bottlenecks, we show that our out-of-order execution is up to 20% faster than in-order execution. Finally, we conduct several case studies on a 4-GPU system and demonstrate the advantages of GPU-chariot over a manually pipelined code. We conclude that GPU-chariot can be useful when developing stream applications with software pipelines on multiple GPUs and CPUs.
Effect of Magnetostatic Interactions between the Spin-Torque Oscillator and the SPT Writer on the Oscillation Characteristics of the Spin-Torque Oscillator
Sota ASAKA Takuya HASHIMOTO Kazuetsu YOSHIDA Yasushi KANAI

PAPER

Vol:
E96-C No:12
Page(s):
1484-1489
Microwave-assisted magnetic recording (MAMR) has been proposed as a candidate technology to realize areal recording densities of over 2 Tbit/inch2. MAMR requires a spin-torque oscillator (STO) to generate a strong high-frequency magnetic field that will induce magnetic resonance in the recording medium. The oscillation characteristics of STOs were previously investigated using a micromagnetic model that neglected the magnetic interaction among the STO, the single-pole-type (SPT) writer, and the recording head. The STO is typically placed in the gap between the main pole and the trailing shield of the SPT writer, so that the STO is inevitably subjected to strong magnetic interaction with the main pole and the trailing shield. We have developed a new simulator, referred to as an integrated MAMR simulator, that takes this interaction into account. The integrated simulator has revealed that the magnetic interaction has a strong influence on the oscillation characteristics.
Performance Evaluation of Non-binary LDPC Coding and Iterative Decoding System for BPM R/W Channel with Write-Errors
Yasuaki NAKAMURA Yoshihiro OKAMOTO Hisashi OSAWA Hajime AOI Hiroaki MURAOKA

PAPER

Vol:
E96-C No:12
Page(s):
1497-1503
Bit-patterned medium (BPM) is one of the promising approaches for ultra-high density magnetic recording systems. However, BPM requires precise write synchronization, and exhibits write-errors due to insufficient write field gradient, medium switching field distribution (SFD), demagnetization field from adjacent islands, and island position variation. In this paper, an iterative decoding system using a non-binary low-density parity-check (LDPC) code is considered for a BPM R/W channel with write-errors at an areal recording density of 2Tbit/inch2 including the coding rate loss. The performance of the iterative decoding system using the non-binary LDPC code over the Galois field GF(28) is evaluated by computer simulation, and it is compared with the conventional iterative decoding system using a binary LDPC code. The results show that the non-binary LDPC system has a larger write margin than the binary LDPC system.
A Loss-Recovery Scheme for Mixed Unicast and Multicast Traffic Using Network Coding
Zhiheng ZHOU Liang ZHOU Shengqiang LI

PAPER-Wireless Communication Technologies

Vol:
E96-B No:12
Page(s):
3116-3123
In wireless networks, how to provide reliable data transfer is an important and challenging issue due to channel fading and interference. Several approaches, e.g., Automatic Repeat reQuest (ARQ), Hybrid ARQ (HARQ) and Network Coding (NC), are used to enhance reliability of transmission in wireless networks. However, we note that these schemes implement the data recovery process for mixed unicast and multicast (MUM) communications by simply separating the process into two phases, unicast and multicast phase. This is inefficient and expensive. In this paper, we propose an efficient retransmission scheme with network coding for MUM transmission, aiming at improving bandwidth utilization. UMNC searches for coding opportunities from both unicast and multicast flows, which offer the potential benefit of improved recovery in the event of packet loss. We theoretically prove that UMNC can effectively reduce the total number of retransmissions and thus improve bandwidth efficiency, compared with existing schemes.
On the Sparse Signal Recovery with Parallel Orthogonal Matching Pursuit
Shin-Woong PARK Jeonghong PARK Bang Chul JUNG

LETTER-Digital Signal Processing

Vol:
E96-A No:12
Page(s):
2728-2730
In this letter, parallel orthogonal matching pursuit (POMP) is proposed to supplement orthogonal matching pursuit (OMP) which has been widely used as a greedy algorithm for sparse signal recovery. Empirical simulations show that POMP outperforms the existing sparse signal recovery algorithms including OMP, compressive sampling matching pursuit (CoSaMP), and linear programming (LP) in terms of the exact recovery ratio (ERR) for the sparse pattern and the mean-squared error (MSE) between the estimated signal and the original signal.
Analog Circuit Synthesis with Constraint Generation of Layout-Dependent Effects by Geometric Programming
Yu ZHANG Gong CHEN Bo YANG Jing LI Qing DONG Ming-Yu LI Shigetoshi NAKATAKE

PAPER-Physical Level Design

Vol:
E96-A No:12
Page(s):
2487-2498
As CMOS devices scaling down in nowadays integrated circuits, the impact of layout-dependent effects (LDEs) to circuit performances becomes to be significant. This paper mainly focuses on LDE-aware analog circuit synthesis. Our circuit synthesis follows an optimization framework of transistor sizing based on geometric programming (GP) in which analog circuit performances are formulated in terms of monomials and posynomials. Providing GP models for the LDEs such as the shallow trench isolation (STI) stress and the well proximity effect (WPE), we can generate layout constraints related to LDEs during the circuit synthesis. Applying our circuit synthesis to a typical two-stage op-amp, we showed that the resultant circuit, which generated by GP with circuit performance and layout constraints, satisfied all the specifications with the verification of HSPICE simulation based on the BSIM model with LDE options.
Recursive Construction of (k+1)-Ary Error-Correcting Signature Code for Multiple-Access Adder Channel
Shan LU Jun CHENG Yoichiro WATANABE

PAPER-Coding Theory

Vol:
E96-A No:12
Page(s):
2368-2373
A recursive construction of (k+1)-ary error-correcting signature code is proposed to identify users for MAAC, even in the presence of channel noise. The recursion is originally from a trivial signature code. In the (j-1)-th recursion, from a signature code with minimum distance of 2j-2, a longer and larger signature code with minimum distance of 2j-1 is obtained. The decoding procedure of signature code is given, which consists of error correction and user identification.
Network Designs for Cycle-Attack-Free Logical-Tree Topologies in Optical CDM Networks
Tatsuya FUKUDA Ken-ichi BABA

PAPER-Fiber-Optic Transmission for Communications

Vol:
E96-B No:12
Page(s):
3070-3079
Optical Code Division Multiplexing (OCDM) is a multiplexing technology for constructing future all-optical networks. Compared with other multiplexing technologies, it can be easily controlled and can establish lightpaths of smaller granularity. However, previous research has revealed that OCDM networks are vulnerable to cycle attacks. Cycle attacks are caused by multi-access interference (MAI), which is crosstalk noise on the same wavelength in OCDM networks. If cycle attacks occur, they disrupt all network services immediately. Previous research has proposed a logical topology design that is free of cycle attacks. However, this design assumes that path assignment is centrally controlled. It also does not consider the delay between each node and the centralized controller. In this paper, we propose novel logical topology designs that are free of cycle attacks and methods of establishing paths. The basic concepts underlying our methods are to autonomously construct a cycle-attack-free logical topology and to establish lightpaths by using a distributed controller. Our methods can construct a logical network and establish lightpaths more easily than the previous method can. In addition, they have network scalability because of their distributed control. Simulation results show that our methods have lower loss probabilities than the previous method and better mean hop counts than the centralized control approach.
Effective Implementation and Embedding Algorithms of CEPTA Method for Finding DC Operating Points
Zhou JIN Xiao WU Dan NIU Yasuaki INOUE

PAPER-Device and Circuit Modeling and Analysis

Vol:
E96-A No:12
Page(s):
2524-2532
Recently, the compound element pseudo transient analysis, CEPTA, method is regarded as an efficient practical method to find DC operating points of nonlinear circuits when the Newton-Raphson method fails. In the previous CEPTA method, an effective SPICE3 implementation algorithm was proposed without expanding the Jacobian matrix. However the limitation of step size was not well considered. Thus, the non-convergence problem occurs and the simulation efficiency is still a big challenge for current LSI nonlinear cicuits, especially for some practical large-scale circuits. Therefore, in this paper, we propose a new SPICE3 implementation algorithm and an embedding algorithm, which is where to insert the pseudo capacitors, for the CEPTA method. The proposed implementation algorithm has no limitation for step size and can significantly improve simulation efficiency. Considering the existence of various types of circuits, we extend some possible embedding positions. Numerical examples demonstrate the improvement of simulation efficiency and convergence performance.
Clique-Based Architectural Synthesis of Flow-Based Microfluidic Biochips
Trung Anh DINH Shigeru YAMASHITA Tsung-Yi HO Yuko HARA-AZUMI

PAPER-High-Level Synthesis and System-Level Design

Vol:
E96-A No:12
Page(s):
2668-2679
Microfluidic biochips, also referred to “lab-on-a-chip,” have been recently proposed to integrate all the necessary functions for biochemical analyses. This technology starts a new era of biology science, where a combination of electronic and biology is first introduced. There are several types of microfluidic biochips; among them there has been a great interest in flow-based microfluidic biochips, in which the flows of liquid is manipulated using integrated microvalves. By combining several microvalves, more complex resource units such as micropumps, switches and mixers can be built. For efficient execution, the flows of liquid routes in microfluidic biochips need to be scheduled under some resource constraints and routing constraints. The execution time of a biochemical application depends strongly on the binding and scheduling result. The most previously developed binding and scheduling algorithm is based on heuristics, and there has been no method to obtain optimal results. Considering the above, we propose an optimal method by casting the problem to a clique problem. Moreover, this paper also presents some heuristic techniques for computational time reduction. Experiments demonstrate that the proposed method is able to reduce the execution time of biochemical applications by more than 15% compared with the previous approach. Moreover, the proposed heuristic method is able to produce the results at no or little cost of optimality, in significantly shorter time than the optimal method.
A 5.83pJ/bit/iteration High-Parallel Performance-Aware LDPC Decoder IP Core Design for WiMAX in 65nm CMOS
Xiongxin ZHAO Zhixiang CHEN Xiao PENG Dajiang ZHOU Satoshi GOTO

PAPER-High-Level Synthesis and System-Level Design

Vol:
E96-A No:12
Page(s):
2623-2632
In this paper, we propose a synthesizable LDPC decoder IP core for the WiMAX system with high parallelism and enhanced error-correcting performance. By taking the advantages of both layered scheduling and fully-parallel architecture, the decoder can fully support multi-mode decoding specified in WiMAX with the parallelism much higher than commonly used partial-parallel layered LDPC decoder architecture. 6-bit quantized messages are split into bit-serial style and 2bit-width serial processing lines work concurrently so that only 3 cycles are required to decode one layer. As a result, 12∼24 cycles are enough to process one iteration for all the code-rates specified in WiMAX. Compared to our previous bit-serial decoder, it doubles the parallelism and solves the message saturation problem of the bit-serial arithmetic, with minor gate count increase. Power synthesis result shows that the proposed decoder achieves 5.83pJ/bit/iteration energy efficiency which is 46.8% improvement compared to state-of-the-art work. Furthermore, an advanced dynamic quantization (ADQ) technique is proposed to enhance the error-correcting performance in layered decoder architecture. With about 2% area overhead, 6-bit ADQ can achieve the error-correcting performance close to 7-bit fixed quantization with improved error floor performance.
Evaluation of an FPGA-Based Heterogeneous Multicore Platform with SIMD/MIMD Custom Accelerators
Yasuhiro TAKEI Hasitha Muthumala WAIDYASOORIYA Masanori HARIYAMA Michitaka KAMEYAMA

PAPER-High-Level Synthesis and System-Level Design

Vol:
E96-A No:12
Page(s):
2576-2586
Heterogeneous multi-core architectures with CPUs and accelerators attract many attentions since they can achieve power-efficient computing in various areas from low-power embedded processing to high-performance computing. Since the optimal architecture is different from application to application, finding the most suitable accelerator is very important. In this paper, we propose an FPGA-based heterogeneous multi-core platform with custom accelerators for power-efficient computing. Using the proposed platform, we evaluate several applications and accelerators to identify many key requirements of the applications and properties of the accelerators. Such an evaluation is very important to select and optimize the most suitable accelerator according to the requirements of an application to achieve the best performance.
On the Dependence of Error Performance of Spatially Coupled LDPC Codes on Their Design Parameters
Hiroyuki IHARA Tomoharu SHIBUYA

LETTER-Coding Theory

Vol:
E96-A No:12
Page(s):
2447-2451
Spatially coupled (SC) low-density parity-check (LDPC) codes are defined by bipartite graphs that are obtained by assembling prototype graphs. The combination and connection of prototype graphs are designated by specifying some parameters, and Kudekar et al. showed that BP threshold of the ensemble of SC LDPC codes agrees with MAP threshold of the ensemble of regular LDPC codes when those parameters are grown up so that the code length tends to infinity. When we design SC LDPC codes with practical code length, however, it is not clear how to set those parameters to enhance the performance of SC LDPC codes. In this paper, we provide the result of numerical experiments that suggest the dependence of error performance of SC LDPC codes over BEC on their design parameters.
A Robust Speech Communication into Smart Info-Media System
Yoshikazu MIYANAGA Wataru TAKAHASHI Shingo YOSHIZAWA

INVITED PAPER

Vol:
E96-A No:11
Page(s):
2074-2080
This paper introduces our developed noise robust speech communication techniques and describes its implementation to a smart info-media system, i.e., a small robot. Our designed speech communication system consists of automatic speech detection, recognition, and rejection. By using automatic speech detection and recognition, an observed speech waveform can be recognized without a manual trigger. In addition, using speech rejection, this system only accepts registered speech phrases and rejects any other words. In other words, although an arbitrary input speech waveform can be fed into this system and recognized, the system responds only to the registered speech phrases. The developed noise robust speech processing can reduce various noises in many environments. In addition to the design of noise robust speech recognition, the LSI design of this system has been introduced. By using the design of speech recognition application specific IC (ASIC), we can simultaneously realize low power consumption and real-time processing. This paper describes the LSI architecture of this system and its performances in some field experiments. In terms of current speech recognition accuracy, the system can realize 85-99% under 0-20dB SNR and echo environments.
A Single Tooth Segmentation Using PCA-Stacked Gabor Filter and Active Contour
Pramual CHOORAT Werapon CHIRACHARIT Kosin CHAMNONGTHAI Takao ONOYE

PAPER-Image Processing

Vol:
E96-A No:11
Page(s):
2169-2178
In tooth contour extraction there is insufficient intensity difference in x-ray images between the tooth and dental bone. This difference must be enhanced in order to improve the accuracy of tooth segmentation. This paper proposes a method to improve the intensity between the tooth and dental bone. This method consists of an estimation of tooth orientation (intensity projection, smoothing filter, and peak detection) and PCA-Stacked Gabor with ellipse Gabor banks. Tooth orientation estimation is performed to determine the angle of a single oriented tooth. PCA-Stacked Gabor with ellipse Gabor banks is then used, in particular to enhance the border between the tooth and dental bone. Finally, active contour extraction is performed in order to determine tooth contour. In the experiment, in comparison with the conventional active contour without edge (ACWE) method, the average mean square error (MSE) values of extracted tooth contour points are reduced from 26.93% and 16.02% to 19.07% and 13.42% for tooth x-ray type I and type H images, respectively.

4361-4380hit(16314hit)

Keyword Search Result

[Keyword] SI(16314hit)

Simulating Cardiac Electrophysiology in the Era of GPU-Cluster Computing

Dynamic Spectrum Control Aided Spectrum Sharing with Nonuniform Sampling-Based Channel Sounding

Fourier Analysis of Sequences over a Composition Algebra of the Real Number Field

A New Delay Distribution Model with a Half Triangular Distribution for Statistical Static Timing Analysis

Periodic Pattern Coding for Last Level Cache Data Compression

GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems

Effect of Magnetostatic Interactions between the Spin-Torque Oscillator and the SPT Writer on the Oscillation Characteristics of the Spin-Torque Oscillator

Performance Evaluation of Non-binary LDPC Coding and Iterative Decoding System for BPM R/W Channel with Write-Errors

A Loss-Recovery Scheme for Mixed Unicast and Multicast Traffic Using Network Coding

On the Sparse Signal Recovery with Parallel Orthogonal Matching Pursuit

Analog Circuit Synthesis with Constraint Generation of Layout-Dependent Effects by Geometric Programming

Recursive Construction of (k+1)-Ary Error-Correcting Signature Code for Multiple-Access Adder Channel

Network Designs for Cycle-Attack-Free Logical-Tree Topologies in Optical CDM Networks

Effective Implementation and Embedding Algorithms of CEPTA Method for Finding DC Operating Points

Clique-Based Architectural Synthesis of Flow-Based Microfluidic Biochips

A 5.83pJ/bit/iteration High-Parallel Performance-Aware LDPC Decoder IP Core Design for WiMAX in 65nm CMOS

Evaluation of an FPGA-Based Heterogeneous Multicore Platform with SIMD/MIMD Custom Accelerators

On the Dependence of Error Performance of Spatially Coupled LDPC Codes on Their Design Parameters

A Robust Speech Communication into Smart Info-Media System

A Single Tooth Segmentation Using PCA-Stacked Gabor Filter and Active Contour

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles