IEICE global.ieice.org Site

Keyword Search Result

[Keyword] SI(16314hit)

9441-9460hit(16314hit)

Trace-Driven Performance Simulation Modeling for Fast Evaluation of Multimedia Processor by Simulation Reuse
Ho Young KIM Tag Gon KIM

PAPER-Simulation and Verification

Vol:
E88-A No:12
Page(s):
3306-3314
A method for fast but yet accurate performance evaluation of processor architecture is mostly desirable in modern processors design. This paper proposes one such method which can measure cycle counts and power consumption of pipelined processors. The method first develops a trace-driven performance simulation model and then employs simulation reuse in simulation of the model. The trace-driven performance modeling is for accuracy in which performance simulation uses the same execution traces as constructed in simulation for functional verification. Fast performance simulation can be achieved in a way that performance for each instruction in the traces is evaluated without evaluation of the instruction itself. Simulation reuse supports simulation speedup by elimination of an evaluation at the current state, which is identical to that at a previous state. The reuse approach is based on the property that application programs, especially multimedia applications, have many iterative loops in general. A performance simulator for pipeline architecture based on the proposed method has been developed through which greater speedup has been made compared with other approaches in performance evaluation.
Multiplier Energy Reduction by Dynamic Voltage Variation
Vasily G. MOSHNYAGA Tomoyuki YAMANAKA

PAPER-VLSI Circuit

Vol:
E88-A No:12
Page(s):
3548-3553
Design of portable battery operated multimedia devices requires energy-efficient multiplication circuits. This paper proposes a novel architectural technique to reduce power consumption of digital multipliers. Unlike related approaches which focus on multiplier transition activity reduction, we concentrate on dynamic reduction of supply voltage. Two implementation schemes capable of dynamically adjusting a double voltage supply to input data variation are presented. Simulations show that using these schemes we can reduce energy consumption of 1616-bit multiplier by 34% and 29% on peak and by 10% and 7% on average with area overhead of 15% and 4%, respectively, while maintaining the performance of traditional multiplier.
Large-Size Local-Domain Basis Functions with Phase Detour and Fresnel Zone Threshold for Sparse Reaction Matrix in the Method of Moments
Tetsu SHIJO Takuichi HIRANO Makoto ANDO

PAPER-EM Analysis

Vol:
E88-C No:12
Page(s):
2208-2215
Locality in high frequency diffraction is embodied in the Method of Moments (MoM) in view of the method of stationary phase. Local-domain basis functions accompanied with the phase detour, which are not entire domain but are much larger than the segment length in the usual MoM, are newly introduced to enhance the cancellation of mutual coupling over the local-domain; the off-diagonal elements in resultant reaction matrix evanesce rapidly. The Fresnel zone threshold is proposed for simple and effective truncation of the matrix into the sparse band matrix. Numerical examples for the 2-D strip and the 2-D corner reflector demonstrate the feasibility as well as difficulties of the concept; the way mitigating computational load of the MoM in high frequency problems is suggested.
Successive Pad Assignment for Minimizing Supply Voltage Drop
Takashi SATO Masanori HASHIMOTO Hidetoshi ONODERA

PAPER-Power/Ground Network

Vol:
E88-A No:12
Page(s):
3429-3436
An efficient pad assignment methodology to minimize voltage drop on a power distribution network is proposed. A combination of successive pad assignment (SPA) with incremental matrix inversion (IMI) determines both location and number of power supply pads to satisfy drop voltage constraint. The SPA creates an equivalent resistance matrix which preserves both pad candidates and power consumption points as external ports so that topological modification due to connection or disconnection between voltage sources and candidate pads is consistently represented. By reusing sub-matrices of the equivalent matrix, the SPA greedily searches the next pad location that minimizes the worst drop voltage. Each time a candidate pad is added, the IMI reduces computational complexity significantly. Experimental results including a 400 pad problem show that the proposed procedures efficiently enumerate pad order in a practical time.
Contour-Based Window Extraction Algorithm for Bare Printed Circuit Board Inspection
Shih-Yuan HUANG Chi-Wu MAO Kuo-Sheng CHENG

PAPER-Pattern Recognition

Vol:
E88-D No:12
Page(s):
2802-2810
Pattern extraction is an indispensable step in bare printed circuit board (PCB) inspection and plays an important role in automatic inspection system design. A good approach for pattern definition and extraction will make the following PCB diagnosis easy and efficient. The window-based technique has great potential in PCB patterns extraction due to its simplicity. The conventional window-based pattern extraction methods, such as Small Seeds Window Extraction method (SSWE) and Large Seeds Window Extraction method (LSWE), have the problems of losing some useful copper traces and splitting slanted-lines into too many small similar windows. These methods introduce the difficulty and computation intensive in automatic inspection. In this paper, a novel method called Contour Based Window Extraction (CBWE) algorithm is proposed for improvement. In comparison with both SSWE and LSWE methods, the CBWE algorithm has several advantages in application. Firstly, all traces can be segmented and enclosed by a valid window. Secondly, the type of the entire horizontal or vertical line of copper trace is preserved. Thirdly, the number of the valid windows is less than that extracted by SSWE and LSWE. From the experimental results, the proposed CBWE algorithm is demonstrated to be very effective in basic pattern extraction from bare PCB image analysis.
A Simplified Illustration of Arbitrary DAC Waveform Effects in Continuous Time Delta-Sigma Modulators
Hossein SHAMSI Omid SHOAEI

LETTER

Vol:
E88-A No:12
Page(s):
3577-3579
In this paper a straight-forward approach to extract the equivalent loop-gain of a continuous time Delta-Sigma modulator with an arbitrary DAC waveform in z-domain is presented. In this approach the arbitrary DAC waveform is approximated by the infinite number of rectangular pulse shapes. Then simply using the transformations available in literatures for a rectangular DAC pulse shape and applying superposition on each rectangular pulse shape, the loop-gain of the system is derived in z-domain.
Node Placement Algorithms in the Case that Routes are Design Variables in Shuffle-Like Multihop Lightwave Networks
Tokumi YOKOHIRA Kiyohiko OKAYAMA

PAPER-Network

Vol:
E88-B No:12
Page(s):
4578-4587
The shuffle-like network (SL-Net) is known as a logical topology for WDM-based multihop packet-switched networks. Even if we fix the logical topology to an SL-Net, we can still reposition nodes in the SL-Net by re-tuning wavelengths of transmitters and/or receivers. In conventional node placement algorithms, routes between nodes are assumed to be given. In this paper, we propose two heuristic node placement algorithms for the SL-Net to decrease the average end-to-end packet transmission delay under a given traffic matrix in the case that routes are design variables. The principal idea is to prevent too many traffic flows from overlapping on any link. To attain the idea, in one of the algorithms, a node is selected one by one in a decreasing order of the sums of sending and receiving traffic requirements in nodes, and its placement and routes between the node and all the nodes already placed are simultaneously decided so that the maximum of the amounts of traffic on links at the moment is minimum. In the other algorithm, a node is selected in the same way, and first it is placed so that the average distance between the node and all the nodes already placed is as large as possible, and then routes between the node and all the nodes already placed are decided so that the maximum of the amounts of traffic on links at the moment is minimum. Numerical results for four typical traffic matrices show that either of the proposed algorithms has better performance than conventional algorithms for each matrix, and show that the proposed algorithms, which are based on a jointed optimization approach of node placement and routing, are superior to algorithms which execute node placement and routing as two isolated phases.
Quality and Power Efficient Architecture for the Discrete Cosine Transform
Chi-Chia SUNG Shanq-Jang RUAN Bo-Yao LIN Mon-Chau SHIE

PAPER-VLSI Architecture

Vol:
E88-A No:12
Page(s):
3500-3507
In recent years, the demand for multimedia mobile battery-operated devices has created a need for low power implementation of video compression. Many compression standards require the discrete cosine transform (DCT) function to perform image/video compression. For this reason, low power DCT design has become more and more important in today's image/video processing. This paper presents a new power-efficient Hybrid DCT architecture which combines Loeffler DCT and binDCT in terms of special property on luminance and chrominance difference. We use Synopsys PrimePower to estimate the power consumption in a TSMC 0.25-µm technology. Besides, we also adopt a novel quality assessment method based on structural distortion measurement to measure the quality instead of peak signal to noise rations (PSNR) and mean squared error (MSE). It is concluded that our Hybrid DCT offers similar quality performance to the Loeffler, and leads to 25% power consumption and 27% chip area savings.
An Asymptotic Relative Performance Measure for Signal Detectors Based on the Correlation Information of Statistics
Jinsoo BAE Iickho SONG Hyun JOO

LETTER-Fundamental Theories for Communications

Vol:
E88-B No:12
Page(s):
4643-4646
Signal detectors generally utilize nonlinear statistics of an original observation rather than the original observation as it is. The sign statistic, a typical example of the nonlinear statistics, is the sign information of an observation and the sign detector relies only on the sign statistic. Since either detector might be of a better performance depending on the situation, it is quite important to determine which is the best performer among the detectors, based on the given situational information about noise and signal strength. In this letter, a qualitative analysis is presented that the correlation coefficients between the statistics and original observation can be used to predict the asymptotic performance of a detector utilizing one of the statistics, relative to the other detectors.
A VLSI Array Processing Oriented Fast Fourier Transform Algorithm and Hardware Implementation
Zhenyu LIU Yang SONG Takeshi IKENAGA Satoshi GOTO

PAPER-VLSI Architecture

Vol:
E88-A No:12
Page(s):
3523-3530
Many parallel Fast Fourier Transform (FFT) algorithms adopt multiple stages architecture to increase performance. However, data permutation between stages consumes volume memory and processing time. One FFT array processing mapping algorithm is proposed in this paper to overcome this demerit. In this algorithm, arbitrary 2k butterfly units (BUs) could be scheduled to work in parallel on n=2s data (k=0,1,..., s-1). Because no inter stage data transfer is required, memory consumption and system latency are both greatly reduced. Moreover, with the increasing of BUs, not only does throughput increase linearly, system latency also decreases linearly. This array processing orientated architecture provides flexible tradeoff between hardware cost and system performance. In theory, the system latency is (s2s-k)tclk and the throughput is n/(s2s-ktclk), where tclk is the system clock period. Based on this mapping algorithm, several 18-bit word-length 1024-point FFT processors implemented with TSMC0.18 µm CMOS technology are given to demonstrate its scalability and high performance. The core area of 4-BU design is 2.9911.121 mm2 and clock frequency is 326 MHz in typical condition (1.8 V,25). This processor completes 1024 FFT calculation in 7.839 µs.
A Waveguide Broad-Wall Transverse Slot Linear Array with Reflection-Canceling Inductive Posts and Grating-Lobe Suppressing Parasitic Dipoles
M.G. SORWAR HOSSAIN Jiro HIROKAWA Makoto ANDO

PAPER-Antenna Design

Vol:
E88-C No:12
Page(s):
2266-2273
A design of a linearly-polarized non-resonant waveguide broad-wall transverse slot linear array with suppressed grating lobes is presented. Each unit element in the array consists of a transverse slot, an inductive post and a parasitic dipole-pair at a height of half of the free space wavelength. It is designed as an isolated unit without considering mutual coupling by using the Method of Moments (MoM) for radiation suppression in grating beam direction and reflection cancellation at the input. The elements thus designed are used in a travelling wave array environment. It is predicted that the reflection is less than -20 dB at 11.95 GHz while the grating lobes are suppressed by more than 15 dB. The design and the characteristics of the array are confirmed by measurements.
Computational Methods for Surface Relief Gratings Using Electric and Magnetic Flux Expansions
Minoru KOMATSU Hideaki WAKABAYASHI Jiro YAMAKITA

PAPER-EM Analysis

Vol:
E88-C No:12
Page(s):
2192-2198
The relative permittivity and permeability are discontinuous at the grating profile, and the electric and magnetic flux densities are continuous. As for the method of analysis for scattering waves by surface relief gratings placed in conical mounting, the spatial harmonic expansion approach of the flux densities are formulated in detail and the validity of the approach is shown numerically. The present method is effective for uniform regions such as air and substrate in addition to grating layer. The matrix formulations are introduced by using numerical calculations of the matrix eigenvalue problem in the grating region and analytical solutions separated for TE and TM waves in the uniform region are described. Some numerical examples for linearly and circularly polarized incidence show the usefulness of the flux densities expansion approach.
Estimation of Surface Impedance for Inhomogeneous Half-Space Using Far Fields
Michinari SHIMODA Masazumi MIYOSHI

PAPER-EM Analysis

Vol:
E88-C No:12
Page(s):
2199-2207
An inverse scattering problem of estimating the surface impedance for an inhomogeneous half-space is investigated. By virtue of the fact that the far field representation contains the spectral function of the scattered field, complex values of the function are estimated from a set of absolute values of the far field. An approximate function for the spectral function is reconstructed from the estimated complex values by the least-squares sense. The surface impedance is estimated through calculating the field on the surface of the half-space expressed by the inverse Fourier transform. Numerical examples are given and the accuracy of the estimation is discussed.
Control of Total Transmission on Ferrite Edge-Mode Isolator
Toshiro KODERA

PAPER-Microwaves, Millimeter-Waves

Vol:
E88-C No:12
Page(s):
2366-2371
This paper introduces a new approach to realize a multi-state operation on the microwave isolator using ferrite edge-mode. The voltage control of total transmission on the isolator is realized. The operation is based on the unique property of ferrite edge-mode and the variable resistance of PIN diodes. On the isolator, the frequency response is investigated both experimentally and numerically. The numerical analysis is performed by the FDTD method. Both numerical and experimental results have shown that the transmission between two ports can be totally controlled by the applied voltage for the diodes. The experimental results indicate that the transmission direction can be controlled at 11 GHz, and the isolation ratio can be controlled for more than 30 dB.
A Hardware Algorithm for Modular Multiplication/Division Based on the Extended Euclidean Algorithm
Marcelo E. KAIHARA Naofumi TAKAGI

PAPER-VLSI Design Technology and CAD

Vol:
E88-A No:12
Page(s):
3610-3617
A hardware algorithm for modular multiplication/division which performs modular division, Montgomery multiplication, and ordinary modular multiplication is proposed. The modular division in our algorithm is based on the extended Euclidean algorithm. We employ our newly proposed computation method that consists of processing the multiplier from the most significant digit first to calculate Montgomery multiplication. Finally, the ordinary modular multiplication is based on shift-and-add multiplication. Each of these three operations is carried out through the iteration of simple operations such as shifts and additions/subtractions. To avoid carry propagation in all additions and subtractions, the radix-2 signed-digit representation is employed. A modular multiplier/divider based on the algorithm has a linear array structure with a bit-slice feature and carries out n-bit modular multiplication/division in O(n) clock cycles, where the length of the clock cycle is constant and independent of n. This multiplier/divider can be implemented using a hardware amount only slightly larger than that of the modular divider.
Absolutely Convergent Expansion of Hankel Functions for Sommerfeld Type Integral
Bin-Hao JIANG

LETTER-Electromagnetic Theory

Vol:
E88-C No:12
Page(s):
2377-2378
Generalized impedance boundary conditions are employed to simulate the effects of the parallel-stratified media on electromagnetic fields. Sommerfeld type integral contained in Hertz potential is expressed as the sum of two parts: zeroth order Hankel function and an absolutely convergent series expansion of spherical Hankel functions.
Perturbation Approach for Order Selections of Two-Sided Oblique Projection-Based Interconnect Reductions
Chia-Chi CHU Ming-Hong LAI Wu-Shiung FENG

LETTER

Vol:
E88-A No:12
Page(s):
3573-3576
An order selection scheme for two-sided oblique projection-based interconnect reduction will be investigated. It will provide a guideline for terminating the conventional nonsymmetric Pade via Lanczos (PVL) iteration process. By exploring the relationship of the system Grammians of the original network and those of the reduced network, it can be shown that the system matrix of the reduced-order system generated by the two-sided oblique projection can also be expressed as those of the original interconnect model with some additive perturbations. The perturbation matrix only involves bi-orthogonal vectors at the previous step of the nonsymmetric Lanczos algorithm. This perturbation matrix will provide the stopping criteria in the order selection scheme and achieve the desired accuracy of the approximate transfer function.
The Performance Analysis of NAT-PT and DSTM for IPv6 Dominant Network Deployment
Myung-Ki SHIN

LETTER-Internet

Vol:
E88-B No:12
Page(s):
4664-4666
NAT-PT and DSTM are becoming more widespread as de-facto standards for IPv6 dominant network deployment. But few researchers have empirically evaluated their performance aspects. In this paper, we compared the performance of NAT-PT and DSTM with IPv4-only and IPv6-only networks on user applications using metrics such as throughput, CPU utilization, round-trip time, and connect/request/response transaction rate.
High Quality and Low Complexity Speech Analysis/Synthesis Based on Sinusoidal Representation
Jianguo TAN Wenjun ZHANG Peilin LIU

LETTER-Speech and Hearing

Vol:
E88-D No:12
Page(s):
2893-2896
Sinusoidal representation has been widely applied to speech modification, low bit rate speech and audio coding. Usually, speech signal is analyzed and synthesized using the overlap-add algorithm or the peak-picking algorithm. But the overlap-add algorithm is well known for high computational complexity and the peak-picking algorithm cannot track the transient and syllabic variation well. In this letter, both algorithms are applied to speech analysis/synthesis. Peaks are picked in the curve of power spectral density for speech signal; the frequencies corresponding to these peaks are arranged according to the descending orders of their corresponding power spectral densities. These frequencies are regarded as the candidate frequencies to determine the corresponding amplitudes and initial phases according to the least mean square error criterion. The summation of the extracted sinusoidal components is used to successively approach the original speech signal. The results show that the proposed algorithm can track the transient and syllabic variation and can attain the good synthesized speech signal with low computational complexity.
FPGA Implementation of a Stereo Matching Processor Based on Window-Parallel-and-Pixel-Parallel Architecture
Masanori HARIYAMA Yasuhiro KOBAYASHI Haruka SASAKI Michitaka KAMEYAMA

PAPER-VLSI Architecture

Vol:
E88-A No:12
Page(s):
3516-3522
This paper presents a processor architecture for high-speed and reliable stereo matching based on adaptive window-size control of SAD (Sum of Absolute Differences) computation. To reduce its computational complexity, SADs are computed using images divided into non-overlapping regions, and the matching result is iteratively refined by reducing a window size. Window-parallel-and-pixel-parallel architecture is also proposed to achieve to fully exploit the potential parallelism of the algorithm. The architecture also reduces the complexity of an interconnection network between memory and functional units based on the regularity of reference pixels. The stereo matching processor is implemented on an FPGA. Its performance is 80 times higher than that of a microprocessor (Pentium4@2 GHz), and is enough to generate a 3-D depth image at the video rate of 33 MHz.

9441-9460hit(16314hit)

Keyword Search Result

[Keyword] SI(16314hit)

Trace-Driven Performance Simulation Modeling for Fast Evaluation of Multimedia Processor by Simulation Reuse

Multiplier Energy Reduction by Dynamic Voltage Variation

Large-Size Local-Domain Basis Functions with Phase Detour and Fresnel Zone Threshold for Sparse Reaction Matrix in the Method of Moments

Successive Pad Assignment for Minimizing Supply Voltage Drop

Contour-Based Window Extraction Algorithm for Bare Printed Circuit Board Inspection

A Simplified Illustration of Arbitrary DAC Waveform Effects in Continuous Time Delta-Sigma Modulators

Node Placement Algorithms in the Case that Routes are Design Variables in Shuffle-Like Multihop Lightwave Networks

Quality and Power Efficient Architecture for the Discrete Cosine Transform

An Asymptotic Relative Performance Measure for Signal Detectors Based on the Correlation Information of Statistics

A VLSI Array Processing Oriented Fast Fourier Transform Algorithm and Hardware Implementation

A Waveguide Broad-Wall Transverse Slot Linear Array with Reflection-Canceling Inductive Posts and Grating-Lobe Suppressing Parasitic Dipoles

Computational Methods for Surface Relief Gratings Using Electric and Magnetic Flux Expansions

Estimation of Surface Impedance for Inhomogeneous Half-Space Using Far Fields

Control of Total Transmission on Ferrite Edge-Mode Isolator

A Hardware Algorithm for Modular Multiplication/Division Based on the Extended Euclidean Algorithm

Absolutely Convergent Expansion of Hankel Functions for Sommerfeld Type Integral

Perturbation Approach for Order Selections of Two-Sided Oblique Projection-Based Interconnect Reductions

The Performance Analysis of NAT-PT and DSTM for IPv6 Dominant Network Deployment

High Quality and Low Complexity Speech Analysis/Synthesis Based on Sinusoidal Representation

FPGA Implementation of a Stereo Matching Processor Based on Window-Parallel-and-Pixel-Parallel Architecture

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles