Toshiki KANAMOTO Shigekiyo AKUTSU Tamiyo NAKABAYASHI Takahiro ICHINOMIYA Koutaro HACHIYA Atsushi KUROKAWA Hiroshi ISHIKAWA Sakae MUROMOTO Hiroyuki KOBAYASHI Masanori HASHIMOTO
In this letter, we discuss the impact of intrinsic error in parasitic capacitance extraction programs which are commonly used in today's SoC design flows. Most of the extraction programs use pattern-matching methods which introduces an improvable error factor due to the pattern interpolation, and an intrinsically inescapable error factor from the difference of boundary conditions in the electro-magnetic field solver. Here, we study impact of the intrinsic error on timing and crosstalk noise estimation. We experimentally show that the resulting delay and noise estimation errors show a scatter which is normally distributed. Values of the standard deviations will help designers consider the intrinsic error compared with other variation factors.
Akira TSUCHIYA Masanori HASHIMOTO Hidetoshi ONODERA
This paper proposes a method to determine a single frequency for interconnect RL extraction. Resistance and inductance of interconnects depend on frequency, and hence the extraction frequency strongly affects the modeling accuracy of interconnects. The proposed method determines an extraction frequency based on the transfer characteristic of interconnects. By choosing the frequency where the transfer characteristic becomes maximum, the extracted RL values achieve the accurate modeling of the waveform. Experimental results show that the proposed method provides accurate transition waveforms over various interconnect topologies.
Kazunori SHIMIZU Tatsuyuki ISHIKAWA Nozomu TOGAWA Takeshi IKENAGA Satoshi GOTO
In this paper, we propose a power-efficient LDPC decoder architecture based on an accelerated message-passing schedule. The proposed decoder architecture is characterized as follows: (i) Partitioning a pipelined operation not to read and write intermediate messages simultaneously enables the accelerated message-passing schedule to be implemented with single-port SRAMs. (ii) FIFO-based buffering reduces the number of SRAM banks and words of the LDPC decoder based on the accelerated message-passing schedule. The proposed LDPC decoder keeps a single message for each non-zero bit in a parity check matrix as well as a classical schedule while achieving the accelerated message-passing schedule. Implementation results in 0.18 [µm] CMOS technology show that the proposed decoder architecture reduces an area of the LDPC decoder by 43% and a power dissipation by 29% compared to the conventional architecture based on the accelerated message-passing schedule.
Takashi ISHIDA Masayuki GOTO Toshiyasu MATSUSHIMA Shigeichi HIRASAWA
Recently, a word-valued source has been proposed as a new class of information source models. A word-valued source is regarded as a source with a probability distribution over a word set. Although a word-valued source is a nonstationary source in general, it has been proved that an entropy rate of the source exists and the Asymptotic Equipartition Property (AEP) holds when the word set of the source is prefix-free. However, when the word set is not prefix-free (non-prefix-free), only an upper bound on the entropy density rate for an i.i.d. word-valued source has been derived so far. In this paper, we newly derive a lower bound on the entropy density rate for an i.i.d. word-valued source with a finite non-prefix-free word set. Then some numerical examples are given in order to investigate the behavior of the bounds.
Win-Bin HUANG Alvin W. Y. SU Yau-Hwang KUO
Set Partitioning in Hierarchical Trees (SPIHT) is a highly efficient technique for compressing Discrete Wavelet Transform (DWT) decomposed images. Though its compression efficiency is a little less famous than Embedded Block Coding with Optimized Truncation (EBCOT) adopted by JPEG2000, SPIHT has a straight forward coding procedure and requires no tables. These make SPIHT a more appropriate algorithm for lower cost hardware implementation. In this paper, a modified SPIHT algorithm is presented. The modifications include a simplification of coefficient scanning process, a 1-D addressing method instead of the original 2-D arrangement of wavelet coefficients, and a fixed memory allocation for the data lists instead of a dynamic allocation approach required in the original SPIHT. Although the distortion is slightly increased, it facilitates an extremely fast throughput and easier hardware implementation. The VLSI implementation demonstrates that the proposed design can encode a CIF (352288) 4:2:0 image sequence with at least 30 frames per second at 100-MHz working frequency.
Toshiyuki MIYAMOTO Yasuhiro MORITA Sadatoshi KUMAGAI
Secret sharing is a method for distributing a secret among a party of participants. Each of them is allocated a share of the secret, and the secret can only be reconstructed when the shares are combined together. We have been proposing a secret sharing distributed database system (SSDDB) that uses a secret sharing scheme to improve confidentiality and robustness of distributed database systems. This paper proposes a vertical partitioning algorithm for the SSDDB, and evaluates the algorithm by computational experiments.
Junichi AKITA Hiroaki TAKAGI Takeshi NAGASAKI Masashi TODA Toshio KAWASHIMA Akio KITAGAWA
Rapid eye motion, or so called saccade, is a very quick eye motion which always occurs regardless of our intention. Although the line of sight (LOS) with saccade tracking is expected to be used for a new type of computer-human interface, it is impossible to track it using the conventional video camera, because of its speed which is often up to 600 degrees per second. Vision Chip is an intelligent image sensor which has the photo receptor and the image processing circuitry on a single chip, which can process the acquired image information by keeping its spatial parallelism. It has also the ability of implementing the very compact integrated vision system. In this paper, we describe the vision chip architecture which has the capability of detecting the line of sight from infrared eye image, with the processing speed supporting the saccade tracking. The vision chip described here has the pixel parallel processing architecture, with the node automata for each pixel as image processing. The acquired image is digitized to two flags indicating the Purkinje's image and the pupil by comparators at first. The digitized images are then shrunk, followed by several steps of expanding by node automata located at each pixel. The shrinking process is kept executed until all the pixels disappear, and the pixel disappearing at last indicates the center of the Purkinje's image and the pupil. This disappearing step is detected by the projection circuitry in pixel circuit for fast operation, and the coordinates of the center of the Purkinje's image and the pupil are generated by the simple encoders. We describe the whole architecture of this vision chip, as well as the pixel architecture. We also describe the evaluation of proposed algorithm with numerical simulation, as well as processing speed using FPGA, and improvement in resolution using column parallel architecture.
Homayoon ORAIZI Mahdi MORADIAN Kazuhiro HIRASAWA
In this paper a new method for the design and optimization of microstrip parallel coupled-line bandpass filters is presented which allows for the specification of frequency bandwidths and arbitrary source and load impedance transformation. The even- and odd-mode theory and the relationships between impedance, transmission and scattering matrices and their properties are used to construct a positive definite error function using the insertion losses at discrete frequencies in the pass, transition and stop bands. The dispersion relations for the coupled line are also taken into account. The minimization of the error function determines the widths, gap spacings and lengths of the coupled-line filter, for the optimum design and realization of filter specifications. The proposed filter design and optimization method is coded by computer programs and the results of simulation, fabrication and testing of sample filters together with comparisons with available full-wave analysis softwares, indicate the efficacy of the proposed method. Filter design with up to 50% bandwidth and the design of shorter lengths of coupled line sections are achievable by the proposed method in part due to the incorporation of impedance matching.
Atsuo OZAKI Masashi SHIRAISHI Shusuke WATANABE Minoru MIYAZAWA Masakazu FURUICHI Hiroyuki SATO
In computer simulation of a large number of moving objects (MOs), how to enlarge Δt (the interval between the simulation time steps) without introducing causality errors is one of the primary keys to enhancing performance. Causality errors can be avoided by using the same Δt among related MOs when they are in the scene of detection (SoD). But in a large-scale MO simulation, MOs interact with one another in a complicated manner requiring a large calculation cost to predict the beginning time of SoD. In this paper we propose an event-aware dynamic time step synchronization method (DTSS) for distributed MO simulation, which increases Δt without introducing causality errors and speeds up the simulation. DTSS can be implemented with little calculation cost because: (1) DTSS does not calculate the beginning time of SoD exactly, but calculates the time for possible entry into SoD with a simple mechanisim, and (2) MO simulation consists of a "movement"-phase and a "detection"-phase in which the distance-calculation between MOs requires a heavy load, and DTSS utilizes the distance values to calculate Δt. In this paper, we also discuss a suitable HLA based time management mechanism to implement DTSS on a distributed computing environment. In the performance evaluation of DTSS, the calculation cost of DTSS is implemented by using the HLA suitable time management mechanism. The results show that DTSS can be executed within the ideal time plus its 1% over-cost when a basic scenario of war-game simulation is employed. Therefore if the ratio of SoD to the total simulation is small, the execution time is expected to decrease to nearly this ratio. We also introduce the criterion for determining when DTSS is superior to the conventional method by using the performance evaluation results. The results presented in this paper are effectively utilized when DTSS is applied to practical applications.
Toru SHIMIZU Masami NAKAJIMA Masahiro KAINAGA
This paper describes the design and evaluation of a massively parallel processor base on Matrix architecture which is suitable for portable multimedia applications. The proposed architecture in this paper achieves 40 GOPS of 16-bit fixed-point additions at 200 MHz clock frequency and 250 mW power dissipation. In addition, 1 M-bit SRAM for data registers and 2,048 2-bit processing elements connected by a flexible switching network are integrated in 3.1 mm2 in 90 nm low-power CMOS technology. The energy-efficient Matrix architecture supports 2,048-way parallel operations and the programmable functions required for multimedia SoCs.
Shingo YAMAGUCHI Tomohiro TAKAI Tatsuya WATANABE Qi-Wei GE Minoru TANAKA
This paper deals with computation of parallel degree, PARAdeg, for (dataflow) program nets with SWITCH-nodes. Ge et al. have given the definition of PARAdeg and an algorithm of computing PARAdeg for program nets with no SWITCH-nodes. However, for program nets with SWITCH-nodes, any algorithm of computing PARAdeg has not been proposed. We first show that it is intractable to compute PARAdeg for program nets with SWITCH-nodes. To do this, we define a subclass of program nets with SWITCH-nodes, named structured program nets, and then show that the decision problem related to compute PARAdeg for acyclic structured program nets is NP-complete. Next, we give a heuristic algorithm to compute PARAdeg for acyclic structured program nets. Finally, we do experiments to evaluate our heuristic algorithm for 200 acyclic structured program nets. We can say that the heuristic algorithm is reasonable, because its accuracy is more than 96% and the computation time can be greatly reduced.
Junichi MIYAKOSHI Yuichiro MURACHI Tomokazu ISHIHARA Hiroshi KAWAGUCHI Masahiko YOSHIMOTO
For super-parallel video processing, we proposed a power- and area-efficient SRAM core architecture with a segmentation-free access, which means accessibility to arbitrary consecutive pixels, and horizontal/vertical access. To achieve these flexible accesses, a spirally-connected local-wordline select signal and multi-selection scheme in wordlines are proposed, so that extra X-decoders in the conventional multi-division SRAM can be eliminated. Consequently, the proposed SRAM reduces a power and area by 57-60% and 60%, respectively, when it is applied to a 128 parallel architecture. The proposed 160-kbit SRAM with 16-read ports (2-read port SRAM with eight-parallel architecture) is implemented to a search window buffer for an H.264 motion estimation processor core which dissipates 800 µW for QCIF 15-fps in a 130-nm technology.
In this paper, we present a new low-cost concurrent error detection (CED) S-Box architecture for the Advanced Encryption Standard (AES). Because the complexity and the nonlinearity, it is difficult to develop error detection algorithms for the S-Box. Conventionally, a parity checked S-Box is implemented with ROM (read only memory). In some applications, for example, smart cards, both chip size and fault detection are demanded seriously. ROM-based parity checking cannot meet the demands. We propose our CED S-Box (CEDSB) architecture for two reasons. The first is to design a S-Box without ROM. The second is to obtain a compact S-Box with real time error detection. Based on the composite field, we develop the CEDSB architecture to implement the fault detection for the S-Box. The overhead of the CED for the S-Boxes in GF((24)2) and in GF(((22)2)2) are 152 and 132 NAND gates respectively. The amount of extra gates used for the CEDSB is nearly equal to that of the ROM-based CED S-Box (131 NAND gates). The chip area of the ROM-based CED S-Box, the CEDSBs in GF((24)2), and in GF(((22)2)2) are 2996, 558, and 492 NAND gates separately. The chip area of the CEDSB is more compact than a ROM-based CED S-Box.
Yuuki FUNAHASHI Shogo USAMI Ichi TAKUMI Masayasu HATA
We have researched high dimensional parity-check (HDPC) codes that give good performance over a channel that has a very high error rate. HDPC code has a little coding overhead because of its simple structure. It has hard-in, maximum detected bit flipping (MDBF) decoding that has reasonable decoding performance and computational cost. In this paper, we propose a modified algorithm for MDBF decoding and compare the proposed MDBF decoding with conventional hard-in decoding.
Gou HOSOYA Hideki YAGI Toshiyasu MATSUSHIMA Shigeichi HIRASAWA
We study a modification method for constructing low-density parity-check (LDPC) codes for solid burst erasures. Our proposed modification method is based on a column permutation technique for a parity-check matrix of the original LDPC codes. It can change the burst erasure correction capabilities without degradation in the performance over random erasure channels. We show by simulation results that the performance of codes permuted by our method are better than that of the original codes, especially with two or more solid burst erasures.
Mohammad DANESH Farid SHEIKHOLESLAM Mehdi KESHMIRI
Consideration of manipulator dynamics and external disturbances in robot control system design can enhance the stability and performance properties of the whole system. In this paper, we present an approach to solve the control problem when the inertia parameters of robot are unknown, and at the same time robot is subjected to external force disturbances. This approach is based on simultaneous estimation of force signal and inertia parameters and utilizing them in the control law. The update laws and the control law are derived based on a single time-varying Lyapunov function, so that the global convergence of the tracking error is ensured. A theorem with a detailed proof is presented to guarantee the global uniform asymptotic stability of the whole system. Some simulations are made for a number of external forces to illustrate the effectiveness of the proposed approach.
In this paper, we propose two techniques to solve the nonlinear constrained optimization problem in large scale mesh-interconnected system. The first one is a diagram-method-based decomposition technique which decomposes the large scale system into some small subsystems. The second technique is a projected-Jacobi-based parallel dual-type method which can solve the optimization problems in the decomposed subsystems efficiently. We have used the proposed algorithm to solve numerous examples of large scale constrained optimization problems in power system. The test results show that the proposed algorithm has computational efficiency with respect to the conventional approach of the centralized Newton method and the state-of-the-art Block-Parallel Newton method.
Chen ZHENG Noriaki MIYAZAKI Toshinori SUZUKI
Effective and simply realizable rate compatible low-density parity-check (LDPC) codes are proposed. A parity check matrix is constructed with the progressively increased column weights (PICW) order and adopted to achieve a punctured LDPC coding scheme for a wide range of the code rates of the rate compatible systems. Using the proposed rate compatible punctured LDPC codes, low complex adaptive communication systems, such as wireless communication systems, can be achieved with the reliable transmissions.
Kenta KASAI Yuji SHIMOYAMA Tomoharu SHIBUYA Kohichi SAKANIWA
Multi-Edge type Low-Density Parity-Check codes (MET-LDPC codes) introduced by Richardson and Urbanke are generalized LDPC codes which can be seen as LDPC codes obtained by concatenating several standard (ir)regular LDPC codes. We prove in this paper that MET-LDPC code ensembles possess a certain symmetry with respect to their Average Coset Weight Distributions (ACWD). Using this symmetry, we drive ACWD of MET-LDPC code ensembles from ACWD of their constituent ensembles.
EunJung CHANG HoYeol KWON John M. CIOFFI
Tone Injection (TI) can reduce the high peak-to-average ratio (PAR) which can substantially limit the performance of multicarrier systems without bandwidth loss. However, TI results in peak regrowth since it does not consider second peaks which can be higher than the peak after performing TI and also the average transmit power is increased because of huge constellation. In this paper, a no-rate loss PAR reduction technique, Injected Tone Constellation (ITC), is proposed along with an iterative algorithm to achieve the performance increase and to minimize the average transmit power without high complexity.