Ankur SRIVASTAVA Chunhong CHEN Majid SARRAFZADEH
We propose a timing driven gate duplication algorithm for the technology independent phase. Our algorithm is a generalization of the gate duplication strategy suggested in [3]. Our technique gets a more global view by duplicating multiple gates at a time. We compare the minimum circuit delay obtained by SIS with the delay obtained by using our gate duplication. Results show that up to 11% improvement in delay can be obtained. Our algorithm does not have an adverse effect on the overall synthesis time, indicating that gate duplication is an efficient strategy for timing optimization.
Jung-Sik JEONG Kei SAKAGUCHI Jun-ichi TAKADA Kiyomichi ARAKI
This paper presents the performance of the Directionally Constrained Minimization of Power (DCMP) and the Zero-Forcing (ZF) in the Angular Spread (AS) environment. To obtain the optimal weights for both methods, the Extended Array Mode Vector (EAMV) is employed. It is known that the EAMV represents the instantaneous AS as well as the instantaneous DOA in the slow fading channel. As a result, it is shown that the DCMP and the ZF using the EAMV estimates can improve the Signal-to-Interference-plus-Noise Ratio (SINR) considerably, as compared with those using the Direction of Arrival (DOA) information only. At the same time, the intrinsic problems causing the performance loss in the DCMP and the ZF are revisited. From this, the reasons for the performance deterioration are analyzed, in relation with the AS, the number of samples, the number of antenna elements, and the spatial correlation coefficient of the signals. It follows that the optimal signal combining techniques using the EAMV estimates can diminish such effects.
A system level approach for a memory power reduction is proposed in this paper. The basic idea is allocating frequently executed object codes into a small subprogram memory and optimizing supply voltage and threshold voltage of the subprogram memory. Since large scale memory contains a lot of direct paths from power supply to ground, power dissipation caused by subthreshold leakage current is more serious than dynamic power dissipation. Our approach optimizes the size of subprogram memory, supply voltage, and threshold voltage so as to minimize memory power dissipation including static power dissipation caused by leakage current. A heuristic algorithm which determines code allocation, supply voltage, and threshold voltage simultaneously so as to minimize power dissipation of memories is proposed as well. Our experiments with some benchmark programs demonstrate significant energy reductions up to 80% over a program memory which does not employ our approach.
Katsushi TAKANO Satoshi TAOKA Masahiro YAMAUCHI Toshimasa WATANABE
We consider only P-invariants that are nonnegative integer vectors in this paper. An P-invariant of a Petri net N=(P,T,E,α,β) is a |P|-dimensional vector Y with Yt A = for the place-transition incidence matrix A of N. The support of an invariant is the set of elements having nonzero values in the vector. Since any invariant is expressed as a linear combination of minimal-support invariants (ms-invariants for short) with nonnegative rational coefficients, it is usual to try to obtain either several invariants or the set of all ms-invariants. The Fourier-Motzkin method (FM) is well-known for computing a set of invariants including all ms-invariants. It has, however, critical deficiencies such that, even if invariants exist, none of them may be computed because of memory overflow caused by storing candidate vectors for invariants and such that, even when a set of invariants are produced, many non-ms invariants may be included. We are going to propose the following two methods: (1) FM1_M2 that finds a smallest possible set of invariants including all ms-invariants; (2) STFM that necessarily produces one or more invariants if they exist. Experimental results are given to show their superiority over existing ones.
An automated analog circuit synthesis based on reuse of topological features of 'prototype circuits' is proposed. The prototype circuits are designed by humans and suggested to the synthesis system as hints of configurations of new analog circuits to be synthesized by the system. The connections of elements in analog circuits are not generally systematic, but they would have some similarities to a circuit which has similar behaviors or functionalities. In the proposed process, the information on circuit connections is stored as sub-circuits extracted from the prototype circuits. And then, genetic algorithm is used to search for an optimum combination of the sub-circuits that achieves the desired electronic specifications. The combinations of sub-circuits are performed with a novel technique where the terminals of the sub-circuits are shared. The capabilities of the proposed method are demonstrated through an example of the synthesis.
Kunihiko HIRAISHI Hirohide TANAKA
The legal firing sequence problem of Petri nets (LFS) is one of fundamental problems in the analysis of Petri nets, because it appears as a subproblem of various basic problems. Since LFS is shown to be NP-hard, various heuristics has been proposed to solve the problem of practical size in a reasonable time. In this paper, we propose a new algorithm for this problem. It is based on the partial order verification technique, and reduces redundant branches in the search tree. Moreover, the proposed algorithm can be combined with various types of heuristics.
Shigeru YAMASHITA Hiroshi SAWADA Akira NAGOYA
This paper presents a new framework for synthesizing look-up table (LUT) networks. Some of the existing LUT network synthesis methods are based on one or two functional (Boolean) decompositions. Our method also uses functional decompositions, but we try to use various decomposition methods, which include algebraic decompositions. Therefore, this method can be thought of as a general framework for synthesizing LUT networks by integrating various decomposition methods. We use a cost database file which is a unique characteristic in our method. We also present comparisons between our method and some well-known LUT network synthesis methods, and evaluate the final results after placement and routing. Although our method is rather heuristic in nature, the experimental results are encouraging.
Masanori HASHIMOTO Hidetoshi ONODERA
We propose a transistor sizing method that downsizes MOSFETs inside a cell to eliminate redundancy of cell-based circuits as much as possible. Our method reduces power dissipation of detail-routed circuits while preserving interconnects. The effectiveness of our method is experimentally evaluated using 3 circuits. The power dissipation is reduced by 75% maximum and 60% on average without delay increase. Compared with discrete cell sizing, the proposed method reduces power dissipation furthermore by 30% on average.
Tetsuya SHIMAMURA Colin F. N. COWAN
This paper proposes a non-linear adaptive algorithm, the amplitude banded RLS (ABRLS) algorithm, as an adaptation procedure for time variant channel equalizers. In the ABRLS algorithm, a coefficient matrix is updated based on the amplitude level of the received sequence. To enhance the capability of tracking for the ABRLS algorithm, a parallel adaptation scheme is utilized which involves the structures of decision feedback equalizer (DFE). Computer simulations demonstrate that the novel ABRLS based equalizer provides a significant improvement relative to the conventional RLS DFE on a rapidly time variant communication channel.
Muling GUO Madoka HASEGAWA Shigeo KATO Juichi MIYAMICHI
Reversible variable length codes (RVLCs), which make instantaneous decoding possible in both forward and backward directions, are exploited to code data stream in noisy enviroments. Because there is no redundancy in code words of RVLCs, RVLCs are suitable for very low bit-rate video coding. Golomb-Rice code, one of variable length code for infinite number of symbols, is widely used to encode exponentially distributed non-negative integers. We propose a reversible variable length code by modifying Golomb-Rice code, which is called parity check reversible Golomb-Rice code and abbreviated to P-RGR code. P-RGR code has the same code length distribution as GR code but can detect one-bit error in any arbitrary position of the code stream. The sets of P-RGR code words in both directions are identical so that they can be constructed by nearly the same algorithm. Furthermore, this paper also gives a general construction method for all instantaneously decodable RGR codes.
Osamu MIZUNO Daisuke SHIMODA Tohru KIKUNO Yasunari TAKAGI
This paper presents an enhancement of a software project simulator to perform risk prediction with cost estimation capability. So far, we have developed a software project simulator to simulate software development projects. In this simulator, a development process was described using Petri net model, and it was applied to some actual project data in a certain company successfully. On the other hand, we have also presented a risk predicting system to find "risky" projects by statistical analysis on risk questionnaire for project managers. In this approach, only the probability to be risky was calculated for a project. Thus, the managers in the company wanted to know a concrete proof why a software project becomes risky. In this paper, to present the proof that a software project becomes risky, we try to enhance the previous project simulator so that the simulator can deal with risk factors. To consider the risk factors, we modify the previous simulator so that both the fluctuation of skill level and the deadline pressure can be represented by the parameters in the simulator. By using a case study, we confirm that the enhanced simulator can estimate the development cost under some typical risks. As a result, we can expect that the simulator shows how much the development cost of a risky project exceeds an estimate.
Shuji TSUKIYAMA Masakazu TANAKA Masahiro FUKUI
In this paper, we present a new algorithm for statistical static timing analysis of a CMOS combinatorial circuit, which takes correlations into account to improve accuracy of the distribution of the maximum delay of the circuit. The correlations treated in the algorithm are not only the one between distributions of arrival times of input signals to a logic gate but also correlation between switching delays of a logic gate and correlation between interconnect delays of a net. We model each delay by a normal distribution, and use a normal distribution of two stochastic variables with a coefficient of correlation for computing the maximum of two delays. Since the algorithm takes the correlation into account, the time complexity is O(m2) in the worst-case, where m is the number of edges of the graph representing a given circuit. But, for real combinatorial circuits, the complexity is expected to be less than this.
Shingo YAMAGUCHI Yuko SHIODE Qi-Wei GE Minoru TANAKA
A workflow is a flow of work carried out by workers, and workflow management is to automate the flow of work. In workflow management, an actual work is carried out based on the workflow, which is called case. In order to effectively meet various requirements, it is necessary to change current workflow dynamically, which is called dynamic workflow change. When the dynamic change is required, there exist cases in the workflow. In order to handle these cases and further to keep the queuing order, the dynamic change takes period of time (called transient time) until the changed workflow becomes steady state again. During the transient time, workers are forced to do irregular work, and therefore it is important to clarify if a change type takes shorter transient time. In this paper, we do the performance evaluation on transient time of dynamic workflow changes. To do so, we first give a definition of transient time, and then propose methods of computing transient time of three change types proposed by Ellis et al. Finally, we do the performance evaluation for 90 dynamic changes by computing the transient times.
Nozomu TOGAWA Takashi SAKURAI Masao YANAGISAWA Tatsuo OHTSUKI
This letter proposes a hardware/software partitioning algorithm for digital signal processor cores with two register files. Given a compiled assembly code and a timing constraint of execution time, the proposed algorithm generates a processor core configuration with a new assembly code running on the generated processor core. The proposed algorithm considers two register files and determines the number of registers in each of register files. Moreover the algorithm considers two or more types of functional units for each arithmetic or logical operation and assigns functional units with small area to a processor core without causing performance penalty. A generated processor core will have small area compared with processor cores which have a single register file or those which consider only one type of functional units for each operation. The experimental results demonstrate the effectiveness and efficiency of the proposed algorithm.
Jun ZHAO Fred J. MEYER Nohpill PARK Fabrizio LOMBARDI
We examine diagnosis of processor array systems formed as two-dimensional grids, with boundaries, and either four or eight neighbors for each interior processor. We employ a parallel test schedule. Neighboring processors test each other and report the results. Our diagnostic objective is to find a fault-free processor or set of processors. The system may then be sequentially diagnosed by repairing those processors tested faulty according to the identified fault-free set. We establish an upper bound on the maximum number of faults that can be sustained without invalidating the test results under worst case conditions. We give test schedules and diagnostic algorithms that meet the upper bound as far as the highest order term. We compare these near optimal diagnostic algorithms to alternative algorithms--both new and already in the literature.
Khaled MAHMUD Kaiji MUKUMOTO Akira FUKUDA
This paper presents a variable rate transmission scheme suitable for bandlimited meteor burst channel. Meteor Burst Communication (MBC) is a unique type of radio communication, which is primarily used for non-realtime remote data collection. In the paper, along with conventional BPSK and QPSK modulations, QAM and M-ary Bi-orthogonal modulations are analyzed for software modem implementation in an MBC system. Performance of the modulation methods is presented for both static AWGN channel and meteor burst channel. The proposed scheme for variable rate transmission dynamically estimates the MBC channel and varies the modulation type of a software modem, to control the transmission rate between bursts. The scheme dynamically selects a modulation type and packet length that will maximize the average throughput of the system. Performance of the scheme is analyzed and compared with conventional fixed rate modems. A practical implementation for software modem is suggested that uses a common core modulator/demodulator structure.
Makoto ISHII Tomokazu SHIGA Kiyoshi IGARASHI Shigeo MIKOSHIBA
A priming effect is studied for a three-electrode, surface-discharge AC-PDP, which has stripe barrier ribs of 0.22 mm pitch. It was found that by keeping the interval between the reset and address pulses within 24 µs, the data pulse voltage can be reduced while the data pulse width can be narrowed due to the priming effect. By adopting the primed addressing technique to the PDP, the data pulse voltage was reduced to 20 V when the data and scan pulse widths were 1 µs. Alternatively, the data pulse width could be narrowed to 0.33 µs when the data pulse voltage was 56 V. 69% of the TV field time could be assigned for the display periods with 12 sub-fields, assuring high luminance display.
Boon-Keat TAN Ryuji YOSHIMURA Toshimasa MATSUOKA Kenji TANIGUCHI
This paper describes a new architecture-based microprocessor, a dynamically programmable parallel processor (DPPP), that consists of large numbers of simplified ALUs (sALU) as processing blocks. All sALUs are interconnected via a code division multiple-access bus interface that provides complete routing flexibility by establishing connections virtually through code-matching instead of physical wires. This feature is utilized further to achieve high parallelism and fault tolerance. High fault tolerance is realized without the limitations of conventional fabrication-based techniques nor providing spare elements. Another feature of the DPPP is its simple programmability, as it can be configured by compiling numerical formula input using the provided user auto-program interface. A prototype chip based on the proposed architecture has been implemented on a 4.5 mm 4.5 mm chip using 0.6 µm CMOS process.
Tianxu ZHAO Yue HAO Yong-Chang JIAO
An optimal allocation model for the sub-processing-element (sub-PE) level redundancy is developed, which is solved by the genetic algorithms. In the allocation model, the average defect density D and the parameter δ are also considered in order to accurately analyze the element yield, where δ is defined as the ratio of the support circuit area to the total area of a PE. When the PE's area is imposed on the constraint, the optimal solutions of the model with different D and δ are calculated. The simulation results indicate that, for any fixed average defect density D, both the number of the optimal redundant sub-circuit added into a PE and the PE's yield decrease as δ increases. Moreover, for any fixed parameter δ, the number of the optimal redundant sub-circuit increases, while the optimal yield of the PE decreases, as D increases.
Kiyotaka YAMAMURA Takayoshi KUMAKURA Yasuaki INOUE
Recently, an efficient algorithm has been proposed for finding all solutions of systems of nonlinear equations using inverses of approximate Jacobian matrices. In this letter, an effective technique is proposed for improving the computational efficiency of the algorithm with a little bit of computational effort.