Moritoshi YASUNAGA Ikuo YOSHIHARA Jung Hwan KIM
In this paper, we propose the evolutionary algorithm-based reasoning system and its design methodology. In the proposed design methodology, reasoning rules behind the past cases in each task (in each case database) are extracted through genetic algorithms and are expressed as truth tables (we call them 'evolved truth tables'). Circuits for the reasoning systems are synthesized from the evolved truth tables. Parallelism in each task can be embedded directly in the circuits by the hardware implementation of the evolved truth tables, so that the high speed reasoning system with small or acceptable hardware size is achieved. We developed a prototype system using Xilinx Virtex FPGA chips and applied it to the gene boundary reasoning (GBR) and English pronunciation reasoning (EPR), which are very important practical tasks in the genome science and language processing field, respectively. The GBR and the EPR prototype systems are evaluated in terms of the reasoning accuracy, circuit size, and processing speed, and compared with the conventional approaches in the parallel AI and the artificial neural networks. Fault injection experiments are also carried out using the prototype system, and its high fault-tolerance, or graceful degradation against defective circuits that suits to the hardware implementation using wafer scale LSIs is demonstrated.
Maria-Cristina MARINESCU Martin RINARD
This paper describes a novel approach to high-level synthesis of complex pipelined circuits, including pipelined circuits with feedback. This approach combines a high-level, modular specification language with an efficient implementation. In our system, the designer specifies the circuit as a set of independent modules connected by conceptually unbounded queues. Our synthesis algorithm automatically transforms this modular, asynchronous specification into a tightly coupled, fully synchronous implementation in synthesizable Verilog.
A dual-offset microstrip-fed slot antenna having large bandwidth studied in this paper. The proposed antenna is analyzed by the Finite Difference Time Domain (FDTD) method. In this case, two offsets and other design parameters of the antenna lead to the good impedance matching over a wide frequency band. The experimental bandwidth is approximately 1.587 octave (-10 dB S11). And the experimented data for the impedance loci, the radiation patterns, and gain of the antenna are also described. The measured results are relatively in good agreement with the FDTD results.
Distributed domino effect-free checkpointing techniques can be divided into two categories: coordinated and communication-induced checkpointing. The former is inappropriate for mobile computing systems because it either forces every mobile host to take a new checkpoint or blocks the underlying computation during the checkpointing process. The latter makes every mobile host take the checkpoint independently. However, each mobile host may need to store multiple local checkpoints in stable storage. This investigation presents a novel three level synchronous checkpointing algorithm that combines the advantages of above two methods for mobile computing systems. The algorithm utilizes pre-synchronization, quasi-synchronization, and post-synchronization techniques and has the following merits: (1) Consistent global checkpoints can be ensured. (2) No mobile host is blocked during checkpointing. (3) Only twice the checkpoint size is required. (4) Power consumption is low. (5) The disconnection problem of mobile hosts can be resolved. (6) Very few mobile hosts in doze mode are disturbed. (7) It is simple and easy to implement. The proposed algorithm's numerical results are also provided in this work for comparison. The comparison reveals that our algorithm outperforms other algorithms in terms of checkpoint overhead, maintained checkpoints, power consumption, and disturbed mobile hosts.
Jun ZHAO Fred J. MEYER Nohpill PARK Fabrizio LOMBARDI
We examine diagnosis of processor array systems formed as two-dimensional grids, with boundaries, and either four or eight neighbors for each interior processor. We employ a parallel test schedule. Neighboring processors test each other and report the results. Our diagnostic objective is to find a fault-free processor or set of processors. The system may then be sequentially diagnosed by repairing those processors tested faulty according to the identified fault-free set. We establish an upper bound on the maximum number of faults that can be sustained without invalidating the test results under worst case conditions. We give test schedules and diagnostic algorithms that meet the upper bound as far as the highest order term. We compare these near optimal diagnostic algorithms to alternative algorithms--both new and already in the literature.
Vasily G. MOSHNYAGA Hiroshi TSUJI
Due to a large capacitance and enormous access rate, caches dissipate about a third of the total energy consumed by today's processors. In this paper we present a new architectural technique to reduce energy consumption in caches. Unlike previous approaches, which have focused on lowering cache capacitance and the number of accesses, our method exploits a new freedom in cache design, namely the voltage per access. Since in modern block-buffered caches, the loading capacitance operated on block-hit is much less than the capacitance operated on miss, the given clock cycle time is inefficiently utilized during the hit. We propose to trade-off this unused time with the supply voltage, lowering the voltage level on the hit and increasing it on the miss. Experiments show that the approach can half the cache energy dissipation without large performance and area overhead.
A plasma display panel (PDP) represents gray levels by the pulse number modulation technique that results in undesirable dynamic false contours on moving images. Among the various techniques proposed for the reduction of dynamic false contours, the optimization of the subfield pattern can be most easily implemented without the need for any additional dedicated hardware or software. In this paper, a systematic method for selecting the optimum subfield pattern is presented. In the proposed method, a subfield pattern that minimizes the quantitative measure of the dynamic false contour on the predefined test image is selected as the optimum pattern. The selection is made by repetitive calculations based on a genetic algorithm. Quantitative measure of the dynamic false contour calculated by simulation on the test image serves as a criterion for minimization by the genetic algorithm. In order to utilize the genetic algorithm, a structure of a string is proposed to satisfy the requirements for the subfield pattern. Also, three genetic operators for optimization, reproduction, crossover, and mutation, are specially designed for the selection of the optimum subfield pattern.
Muling GUO Madoka HASEGAWA Shigeo KATO Juichi MIYAMICHI
Reversible variable length codes (RVLCs), which make instantaneous decoding possible in both forward and backward directions, are exploited to code data stream in noisy enviroments. Because there is no redundancy in code words of RVLCs, RVLCs are suitable for very low bit-rate video coding. Golomb-Rice code, one of variable length code for infinite number of symbols, is widely used to encode exponentially distributed non-negative integers. We propose a reversible variable length code by modifying Golomb-Rice code, which is called parity check reversible Golomb-Rice code and abbreviated to P-RGR code. P-RGR code has the same code length distribution as GR code but can detect one-bit error in any arbitrary position of the code stream. The sets of P-RGR code words in both directions are identical so that they can be constructed by nearly the same algorithm. Furthermore, this paper also gives a general construction method for all instantaneously decodable RGR codes.
Naotake KAMIURA Takashi KODERA Nobuyuki MATSUI
In this paper we propose a MIN (Multistage Interconnection Network) whose performance in the faulty case degrades as gracefully as possible. We focus on a two-dilated baseline network as a sort of MIN. The link connection pattern in our MIN is determined so that all the available paths established between an input terminal and an output terminal via an identical input of a SE (Switching Element) in some stage will never pass through an identical SE in the next stage. Extra links are useful in improving the performance of the MIN and do not complicate the routing scheme. There is no difference between our MIN and others constructed from a baseline network with regard to numbers of links and cross points in all SEs. The theoretical computation and simulation-based study show that our MIN is superior to others in performance, especially in robustness against concentrated SE faults in an identical stage.
Boon-Keat TAN Ryuji YOSHIMURA Toshimasa MATSUOKA Kenji TANIGUCHI
This paper describes a new architecture-based microprocessor, a dynamically programmable parallel processor (DPPP), that consists of large numbers of simplified ALUs (sALU) as processing blocks. All sALUs are interconnected via a code division multiple-access bus interface that provides complete routing flexibility by establishing connections virtually through code-matching instead of physical wires. This feature is utilized further to achieve high parallelism and fault tolerance. High fault tolerance is realized without the limitations of conventional fabrication-based techniques nor providing spare elements. Another feature of the DPPP is its simple programmability, as it can be configured by compiling numerical formula input using the provided user auto-program interface. A prototype chip based on the proposed architecture has been implemented on a 4.5 mm 4.5 mm chip using 0.6 µm CMOS process.
Chawalit HONSAWEK Kazuhito ITO Tomohiko OHTSUKA Trio ADIONO Dongju LI Tsuyoshi ISSHIKI Hiroaki KUNIEDA
In this paper, a LSI design for video encoder and decoder for H.263+ video compression is presented. LSI operates under clock frequency of 27 MHz to compress QCIF (176144 pixels) at the frame rate of 30 frame per second. The core size is 4.6 4.6 mm2 in a 0.35 µm process. The architecture is based on bus connected heterogeneous dedicated modules, named as System-MSPA architecture. It employs the fast and small-chip-area dedicated modules in lower level and controls them by employing the slow and flexible programmable device and an external DRAM. Design results in success to achieve real time encoder in quite compact size without losing flexibility and expand ability. Real time emulation and easy test capability with external PC is also implemented.
Kiyotaka YAMAMURA Takayoshi KUMAKURA Yasuaki INOUE
Recently, an efficient algorithm has been proposed for finding all solutions of systems of nonlinear equations using inverses of approximate Jacobian matrices. In this letter, an effective technique is proposed for improving the computational efficiency of the algorithm with a little bit of computational effort.
First, we give a graph-theoretic formalization for the spare assignment problems for two cases of reconfiguring NN mesh-connected processor arrays with spares on a diagonal line in the array or two orthogonal lines at the edges of the array. Second, we discuss the problems for minimizing the numbers of "dangerous processors" for the cases. Here, a dangerous processor is a nonfaulty one for which there remains no spare processor to be assigned if it becomes faulty, without modifying the spare assignments to other faulty processors. The problem for the latter case, originally presented by Melhem, has already been discussed and solved by the O(N2) algorithm in [3], but it's procedure is very complicated. Using the above graph-theoretic formalization, we give efficient plain algorithms for minimizing the numbers of dangerous processors by which the problems for both the cases can be solved in O(N) time.
Today software piracy is a major concern to electronic commerce since a digitized product such as software is vulnerable to redistribution and unauthorized use. This paper presents an enhanced electronic software distribution and software protection model. Authentication scheme of the proposed model is based on zero-knowledge (ZK) proof which requires limited computation. The proposed model considers post installation security using authentication agent. It prevents software piracy and illegal copy. It also provides secure and efficient software live-update mechanism based on traitor tracing scheme. Even if software or personal key is copied illegally, a merchant can trace back to its original owner from the electronic license and personal key. The proposed model provides security and reasonable performance and safety.
Hiroaki SUZUKI Hiroyuki KAWAI Hiroshi MAKINO Yoshio MATSUDA
A VLIW (Very Long Instruction Word) architecture with a new code compaction method has been proposed. For a 3D-geometry processor, we consider two types of 2-issue VLIW architectures, the floating-point execution accelerating VLIW (FP-VLIW) and the data-move enhancing VLIW (MV-VLIW) architectures, as expansions of a Single-Streaming Single Instruction, Multiple Data (SS-SIMD) architecture. To solve the code bloat problem which is common to VLIW architectures, the proposed method makes it possible to compact original codes into the VLIW codes by software tools and decompact the VLIW codes by a simple hardware decompactor composed of an instruction swap circuit on a chip. Speeds and code densities of the two VLIWs with the code compaction are compared to the SS-SIMD with the same instruction set and the same building blocks. The FP-VLIW shows the fastest speed performance in the evaluation results of the viewperf CDRS-03 benchmark programs. It is 36% faster than the SS-SIMD used as reference. The proposed compaction method keeps the 95% code density of the SS-SIMD. One test program shows that the code density of the MV-VLIW is higher than that of the SS-SIMD. This result demonstrates that the merit of compacting nops can be greater than the VLIW penalty. The FP-VLIW architecture with the code compaction achieves 1.36 times the speed performance without significant code-density deterioration.
Yuchun MA Xianlong HONG Sheqin DONG Yici CAI Chung-Kuan CHENG Jun GU
Boundary Constraints of VLSI floorplanning require a set of blocks to be placed along the boundaries of the chip. Thus, this set of blocks can be adjacent to I/O pads for external communication. Furthermore, these blocks are kept away from the central area so that they do not form blockage for internal routing. In the paper, we devise an algorithm of VLSI floorplanning with boundary constraints using a Corner Block List (CBL) representation. We identify the necessary and sufficient conditions of the CBL representation for the boundary constraints. We design a linear time approach to scan the conditions and formulate a penalty function to punish the constraint violation. A simulated annealing process is adopted to optimize the floorplan. Experiments on MCNC benchmarks show promising results.
An advanced spare-connection scheme for K-out-of-N redundancy is proposed for constructing fault-tolerant ring- or toroidal mesh-connected processing-node arrays able to enhance emulation of binary hypercubes by using bypass networks. With this scheme, a component redundancy configuration for a base array with a fixed number of primary nodes, such as that for 8-node ring or 32-node toroidal mesh, can be constructed by using bypass links with a segmented bus structure to selectively connect the primary nodes to a spare node in parallel. These bypass links are allocated to the primary nodes by graph-node coloring with a minimum inter-node distance of three in order to use the bypass links as the hypercube connections as well as to attain strong fault tolerance for reconfiguring the base array with the primary network topology. An extended redundancy configuration for a large fault-tolerant array can be constructed by connecting the component configurations by using external switches of a hub type provided at the bus nodes of the bypass links. This configuration has a network topology of the parallel star-connections of sub-hypercubes whose diameter is smaller than that of the regular hypercube.
Petri net is a mathematical model for concurrent systems. Liveness is one of important properties of Petri net. Liveness problem of general Petri net is of exponential space complexity and subclasses are suggested with less computational complexity. It is well known that liveness problem of bounded (extended) free choice net is solved in deterministic polynomial time. This paper treats liveness problem of AC/DC nets. AC/DC net is a subclass of Petri net that exhibits no confusion (mixture of concurrency and conflict). This class properly includes the class of free choice nets. It is shown that every minimal siphon of an AC/DC net is trap if and only if every strongly connected siphon is a trap. This result shows that monotone liveness of bounded AC/DC net is solved in deterministic polynomial time. It is shown that this result is true of bounded time AC/DC net with static fair condition.
Jianqing WANG Hideaki SEKO Osamu FUJIWARA Toshio NOJIMA
A multi-grid finite-difference time-domain (FDTD) method was applied for numerical dosimetry analysis in the human head for 5 GHz band portable terminals. By applying fine FDTD grids to the volumes in the human head where the highest electromagnetic (EM) absorption occurs and coarse grids to the remaining volumes of the head, the spatial peak specific absorption rate (SAR) assessment was achieved with a less computation memory and time. The accuracy of applying the multi-grid FDTD method to the spatial peak SAR assessment was checked in comparison with the results obtained from the usual uniform-grid method, and then the spatial peak SARs for three typical situations of a person using a 5.2 GHz band portable terminal were calculated in conjunction with an anatomically based human head model.
Hiroshi MATSUURA Makoto TAKANO
A distributed network management system (NMS) is urgently needed to manage large number of network managed objects (MOs) such as public network MOs. The ATM Forum has proposed the M4-interface to achieve just such a distributed NMS. However, the basis of the M4-interface is not adequate in terms of flexible distribution, because its main unit of distribution is location. To improve the granularity of the distribution, we have defined the route as a further unit of distribution. We also describe new MOs that provide for a distribution based on routes. A more detailed distribution is then possible than with a distribution purely based on location. In addition, we propose a new CMIP Action to move MOs from one sub-NMS (SubNMS) to another while the system is running. Using this Action, we can achieve a more flexible distribution in terms of network problems and load.