In this paper, a high-performance pipelining architecture for 2-D inverse discrete wavelet transform (IDWT) is proposed. We use a tree-block pipeline-scheduling scheme to increase computation performance and reduce temporary buffers. The scheme divides the input subbands into several wavelet blocks and processes these blocks one by one, so the size of buffers for storing temporal subbands is greatly reduced. After scheduling the data flow, we fold the computations of all wavelet blocks into the same low-pass and high-pass filters to achieve higher hardware utilization and minimize hardware cost, and pipeline these two filters efficiently to reach higher throughput rate. For the computations of N N-sample 2-D IDWT with filter length of size K, our architecture takes at most (2/3)N2 cycles and requires 2N(K-2) registers. In addition, each filter is designed regularly and modularly, so it is easily scalable for different filter lengths and different levels. Because of its small storage, regularity, and high performance, the architecture can be applied to time-critical image decompression.
Rachaporn KEINPRASIT Prabhas CHONGSTITVATANA
In this paper an algorithm based on Ant Colony Optimization techniques called Ants on a Tree (AOT) is introduced. This algorithm can integrate many algorithms together to solve a single problem. The strength of AOT is demonstrated by solving a High-Level Synthesis problem. A High-Level Synthesis problem consists of many design steps and many algorithms to solve each of them. AOT can easily integrate these algorithms to limit the search space and use them as heuristic weights to guide the search. During the search, AOT generates a dynamic decision tree. A boosting technique similar to branch and bound algorithms is applied to guide the search in the decision tree. The storage explosion problem is eliminated by the evaporation of pheromone trail generated by ants, the inherent property of our search algorithm.
Tadashi DOHI Kazuki IWAMOTO Hiroyuki OKAMURA Naoto KAIO
Software rejuvenation is a proactive fault management technique that has been extensively studied in the recent literature. In this paper, we focus on an example for a telecommunication billing application considered in Huang et al. (1995) and develop the discrete-time stochastic models to estimate the optimal software rejuvenation schedule. More precisely, two software availability models with rejuvenation are formulated via the discrete semi-Markov processes, and the optimal software rejuvenation schedules which maximize the steady-state availabilities are derived analytically. Further, we develop statistically non-parametric algorithms to estimate the optimal software rejuvenation schedules, provided that the complete sample data of failure times are given. Then, a new statistical device, called the discrete total time on test statistics, is introduced. Finally, we examine asymptotic properties for the statistical estimation algorithms proposed in this paper through a simulation experiment.
It is still an open question whether software agents should be personified in the interface. In order to study the effects of faces and facial expressions in the interface, a series of experiments was conducted to compare subjects' responses to and evaluation of different faces and facial expressions. The experimental results obtained demonstrate that: 1) personified interfaces help users engage in a task, and are well suited for an entertainment domain; 2) people's impressions of a face in a task are different from ones of the face in isolation. Perceived intelligence of a face is determined not by the agent's appearance but by its competence; 3) there is a dichotomy between user groups which have opposite opinions about personification. Thus, agent-based interfaces should be flexible to support the diversity of users' preferences and the nature of tasks.
Hiroki FURUYA Hajime NAKAMURA Shinichi NOMOTO Tetsuya TAKINE
This paper studies the local Poisson property of aggregated IP traffic. First, it describes the scenario where IP traffic presents a Poisson-like characteristic within some limited range of time scales when packets from independent traffic streams are aggregated. Each of the independent traffic streams corresponds to a series of correlated IP packets such as those of a transport connection. Since the Poisson-like characteristic is observed only within some limited range of time scales, we call this characteristic the local Poisson property. The limited range of time scales of the local Poisson property can be estimated from a network configuration and characteristics of transport connections. Second, based on these observations, we seek the possibility to apply an ordinary Poisson process to evaluation of the packet loss probability in IP networks. The analytical investigation, where IP traffic is modeled by a superposition of independent branching Poisson processes that presents the local Poisson property, suggests that the packet loss probability can be estimated by a finite-buffer queue with a Poisson process when the buffer size is within a certain range. The investigation is verified by simulations. These findings expand the applicability of conventional Poisson-based approaches to IP network design issues.
Atsushi KUSUNOKI Mitsuru TANAKA
This paper presents the design consideration of a polarization-transformation transmission filter, which is composed of a multilayered chiral slab. The optimal material parameters and thickness of each layer of the slab can be determined by using a genetic algorithm (GA). Substituting the constitutive relations for a chiral medium into Maxwell's equations, the electromagnetic field in the medium is obtained. A chain-matrix formulation is used to derive the relationship between the components of the incident, the reflected, and the transmitted electric fields. The cross- and co-polarized powers carried by the transmitted and reflected waves are represented in terms of their electric field components. The procedure proposed for the design of a polarization-transformation filter is divided into two stages. An ordinary filter without polarization-transformation and a polarization-transformation filter for the transmitted wave are designed with a multilayered non-chiral slab and a multilayered chiral slab at the first and the second stages, respectively. According to the specifications of the filters, two functionals are defined with the transmitted and reflected powers. Thus the optimal design of a polarization-transformation filter with the multilayered chiral slab is reduced to an optimization problem where the material parameters and thickness of each chiral layer are found by maximizing the functionals. Applying the GA to the maximization of the functionals, one can obtain the optimal material parameters and thicknesses of the multilayered chiral slab. Numerical results are presented to confirm the effectiveness of the two-stage design procedure. For three types of multilayered chiral slabs, optimal values of refractive indices, thicknesses, and chiral admittances are obtained. It is seen from the numerical results that the proposed procedure is very effective in the optimal design of polarization-transformation filters for the transmitted wave.
Chan-Ho PARK Byung-Soo CHOI Suk-Jin KIM Eun-Gu JUNG Dong-Ik LEE
This paper presents a new asynchronous multiplier. The original array structure is divided into two asymmetric arrays, called an upper array and a lower array. For the lower array, Left to Right scheme is applied to take advantage of a fast computation and low power consumption as well. Simulation results show that the proposed multiplier has 40% of performance improvement with a relatively lower power consumption. The multiplier has been implemented in a CMOS 0.35 µm technology and proved functionally correct.
The design of the analog part of a mixed analog-digital IC for a commercial wireless burglar alarm system is presented as an example of a very low-power VLSI design for battery-operated systems. The main constraint is battery life, which must be at least five years (with standard camera-battery). An operational amplifier, a power supply monitor and an oscillator are the core of the design. The operational amplifier absorbs 1.5 µA while the entire analog part absorbs 4 µA. Measures on each single part show compliance with specification. Test on working environment show its full functionality. Even though the example is application specific, the design solutions and each single element can also be utilized in many other battery-operated low-frequency devices (e.g. environmental parameter monitoring).
Hiroshi HASEGAWA Masashi NAKAGAWA Isao YAMADA Kohichi SAKANIWA
In this paper, we propose a simple method to find the optimal rational function, with a fixed denominator, which minimizes an integral of polynomially weighted squared error to given analytic function. Firstly, we present a generalization of the Walsh's theorem. By using the knowledge on the zeros of the fixed denominator, this theorem characterizes the optimal rational function with a system of linear equations on the coefficients of its numerator polynomial. Moreover when the analytic function is specially given as a polynomial, we show that the optimal numerator can be derived without using any numerical integration or any root finding technique. Numerical examples demonstrate the practical applicability of the proposed method.
Arata KOIKE Satoko TAKIGAWA Kiyoka TAKEDA Akihisa KOBAYASHI Masashi MORIMOTO Konosuke KAWASHIMA
In this paper, we first investigate the characteristics of movie contents over the Internet. As in the previous studies, we found the lognormal-distribution well fits the distribution of file size for the whole set of general movie contents. When we specifically focus on the subset that consists of movie trailers, however, it shows different distribution from the lognormal-distribution. Our analysis shows it is similar to an exponential-distribution. We here assume that movie trailers are one of the relevant contents for Contents Delivery Networks (CDN) or Peer-to-Peer (P2P) file exchange communities. We further studies the relationship between playing duration and file size for the movie trailers and we did not find any linear correlation among them. We next consider bandwidth requirements to retrieve movie trailer contents. Our objective is to make it possible for user to view the contents in real-time. Many previous studies investigate bandwidth requirement based only on the file size distribution. In this paper, we analyze the traffic design criteria for CDN or P2P by taking into account both of the results for the file size distribution and the relationship between playing duration and file size for movie trailers. Simulation studies reveal the impact for the bandwidth requirement.
Md. ALTAF-UL-AMIN Satoshi OHTAKE Hideo FUJIWARA
This paper introduces a design for testability (DFT) scheme for delay faults of a controller-data path circuit. The scheme makes use of both scan and non-scan techniques. First, the data path is transformed into a hierarchically two-pattern testable (HTPT) data path based on a non-scan approach. Then an enhanced scan (ES) chain is inserted on the control lines and the status lines. The ES chain is extended via the state register of the controller. If necessary, the data path is further modified. Then a test controller is designed and integrated to the circuit. Our approach is mostly based on path delay fault model. However the multiplexer (MUX) select lines and register load lines are tested as register transfer level (RTL) segments. For a given circuit, the area overhead incurred by our scheme decreases substantially with the increase in bit-width of the data path of the circuit. The proposed scheme supports hierarchical test generation and can achieve fault coverage similar to that of the ES approach.
Tadahiro OCHIAI Hiroshi HATANO
Utilizing a macromodel which calculates the floating gate potential by combining resistances and dependent voltage and current sources, DC transfer characteristics for multi-input neuron MOS inverters and for those in the neuron MOS full adder circuit are simulated both at room temperature and at 77 K. Based on the simulated results, low temperature circuit failures are discussed. Furthermore, circuit design parameter optimization both for low and room temperature operations is described.
Design patterns can be regarded as an approach to encapsulate and reuse good design practices. However, most design patterns are specified using informal text and examples. To obtain all of the benefits of patterns, formal specification and tool support are indispensable. This paper proposes a Design Pattern Specification Language (DPSL) that is both manageable and effective. The DPSL provides software developers with the capability to treat design patterns as concrete design units without lowering abstraction. To demonstrate the usability of our DPSL and its application in design modeling, we have developed a prototype tool that supports the DPSL in UML diagrams. This prototype allows us to demonstrate the tool's support possibilities and the usability of patterns for software development applications.
Hideyuki ITO Ryusuke KONISHI Hiroshi NAKADA Kiyoshi OGURI Minoru INAMORI Akira NAGOYA
This paper describes the realization of a dynamically reconfigurable logic LSI based on a novel parallel computer architecture. The key point of the architecture is its dual-structured cell array which enables dynamic and autonomous reconfiguration of the logic circuits. The LSI was completed by successfully introducing two specific features: fully asynchronous logic circuits and a homogeneous structure, only LUTs are used.
Kazuya TANIGAWA Tetsuo HIRONAKA Akira KOJIMA Noriyoshi YOSHIDA
Reconfigurable architectures have been focused for its potential on achieving high performance by reconfiguring special purpose circuits for a target application and its flexibility due to its ability of reconfiguring. We have set our sights on use of a reconfigurable architecture as a general-purpose computer by extending the advantageous properties of the architecture. To achieve the goal, a generalized execution model for reconfigurable architecture is required, so we have proposed an Ideal PARallel Structure (I-PARS) execution model. In the I-PARS execution model, any programs based on its model has no restriction depending on hardware structures based on a specific reconfigurable processor, which makes it easier to develop software. Further, we have proposed a PARS architecture which executes programs based on the I-PARS execution model effectively. The PARS architecture has a large reconfigurable part for highly parallel execution, which utilizes parallelism described on the I-PARS execution model. For effective utilization of the reconfigurable part in the PARS architecture, it has an ability to reconfigure and execute operations simultaneously in one cycle. Further, the PARS architecture supports branch operations to introduce control flow in an execution on the architecture, which makes it possible to skip an execution which does not produce a valid result. In this paper, we introduce the detailed structure of an implemented prototype processor based on the PARS architecture. In the implementation, 420,377 CMOS transistors were used, which was only 3.8% of the number of transistors used in the UltraSPARC-III in logic circuits. Additionally, we evaluated the performance of the prototype processor by using some benchmark programs. From the evaluation results, we found that the prototype processor could achieve nearly the same performance and be implemented with extremely the less number of transistors compared with UltraSPARC-III 750MHz.
Hisako SATO Mariko OHTSUKA Kazuya MAKABE Yuichi KONDO Kazumasa YANAGISAWA Peter M. LEE
This paper presents an efficient application of hot-carrier reliability simulation to delay libraries of 0.18µm and 0.14µm gate length logic products. Using analysis of simple primitive inverter cells, a design rule was developed in restricting signal rise time, and delay libraries of actual products were screened to check whether the rise time restrictions were met. At 200MHz, maximum rise time (0-100%) triseMAX was 0.8nsec (17% of duty) under Δtd/td = 5%. For a 800,000 net product, only 25 simulations were done (each less than one minute CPU time) for the internal devices with screening done for this logic process. 30 nets were caught, but judged reliable due to their reduced duty.
Takuya OKAMOTO Takafumi YUASA Tomonori IZUMI Takao ONOYE Yukihiro NAKAMURA
A configurable device "PCA-Chip2" implements the concept of Plastic Cell Architecture, which is an extension of programmable logic devices. This paper presents basic design tools for the PCA-Chip2 as the first step to develop the total design environment. Given a C description of a target function, configuration data for PCA-Chip2 is automatically generated by the tools. Trial designs by the tools are also presented to demonstrate the practicability of the proposed approach.
An accurate, fast delay calculation method suitable for high-performance, low-power LSI design is proposed. The delay calculation is composed of two steps: (1) the gate delay is calculated by using an effective capacitance obtained from a simple model we propose; and (2) the interconnect delay is also calculated from the effective capacitance and modified by using the gate-output transition time. The proposed delay calculation halves the error of a conventional rough calculation, achieving a computational error within 10% per gate stage. The mathematical models are simple enough that the method is suitable for quick delay calculation and logic circuit optimization in the early stages of LSI design. A delay optimization tool using this delay calculation method reduced the worst path delay of a multiply-add module by 11.2% and decreased the sizes of 58.1% of the gates.
Akira YAMADA Yasuhiro NUNOMURA Hiroaki SUZUKI Hisakazu SATO Niichi ITOH Tetsuya KAGEMOTO Hironobu ITO Takashi KURAFUJI Nobuharu YOSHIOKA Jingo NAKANISHI Hiromi NOTANI Rei AKIYAMA Atsushi IWABU Tadao YAMANAKA Hidehiro TAKATA Takeshi SHIBAGAKI Takahiko ARAKAWA Hiroshi MAKINO Osamu TOMISAWA Shuhei IWADE
A high-speed 32-bit RISC microcontroller has been developed. In order to realize high-speed operation with minimum hardware resource, we have developed new design and analysis methods such as a clock distribution, a bus-line layout, and an IR drop analysis. As a result, high-speed operation of 400 MHz has been achieved with power dissipation of 0.96 W at 1.8 V.
Hidefumi KUROKAWA Hiroyuki IKEGAMI Motohide OTSUBO Kiyoshi ASAO Kazuhisa KIRIGAYA Katsuya MISU Satoshi TAKAHASHI Tetsuji KAWATSU Kouji NITTA Hiroshi RYU Kazutoshi WAKABAYASHI Minoru TOMOBE Wataru TAKAHASHI Akira MUKOUYAMA Takashi TAKENAKA
This paper describes the effects of system LSI design with C language-based behavioral synthesis following several trials of design period reduction and quality improvement for a variety of circuit types. The results of these trials are analyzed from the viewpoints of description productivity, verification productivity, reusability and design flexibility as well as hardware and software co-verification. First the C-based design flow proposed by the authors is described, and the design productivity and verification productivity under this design flow is compared to RTL design. The reusability of the behavioral IP core and its efficiency with HW/SW co-verification are also shown using design examples. Next, using the example of an MPEG-4 video decoder design, a typical design process in a C-based design is shown with considerations regarding verification efficiency, reusability of the IP core and HW/SW co-verification. Finally, the authors' perspectives regarding future directions of system LSI design are discussed.