Shinji KIMURA Tsutomu IGAKI Hiromasa HANEDA
The paper describes a parallel algorithm for the manipulation of binary decision diagrams on a shared memory multi-processor system. Binary decision diagrams are very efficient representations of logic functions, and are widely used in computer aided design of logic circuits. Logic operations on logic functions such as AND and OR are reduced to operations on binary decision diagrams representing these functions. Operations on binary decision diagrams are time-consuming in some cases, and a fast manipulation method is needed. As with the manipulation, we focus on the construction of a binary decision diagram from a logic formula, and devised a parallel algorithm for the construction. In the construction, there are many logic operations to be processed, and some of them can be processed in parallel. At first, we introduce an extraction method and a parallel-execution method for such parallelizable operations. This is the parallel execution method for an operation sequence (or a set of operations). To extract more parallelism, we introduce a dynamic expansion method of a logic operation. The dynamic expansion is a method to obtain sub-operations from a logic operation using the modified Shannon's expansion. These sub-operations are executed in parallel and the results of these sub-operations are merged to obtain the result of the original operation. Our parallel algorithm, which is based on the construction of shared binary decision diagrams with the negative edge and the operation cache, is implemented in C on a shared memory multi-processor system Sequent S-81 (CPU 80386 (16 MHz)28, 86.75MB), and applied to multiplier examples and ISCAS benchmarks. The speed-up ratio becomes 14 for multipliers, and becomes 11 for c1908 in ISCAS benchmarks.
Katsuhiko SHIMABUKURO Michitaka KAMEYAMA Tatsuo HIGUCHI
It is well known that the multiple-valued signed-digit (SD) arithmetic circuits have the attractive features of compactness and high-speed operation. However, both of these features have yet to be utilized fully. In this paper, we consider the application of a parallel-structure-based VLSI processor. A high-performance parallel-structure-based multiple-valued VLSI processor using the radix-2 SD number system is proposed. Its compactness makes the parallelism high under chip size limitations in comparison with the ordinary binary arithmetic circuits. Moreover, the speed of the single arithmetic module is very high in the SD arithmetic circuits, so that we can take advantage of the high-speed operation in the parallel-structure-based VLSI processor chip. The multiple-valued bidirectional current-mode technology is used not only in high-speed small sized arithmetic circuits, but also in reducing the number of connections in the parallel-structure-based VLSI processor. The proposed processor is specially developed for real-time digital control, where the performance is evaluated by delay time. Performance estimation using SPICE simulators shows that the delay time of proposed processor for matrix operations such as matrix multiplication is greatly reduced in comparison with a conventional binary processor.
This paper proposes the GDV system, which provides a format-independent interface with which to access documents in various formats. It also proposes a new approach for document representation to be used in the GDV system. In this approach, a document is represented by a chain of objects, each of which belongs to a certain class and transforms access operations according to the class-specific transformation rule. A user's request is interpreted as a request to the uppermost object of the chain, transformed by objects in the chain successively, and executed by the lowermost object in the chain. The initial state of a document is an object chain containing an unidentified object. As the unidentified object identifies and divides itself, classification (and chain generation) proceeds step by step.
This paper describes the integration of a qualitative method and a quantitative method by Bottleneck Diagnosis/Improvement Expert Systems for Synchronized queueing network (BDES-S and BIES-S). On the basis of qualitative reasoning, BDES-S can carry out parameter tuning in order to diagnose and improve bottlenecks of synchronized queueing networks. BDES-S can produce several alternative qualitative improvement plans for one bottleneck server. BIES-S can produce quantitative improvement equations for each qualitative improvement plan. Our method using BDES-S and BIES-S can integrate both quantitative and qualitative methods for parameter tuning on complicated queueing synchronized networks.
Hidetaka HIGASHINO Kentaro SETSUNE Kiyotaka WASA
Experimental results on the superconducting three-terminal devices Using Bi-system High-Tc Superconductors were reported. The VCJJ (Variabel critical-current-type Josephson junction devices) using the thermal effect (VCJJ) and a dual gate Josephson device of a new current-injection type are described. The basic technology and problems for high-Tc three-terminal devices are briefly discussed.
Koichi HAYASHI Mitsuru KOMATSU Masakatsu NISHIGAKI Hideki ASAI
This letter describes the waveform relaxation algorithm with the dynamic circuit partitioning technique based on the operation point of bipolar devices. Finally, we verify its availability for the simulation of the digital bipolar transistor circuit.
This paper proposes a model for learning non-parametric densities using finite-dimensional parametric densities by applying Yamanishi's stochastic analogue of Valiant's probably approximately correct learning model to density estimation. The goal of our learning model is to find, with high probability, a good parametric approximation of the non-parametric target density with sample size and computation time polynomial in parameters of interest. We use a learning algorithm based on the minimum description length (MDL) principle and derive a new general upper bound on the rate of convergence of the MDL estimator to a true non-parametric density. On the basis of this result, we demonstrate polynomial-sample-size learnability of classes of non-parametric densities (defined under some smoothness conditions) in terms of exponential families with polynomial bases, and we prove that under some appropriate conditions, the sample complexity of learning them is bounded as O((1/ε)(2r1)/2r1n(2r1)/2r(1/ε)(1/ε)1n(1/δ) for a smoothness parameter r (a positive integer), where ε and δ are respectively accuracy and confidence parameters. Futher, we demonstrate polynomial-time learnability of classes of non-parametric densities (defined under some smoothness conditions) in terms of histogram densities with equal-length cells, and we prove that under some appropriate condition, the sample complexity of learning them is bounded as O((1/ε)3/21n3/2(1/ε)(1/ε)1n(1/δ)).
Takeo YAMASHITA Satoshi HASAKA Iwao NATORI Tadahiro OHMI
The two most important parameters in reactive ion etching process, ion bombardment energy and flux, were extracted through a simple RF waveform measurement at the excitation electrode in a conventional cathode-coupled plasma RIE system. By using the extracted plasma parameters, damage and contamination in Si substrates induced by reactive ion etching in a SiCl4 plasma were investigated. A very convenient map representation of ion energy and ion flux was introduced in understanding the etching process occurring in the RIE system.
Kiyoshi NISHIKAWA Russell M. MERSEREAU
We present a successful method for designing 2-D circularly symmetric R lowpass filters with constant group delay. The procedure is based on a transformation of a 1-D prototype R filter with constant group delay, whose magnitude response is the 2-D cross-sectional response. The 2-D filter transfer function has a separable denominator and a numerator which is obtained from the prototype numerator by means of a series of McClellan transformations whose free parameters can be optimized by successive procedure. The method is illustrated by an example.
Kazuhiro MOTEGI Shigeyoshi WATANABE
For the development of a practical device simulation, it is necessary to solve the large sparse linear equations with a high speed computation of direct solution method. The use of parallel computation methods to solve the linear equations can reduce the CPU time greatly. The Multi Step Diakoptics (MSD) algorithm, is proposed as one of these parallel computation methods with direct solution, which is based on Diakoptics, that is, a tearing-based parallel computation method for sparse linear equations. We have applied the MSD algorithm to device simulation. This letter describes the partition and connection schedules in the MSD algorithm. The evaluation of this algorithm is done using a massively parallel computer with distributed memory (AP1000).
The motion of particles in low-pressure chemical vapor deposition (LPCVD) (0.4 Torr) equipment has been investigated by a numerical simulation. The effects of wafer orientation, electrostatic forces, and thermophoresis were evaluated. Horizontal surface-down processing and vertical processing can reduce particulate contamination remarkably compared with horizontal surface-up processing. Static electricity control is essential. Weakly charged wafers (several V to several 10 V) can significantly increase submicron particle deposition. In the absence of electrical forces, thermophoresis prevents deposition of particles in the size range 0.03 µmDp0.6 µm, when the temperature difference between the wafer surface and the gas inlet temperature exceeds 100. Deposition of particles smaller than 0.03 µm still occurs by diffusion.
As part of Hitachi's development of clean semiconductor processing equipment, the Fluids Modeling Group of the Mechanical Engineering Research Laboratory is developing a computer model, CONTAMINATE, for simulating contamination of wafers in chemical vapor deposition (CVD) systems. CONTAMINATE is based on a 2D implementation of the SIMPLER algorithm for simulating convection/diffusion transport processes. The new model includes modules for simulating fluid flow, heat transfer, chemical reactions, and gas-phase formation and deposition of clusters and particles. CONTAMINATE outputs property fields and estimates of various film quality indices. Using CONTAMINATE we simulated a SiH4: O2: N2 gas mixture at 300 K flowing over a wafer heated to 700 K. System pressures were varied from 1-100 torr and SiH4 pressures from 0.1 to 10 torr. Deposition characteristics are in qualitative agreement with actual systems and are summarized as follows: (1) No particles larger than 0.1µm deposited for any of the conditions tested. (2) Film damage occurred above 10 torr, but no damage occurred below 10 torr. (3) Increasing SiH4 pressure at constant system pressure eliminated particle deposition because particles grew large enought that thermophoresis blocked particle diffusion. (4) Conformal deposition of featured surfaces was achieved only at 1 torr. (5) Film thickness variation over the diameter of the wafer was 15% at 100 torr, 3% at 10 torr, and 1% at 1 torr.
Masayuki KAWAMATA Takehiko KAGOSHIMA Tatsuo HIGUCHI
This paper proposes an efficient design method of three-dimensional (3-D) recursive digital filters for video signal processing via decomposition of magnitude specifications. A given magnitude specification of a 3-D digital filter is decomposed into specifications of 1-D digital filters with three different (horizontal, vertical, and temporal) directions. This decomposition can reduce design problems of 3-D digital filters to design problems of 1-D digital filters, which can be designed with ease by conventional methods. Consequently, design of 3-D digital filters can be efficiently performed without complicated tests for stability and large amount of computations. In order to process video signal in real time, the 1-D digital filters with temporal direction must be causal, which is not the case in horizontal and vertical directions. Since the proposed method can approximate negative magnitude specifications obtained by the decomposition with causal 1-D R filters, the 1-D digital filters with temporal direction can be causal. Therefore the 3-D digital filters designed by the proposed method is suitable for real time video signal processing. The designed 3-D digital filters have a parallel separable structure having high parallelism, regularity and modularity, and thus is suitable for high-speed VLSI implementation.
Spoken language systems such as speech-to-speech dialog translation systems have been gaining more attention in recent years. These systems require full integration of speech recognition and natural language understanding. This paper presents an efficient parsing algorithm that integrates the search problems of speech processing and language processing. The parsing algorithm we propose here is regarded as an extension of the finite-state-network directed, one-pass search algorithm to one directed by a context-free grammar with retention of the time-synchronous procedure. The extended search algorithm is used to find approximately globally optimal sentence hypotheses; it does not have overhead which exists in, for example, hierarchical systems based on the lattice parsing approach. The computational complexity of this search algorithm is proportional to the length of the input speech. As the search process in the speech recognition can directly take account of the predictive information in the sentence parsing, this framework can be extended to sopken language systems which deal with dynamically varying constraints in dialogue situations.
We investigate the relationship between two different notions of reducibility among prediction (learning) problems within the distribution-free learning model of Valiant (PAC learning model). The notions of reducibility we consider are the analogues for prediction problems of the many-one reducibility and of the Turing reducibility. The former is the notion of prediction preserving reducibility developed by Pitt and Warmuth, and its generalization. Concerning these two notions of reducibility, we show that there exist a pair of prediction problems A and B, whose membership problems are polynomial time solvable, such that A is reducible to B with respect to the Turing reducibility, but not with respect to the prediction preserving reducibility. We show this result by making use of the notion of a class of polynomially sparse variants of a concept representation class. We first show that any class A of polynomially sparse variants of another class B is reducible to B with respect to the Turing reducibility'. We then prove the existence of a prediction problem R and a class R of polynomially sparse variants of R, such that R does not reduce to R with respect to the prediction preserving reducibility.
Yasuhisa HAYASHI Satoshi KONDO Nobuyuki TAKASU Akio OGIHARA Shojiro YONEDA
This study proposes a new training method for hidden Markov model with separate vector quantization (SVQ-HMM) in speech recognition. The proposed method uses the correlation of two different kinds of features: cepstrum and delta-cepstrum. The correlation is used to decrease the number of reestimation for two features thus the total computation time for training models decreases. The proposed method is applied to Japanese language isolated dgit recognition.
Hideki SAKAUCHI Yasuyo OKANOUE Satoshi HASEGAWA
This paper proposes design schemes which obtain an efficient spare-channel assignment against single and double link failures for a self-healing network. Spare-channel design problems can be formulated as a linear-programming (LP) problem when variables are assumed to be continuous. For the problem, the proposed algorithm effectively solves a sub-set of whole constraints by making use of a maximum-flow algorithm in an iterative manner. It is shown that the maximum number of iteration times is limited by the number of links in the network. Moreover, the relation between the design function and the self-healing function is discussed. It is also shown that the cooperation of the two functions can realize more effective control in large scale networks.
Tomoko SAWABE Tetsurou FUJII Hiroshi NAKADA Naohisa OHTA Sadayasu ONO
This paper describes a super high definition (SHD) image processing system we have developed. The computing engine of this system is a parallel processing system with 128 processing elements called NOVI- HiPIPE. A new pipelined vector processor is introduced as a backend processor of each processing element in order to meet the great computing power required by SHD image processing. This pipelined vector processor can achieve 120 MFLOPS. The 128 pipelined vector processors installed in NOVI- HiPIPE yield a total system peak performance of 15 GFLOPS. The SHD image processing system consists of an SHD image scanner, and SHD image storage node, a full color printer, a film recorder, NOVI- HiPIPE, and a Super Frame Memory. The Super Frame Memory can display a ful color moving image sequence at a rate of 60 fps on a CRT monitor at a resolution of 2048 by 2048 pixels. Workstations, interconnected through an Ethernet, are used to control these units, and SHD image data can be easily transfered among the units. NOVI- HiPIPE has a frame memory which can display SHD still images on a color monitor, therefore, one processed frame can be directly displayed. We are developing SHD image processing algorithms and parallel processing methodologies using this system.
Yuichi KAJI Ryuichi NAKANISHI Hiroyuki SEKI Tadao KASAMI
Parallel multiple context-free grammars (pmcfg's) and multiple context-free grammars (mcfg's) were introduced as extensions of context-free grammars to describe the syntax of natural languages. Pmcfg's and mcfg's deal with tuples of strings, and it has been shown that the universal recognition problem for mcfg's is EXP-POLY time-complete where the universal recognition problem is the problem to decide whether G generates w for a given grammar G and string w. In this paper, the universal recognition problems for the class of pmcfg's and for the subclass of pmcfg's with the information-lossless condition are shown to be EXP-POLY time-complete and PSPACE-complete, respectively. It is also shown that the problems for pmcfg's and for mcfg's with a bounded dimension are both -complete and those for pmcfg's and for mcfg's with a bounded degree are both -complete. As a corollary, the problem for modified head grammars introduced by Vijay-Shanker, et al. to define the syntax of natural languages is shown to be in deterministic polynomial time.
The effects of changing system parameters on job scheduling policies are studied for load balancing of multi-class jobs in a distributed computer system that consists of heterogeneous host computers connected by a single-channel communications network. A job scheduling policy decides which host should process the arriving jobs. We consider two job scheduling policies. The one is the overall optimal policy whereby jobs are scheduled so as to minimize the overall mean job response time. Tantawi and Towsley obtained the algorithm that gives the solution of the policy in the single class job environment and Kim and Kameda extended it to the multiple job class environment. The other is the individually optimal policy whereby jobs are scheduled so that every job may feel that its own expected response time is minimized. We can consider three important system parameters in a distributed computer system: the communication time of the network, the processing capacity of each node, and the job arrival rate of each node. We examine the effects of these three parameters on the two load balancing policies by numerical experiment.