Seiji FUJINO Ryutaro HIMENO Akira KOJIMA Kazuo TERADA
We describe the implementation of an iterative method with the goal of gaining a long vector length. The strategy for vectorization by means of multipoint stencils used for discretization of the partial differential equations is discussed. Numerical experiments show that the strategy that requires certain restrictions on the number of grid points in the x and y directions improves the performance on the vector supercomputer.
Kiyohiro FURUTANI Tsukasa OOISHI Mikio ASAKURA Hideto HIDAKA Hideyuki OZAKI Michihiro YAMADA
This paper proposes a new test mode circuit which enables the massively parallel test of DRAMs with a standard LSI tester with little chip area penalty. It is useful to enhance the test throughput that can't be improved by the conventional multi-bit test mode. And a new redundancy circuit that detects and repairs the short circuit failures in the memory cell array is also proposed. It greatly improves the yield of super low power 256 Mbit DRAMs.
InHwan KIM Takayuki NAKACHI Nozomu HAMADA
In the adaptive lattice estimation process, it is well known that the convergence speed of the successive stage is affected by the estimation errors of reflection coefficients in its preceding stages. In this paper, we propose block estimation methods of two-dimensional (2-D) adaptive lattice filter. The convergence speed of the proposed algorithm is significantly enhanced by improving the adaptive performance of preceding stages. Furthermore, this process can be simply realized. The modeling of 2-D AR field and texture image are demonstrated through computer simulations.
Shietung PENG Igor SEDUKHIN Stanislav SEDUKHIN
In this paper the design of systolic array processors for computing 2-dimensional Discrete Fourier Transform (2-D DFT) is considered. We investigated three different computational schemes for designing systolic array processors using systematic approach. The systematic approach guarantees to find optimal systolic array processors from a large solution space in terms of the number of processing elements and I/O channels, the processing time, topology, pipeline period, etc. The optimal systolic array processors are scalable, modular and suitable for VLSI implementation. An application of the designed systolic array processors to the prime-factor DFT is also presented.
Kawori TAKAKUBO Hajime TAKAKUBO Shigetaka TAKAGI Nobuo FUJII
Analog inverter is one of the most useful building blocks in analog circuits. This paper proposes an analog inverter consisting of a p-channel MOS (PMOS) and an n-channel MOS (NMOS) inverter and presents an application to all-pass filter realizations. The proposed circuit has a wide dynamic range by combining PMOS and NMOS inverters. When the proposed analog inverter is applied to an all-pass filter, the circuit configuration becomes simpler and occupies less chip area and power consumption.
Takeshi FUKUDA Yasuhiko MORIMOTO Shinichi MORISHITA Takeshi TOKUYAMA
In this paper, we investigate inverse problems of the interval query problem in application to data mining. Let I be the set of all intervals on U = {1, 2, , n}. Consider an objective function f(I), conditional functions ui(I) on I, and define an optimization problem of finding the interval I maximizing f(I) subject to ui(I) > Ki for given real numbers Ki (i = 1, 2, , h). We propose efficient alogorithms to solve the above optimization problem if the objective function is either additive or quotient, and the conditional functions are additive, where a function f is additive if f(I) = ΣiIf^(i) extending a function f^ on U, and quotient if it is represented as a quotient of two additive functions. We use computational-geometric methods such as convex hull, range searching, and multidimensional divide-and-conquer.
Saed SAMADI Akinori NISHIHARA Nobuo FUJII
It is shown that two-dimensional linear phase FIR digital filters with various shapes of frequency response can be designed and realized as modular array structures free of multiplier coefficients. The design can be performed by judicious selection of two low order linear phase transfer functions to be used at each module as kernel filters. Regular interconnection of the modules in L rows and K columns conditioned with boundary coefficients 1, 0 and 1/2 results in higher order digital filters. The kernels should be chosen appropriately to, first, generate the desired shape of frequency response characteristic and, second, lend themselves to multiplierless realization. When these two requirements are satisfied, the frequency response can be refined to possess narrower transition bands by adding additional rows and columns. General properties of the frequency response of the array are investigated resulting in Theorems that serve as valuable tools towards appropriate selection of the kernels. Several design examples are given. The array structures enjoy several favorable features. Specifically, regularity and lack of multiplier coefficients makes it suitable for high-speed systolic VLSI implementation. Computational complexity of the structure is also studied.
Masahisa SHIMIZU Yasuhiro OUE Kazumasa OHNISHI Toru KITAMURA
Because a massively parallel computer processes vast amounts of data and generates many access requests from multiple processors simultaneously, parallel secondary storage requires large capacity and high concurrency. One effective method of implementation of such secondary storage is to use disk arrays which have multiple disks connected in parallel. In this paper, we propose a parallel file access method named DECODE (dynamic express changing of data entry) in which load balancing of each disk is achieved by dynamic determination of the write data position. For resolution of the problem of data fragmentation which is caused by the relocation of data during a write process, the concept of "Equivalent Area" is introduced. We have performed a preliminary performance evaluation using software simulation under various access statuses by changing the access pattern, access size and stripe size and confirmed the effectiveness of load balancing with this method.
Satoshi OKUDE Tetsuya SAKAI Masaaki SUDOH Akira WADA Ryozo YAMAUCHI
A novel technique is proposed to fabricate a chirped fiber Bragg grating utilizing thermal diffusion of core dopant. The chirped grating is written with a uniform period by using UV exposure technique in the fiber whose effective index of the guided mode varies along its length. Thermal diffusion of the core dopant it employed to realize this change of the effective index. Through the thermal diffusion process, the effective index of the fiber decreases from its initial value. When the grating is written in the diffused core region, its reflection wavelength becomes shorter than that in the non-diffused region. The continuous change of effective index is required for making a chirped grating. The fiber is heated by a non-uniform heat source. When the uniform grating is written in this region, the reflection wavelength smoothly changes along the fiber length although the grating period is constant. By optimizing the fiber parameters to realize a highly chirped grating, we have obtained a typical one whose bandwidth is 14.1 nm at half maximum and maximum rejection in transmission is 29 dB. Additionally, the proposed method has an advantage to control the chirp profile with high mechanical reliability.
Mitsuru KAWAMOTO Kiyotoshi MATSUOKA Masahiro OYA
This paper proposes a new method for recovering the original signals from their linear mixtures observed by the same number of sensors. It is performed by identifying the linear transform from the sources to the sensors, only using the sensor signals. The only assumption of the source signals is basically the fact that they are statistically mutually independent. In order to perform the 'blind' identification, some time-correlational information in the observed signals are utilized. The most important feature of the method is that the full information of available time-correlation data (second-order statistics) is evaluated, as opposed to the conventional methods. To this end, an information-theoretic cost function is introduced, and the unknown linear transform is found by minimizing it. The propsed method gives a more stable solution than the conventional methods.
Nobuo SHIGA Kenji OTOBE Nobuhiro KUWATA Ken-ichiro MATSUZAKI Shigeru NAKAJIMA
The application of pulse-doped GaAs MESFET's to a power amplifier module is discussed in this paper. The epitaxial layer structure was redesigned to have a dual pulse-doped structure for power applications, achieving a sufficient gate-drain brakdown voltage with excellent linearity. The measured load-pull characteristics of the redesigned device for the minimum power consumption design was presented. This device was shown to have almost twice the power-added efficiency of a conventional ion-implanted GaAs MESFET. Two kinds of power amplifiers were designed and fabricated, achieving Pout of 28.6 dBm at IM3 of -40 dBc with Pdc of 8 W and Pout of 33.0 dBm at IM3 of -40 dBc with Pdc of 32 W, respectively.
Shigeki SAKAGUCHI Shin-ichi TODOROKI
We propose low Rayleigh scattering Na2O-MgO-SiO2 (NMS) glass as a candidate material for low-loss optical fibers. This glass exhibits Rayleigh scattering which is only 0.4 times that of silica glass, and a theoretical evaluation suggests that it is dominated by density fluctuation. An investigation of the optical properties of NMS glass reveals that a minimum loss of 0.06 dB/km is expected at a wavelength of 1.6 µm and that the zero-material dispersion wavelength is found in the 1.5 µm band. To establish the waveguide structure, we evaluated the feasibility of using F-doped NMS (NMS-F) glass as a cladding layer for an NMS core and found that it is suitable because it exhibits low relative scattering (e.g. 0.7) and is versatile in terms of viscosity matching. We also describe an attempt to draw optical fibers using the double crucible technique.
The first optimizing compiler was developed at IBM in order to prove that high level language programming could be as efficient as hand-coded machine language. Computer architecture and compiler optimization interacted through a feedback loop, from the high-level language computer architectures of the 1970s to the RISC machines of the 1980s. In the supercomputing community, the availability of effective vectorizing compilers delivered easy-to-use performance in the 1980s to the present. These compilers were successful at least in part because they could predict poor performance spots in the program and report these to users. This fostered a feedback loop between programmers and compilers to develop high performance programs. Future optimizing compilers for high performance computers and supercomputers will have to take advantage of both feedback loops.
This paper reviews analog LSI design issues for optical transmission applications; covering ultra-high-speed transmission over 10 Gb/s, multi-Gb/s systems, optical interconnection systems, and optical access. In the future system development, further advancements in not only optical device technology but also LSI technology are eagerly required. More and more sophisticated circuit design techniques are needed to lower power and operation voltage, increase integration, eliminate external elements and adjustments.
Rene PERALTA Masahiro MAMBO Eiji OKAMOTO
We describe our implementation of the Hypercube variation of the Multiple Polynomial Quadratic Sieve (HMPQS) integer factorization algorithm on a Parsytec GC computer with 128 processors. HMPQS is a variation on the Quadratic Sieve (QS) algorithm which inspects many quadratic polynomials looking for quadratic residues with small prime factors. The polynomials are organized as the nodes of an n-dimensional cube. We report on the performance of our implementations on factoring several large numbers for the Cunningham Project.
Naohisa TAKAHASHI Takeshi MIEI
We present a general framework with which we can evaluate the flexibility and efficiency of various replay systems for parallel programs. In our approach, program monitoring is modeled by making a virtual dataflow program graph, referred to as a VDG, that includes all the instructions executed by the program. The behavior of the program replay is modeled on the parallel interpretation of a VDG based on two basic parallel execution models for dataflow program graphs: a data-driven model and a demand-driven model. Previous attempts to replay parallel programs, known as Instant Replay and P-Sequence, are also modeled as variations of the data-driven replay, i.e. the datadriven interpretation of a VDG. We show that the demand-driven replay, i.e. the demand-driven interpretation of a VDG, is more flexible in program replay than the data-driven replay since it allows better control of parallelism and a more selective replay. We also show that we can implement a demand-driven replay that requires almost the same amount of data to be saved during program monitoring as does the data-driven replay, and which eliminates any centralized bottleneck during program monitoring by optimizing the demand propagation and using an effective data structure.
Hirohisa YOKOTA Emiko OKITSU Yutaka SASAKI
Thermally-diffused expanded core (TEC) techniques brought the fibers with the mode fields expanded by thermal diffusion of core dopants. The techniques are effective to the reduction of splice or connection losses between the different kind of fibers, and are applied to the integrations of thin film optical devices in fiber networks, the fabrications of chirped fiber gratings, and so on. In the practical use of TEC techniques, the fibers are heated high temperature of about 1650 because of a short peried of time in processing by microburners. The mode field diameter expansion (MFDE) ratio, which is defined as the ratio of the mode field diameter in the fiber section having the core expanded and that unexpanded, is desired to be more than 2.0 from the viewpoint of loss reduction in industrial uses of the TEC techniques. When the TEC techniques are applied to polarization-maintaining optical fibers (PM fibers), such as PANDA fibers, both core dopants and stress applying part (SAP) dopants diffuse simultaneously. So the MFDE ratio is less than two without mode field deformation in conventional PANDA fibers which are practically used as PM fibers. In this paper a PANDA fiber design suitable for the TEC techniques is newly proposed. The fiber has 1.28 µm cutoff wavelength and the mode field diameter is about 11 µm before core expansion at 1.3µm wavelength.
Kaoru WATANABE Hiroshi TAMURA Keisuke NAKANO Masakazu SENGOKU
In this paper we extend the p-collection problem to a flow network with lower bounds, and call the extended problem the lower-bounded p-collection problem. First we discuss the complexity of this problem to show NP-hardness for a network with path structure. Next we present a linear time algorithm for the lower-bounded 1-collection problem in a network with tree structure, and a pseudo-polynomial time algorithm with dynamic programming type for the lower-bounded p-collection problem in a network with tree structure. Using the pseudo-polynomial time algorithm, we show an exponential algorithm, which is efficient in a connected network with few cycles, for the lower-bounded p-collection problem.
Kunihiko KOZARU Atsushi KINOSHITA Tomohisa WADA Yutaka ARITA Michihiro YAMADA
This paper presents Super-CMOS SRAM process technology that integrates bipolar and CMOS transistors in a chip while adding only one ion implantation step and no lithography mask steps to the conventional CMOS SRAM process. The Super-CMOS SRAM process therefore has the same process cost as the CMOS SRAMs, while it achieves higher access speeds. In order to demonstrate the Super-CMOS SRAM, we have developed a 3.3 V/5 V 256 kb SRAM using 0.4 µm Super-CMOS process technology. By applying bipolar transistors to the sense amplifier circuits, a high-speed access time of 5.8 ns with a 3.0 V power supply is successfully achieved.
Masataka MINAMI Nagatoshi OHKI Hiroshi ISHIDA Toshiaki YAMANAKA Akihiro SHIMIZU Koichiro ISHIBASHI Akira SATOH Tokuo KURE Takashi NISHIDA Takahiro NAGANO
A high-performance microprocessor-compatible small size full CMOS SRAM cell technology for under 1.8-V operation has been developed. Less than 1-µm spacing between the n and pMOSFETs is achieved by using a retrograde well combined with SSS-OSELO technology. To connect the gates of a driver nMOSFET and a load pMOSFET directly, a 0.3-µm n-gate load pMOSFET, formed by amorphous-Si-film through-channel implantation, is merged with a 0.25-µm p-gate pMOSFET for the peripheral circuits. The memory cell area is reduced by using a mask-free contact process for the local interconnect, which includes titanium-nitride wet-etching using a plasma-TEOS silicone-dioxide mask. The newly developed memory cell was demonstrated using 0.25-µm CMOS process technology. A 6.93-µm2 and 1-V operation full CMOS SRAM cell with a high-performance circuit was achieved by a simple fabrication process.