Hiroaki NISHI Ken-ichiro ANJO Tomohiro KUDOH Hideharu AMANO
JUMP-1 is currently under development by seven Japanese universities to establish techniques for building an efficient distributed shared memory on a massively parallel processor. It provides a coherent cache with reduced hierarchical bit-map directory scheme to achieve cost effective and high performance management. Messages for coherent cache are transferred through a fat tree on the RDT (Recursive Diagonal Torus) interconnection network. RDT router supports versatile functions including multicast and acknowledge combining for the reduced hierarchical bit-map directory scheme. By using 0.5µm BiCMOS SOG technology, it can transfer all packets synchronized with a unique CPU clock (50MHz). Long coaxial cables (4m at maximum) are directly driven with the ECL interface of this chip. Using the dual port RAM, packet buffers allow to push and pull a flit of the packet simultaneously.
Since the introduction of magnetoresistive (MR) heads, the areal density of hard disk drives (HDDs) has been increasing at a rate of 60% a year, and has now reached 1.4 Gb/sq. in. The data rate has also been increasing at a rate of 40% or more, and this has recently become a key factor in the ability of multimedia applications to transfer stored data rapidly from the HDD to the PC or workstation. Currently, data rates of around 150 Mb/sec are being implemented in products. In this study, key technologies for increasing both the areal density and the data rate of HDDs are proposed. If they are implemented, an areal density of around 10 Gb/sq. in. and a data rate of 200 Mb/sec or more can be achieved.
Adel CHERIF Masato SUZUKI Takuya KATAYAMA
We present a novel replication technique for parallel applications where instances of the replicated application are active on different group of processors called replicas. The replication technique is based on the FTAG (Fault Tolerant Attribute Grammar) computation model. FTAG is a functional and attribute based model. The developed replication technique implements "active parallel replication," that is, all replicas are active and compute concurrently a different piece of the application parallel code. In our model replicas cooperate not only to detect and mask failures but also to perform parallel computation. The replication mechanisms are supported by FTAG run time system and are fully application-transparent. Different novel mechanisms for checkpointing and recovery are developed. In our model during rollback recovery only that part of the computation that was detected faulty is discarded. The replication technique takes full advantage of parallel computing to reduce overall computation time.
Kumud KASHYAP Tadahiro WADA Masaaki KATAYAMA Takaya YAMAZATO Akira OGAWA
For mobile communication systems with code division multiple access (CDMA), a new modulation scheme, π/2-shift BPSK, is proposed. The performance has been evaluated in terms of relative out-of-band power, bit-error rate (BER), and spectral efficiency. As the result, it is shown that the proposed scheme has an advantage over conventional BPSK, conventional QPSK, and π/4-shift QPSK under nonlinear amplification.
Wei HUANG Essam A. SOUROUR Masao NAKAGAWA
Microcellular radio direct-sequence code division multiple access (DC-CDMA) system using optical link to connect their base stations to a central station is a solution of cost-effective and efficient spectrum reuse to meet the growing demand for mobile communications. In addition to the inherent multiuser interference (MUI) of CDMA signals, the system capacity is significantly reduced by a nonlinear distortion (NLD) due to the nonlinearity of optical link. In this paper, a two-stage cancellation technique is introduced into the system to cancel both the MUI and the NLD. It is performed at the receiver of the central station where the random ingredients of all user signals are estimated, and the MUI and the NLD are rebuilt and removed from the received signal. The validity of the cancellation technique is theoretically analyzed and shown by the numerical results. The analytical method and its results are also applicable to other general nonlinear CDMA.
Nobuo FUNABIKI Junji KITAMICHI Seishi NISHIKAWA
A digital neural network approach is presented for the multilayer channel routing problem with the objective of crosstalk minimization in this paper. As VLSI fabrication technology advances, the reduction of crosstalk between interconnection wires on a chip has gained important consideration in VLSI design, because of the closer interwire spacing and the circuit operation at higher frequencies. Our neural network is composed of N M L digital neurons with one-bit output and seven-bit input for the N-net-M-track-2L-layer problem using a set of integer parameters, which is greatly suitable for the implementaion on digital technology. The digital neural network directly seeks a routing solution of satisfying the routing constraint and the crosstalk constraint simultaneously. The heuristic methods are effectively introduced to improve the convergence property. The performance is evaluated through solving 10 benchmark problems including Deutsch difficult example in 2-10 layers. Among the existing neural networks, the digital neural network first achieves the lower bound solution in terms of the number of tracks in any instance. Through extensive simulation runs, it provides the best maximum crosstalks of nets for valid routing solutions of the benchmark problems in multilayer channels.
Shinya FUKUMOTO Hiromi MIYAJIMA Kazuya KISHIDA Yoji NAGASAWA
In this paper we suggest the "goodness" of models using the imformation criterion AIC. The information criterion AIC is a statistic to estimate the badness of models. When we usually make the fuzzy rules, we aim to minimize inference error and the number of rules. But these conditions are the criteria to acquire an optimum rule-model by using the training data. In the general case of fuzzy reasoning, we aim to minimize the inference error for not only given training data, but also unknown data. So we have introduced a new information criterion based on AIC into the appraised criterion for estimating the acquired fuzzy rules. Experimental results are given to show the validity of using AIC.
Yoshihiro OKAMOTO Minoru SOUMA Shin TOMIMOTO Hidetoshi SAITO Hisashi OSAWA
A punctured convolutional coded PR4ML system for digital magnetic recording, which applies a punctured coding method to the convolutional code and records the punctured code sequences on two tracks, is proposed. In this study, the bit error rate performance of the proposed system is obtained by computer simulation taking account of partial erasure, which is one of the nonlinear distortions at high densities, and it is compared with those of a conventional 8/9 coded PR4ML system and an I-NRZI coded PR4ML system. The results show that the proposed system is hardly affected by partial erasure and exhibits good performance in high-density recording. A bit error rate of 10-4 can be achieved with SNR's of approximately 13.2 dB and 9.1 dB less than those of the conventional 8/9 coded and I-NRZI coded PR4ML systems, respectively, at a normalized linear density of 3.
Toshiyuki SUZUKI Tomohiro MITSUGI
This paper reports the thermal stability of particulate media, which include Co-Fe oxide, CrO2, and thick and thin MP tapes. By measuring the time decay of magnetization at room temperature, fluctuation fields were obtained as a function of reverse applied field. It was clarified that the fluctuation field has a constant and minimum value when the reverse applied field is equal to coercivity. Minimum fluctuation fields for the four particulate tapes were measured at several environmental temperatures ranging from -75 to +100. It was also clarified that the fluctuation field normalized by remanence coercivity increases as the environmental temperature increases for all tapes, indicating that it is a good measure of thermal stability. Activation volumes were also deduced as a function of temperature.
Xiaoxing ZHANG Xiayu NI Masahiro IWAHASHI Noriyoshi KAMBAYASHI
In this paper, implementation of a first-order active complex filter with variable parameter using operational transconductance amplifiers (OTAs) and grounded copacitors is presented. The proposed configurations can be used as s key building block to realize high-order active complex filters with variable parameter in cascade and leapfrog configuration. Experimental results which are in good agreement with theoretical responses are also given o demonstrate the feasibility of the proposed configurations.
Yuji IWAHORI Robert J. WOODHAM Masahiro OZAKI Hidekazu TANAKA Naohiro ISHII
An implementation of photometric stereo is described in which all directions of illumination are close to and rotationally symmetric about the viewing direction. THis has practical value but gives rise to a problem that is numerically ill-conditioned. Ill-conditioning is overcome in two ways. First, many more than the theoretical minimum number of images are acquired. Second, principal components analysis (PCA) is used as a linear preprocessing technique to determine a reduced dimensionality subspace to use as input. The approach is empirical. The ability of a radial basis function (RBF) neural network to do non-parametric functional approximation is exploited. One network maps image irradiance to surface normal. A second network maps surface normal to image irradiance. The two networks are trained using samples from a calibration sphere. Comparison between the actual input and the inversely predicted input is used as a confidence estimate. Results on real data are demonstrated.
Kunimaro TANAKA Yoshinori NEGISHI Kyosuke YOSHIMOTO Yasunori TAKAHASHI
Small-scale video on demand system will be necessary in the future. Cluster drives, which use optical disk drives, are a good buffer memory for this purpose because the cost per megabyte is low. An ordinary optical cluster drive has many SCSI buses and up to seven optical drives are connected to each SCSI bus. One drive from each bus is assembled to make a group of a cluster drive. The difference betweeen SCSI bus data transfer rate and sustained disk transfer rate enables the cluster drive to be simplified. Several drives on an SCSI bus make a sub-group. The video data is striped onto those sub-groups. When the total data transfer rate from disks within a sub-group exceeds the bus transfer rate, some drives can not acquire the bus. When drives connected to one SCSI bus are not identical, the block size of the data to be recorded on each drive has to be adjusted so that the maximum effective data transfer rate can be obtained. When the cycle times of a slow and fast drive are set identical, the effective data transfer rate is maximum, where one cycle consists of command time, minimum bus free time, disk read time, and bus transfer time.
Nobuo FUNABIKI Junji KITAMICHI Seishi NISHIKAWA
A neural network of massively interconnected digital neurons is presented for the total coloring problem in this paper. Given a graph G (V, E), the goal of this NP-complete problem is to find a color assignment on the vertices in V and the edges in E with the minimum number of colors such that no adjacent or incident pair of elements in V and E receives the same color. A graph coloring is a basic combinatorial optimization problem for a variety of practical applications. The neural network consists of (N+M) L neurons for the N-vertex-M-edge-L-color problem. Using digital neurons of binary outputs and range-limited non-negative integer inputs with a set of integer parameters, our digital neural network is greatly suitable for the implementation on digital circuits. The performance is evaluated through simulations in random graphs with the lower bounds on the number of colors. With a help of heuristic methods, the digital neural network of up to 530, 656 neurons always finds a solution in the NP-complete problem within a constant number of iteration steps on the synchronous parallel computation.
Tadayoshi HORITA Itsuo TAKANAMI
The authors previously proposed a reconfigurable architecture called the "XL-scheme" in order to cope with processor element (PE) faults as well as link faults. However, they described an algorithm for compensating only for link faults. They determined the potential ability to tolerate faults of the XL-scheme for simultaneous faults of links and PEs, and left a reconstruction algorithm for simultaneous PE and link faults to be studied in the future. This paper briefly explains the XL-scheme and gives a reconstruction algorithm for simultaneous PE and link faults. The algorithm first replaces faulty PEs with healthy ones and then replaces faulty links with healthy ones. We then compute the reliabilities of the mesh-arrays with simultaneous PE and link faults by simulation. We compare the reliability of the XL-scheme with that of the one-and-half track switch model. It is seen that the former is much larger than the latter. Furthermore, we show the result for processing time.
The construction of fault-tolerant processor arrays with interconnections of cube-connected cycles (CCCs) by using an advanced spare-connection scheme for k-out-of-n redundancies called "generalized additional bypass linking" is described. The connection scheme uses bypass links with wired OR connections to spare processing elements (PEs) without external switches, and can reconfigure complete arrays by tolerating faulty portions in these PEs and links. The spare connections are designed as a node-coloring problem of a CCC graph with a minimum distance of 3: the chromatic numbers corresponding to the number of spare PE connections were evaluated theoretically. The proposed scheme can be used for constructing various k-out-of-n configurations capable of quick broadcasting by using spare circuits, and is superior to conventional schemes in terms of extra PE connections and reconfiguration control. In particular, it allows construction of optimal r-fault-tolerant configurations that provide r spare PEs and r extra connections per PE for CCCs with 4x PEs (x: integer) in each cycle.
Sadaki HIROSE Satoshi OKAWA Haruhiko KIMURA
Let L be any class of languages, L' be one of the classes of context-free, context-sensitive and recursively enumerable languages, and Σ be any alphabet. In this paper, we show that if the following statement (1) holds, then the statement (2) holds. (1) For any language L in L over Σ, there exist an alphabet Γ including Σ, a homomorphism h:Γ*Σ* defined by h(a)=a for aΣ and h(a)=λ (empty word) for aΓ-Σ, a Dyck language D over Γ, and a language L1 in L' over Γ such that L=h(DL1). (2) For any language L in L over Σ, there exist an alphabet of k pairs of matching parentheses Xk, Dyck reduction Red over Xk, and a language L2 in L' over ΣXk such that L=Red(L2)Σ*. We also give an application of this result.
This paper first proposes a new approach to designing high-quality, low-diameter, small mean-internode-distance (MID), k-subcubic-connected cyclic networks. The approach is a modification of the k-cubic-connected cyclic (k-ccc) network in which there are N=k2k-1 instead of N=k2k nodes in the k-ccc network. The special features of this network are: (1) It fills the gap between the number of nodes in k-ccc and (k+1)-ccc networks, but retains a constant number of link (3) per node in the network, (2) it allows higher quality, smaller diameters and mean internode distances hypercube networks with the same numbers of nodes. A second novel approach consists of a k+-sccc network with the same number of nodes as the k-ccc but with smaller diameters and mean internode distances. A generalized k-ccc network formed by nodes N=k2m is introduced for n-cube and k-ccc (modified or normal) networks that allows minimum network quality to be obtained where m may or may not equal to k. A routing algorithm for 4-sccc is also presented.
This paper proposes a new multiuser detector, quasi-decorrelating detector (QDD), for a synchronous CDMA system. The QDD has the same complexity as that of decorrelator detector (DD) although it uses feedback loops, the number of which is adjustable to balance the near-far resistance and noise enhancement. The results show that the QDD outperforms the DD under various operational conditions. The impact of different spreading codes on the performance of the QDD is studied. It is shown that the Gold code is the best spreading code suitable for the QDD.
Hitoshi YAMAUCHI Takayuki MAEDA Hiroaki KOBAYASHI Tadao NAKAMURA
The multipass rendering method based on the global illumination model can generate the most photo-realistic images. However, since the multipass rendering method is very time consuming, it is impractical in the industrial world. This paper discusses a massively parallel processing approach to fast image synthesis by the multipass rendering method. Especially, we focus on the performance evaluation of the view-dependent object-space parallel processing on the (Mπ)2 which has been proposed in our previous paper. We also propose two kinds of distributed frame buffer system named cached frame buffer and multistage-interconnected frame buffer. These frame buffer systems can solve the access conflict problem on the frame buffer. The simulation results show that the (Mπ)2 has a scalable performance. For example, the (Mπ)2 with more than 4000 processing elements can achieve an efficiency of over 50%. We also show that both of the proposed distributed frame buffer systems can relieve the overhead due to frame buffer access in the (Mπ)2 in the case that a large number of high-performance processing elements are adopted in the system.
Tadayuki SAKAKIBARA Katsuyoshi KITAI Tadaaki ISOBE Shigeko YAZAWA Teruo TANAKA Yasuhiro INAGAMI Yoshiko TAMAKI
We present a scalable parallel memory architecture with a skew scheme by which permanent-concentration-free strides, if any, do not depend on the number of ways in parallel memory interleaving. The permanent-concentration is a kind of memory access conflict. With conventional skew schemes, permanent-concentration-free strides depended on the number of banks (or bank groups) in parallel memory (=number of ways in parallel memory interleaving). We analyze two kinds of cause of conflicts: permanent-concentration occurs when memory access requests concentrate in limited number of banks (or bank groups) in parallel memory, and transient-concentration, when memory access requests transiently concentrate in some banks (or bank groups) in parallel memory. We have identified permanent-concentration-free strides, which are independent of the number of banks (or bank groups) in parallel memory, by solving two concentrations separately. The strategy is to increase the size of address block of shifting address assignment to the parallel memory in order to reduce permanent-concentrations, and make the size of the buffer for each banks (or bank groups) in the parallel memory match the size of address block of shifting in order to absorb transient-concentrations. The skew scheme uses the same size of address block of shifting address assignment for memory systems for different numbers of banks (or bank groups) in parallel memory. As a result, scalability for permanent-concentration-free strides is achieved independent of the number of banks (or bank groups) in parallel memory.