Yuetsu KODAMA Hirohumi SAKANE Mitsuhisa SATO Hayato YAMANA Shuichi SAKAI Yoshinori YAMAGUCHI
Communication latency is central to multiprocessor design. This study presents the design principles of the EM-X distributed-memory multiprocessor towards tolerating communication latency. The EM-X overlaps computation with communication for latency tolerance by multithreading. In particular, we present two types of hardware support for remote memory access: (1) priority-based packet scheduling for thread invocation, and (2) direct remote memory access. The priority-based scheduling policy extends a FIFO ordered thread invocation policy to adopt to different computational needs. The direct remote memory access is designed to overlap remote memory operations with thread execution. The 80-processor prototype of EM-X is developed and is operational since December 1995. We execute several programs on the machine and evaluate how the EM-X effectively overlaps computation with communication toward tolerating communication latency for high performance parallel computing.
There is a growing demand for high reliability beyond what current RAID can provide and there are various levels of user demand for data reliability. An efficient data placement scheme called RM2 has been proposed in [10], which makes a disk array system resistant to double disk failures. In this paper, we consider how to choose an optimal striping unit for RM2 particularly when no workload information is available except read/write ratio. For experimental purposes, we develop a disk array simulator incorporating RM2 as one of the data placement schemes including other schemes of RAID levels. In the case of disk read operations, it is shown that RM2 has an optimal striping unit of 4/3T for large requests and 8/3T for small requests, where T represents the size of a single track. We have also shown that, if any disk write operations are involved, an optimal striping unit becomes 1/3T for large requests and 8/3T for small requests.
Kazuhiko MOGI Masaru KITSUREGAWA
RAID5 disk arrays provide high performance and high reliability for reasonable cost. However RAID5 suffers a performance penalty during block updates. In this paper, we propose a method to improve the small write performance of RAID5 disk arrays, named Virtual Striping. Instead of updating each block independently, this method buffers a number of updates, generates a new stripe composed of the newly updated blocks, then writes the full stripe back to disk. In order to make free space for write operations, new garbage collection strategy is employed, where the linkage of blocks in a parity stripe is changed in Virtual Striping. The LFS (log-structured file system) based storage management scheme also writes new block onto large free area, which uses copying garbage collection. In this paper, we compare the performance of both methods through simulation. Although the write cost of Virtual Striping is more than that of the LFS based method, Virtual Striping has better performance than the LFS based method. This is due to the high efficiency of garbage collection in Virtual Striping.
Takashi YOKOTA Hiroshi MATSUOKA Kazuaki OKAMOTO Hideo HIRONO Shuichi SAKAI
This paper discusses a massively parallel interconnection scheme for multithreaded architecture and introduces a new class of direct interconnection networks called the hierarchical Multidimensional Directed Cycles Ensemble (hMDCE). Its suitability for massively parallel systems is discussed. The network is evolved from the Multidimensional Directed Cycles Ensemble (MDCE) network, where each node is substituted by lower-level sub-networks. The new network addresses some serious problems caused by the increasing scale of parallel systems, such as longer latency, limited throughput and high implementation cost. This paper first introduces the MDCE network and then presents and examines in detail the hierarchical MDCE network. Bisection bandwidth of hMDCE is considerably reduced from its ancestor MDCE and the network performs significantly higher throughput and lower latency under some practical implementation constraints. The gate count and delay time of the compiled circuit for the routing function are insignificant. These results reveal that the hMDCE network is an important candidate for massively parallel systems interconnection.
Hani C. YEHIA Kazuya TAKEDA Fumitada ITAKURA
The objective of this paper is to find a parametric representation for the vocal-tract log-area function that is directly and simply related to basic acoustic characteristics of the human vocal-tract. The importance of this representation is associated with the solution of the articulatory-to-acoustic inverse problem, where a simple mapping from the articulatory space onto the acoustic space can be very useful. The method is as follows: Firstly, given a corpus of log-area functions, a parametric model is derived following a factor analysis technique. After that, the articulatory space, defined by the parametric model, is filled with approximately uniformly distributed points, and the corresponding first three formant frequencies are calculated. These formants define an acoustic space onto which the articulatory space maps. In the next step, an independent component analysis technique is used to determine acoustic and articulatory coordinate systems whose components are as independent as possible. Finally, using singular value decomposition, acoustic and articulatory coordinate systems are rotated so that each of the first three components of the articulatory space has major influence on one, and only one, component of the acoustic space. An example showing how the proposed model can be applied to the solution of the articulatory-to-acoustic inverse problem is given at the end of the paper.
Masahiro AGU Kazuo YAMANAKA Hiroki TAKAHASHI
Stable phase locked states" are found amongst the equiliblia of the phasor model known as a generalized Hopfield model having complex-valued local states on the unit circle with centre at the origin. The asynchronous updating rule is assumed, and the energy decreasing characteristic is used to investigate a property of the equilibrium states. Some of the equilibria are shown to be fragile" in the sense that the energy is not locally convex. It is also shown that the local convexity of the energy is assured by a sort of consistency between the equilibrium and the connection weights.
Tadayoshi HORITA Itsuo TAKANAMI
Various reconfiguration schemes against faults of mesh-connected processor arrays have been proposed. As one of them, the mesh-connected processor arrays model based on single-track switches was proposed in [1]. The model has an advantage of its inherent simplicity of the routing hardware. Furthermore, the 2 track switch model [2] and the multiple track switch model [3] were proposed to enhance yields and reliabilities of arrays. However, in these models, Simplicity of the routing hardware is somewhat lost because multiple tracks are used for each row and column. In this paper, we present a builtin self-reconstruction approach for mesh-connected processor arrays which are partitioned into sub-arrays each using single-track switches. Spare PEs which are located on the boundaries of the sub-arrays compensate faulty PEs in these sub-arrays. First, we formulate a reconfigulation algorithm for partitioned mesh-arrays using a Hopfield-type neural network, and then its performance for reconfigulation in terms of survival rates and reliabilities of arrays and processing time are investigated by computer simulations. From the results, we can see that high reliabilites are achieved while processing time is a little and hardware overhead (links and switches) required for reconstruction is as same as that for the track switch model. Next, we present a hardware implementation of the neural algorithm so that a built-in self-reconfigurable scheme may be realized.
Yoshiaki SAITOH Akira KANKE Isamu SHINOZAKI Tohru KIRYU Jun'ichi HORI
Adapting the principle of parametron oscillation, a small implantable temperature sensor requiring no internal power supply is described. Since this sensor's oscillation frequency is half that of the excitation frequency, the oscillated signal can be measured from the reception side, free of any signal, interference, simply by positioning the sensor and the excitation antenna so that; 1) they are separated up to 95 cm in the air; 2) a 41 cm gap, the phantom equivalent of the thickness of the human abdomen maintain between them. In the temperature-dependent quartz resonator sensor, oscillation occurs only when frequency and temperature correspond. The excitation power is then adjusted so that the frequency bandwidth narrows. As a result, the margin of error in measuring the temperature is minimized; (0.07).
Tetsuya MIYASHITA Tatsuo UCHIDA
To overcome the problem of narrow viewing angle in active matrix liquid crystal displasy(LCDs) in the twisted nematic mode(TN mode), we have proposed a new LCD mode using a bend-alignment cell with an optical compensator. In this new mode, we have successfully obtained a black state with almost no leakage over a wide viewing angle range with very fast response. We describe the fundamental principle and design rule of the optical compensator and discuss the properties obtained in theoretical and experimental term.
The purpose of this letter is to investigate the stability of the active two port networks having some restrictions on load and source terminations, and the stability conditions having two inequalities have been obtained. As the terminations making the active two port networks stable can be obtained from these inequalities, these stability conditions are very useful for designing high frequency amplifiers, especially, tuned amplifiers.
Saed SAMADI Akinori NISHIHARA Nobuo FUJII
A classs of type 1 linear phase FIR digital filters is proposed. The filter can be realized using a parallel, modular and regular array structure. It is shown that, under some simple constraints, the consisting modules of the array can be realized free of multiplier coefficients. Such two dimensional mesh arrays are specially suitable for realization with special-purpose systolic hardware for high-speed digital signal processing tasks. Compared to the array structure, proposed by the authors, for multiplierless realization of maximally flat FIR digital filters, this class needs less adders to fulfill the same magnitude response requirements. Another attractive property of the proposed array is that a number of highpass or lowpass filters with different passband widths can be realized simultaneously in a very economical way.
Because the match phase in OPS5-type production systems requires most of the system's execution time and memory accesses, we proposed hash-based parallel production systems, CPPS (Clustered Parallel Production Systems), based on the RETE algorithm for distributed memory parallel computers, or multicomputers to reduce such a bottleneck. CPPS was effective in speeding up the match phase, but still left room for optimizations. In this paper, we introduce software cache techniques to memory nodes in the CPPS as one of the optimizations, and implement it on a multicomputer, nCUBE2. The benchmark results show that the CPPS with the software cache is about 2-fold faster than the original, and more than 7-fold faster than the simple hash method proposed by Acharya et al. for a large scale problem. The speed-up can be attributed to decreased communication costs.
Yen-Wei CHEN Zensho NAKAO Shinichi TAMURA
An attenuation correction method was proposed for laser-produced plasma emission computed tomography (ECT), which is based on a relation of the attenuation coefficient and the emission coefficient in plasma. Simulation results show that the reconstructed images are dramatically improved in comparison to the reconstructions without attenuation correction.
Hideo SAITO Etsuo NAKAGAWA Tetsuya MATSUSHITA Fusayuki TAKESHITA Yasuhiro KUBO Shuichi MATSUI Kazutoshi MIYAZAWA Yasuyuki GOTO
Flurorinated liquid crystal compounds having fluorophenyl, difluorophenyl and trifluorophenyl moieties combined with ester linkages, 1,2-ethylenes and covalent bonds were prepared and checked for their physical properties i.e. mesophases, dielectric and optical anisotropy. viscosity, pretilt angle and threshold voltage. By introducing fluorine atom(s) into the molecules, optical anisotropy and threshold voltage decreased, though the nematic temperature range diminished. The investigated compounds were all chemically stable and by using the compounds nematic liquid crystalline mixtures having low threshold voltage, low viscosity, large optical anisotropy and wide nematic ranges which were suitable for AM-LCDs, could be obtained.
Noritaka SHIGEI Hiromi MIYAJIMA Takayuki ISHIZAKA Sadayuki MURASHIMA
To enhance fabrication yield for processor arrays, many reconfiguration schemes for replacing faulty processing elements (PE's) with spare PE's have been proposed. An array grid model based on single-tracks is one of such models. For this model, some algorithms for reconfiguring processor arrays have been proposed. However, an algorithm which can reconfigure the array, whenever the array is reconfigurable, has not been proposed yet. This paper presents two types of methods for reconfiguration of processor arrays. Both the types use indirect replacements for reconfiguring arrays. For an indirect replacement of a faulty non-spare PE, one has a fixed direction, the other has at most four directions among which one is chosen. For the former, we consider the several distribution of spare PE's, and computer simulations show a tendency in the term of difference in the distributions. The latter algorithms consist of two phases. In the first phase, rows and columns of spare PE's are decided in accordance with a rule. Several rules for deciding spare PE's are considered in this paper. In the second phase, faulty non-spare PE's are replaced with healthy spare PE's. By simulations the performance of the algorithms are evaluated and a tendency is shown in the terms of difference in disposition of spare PE's.
Toshinori YAMADA Koji YAMAMOTO Shuichi UENO
Motivated by the design of fault-tolerant multiprocessor interconnection networks, this paper considers the following problem: Given a positive integer t and a graph H, construct a graph G from H by adding a minimum number Δ(t, H) of edges such that even after deleting any t edges from G the remaining graph contains H as a subgraph. We estimate Δ(t, H) for the hypercube and torus, which are well-known as important interconnection networks for multiprocessor systems. If we denote the hypercube and the square torus on N vertices by QN and DN respectively, we show, among others, that Δ(t, QN) = O(tN log(log N/t + log 2e)) for any t and N (t 2), and Δ(1, DN) = N/2 for N even.
Kunio SAKAKIBARA Jiro HIROKAWA Makoto ANDO Naohisa GOTO
In the design of a large slotted waveguide array, evaluation of mutual couplings between the slots is time consuming. This paper proposes an effective approximation analysis of the external mutual couplings using periodic boundary condition. Simple design procedure is verified for two-dimensional slot array.
Efficient parallel algorithms for several problems on proper circular arc graphs are presented in this paper. These problems include finding a maximum matching, partitioning into a minimum number of induced subgraphs each of which has a Hamiltonian cycle (path), partitioning into induced subgraphs each of which has a Hamiltonian cycle (path) with at least k vertices for a given k, and adding a minimum number of edges to make the graph contain a Hamiltonian cycle (path). It is shown here that the above problems can all be solved in logarithmic time with a linear number of EREW PRAM processors, or in constant time with a linear number of BSR processors. A more important part of this work is perhaps the extension of basic BSR to allow simultaneous multiple BROADCAST instructions.
Kazumasa KOBAYASHI Kouji YAMANO Hideki KOKUBUN Kiichi KOBAYASHI
A new high-speed decoding algorithm for Difference-set cyclic codes, and the design and implementation of a 50 MHz CMOS LSI for decoding the (1057, 813) DSCC, are presented. The algorithm, called modified threshold decoding, makes it possible to introduce an arbitrary number of pipeline stages into feedback loops in decoding circuits. A prototype LSI containing about 13k logic gates was fabricated using 1 µm CMOS gate-array technology. The power consumption is less than 750 mW at a 50 MHz clock rate. It is available for digital data transmission systems having an I/O data rate of up to 25 MBPS. It is being used in experimental set-ups targeted at future digital broadcasting systems. The proposed algorithm has an important advantage for much longer codes as it has the potential to be used in the high-speed decoding of DSCCs having a code length longer than 1057.
Hiraku OKADA Takeshi SATO Takaya YAMAZATO Masaaki KATAYAMA Akira OGAWA
In this paper, we analyze the throughput and delay performances of the CDMA unslotted ALOHA system considering packet retransmisson. We also clarify the stability of the system. Based on these results, we propose the optimal retransmission control (ORC) to improve the performances. The ORC is the scheme to prevent the system from drifting to an undesirable operating point by controlling the birth rate of retransmitted packets. As a result, it is shown that the throughput and delay performances of the system with the ORC are better than without the ORC and the system does not drift to an undesirable operating point.