Atsushi KUSUNOKI Mitsuru TANAKA
Polarization transformation characteristics of a statified slab consisting of uniaxial chiral layers are investigated. It is assumed that a plane electromagnetic wave with arbitrary polarization is normally incident from free space on the stratified slab, which is located on a dielectric substrate. Note that the electric field inside a uniaxial chiral layer is expressed as a sum of four plane waves with different wavenumbers. The wavenumbers are found by seeking non-trivial solutions of the constitutive relations with Maxwell's equations. The electric field components of the transmitted and reflected waves can be obtained from a chainmatrix formalism. The powers and the Stokes parameters of the two waves are represented in terms of their electric field components. As is well known, the Stokes parameters uniquely describe every possible state of polarization of a plane wave. Numerical results are presented for two types of uniaxial chiral structure. The cross- and co-polarized powers and the Stokes parameters of the transmitted and reflected waves are computed for the incident plane wave of linear polarization. The results demonstrate a significant polarization transformation of the transmitted wave. Then it is shown that the stratified slab can be used as efficient polarization-transformation transmission filters active at some frequency band.
Recently, progress has been made in the area of electrical modeling of conductors embedded in arbitrary dielectrics using circuit oriented techniques. These models usually occur in conjunction with VLSI type circuits. Many different applications exist today for such models in the EMI, EIP (Electrical Interconnect and Package) analysis as well as for the microwave circuit area. Practical problems involve a multitude of hardware components and they demand a wide spectrum of both time as well as frequency domain solution techniques. In this paper we consider circuit oriented techniques for the solution of these problems. Specifically, we give an outline of the three dimensional Partial Element Equivalent Circuit (PEEC) full wave modeling approach and review the recent progress in this area.
A family of nonce-based authentication and key distribution protocols based on the trusted third-party model are proposed which are not only efficient on the view points of computation and communication, but also secure against on-line and off-line password guessing attacks. A new concept of implicit or indirect challenge-response authentication which can be used to combine the processes of identify authentication and data integrity assurance during key distribution and to make the entire protocol be more concise and efficient is introduced in this paper. In the proposed family of protocols, specific protocol can be chosen such that the secure session key to be distributed is selected by specific participant in the protocol. Detailed security analyses of every protocols are given.
Masayuki YAMAGUCHI Akihisa YAMADA Toshihiro NAKAOKA Takashi KAMBE Nagisa ISHIURA
This paper presents a novel way of evaluating architecture of embedded custom DSPs which helps designers optimizing the datapath configuration and the instruction set. Given a datapath structure, it evaluates the performance in terms of the estimated number of steps to execute the target program on the datapath. A concept of "parallel constraint" is newly introduced, which enables evaluation of the impact of instruction format design on the performance without explicity specifying the instruction format. The number of execution steps is estimated by a combination of static analysis and dynamic analysis. It enables fast and precise estimation of actual performance in the early design stage. We have developed an architecture evaluation system based on the presented method and applied it to some actual design of signal processors. We demonstrate the accuracy of estimation and the usefulness of the method through its applications.
Mitsuru MARUYAMA Naohisa TAKAHASHI Takeshi MIEI Tsuyoshi OGURA Tetsuo KAWANO Satoru YAGI
A parallel IP router that uses off-the-shelf wor-kstations and interconnecting switches is presented. This router, called CORErouter-I, is a medium-grained, functionally distributed parallel system consisting of four kinds of processors for routing, routing-table searching, servicing, and line interfacing. Also discussed are issues related to the implementation of CORErouter-I, especially in terms of routing protocol processing and packet-forwarding. Performance characteristics of CORErouter-I are also clarified through several experiments performed to evaluate maximum throughput, analyze packet-forwarding time, and estimate the effect of parallel processing on the route-flapping problem.
Edoardo CHARBON Enrico MALAVASI Paolo MILIOZZI Alberto SANGIOVANNI-VINCENTELLI
In this paper we propose a comprehensive approach to physical design based on the constraint paradigm. Bounds on the most critical circuit parasitics are automatically generated to help designers and/or physical design tools meet a set of high-level specifications. The constraint generation engine is based on constrained optimization, where various parasitic effects on interconnect and devices are accounted for and dealt with in different manners according to their statistical behavior and their effect on performance.
Go HASEGAWA Hiroyuki OHSAKI Masayuki MURATA Hideo MIYAHARA
We investigate performance of TCP protocol over ATM networks by using a simulation technique. As the ATM layer, we consider (1) rate-based control of the ABR service class and (2) an EPD (Early Packet Discard) technique applied to the UBR service class and (3) and EPD with per-VC accounting for fairness enhancement applied to the UBR service class. In comparison, we adopt a multi-hop network model where the multiple ATM switches are interconnected. In such a network, unfairness among connections is a possible cause of the problem due to differences of the number of hops and/or the round trip times among connections. Simulation results show that the rate-based control method of ABR achieves highest throughput and best fairness in most circumstances. However, the performance of TCP over ABR is degraded once the cell loss takes place due to the inappropriate control parameter setting. To avoid this performance degradation, we investigate the appropriate parameter set suitable to TCP on ABR service. As a result, parameter tuning can improve the performance of TCP over ABR, but limited. We therefore consider TCP over ABR with EPD enhancement where the EPD technique is incorporated into ABR. We last consider the multimedia network environment, where the VBR traffic exists in the network in addition to the ABR/UBR traffic. By this, we investigate an applicability of the above observations to a more generic model. Through simulation experiments, we find that the similar results can be obtained, but it is also shown that parameters of the rate-based congestion control must be chosen carefully by taking into account the existence of VBR traffic. For this, we discuss the method to determine the appropriate control parameters.
Naoki HONDA Takashi KOMAKINE Kazuhiro OUCHI
A modified frequency domain method for analyzing nonlinear waveform distortion in a magnetic recording process is presented. The measurement technique combines a 5th harmonic measurement technique, which uses a specific 30-bit pattern including dibits, and a precompensation technique for the dibits. The 5th harmonic voltage ratio given by the former technique includes the amount of NLTS (Nonlinear transition shift) and PE (Partial erasure) in dibits. The latter precompensation technique is employed to evaluate the PE as the minimum in the 5th harmonic voltage ratio. The true NLTS can be estimated from the amount of distortion and the evaluated PE. The high accuracy of the technique was confirmed by an examination using a pulse pattern generator with varied phase and amplitude. Finally, the effects of medium properties such as coercivity and squareness on the nonlinear distortions have been investigated by applying the technique to particulate flexible media. The NLTS increased with squareness from 3.5% to 7% while PE was less than 6% for any squareness at a recording density of 76 kFRPI. When coercivity became large, NLTS and PE decreased. The direction of NLTS for Ba-ferrite media agreed with that for a perpendicular Co-Cr thin-film medium.
Yoshihiro OKAMOTO Minoru SOUMA Shin TOMIMOTO Hidetoshi SAITO Hisashi OSAWA
A punctured convolutional coded PR4ML system for digital magnetic recording, which applies a punctured coding method to the convolutional code and records the punctured code sequences on two tracks, is proposed. In this study, the bit error rate performance of the proposed system is obtained by computer simulation taking account of partial erasure, which is one of the nonlinear distortions at high densities, and it is compared with those of a conventional 8/9 coded PR4ML system and an I-NRZI coded PR4ML system. The results show that the proposed system is hardly affected by partial erasure and exhibits good performance in high-density recording. A bit error rate of 10-4 can be achieved with SNR's of approximately 13.2 dB and 9.1 dB less than those of the conventional 8/9 coded and I-NRZI coded PR4ML systems, respectively, at a normalized linear density of 3.
Rafiqul ISLAM Yoshikazu MIYANAGA Koji TOCHINAI
This paper presents a new multi-clustering network for the purpose of intelligent data classification. In this network, the first layer is a self-organized clustering layer and the second layer is a restricted clustering layer with a neighborhood mechanism. A new clustering algorithm is developed in this system for the efficiently use of parallel processors. This parallel algorithm enables the nodes of this network to be independently processed in order to minimize data communication load among processors. Using the parallel processors, the quite low calculation cost can be realized among the conventional networks. For example, a 4-processor parallel computing system has shown its ability to reduce the time taken for data classification to 26.75% of a single processor system without declining its performance.
Andrzej CICHOCKI Shun-ichi AMARI Jianting CAO
In this paper we develop a new family of on-line adaptive learning algorithms for blind separation of time delayed and convolved sources. The algorithms are derived for feedforward and fully connected feedback (recurrent) neural networks on basis of modified natural gradient approach. The proposed algorithms can be considered as generalization and extension of existing algorithms for instantaneous mixture of unknown source signals. Preliminary computer simulations confirm validity and high performance of the proposed algorithms.
Shinhaeng LEE Shin'ichiro OMACHI Hirotomo ASO
Linear programming techniques are useful in many diverse applications such as: production planning, energy distribution etc. To find an optimal solution of the linear programming problem, we have to repeat computations and it takes a lot of processing time. For high speed computation of linear programming, special purpose hardware has been sought. This paper proposes a systolic array for solving linear programming problems using the revised simplex method which is a typical algorithm of linear programming. This paper also proposes a modified systolic array that can solve linear programming problems whose sizes are very large.
Tadayuki SAKAKIBARA Katsuyoshi KITAI Tadaaki ISOBE Shigeko YAZAWA Teruo TANAKA Yasuhiro INAGAMI Yoshiko TAMAKI
We present a scalable parallel memory architecture with a skew scheme by which permanent-concentration-free strides, if any, do not depend on the number of ways in parallel memory interleaving. The permanent-concentration is a kind of memory access conflict. With conventional skew schemes, permanent-concentration-free strides depended on the number of banks (or bank groups) in parallel memory (=number of ways in parallel memory interleaving). We analyze two kinds of cause of conflicts: permanent-concentration occurs when memory access requests concentrate in limited number of banks (or bank groups) in parallel memory, and transient-concentration, when memory access requests transiently concentrate in some banks (or bank groups) in parallel memory. We have identified permanent-concentration-free strides, which are independent of the number of banks (or bank groups) in parallel memory, by solving two concentrations separately. The strategy is to increase the size of address block of shifting address assignment to the parallel memory in order to reduce permanent-concentrations, and make the size of the buffer for each banks (or bank groups) in the parallel memory match the size of address block of shifting in order to absorb transient-concentrations. The skew scheme uses the same size of address block of shifting address assignment for memory systems for different numbers of banks (or bank groups) in parallel memory. As a result, scalability for permanent-concentration-free strides is achieved independent of the number of banks (or bank groups) in parallel memory.
For the solutions of linear systems of equations with unsymmetric coefficient matrices, we propose an improved version of the quasi-minimal residual (IQMR) method by using the Lanczos process as a major component combining elements of numerical stability and parallel algorithm design. For Lanczos process, stability is obtained by a coupled two-term procedure that generates Lanczos vectors scaled to unit length. The algorithm is derived such that all inner products and matrixvector multiplications of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time. Therefore, the cost of global communication on parallel distributed memory computers can be significantly reduced. The resulting IQMR algorithm maintains the favorable properties of the Lanczos process while not increasing computational costs. The efficiency of this method is demonstrated by numerical experimental results carried out on a massively parallel distributed memory computer, the Parsytec GC/PowerPlus.
Hitoshi YAMAUCHI Takayuki MAEDA Hiroaki KOBAYASHI Tadao NAKAMURA
The multipass rendering method based on the global illumination model can generate the most photo-realistic images. However, since the multipass rendering method is very time consuming, it is impractical in the industrial world. This paper discusses a massively parallel processing approach to fast image synthesis by the multipass rendering method. Especially, we focus on the performance evaluation of the view-dependent object-space parallel processing on the (Mπ)2 which has been proposed in our previous paper. We also propose two kinds of distributed frame buffer system named cached frame buffer and multistage-interconnected frame buffer. These frame buffer systems can solve the access conflict problem on the frame buffer. The simulation results show that the (Mπ)2 has a scalable performance. For example, the (Mπ)2 with more than 4000 processing elements can achieve an efficiency of over 50%. We also show that both of the proposed distributed frame buffer systems can relieve the overhead due to frame buffer access in the (Mπ)2 in the case that a large number of high-performance processing elements are adopted in the system.
Vijay K. JAIN Tadasse GHIRMAI Susumu HORIGUCHI
Advanced scientific and engineering problems require massively parallel computing. Critical to the designand ultimately the performanceof such computing systems is the interconnection network binding the computing elements, just as is the cardiovascular network to the human body. This paper develops a new interconnection network, "Tori connected mESHes (TESH)," consisting of k-ary n-cube connection of supernodes that comprise meshes of lower level nodes. Its key features are the following: it is hierarchical, thus allowing exploitation of computation locality as well as easy expansion (up to a million processors), and it appears to be well suited for 3-D VLSI implementation, for it requires far fewer number of vertical wires than almost all known multi-computer networks. Presented in the paper are the architecture of the new network, node addressing and message routing, 3-D VLSI/ULSI considerations, and application of the network to massively parallel computing. Specifically, we discuss the mapping on to the network of stack filtering, a hardware oriented technique for order statistic image filtering.
Wen-Yew LIANG Chung-Ta KING Feipei LAI
This paper introduces an object-based distributed shared memory (DSM) system called Adsmith. The primary goal of Adsmith is to provide a low-cost, portable, and efficient DSM for networks of workstations (NOW). Adsmith achieves this goal by building on top of PVM, a widely supported communication subsystem, as a user-level library and by incorporating many traffic reduction and latency hiding techniques. Issues involved in the design of Adsmith and our solution strategies will be discussed. Preliminary performance evaluation of Adsmith on a network of Pentium computers will be presented. The results show that programs developed with Adsmith can achieve a performance comparable to that developed with PVM.
Adel CHERIF Masato SUZUKI Takuya KATAYAMA
We present a novel replication technique for parallel applications where instances of the replicated application are active on different group of processors called replicas. The replication technique is based on the FTAG (Fault Tolerant Attribute Grammar) computation model. FTAG is a functional and attribute based model. The developed replication technique implements "active parallel replication," that is, all replicas are active and compute concurrently a different piece of the application parallel code. In our model replicas cooperate not only to detect and mask failures but also to perform parallel computation. The replication mechanisms are supported by FTAG run time system and are fully application-transparent. Different novel mechanisms for checkpointing and recovery are developed. In our model during rollback recovery only that part of the computation that was detected faulty is discarded. The replication technique takes full advantage of parallel computing to reduce overall computation time.
Takayuki SAITO Yoshiyasu TAKEFUJI
The graph partitioning problem is a famous combinatorial problem and has many applications including VLSI circuit design, task allocation in distributed computer systems and so on. In this paper, a novel neural network for the m-way graph partitioning problem is proposed where the maximum neuron model is used. The undirected graph with weighted nodes and weighted edges is partitioned into several subsets. The objective of partitioning is to minimize the sum of weights on cut edges with keeping the size of each subset balanced. The proposed algorithm was compared with the genetic algorithm. The experimental result shows that the proposed neural network is better or comparable with the other existing methods for solving the m-way graph partitioning problem in terms of the computation time and the solution quality.
Toshiyuki SUZUKI Tomohiro MITSUGI
This paper reports the thermal stability of particulate media, which include Co-Fe oxide, CrO2, and thick and thin MP tapes. By measuring the time decay of magnetization at room temperature, fluctuation fields were obtained as a function of reverse applied field. It was clarified that the fluctuation field has a constant and minimum value when the reverse applied field is equal to coercivity. Minimum fluctuation fields for the four particulate tapes were measured at several environmental temperatures ranging from -75 to +100. It was also clarified that the fluctuation field normalized by remanence coercivity increases as the environmental temperature increases for all tapes, indicating that it is a good measure of thermal stability. Activation volumes were also deduced as a function of temperature.