Junya KIYOHARA Tsutomu KAWABATA
We study Lempel-Ziv-Yokoo algorithm [1, Algorithm 4] for universal data compression. In this paper, we give a simpler implementation of Lempel-Ziv-Yokoo algorithm than the original one [1, Algorithm 4] and show its asymptotic optimality for a stationary ergodic source.
Overheads caused by frequently communicating α-β values among numerous parallel search processes not only degrade greatly the performance of existing parallel α-β search algorithm but also make it impractical to implement these algorithms in parallel hardware. To solve this problem, the proposed architecture reduces the overheads by using specially designed multi-value arbiters to compare and global broadcasting buses to communicate α-β values. In addition, the architecture employs a set of α-β search control units (α-β SCU's) with distributed α-β registers to accelerate the search by searching all subtrees in parallel. Simulation results show that the proposed parallel architecture with 1444 (38 38) (α-β SCU's) searching in parallel can achieve 179 folds of speed-up. To verify the parallel architecture, we implemented a VLSI chip with 3 α-β SCU's. The chip can achieve a search speed of 13,381,345 node-visits per second, which is more than three orders of improvement over that of existing parallel algorithms.
Takeshi UMEDA Katsumi SAKAKIBARA Masao KASAHARA
It is shown that most of the binary images of generalized algebraic-geometric codes meet the Varshamov-Gilbert bound from the viewpoint of the average binary weight enumerator.
This paper presents an O(mn log n) time algorithm for an approximate string matching problem, in which a pattern string may contain variable length don't care characters. This problem is important for searching DNA sequences or amino acid sequences.
Masazumi KURIHARA Shojiro SAKATA Kingo KOBAYASHI
In this paper we propose a class of byte-error-correcting codes derived from algebraic curves which is a generalization on the Reed-Solomon codes, and present their fast parallel decoding algorithm. Our algorithm can correct up to (m + b -θ)/2b byte-errors for the byte length b, where m + b -θ + 1dG for the Goppa designed distance dG. This decoding algorithm can be parallelized. In this algorithm, for our code over the finite field GF (q), the total complexity for finding byte-error locations is O (bt(t + q - 1)) with time complexity O (t(t + q - 1)) and space complexity O(b), and the total complexity for finding error values is O (bt(b + q - 1)) with time complexity O (b(b + q - 1)) and space complexity O (t), where t(m + b -θ)/2b. Our byte-error-correcting algorithm is superior to the conventional fast decoding algorithm for randomerrors in regard to the number of correcting byte-errors in several cases.
Hiroki NAKAMURA Takaya YAMAZATO Masaaki KATAYAMA Akira OGAWA
In this paper, we evaluate the bit error rate (BER) performance of a coordinate interleaved trellis coded QPSK with T-algorithm. We employ a coordinate interleaving which breaks up burst errors, caused by fading, more effectively than symbol interleaving. We employ a rate-1/2 convolutional codes and the performance is evaluated on Rayleigh fading channels in terms of bit error rate (BER) by analysis and computer simulation. We consider using of the code which having a long effective code length (ECL). For this reason, we employ a decoder based on T-algorithm instead of Viterbi algorithm to avoid the complexity in the decoding. As the results, we achieve satisfactory BER performance with a slight computation in the decoding algorithm and the finite interleaving size.
Akihiro FUJIWARA Michiko INOUE Toshimitsu MASUZAWA Hideo FUJIWARA
The medial axis transform (MAT) is an image representation scheme. For a binary image, the MAT is defined as a set of upright maximal squares which consist of pixels of value l entirely. The MAT plays an important role in image understanding. This paper presents a parallel algorithm for computing the MAT of an n n binary image. We show that the algorithm can be performed in O(log n) time using n2/log n processors on the EREW PRAM and in O(log log n) time using n2/log log n processors on the common CRCW PRAM. We also show that the algorithm can be performed in O(n2/p2 + n) time on a p p mesh and in O(n2/p2 + (n log p)/p) time on a p2 processor hypercube (for 1 p n). The algorithm is cost optimal on the PRAMs, on the mesh (for 1 p n) and on the hypercube (for 1 p n/log n).
Takashi YOKOTA Hiroshi MATSUOKA Kazuaki OKAMOTO Hideo HIRONO Shuichi SAKAI
This paper discusses a massively parallel interconnection scheme for multithreaded architecture and introduces a new class of direct interconnection networks called the hierarchical Multidimensional Directed Cycles Ensemble (hMDCE). Its suitability for massively parallel systems is discussed. The network is evolved from the Multidimensional Directed Cycles Ensemble (MDCE) network, where each node is substituted by lower-level sub-networks. The new network addresses some serious problems caused by the increasing scale of parallel systems, such as longer latency, limited throughput and high implementation cost. This paper first introduces the MDCE network and then presents and examines in detail the hierarchical MDCE network. Bisection bandwidth of hMDCE is considerably reduced from its ancestor MDCE and the network performs significantly higher throughput and lower latency under some practical implementation constraints. The gate count and delay time of the compiled circuit for the routing function are insignificant. These results reveal that the hMDCE network is an important candidate for massively parallel systems interconnection.
THe decimation-in-time (DIT) and the decimation-in-frequency (DIF) algorithms are the most well-known fast algorithms for computing the discrete Fourier transform(DFT). These algorithms constitute the basis of the fast Fourier transform (FFT) implementations, including the pipeline implementation and other parallel configurations. This paper derives an alternative generalization of the algorithms which applies for sequences whose lengths are not a power of two. The treatment is consistent with the radix-two DIF and DIT algorithms, and the generalization is useful for utilizing the accumulated technologies of the FFT algorithm for such sequences.
Miki HASEYAMA Yoshihiro AKETA Hideo KITAJIMA
In this paper, quantization method which can keep the phase and gain characteristics of a reference filter is proposed. The proposed method uses a genetic algorithm and a simulated annealing algorithm. The objective function used in this method is described with two kinds of weighting functions for identifying the phase and gain characteristics respectively. Therefore, the quantization accuracy on the gain characteristic is independent of the accuracy on the phase characteristic. Further, the proposed algorithm can be applied to any types of filters, because the chromosome expresses only their coefficients values. The efficiency of the proposed algorithm is verified by some experiments.
Masahito TOMIZAWA Yoshiaki YAMABAYASHI Nobuyuki KAWASE Yukio KOBAYASHI
This paper provides an architectural study of optical ring trunk-transmission networks using either Time Division Multiplexing (TDM) or Wavelength Division Multiplexing (WDM). A timeslot arrangement algorithm for distributed controlled TDM rings is proposed that minimizes the number of slots (wavelengths) required in bi-directional ring networks. This algorithm is applied in a straightforward manner to wavelength arrangement in WDM ring networks. The technique, characterized by timeslot (or wavelength) conversion, realizes common add/drop procedures in all Add/Drop Multiplexers (ADMs) when they are connected logically in a mesh topology. A self-healing algorithm is also proposed for network restoration. It offers good performance in terms of protection line-capacity, restoration delay, and survivability against multiple failures.
Efficient parallel algorithms for several problems on proper circular arc graphs are presented in this paper. These problems include finding a maximum matching, partitioning into a minimum number of induced subgraphs each of which has a Hamiltonian cycle (path), partitioning into induced subgraphs each of which has a Hamiltonian cycle (path) with at least k vertices for a given k, and adding a minimum number of edges to make the graph contain a Hamiltonian cycle (path). It is shown here that the above problems can all be solved in logarithmic time with a linear number of EREW PRAM processors, or in constant time with a linear number of BSR processors. A more important part of this work is perhaps the extension of basic BSR to allow simultaneous multiple BROADCAST instructions.
Byung-Gook LEE Ki Yong LEE Souguil ANN
This paper considers the estimation of speech parameters and their enhancement using an approach based on the estimation-maximization (EM) algorithm, when only noisy speech data is available. The distribution of the excitation source for the speech signal is assumed as a mixture of two Gaussian probability distribution functions with differing variances. This mixture assumption is experimentally valid for removing the residual excitation signal. The assumption also is found to be effective in enhancing noise-corrupted speech. We adaptively estimate the speech parameters and analyze the characteristics of its excitation source in a sequential manner. In the maximum likelihood estimation scheme we utilize the EM algorithm, and employ a detection and an estimation step for the parameters. For speech enhancement we use Kalman filtering for the parameters obtained from the above estimation procedure. The estimation and maximization procedures are closely coupled. Simulation results using synthetic and real speech vindicate the improved performance of our algorithm in noisy situations, with an increase of about 3 dB in terms of output SNR compared to conventional Gaussian assumption. The proposed algorithm also may be noteworthy in that it needs no voiced/unvoiced decision logic, due to the use of the residual approach.
This paper presents a method for mechanically transforming a parallel algorithm on an original network so that the algorithm can work on a target network. It is assumed that the networks are of cube-type such as the shuffle-exchange network, omega network, and hypercube. Were those networks isomorphic to each other, the algorithm transformation is an easy task. The proposed transformation method is based on a novel graphembedding scheme <φ: δ, κ, π, ψ>. In addition to the dilating operation δ of the usual embedding scheme <φ: δ>, the novel scheme uses three primitive graph-transformation operations; κ (= δ-1) for contracting a path into a node, π for pipelining a graph, and ψ (= π-1) for folding a pipelined graph. By applying the primitive operations, the cube-type networks can be transformed so as to be isomorphic to each other. Relationships between the networks are represented by the composition of applied operations. With the isomorphic mapping φ, an algorithm in a node of the original network can be simulated in the corresponding node(s) of the target network. Thus the algorithm transformation is reduced to routine work.
2D (two-dimensional) convolution is a basic operation in image processing and requires intensive computation. Although the SIMD model is considered suitable for 2D convolution, previous 2D convolution algorithms on the SIMD model assume unbounded number of PEs (Processing Elements) available, which we call unbounded case. Unbounded case could not be satisfied on real computers. In this paper, time-optimal data-parallel 2D convolution is studied on mesh-connected SIMD computers with bounded number of PEs. Because the optimal computation complexity is not difficult to achieve, the main concern of this paper is how to achieve optimal communication complexity. Firstly the lower bound computation complexity is analyzed. Then the lower bound communication complexities are analyzed under two typical data-distribution strategies: block-mapping and cyclic-mapping. Based on the analysis result, an optimal algorithm is presented under the block-mapping. The algorithm achieves the lower bound complexity both in computation and in communication.
Tadayoshi HORITA Itsuo TAKANAMI
Various reconfiguration schemes against faults of mesh-connected processor arrays have been proposed. As one of them, the mesh-connected processor arrays model based on single-track switches was proposed in [1]. The model has an advantage of its inherent simplicity of the routing hardware. Furthermore, the 2 track switch model [2] and the multiple track switch model [3] were proposed to enhance yields and reliabilities of arrays. However, in these models, Simplicity of the routing hardware is somewhat lost because multiple tracks are used for each row and column. In this paper, we present a builtin self-reconstruction approach for mesh-connected processor arrays which are partitioned into sub-arrays each using single-track switches. Spare PEs which are located on the boundaries of the sub-arrays compensate faulty PEs in these sub-arrays. First, we formulate a reconfigulation algorithm for partitioned mesh-arrays using a Hopfield-type neural network, and then its performance for reconfigulation in terms of survival rates and reliabilities of arrays and processing time are investigated by computer simulations. From the results, we can see that high reliabilites are achieved while processing time is a little and hardware overhead (links and switches) required for reconstruction is as same as that for the track switch model. Next, we present a hardware implementation of the neural algorithm so that a built-in self-reconfigurable scheme may be realized.
Yen-Wei CHEN Zensho NAKAO Shinichi TAMURA
An attenuation correction method was proposed for laser-produced plasma emission computed tomography (ECT), which is based on a relation of the attenuation coefficient and the emission coefficient in plasma. Simulation results show that the reconstructed images are dramatically improved in comparison to the reconstructions without attenuation correction.
Atsushi IWATA Rauf IZMAILOV Duan-Shin LEE Bhaskar SENGUPTA G. RAMAMURTHY Hiroshi SUZUKI
We propose a new QOS routing algorithm for finding a path that guarantees several quality of service (QOS) parameters requested by users, for ATM networks. It is known that a routing problem is NP-complete, if the number of additive QOS parameters, such as delay and cost, are more than or equal to two. Although a number of heuristic algorithms have been proposed recently to solve this problem, the appropriate choice of routing algorithms is still an open issue. In this paper, we propose a new heuristic routing algorithm, while being compliant with PNNI routing and signaling specification in the ATM Forum. The performance of algorithms is evaluated by simulation with a various network topologies and loading scenarios. This simulation results demonstrate that the proposed scheme improves the performance while reducing computational complexity.
Fumie TAGA Hiroshi SHIMOTAHIRA
It is an important problem in fields of radar, sonar, and so on to estimate parameters of closely spaced multiple signals. The MUSIC algorithm with the forward-backward (FB) spatial smoothing is considered as the most effective technique at present for the problem with coherent signals in a variety of fields. We have applied this in Laser Microvision. Recently, Shimotahira has proposed the Kernel MUSIC algorithm, which is applicable to cases when signal vectors and noise vectors are orthogonal. It also utilizes Gaussian elimination of the covariance matrix instead of eigenvalue analysis to estimate noise vectors. Although the amount of computation by the Kernel MUSIC algorithm became lighter than that of the conventional MUSIC algorithm, the covariance matrix was formed to estimate noise vectors and also all noise vectors were used to analyze the MUSIC eigenspectrum. The heaviest amount of computation in the Kernel MUSIC algorithm exists in the transformation of the covariance matrix and the analysis of the MUSIC eigenspectrum. We propose a more straightforward algorithm to estimate noise vectors without forming a covariance matrix, easier algorithm to analyze the MUSIC eigenspectrum. The superior characteristics will be demonstrated by results of numerical simulation.
High-resolution algorithms for the detection and estimation of Directions Of Arrival (DOA) such as MUSIC, lead to accurate results but require the computation of the noise-subspace through an expensive covariance matrix eigendecomposition. Adaptive estimators of the noise-subspace can be very useful in a non-stationary environment when the convergence is possible with a few number of snapshots. Some adaptive methods are presented showing that an indirect noise-subspace estimation through a signal subspace estimation can be advantageous both in terms of convergence rate and computation complexity during each update. Some computer simulations examples showing performances are provided.