Chia-Chi CHU Herng-Jer LEE Ming-Hong LAI Wu-Shiung FENG
This work proposes a new method for RLCG interconnect model-order reductions in consideration with the adjoint network. Relationships between an original MNA network and its corresponding adjoint MNA network will be explored first. It will be shown that the congruence transformation matrix used in the one-sided projection can be constructed by using the bi-orthogonal bases developed from the Lanczos-type algorithms. In particular, if the multi-port driving-point impedance of RLCG interconnect circuits is the main concern, the transfer functions and system moments of the adjoint network can be directly calculated from those of the original RLCG interconnect network by exploring symmetric properties of the MNA formulation. Therefore, the cost of constructing the congruence transformation matrix can be simplified by up to 50% of the previous methods. Comparative studies among various standard methods and the proposed methods are also investigated. Experimental results on large-scale RLCG interconnect circuits will demonstrate the accuracy and the efficiency of the proposed method.
Chuliang WENG Minglu LI Xinda LU
The computational grid provides a promising platform for the deployment of various high-performance computing applications. Problem in implementing computational grid environments is how to effectively use various resources in the system, such as CPU cycle, memory, communication network, and data storage. There are many effective heuristic algorithms for scheduling in the computational grid, however most scheduling strategies have no theoretical guarantee at all. In this paper, a cost-based online scheduling algorithm is presented for job assignment in the grid environment with theoretical guarantee. Firstly, a scheduling framework is described, where the grid environment is characterized, and the online job model is defined. Secondly, the cost-based online scheduling algorithm is presented where costs of resources are exponential functions of their loads, and the performance of this algorithm is theoretically analyzed against the performance of the optimal offline algorithm. Finally, we implement the algorithm in the grid simulation environment, and compare the performance of the presented algorithm with the other three algorithms, and experimental results indicate that the cost-based online scheduling algorithm can outperform the other three online algorithms.
The belief propagation (BP) algorithm is a tool with which one can calculate beliefs, marginal probabilities, of probabilistic networks without loops (e.g., Bayesian networks) in a time proportional to the number of nodes. For networks with loops, it may not converge and, even if it converges, beliefs may not be equal to exact marginal probabilities although its application is known to give remarkably good results such as in the coding theory. Tatikonda and Jordan show a theoretical result on the convergence of the algorithm for probabilistic networks with loops in terms of the theory of Markov random fields on trees and give a sufficient condition of the convergence of the algorithm. In this paper, we discuss the "impatient" update rule as well as the "lazy" update rule discussed in Tatikonda and Jordan. In the viewpoint of the theory of Markov random fields, it is shown that the rule for updating both gives essentially the same results and the impatient update rule is expected to converge faster than the lazy one. Numerical experiments are also given.
Ping-Hung CHIANG Ding-Bing LIN Hsueh-Jyh LI
By applying the differential space-time block code (DSTBC) to wireless multicarrier transmission, Diggavi et al. were the first to propose the two-input-multiple-output (2IMO) differentially space-time-time block coded OFDM (TT-OFDM) system. In this paper, we propose three novel differentially transmit-diversity block coded OFDM (DTDBC-OFDM) systems, namely, the FT-, FF-, and TF-OFDM systems. For instance, the TF-OFDM stands for the differentially space-time-frequency block coded OFDM. Moreover, the noncoherent maximum-likelihood sequence detector (NSD), and its three special cases, namely, the noncoherent one-shot detector, the linearly predictive decision-feedback (DF) detector, and the linearly predictive Viterbi receiver are incorporated to the 2IMO DTDBC-OFDM systems. Furthermore, a simple closed-form BER expression for the systems utilizing the noncoherent one-shot detector in the time-varying multipath Rayleigh fading channels is given. Numerical results have revealed that 2IMO DTDBC-OFDM systems employing the noncoherent one-shot detector can obtain significant performance improvement. However, when few antennas are available, the implementation of the linearly predictive DF detector or the linearly predictive Viterbi receiver is necessary for achieving better performance.
Toshiya MASHIMA Takanori FUKUOKA Satoshi TAOKA Toshimasa WATANABE
The 2-vertex-connectivity augmentation problem for a specified set of vertices of a graph with degree constraints, 2VCA-SV-DC, is defined as follows: "Given an undirected graph G = (V,E), a specified set of vertices S ⊆V with |S|3 and a function g:V→Z+∪{∞}, find a smallest set E' of edges such that (V,E ∪E') has at least two internally-disjoint paths between any pair of vertices in S and such that vertex-degree increase of each v ∈V by the addition of E' to G is at most g(v), where Z+ is the set of nonnegative integers." This paper shows a linear time algorithm for 2VCA-SV-DC.
Hua ZHENG Shingo OMURA Koichi WADA
We consider a network, where a special data called certificate is issued between two users, and all certificates issued by the users in the network can be represented by a directed graph. For any two users u and v, when u needs to send a message to v securely, v's public-key is needed. The user u can obtain v's public-key using the certificates stored in u and v. We need to disperse the certificates to the users such that when a user wants to send a message to the other user securely, there are enough certificates in them to get the reliable public-key. In this paper, when a certificate graph and a set of communication requests are given, we consider the problem to disperse the certificates among the nodes in the network, such that the communication requests are satisfied and the total number of certificates stored in the nodes is minimized. We formulate this problem as MINIMUM CERTIFICATE DISPERSAL (MCD for short). We show that MCD is NP-Complete, even if its input graph is restricted to a strongly connected graph. We also present a polynomial-time 2-approximation algorithm MinPivot for strongly connected graphs, when the communication requests satisfy some restrictions. We introduce some graph classes for which MinPivot can compute optimal dispersals, such as trees, rings, and some Cartesian products of graphs.
In parallelizing compilers on distributed memory systems, distributions of irregular sized array blocks are provided for load balancing and irregular problems. The irregular data redistribution is different from the regular block-cyclic redistribution. This paper is devoted to scheduling message for irregular data redistribution that attempt to obtain suboptimal solutions while satisfying the minimal communication costs condition and the minimal step condition. Based on the list scheduling, an efficient algorithm is developed and its experimental results are compared with previous algorithms. The improved list algorithm provides more chance for conflict messages in its relocation phase, since it allocates conflict messages through methods used in a divide-and-conquer algorithm and a relocation algorithm proposed previously. The method of selecting the smallest relocation cost guarantees that the improved list algorithm is more efficient than the other two in average.
Genetic algorithms are a general problem-solving technique that has been widely used in computational biology. In this paper, we present a framework to map hierarchical parallel genetic algorithms for protein folding problems onto computational grids. By using this framework, the two level communication parts of hierarchical parallel genetic algorithms are separated. Thus both parts of the algorithm can evolve independently. This permits users to experiment with alternative communication models on different levels conveniently. The underlying programming techniques are based on generic programming, a programming technique suited for the generic representation of abstract concepts. This allows the framework to be built in a generic way at application level and thus provides good extensibility and flexibility. Experiments show that it can lead to significant runtime savings on PC clusters and computational grids.
Jinhee CHUN Kunihiko SADAKANE Takeshi TOKUYAMA
In [5], the following pyramid construction problem was proposed: Given nonnegative valued functions ρ and µ in d variables, we consider the optimal pyramid maximizing the total parametric gain of ρ against µ. The pyramid can be considered as the optimal unimodal approximation of ρ relative to µ, and can be applied to hierarchical data segmentation. In this paper, we give efficient algorithms for a couple of two-dimensional pyramid construction problems.
Paola FLOCCHINI Antonio Mesa ENRIQUES Linda PAGLI Giuseppe PRENCIPE Nicola SANTORO
We consider the problem of computing the optimal swap edges of a shortest-path tree. This problem arises in designing systems that offer point-of-failure shortest-path rerouting service in presence of a single link failure: if the shortest path is not affected by the failed link, then the message will be delivered through that path; otherwise, the system will guarantee that, when the message reaches the node where the failure has occurred, the message will then be re-routed through the shortest detour to its destination. There exist highly efficient serial solutions for the problem, but unfortunately because of the structures they use, there is no known (nor foreseeable) efficient distributed implementation for them. A distributed protocol exists only for finding swap edges, not necessarily optimal ones. We present two simple and efficient distributed algorithms for computing the optimal swap edges of a shortest-path tree. One algorithm uses messages containing a constant amount of information, while the other is tailored for systems that allow long messages. The amount of data transferred by the protocols is the same and depends on the structure of the shortest-path spanning-tree; it is no more, and sometimes significantly less, than the cost of constructing the shortest-path tree.
An algorithm is described for solving the node-to-set disjoint paths problem in bi-rotator graphs, which are obtained by making each edge of a rotator graph bi-directional. The algorithm is of polynomial order of n for an n-bi-rotator graph. It is based on recursion and divided into three cases according to the distribution of destination nodes in the classes into which the nodes in a bi-rotator graph are categorized. We estimated that it obtains 2n-3 disjoint paths with a time complexity of O(n5), that the sum of the path lengths is O(n3), and that the length of the maximum path is O(n2). Computer experiment showed that the average execution time was O(n3.9) and, the average sum of the path lengths was O(n3.0).
In wavelength-routed optical networks, wavelength converters are considered as one of the most critical network resources because they can significantly reduce the blocking probability, but still remain quite expensive. Unfortunately, previous wavelength assignment algorithms have seldom considered their presence. Therefore, in this paper, we propose a novel dynamic algorithm that can minimize the number of wavelength translations. Our algorithm establishes lightpaths by connecting a minimum number of wavelength-continuous segments. We mathematically prove the correctness of our algorithm. Then, we carry out extensive performance evaluations over three typical topologies with full-range or limited-range converters to compare our proposed algorithm with first-fit and most-used algorithms. The simulations show that, to obtain similar blocking performance, our algorithm requires much fewer converters, or the same number of converters but with smaller conversion ranges. From another perspective, with the same conversion capacity, our algorithm can significantly improve the blocking performance. Our algorithm is also scalable due to its polynomial time complexity and insignificant local signaling overhead.
Lei PAN Wenhui ZHANG Arthur ASUNCION Ming Kin LAI Michael B. DILLENCOURT Lubomir F. BIC Laurence T. YANG
The Navigational Programming (NavP) methodology is based on the principle of self-migrating computations. It is a truly incremental methodology for developing parallel programs: each step represents a functioning program, and each intermediate program is an improvement over its predecessor. The transformations are mechanical and straightforward to apply. We illustrate our methodology in the context of matrix multiplication, showing how the transformations lead from a sequential program to a fully parallel program. The NavP methodology is conducive to new ways of thinking that lead to ease of programming and high performance. Even though our parallel algorithm was derived using a sequence of mechanical transformations, it displays certain performance advantages over the classical handcrafted Gentleman's Algorithm.
Kazuaki YAMAGUCHI Sumio MASUDA
The 2-terminal shortest path problem is to find a shortest path between two specified vertices in a given graph G. In this paper, we consider this problem in the following situation: G is given before the two vertices are specified so that some preprocessing is allowed to reduce the response time. We present a method for calculating lower bounds of the length of the shortest path for any pair of vertices. Experimental results show that the A* algorithm with our method performs much better than previous methods.
Ming-Der SHIEH Jun-Hong CHEN Chien-Ming WU
Montgomery algorithm has demonstrated its effectiveness in applications like cryptosystems. Most of the existing works on finding the Montgomery inverse of an element over the Galois field are based on the software implementation, which is then extended to derive the scalable hardware architecture. In this work, we consider a fundamental change at the algorithmic level and eliminate the potential problems in hardware implementation which makes the resulting modified Montgomery inverse algorithm over GF(2m) very suitable for hardware realization. Due to its structural simplicity, the modified algorithm can be easily mapped onto a high-speed and possibly low-complexity circuit. Experimental results show that our development can achieve both the area and speed advantages over the previous work when the inversion operation over GF(2m) is under consideration and the improvement becomes more significant when we increase the value of m as in the applications of cryptosystems. The salient property of our development sustains the high-speed operation as well as low hardware complexity over a wide range of m for commercial cryptographic applications and makes it suitable for both the scalable architecture and direct hardware implementation.
Let G = (V,E) be an edge weighted graph with n vertices and m edges. For a given integer p with 1 < p < n, we call a set X V of p vertices a p-maximizer if X has a property that the edge-connectivity of the graph obtained by contracting X into a single vertex is no less than that of the graph obtained by contracting any other subset of p vertices. In this paper, we first show that there always exists an ordering v1,v2,...,vn of vertices in V such that, for each i = 2,3,...,n - 1, set {v1,v2,...,vi} is an i-maximizer. We give an O(mn + n2log n) time algorithm for finding such an ordering and then show an application to the source location problem.
Makoto SUGITA Mitsuru KAWAZOE Hideki IMAI
We clarify a relation between the XL algorithm and known Grobner basis algorithms. The XL algorithm was proposed to be a more efficient algorithm to solve a system of algebraic equations under a special condition, without calculating a whole Grobner basis. But in our result, it is shown that to solve a system of algebraic equations with a special condition under which the XL algorithm works is equivalent to calculate the reduced Grobner basis of the ideal associated with the system. Moreover we show that the XL algorithm is a Grobner basis algorithm which can be represented as a redundant variant of a known Grobner basis algorithm F4.
Chang Wook AHN Rudrapatna S. RAMAKRISHNA
An efficient clustering strategy for estimation of distribution algorithms (EDAs) is presented. It is used for properly fitting probabilistic models that play an important role in guiding search direction. To this end, a fitness-aided ordering scheme is devised for deciding the input sequence of samples (i.e., individuals) for clustering. It can effectively categorise the individuals by using the (available) information about fitness landscape. Moreover, a virtual leader is introduced for providing a reliable reference for measuring the distance from samples to its own cluster. The proposed algorithm incorporates them within the framework of random the leader algorithm (RLA). Experimental results demonstrate that the proposed approach is more effective than the existing ones with regard to probabilistic model fitting.
Yusuke TSUDA Jonah GAMBA Tetsuya SHIMAMURA
An efficient adaptation technique of the delay is introduced for accomplishing more accurate adaptive linear equalization of nonminimum phase channels. It is focused that the filter structure and adaptation procedure of the adaptive Butler-Cantoni (ABC) equalizer is very suitable to deal with a variable delay for each iteration, compared with a classical adaptive linear transversal equalizer (LTE). We derive a cost function by comparing the system mismatch of an optimum equalizer coefficient vector with an equalizer coefficient vector with several delay settings. The cost function is square of difference of absolute values of the first element and the last element for the equalizer coefficient vector. The delay adaptation method based on the cost function is developed, which is involved with the ABC equalizer. The delay is adapted by checking the first and last elements of the equalizer coefficient vector and this results in an LTE providing a lower mean square error level than the other LTEs with the same order. We confirm the performance of the ABC equalizer with the delay adaptation method through computer simulations.
The least squares (LS) and the weighted least squares (WLS) algorithms are well known procedures that are used in the design of quadrature mirror filters (QMFs). It is known that these design techniques suffer from pass-band anomaly under certain conditions. A recent method, that overcomes pass-band anomaly for LS QMFs using a frequency sampling design for the initial filter, is extended to WLS design in this letter. A comparison between the modified LS and WLS designs based on experimental results is presented. Although WLS designs have been reported to have superior near-equiripple stop-band performance as compared to LS designs, we find that this is not always true. Specifically, LS designs, with inherent computational and robustness advantages, also have better peak stop-band ripple and transition bandwidth at higher cut-off frequencies than WLS.