Mohd SHAHRIZAN OTHMAN Aleksandar SHURBEVSKI Hiroshi NAGAMOCHI
Given an edge-weighted bipartite digraph G=(A,B;E), the Bipartite Traveling Salesman Problem (BTSP) asks to find the minimum cost of a Hamiltonian cycle of G, or determine that none exists. When |A|=|B|=n, the BTSP can be solved using polynomial space in O*(42nnlog n) time by using the divide-and-conquer algorithm of Gurevich and Shelah (SIAM Journal of Computation, 16(3), pp.486-502, 1987). We adapt their algorithm for the bipartite case, and show an improved time bound of O*(42n), saving the nlog n factor.
Satoshi TAOKA Tadachika OKI Toshiya MASHIMA Toshimasa WATANABE
The k-edge-connectivity augmentation problem with multipartition constraints (kECAMP, for short) is defined by “Given a multigraph G=(V,E) and a multipartition π={V1,...,Vr} (r≥2) of V, that is, $V = igcup_{h = 1}^r V_h$ and Vi∩Vj=∅ (1≤i
Toru FUJITA Koji NAKANO Yasuaki ITO Daisuke TAKAFUJI
The main contribution of this paper is to present an efficient GPU implementation of bulk computation of the CKY parsing for a context-free grammar, which determines if a context-free grammar derives each of a lot of input strings. The bulk computation is to execute the same algorithm for a lot of inputs in turn or at the same time. The CKY parsing is to determine if a context-free grammar derives a given string. We show that the bulk computation of the CKY parsing can be implemented in the GPU efficiently using Bitwise Parallel Bulk Computation (BPBC) technique. We also show the rule minimization technique and the dynamic scheduling method for further acceleration of the CKY parsing on the GPU. The experimental results using NVIDIA TITAN X GPU show that our implementation of the bitwise-parallel CKY parsing for strings of length 32 takes 395µs per string with 131072 production rules for 512 non-terminal symbols.
As autonomous underwater vehicles (AUVs) have been widely used to perform cooperative works with sensor nodes for data-gathering, the need for long-range AUVs has further grown to support the long-duration cooperation with sensor nodes. However, as existing data-gathering protocols for the cooperative works have not considered AUVs' energy consumption, AUVs can deplete their energy more quickly before fulfilling their missions. The objective of this work is to develop an AUV based data-gathering protocol that maximizes the duration for the cooperative works. Simulation results show that the proposed protocol outperforms existing protocols with respect to the long-range AUVs.
Haibo DAI Chunguo LI Luxi YANG
In this letter, we propose two robust and distributed game-based algorithms, which are the modifications of two algorithms proposed in [1], to solve the joint base station selection and resource allocation problem with imperfect information in heterogeneous cellular networks (HCNs). In particular, we repeatedly sample the received payoffs in the exploitation stage of each algorithm to guarantee the convergence when the payoffs of some users (UEs) in [1] cannot accurately be acquired for some reasons. Then, we derive the rational sampling number and prove the convergence of the modified algorithms. Finally, simulation results demonstrate that two modified algorithms achieve good convergence performances and robustness in the incomplete information scheme.
Wenyun GAO Xi CHEN Dexiu HU Haisheng XU
This paper presents non-iterative cooperative/parallel Kalman filtering algorithms for decentralized network navigation, in which mobile nodes cooperate in both spatial and temporal domains to infer their positions. We begin by presenting an augmented minimum-mean-square error (MMSE) estimator for centralized navigation network, and then decouple it into a set of local sub-ones each corresponding to a mobile node; all these sub-estimators work in parallel and cooperatively — with the state estimates exchanging between neighbors — to provide results similar to those obtained by the augmented one. After that, we employ the approximation methods that adopted in the conventional nonlinear Kalman filters to calculate the second-order terms involved in these sub-estimators, and propose a decentralized cooperative/parallel Kalman filtering based network navigation framework. Finally, upon the framework, we present two cooperative/parallel Kalman filtering algorithms corresponding to the extended and unscented Kalman filters respectively, and compare them with conventional decentralized methods by simulations to show the superiority.
Zhi-Zhong CHEN Tatsuie TSUKIJI Hiroki YAMADA
It is a well-known and useful problem to find a matching in a given graph G whose size is at most a given parameter k and whose weight is maximized (over all matchings of size at most k in G). In this paper, we consider two natural extensions of this problem. One is to find t disjoint matchings in a given graph G whose total size is at most a given parameter k and whose total weight is maximized, where t is a (small) constant integer. Previously, only the special case where t=2 was known to be fixed-parameter tractable. In this paper, we show that the problem is fixed-parameter tractable for any constant t. When t=2, the time complexity of the new algorithm is significantly better than that of the previously known algorithm. The other is to find a set of vertex-disjoint paths each of length 1 or 2 in a given graph whose total length is at most a given parameter k and whose total weight is maximized. As interesting applications, we further use the algorithms to speed up several known approximation algorithms (for related NP-hard problems) whose approximation ratio depends on a fixed parameter 0<ε<1 and whose running time is dominated by the time needed for exactly solving the problems on graphs in which each connected component has at most ε-1 vertices.
Ro-Yu WU Jou-Ming CHANG Sheng-Lung PENG Chun-Liang LIU
Left-weight sequences (LW-sequences for short) are in common currency for encoding binary trees. In [16], Wu et al. proposed an algorithm associated with tree rotations for listing all binary trees in diverse representations including LW-sequences. In particular, such a list of LW-sequences is generated in Gray-code order. In this paper, based on this ordering, we present efficient ranking and unranking algorithms. For binary trees with n internal nodes, the time complexity and the space requirement in each of our ranking and unranking algorithms are O(n2) and O(n), respectively.
In this paper we discuss approximation algorithms for the ELEMENT-DISJOINT STEINER TREE PACKING problem (Element-STP for short). For a graph G=(V,E) and a subset of nodes T⊆V, called terminal nodes, a Steiner tree is a connected, acyclic subgraph that contains all the terminal nodes in T. The goal of Element-STP is to find as many element-disjoint Steiner trees as possible. Element-STP is known to be APX-hard even for |T|=3 [1]. It is also known that Element-STP is NP-hard to approximate within a factor of Ω(log |V|) [3] and there is an O(log |V|)-approximation algorithm for Element-STP [2],[4]. In this paper, we provide a $lceil rac{|T|}{2} ceil$-approximation algorithm for Element-STP on graphs with |T| terminal nodes. Furthermore, we show that the approximation ratio of 3 for Element-STP on graphs with five terminal nodes can be improved to 2.
Due to its on-demand and pay-as-you-go properties, cloud computing has become an attractive alternative for HPC applications. However, communication-intensive applications with complex communication patterns still cannot be performed efficiently on cloud platforms, which are equipped with MapReduce technologies, such as Hadoop and Spark. In particular, one major obstacle is that MapReduce's simple programming model cannot explicitly manipulate data transfers between compute nodes. Another obstacle is cloud's relatively poor network performance compared with traditional HPC platforms. The traditional Strassen's algorithm of square matrix multiplication has a recursive and complex pattern on the HPC platform. Therefore, it cannot be directly applied to the cloud platform. In this paper, we demonstrate how to make Strassen's algorithm with complex communication patterns “cloud-friendly”. By reorganizing Strassen's algorithm in an iterative pattern, we completely separate its computations and communications, making it fit to MapReduce programming model. By adopting a novel data/task parallel strategy, we solve Strassen's data dependency problems, making it well balanced. This is the first instance of Strassen's algorithm in MapReduce-style systems, which also matches Strassen's communication lower bound. Further experimental results show that it achieves a speedup ranging from 1.03× to 2.50× over the classical Θ(n3) algorithm. We believe the principle can be applied to many other complex scientific applications.
Hirotoshi HONMA Yoko NAKAJIMA Yuta IGARASHI Shigeru MASUYAMA
A hinge vertex is a vertex in an undirected graph such that there exist two vertices whose removal makes the distance between them longer than before. Identifying hinge vertices in a graph can help detect critical nodes in communication network systems, which is useful for making them more stable. For finding them, an O(n3) time algorithm was developed for a simple graph, and, linear time algorithms were developed for interval and permutation graphs, respectively. Recently, the maximum detour hinge vertex problem is defined by Honma et al. For a hinge vertex u in a graph, the detour degree of u is the largest value of distance between any pair of x and y (x and y are adjacent to u) by removing u. A hinge vertex with the largest detour degree in G is defined as the maximum detour hinge vertex of G. This problem is motivated by practical applications, such as network stabilization with a limited cost, i.e., by enhancing the reliability of the maximum detour hinge vertex, the stability of the network is much improved. We previously developed an O(n2) time algorithm for solving this problem on an interval graph. In this study, we propose an algorithm that identifies the maximum detour hinge vertex on a permutation graph in O(n2) time, where n is the number of vertices in the graph.
Parinya CHALERMSOOK Hiroshi IMAI Vorapong SUPPAKITPAISARN
In this letter, we derive two lower bounds for the number of terms in a double-base number system (DBNS), when the digit set is {1}. For a positive integer n, we show that the number of terms obtained from the greedy algorithm proposed by Dimitrov, Imbert, and Mishra [1] is $Thetaleft(rac{log n}{log log n} ight)$. Also, we show that the number of terms in the shortest double-base chain is Θ(log n).
Hiroshi IMAI Vorapong SUPPAKITPAISARN
In this paper, we improve a width-3 joint sparse form proposed by Okeya, Katoh, and Nogami. After the improvement, the representation can attain an asymtotically optimal complexity found in our previous work. Although claimed as optimal by the authors, the average computation time of multi-scalar multiplication obtained by the representation is 563/1574n+o(n)≈0.3577n+o(n). That number is larger than the optimal complexity 281/786n+o(n)≈0.3575n+o(n) found in our previous work. To optimize the width-3 joint sparse form, we add more cases to the representation. After the addition, we can show that the complexity is updated to 281/786n+o(n)≈0.3575n+o(n), which implies that the modified representation is asymptotically optimal. Compared to our optimal algorithm in the previous work, the modified width-3 joint sparse form uses less dynamic memory, but it consumes more static memory.
Masataka IKEDA Hiroshi NAGAMOCHI
Computing an invariant of a graph such as treewidth and pathwidth is one of the fundamental problems in graph algorithms. In general, determining the pathwidth of a graph is NP-hard. In this paper, we propose several reduction methods for decreasing the instance size without changing the pathwidth, and implemented the methods together with an exact algorithm for computing pathwidth of graphs. Our experimental results show that the number of vertices in all chemical graphs in NCI database decreases by our reduction methods by 53.81% in average.
Yichao LU Xiao PENG Guifen TIAN Satoshi GOTO
Majority-logic algorithms are devised for decoding non-binary LDPC codes in order to reduce computational complexity. However, compared with conventional belief propagation algorithms, majority-logic algorithms suffer from severe bit error performance degradation. This paper presents a low-complexity reliability-based algorithm aiming at improving error correcting ability of majority-logic algorithms. Reliability measures for check nodes are novelly introduced to realize mutual update between variable message and check message, and hence more efficient reliability propagation can be achieved, similar to belief-propagation algorithm. Simulation results on NB-LDPC codes with different characteristics demonstrate that our algorithm can reduce the bit error ratio by more than one order of magnitude and the coding gain enhancement over ISRB-MLGD can reach 0.2-2.0dB, compared with both the ISRB-MLGD and IISRB-MLGD algorithms. Moreover, simulations on typical LDPC codes show that the computational complexity of the proposed algorithm is closely equivalent to ISRB-MLGD algorithm, and is less than 10% of Min-max algorithm. As a result, the proposed algorithm achieves a more efficient trade-off between decoding computational complexity and error performance.
Hirotoshi HONMA Yoko NAKAJIMA Yuta IGARASHI Shigeru MASUYAMA
Consider a simple undirected graph G = (V,E) with vertex set V and edge set E. Let G-u be a subgraph induced by the vertex set V-{u}. The distance δG(x,y) is defined as the length of the shortest path between vertices x and y in G. The vertex u ∈ V is a hinge vertex if there exist two vertices x,y ∈ V-{u} such that δG-u(x,y)>δG(x,y). Let U be a set consisting of all hinge vertices of G. The neighborhood of u is the set of all vertices adjacent to u and is denoted by N(u). We define d(u) = max{δG-u(x,y) | δG-u(x,y)>δG(x,y),x,y ∈ N(u)} for u ∈ U as detour degree of u. A maximum detour hinge vertex problem is to find a hinge vertex u with maximum d(u) in G. In this paper, we proposed an algorithm to find the maximum detour hinge vertex on an interval graph that runs in O(n2) time, where n is the number of vertices in the graph.
Yuma INOUE Takahisa TODA Shin-ichi MINATO
Pattern-avoiding permutations are permutations where none of the subsequences matches the relative order of a given pattern. Pattern-avoiding permutations are related to practical and abstract mathematical problems and can provide simple representations for such problems. For example, some floorplans, which are used for optimizing very-large-scale integration (VLSI) circuit design, can be encoded into pattern-avoiding permutations. The generation of pattern-avoiding permutations is an important topic in efficient VLSI design and mathematical analysis of patten-avoiding permutations. In this paper, we present an algorithm for generating pattern-avoiding permutations, and extend this algorithm beyond classical patterns to generalized patterns with more restrictions. Our approach is based on the data structure πDDs, which can represent a permutation set compactly and has useful set operations. We demonstrate the efficiency of our algorithm by computational experiments.
The Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM) are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. It is assumed that warps (or groups of threads) on the DMM and the UMM work synchronously in a round-robin manner. However, warps work asynchronously in real GPUs, in the sense that they are randomly (or arbitrarily) dispatched for execution. The first contribution of this paper is to introduce asynchronous versions of these models in which warps are arbitrarily dispatched. In addition, we assume that threads can execute the “syncthreads” instruction for barrier synchronization. Since the barrier synchronization operation may be costly, we should evaluate and minimize the number of barrier synchronization operations executed by parallel algorithms. The second contribution of this paper is to show a parallel algorithm to the sum of n numbers in optimal computing time and few barrier synchronization steps. Our parallel algorithm computes the sum of n numbers in O(n/w+llog n) time units and O(log l/log w+log log w) barrier synchronization steps using wl threads on the asynchronous UMM with width w and latency l. Since the computation of the sum takes at least Ω(n/w+llog n) time units, this algorithm is time optimal. Finally, we show that the prefix-sums of n numbers can also be computed in O(n/w+llog n) time units and O(log l/log w+log log w) barrier synchronization steps using wl threads.
This paper presents a GPU (Graphics Processing Units) implementation of dynamic programming for the optimal polygon triangulation. Recently, GPUs can be used for general purpose parallel computation. Users can develop parallel programs running on GPUs using programming architecture called CUDA (Compute Unified Device Architecture) provided by NVIDIA. The optimal polygon triangulation problem for a convex polygon is an optimization problem to find a triangulation with minimum total weight. It is known that this problem for a convex n-gon can be solved using the dynamic programming technique in O(n3) time using a work space of size O(n2). In this paper, we propose an efficient parallel implementation of this O(n3)-time algorithm on the GPU. In our implementation, we have used two new ideas to accelerate the dynamic programming. The first idea (adaptive granularity) is to partition the dynamic programming algorithm into many sequential kernel calls of CUDA, and to select the best parameters for the size and the number of blocks for each kernel call. The second idea (sliding and mirroring arrangements) is to arrange the working data for coalesced access of the global memory in the GPU to minimize the memory access overhead. Our implementation using these two ideas solves the optimal polygon triangulation problem for a convex 8192-gon in 5.57 seconds on the NVIDIA GeForce GTX 680, while a conventional CPU implementation runs in 1939.02 seconds. Thus, our GPU implementation attains a speedup factor of 348.02.
Warin WATTANAPORNPROM Prabhas CHONGSTITVATANA
This article introduces the Coincidence Algorithm (COIN) to solve several multimodal puzzles. COIN is an algorithm in the category of Estimation of Distribution Algorithms (EDAs) that makes use of probabilistic models to generate solutions. The model of COIN is a joint probability table of adjacent events (coincidence) derived from the population of candidate solutions. A unique characteristic of COIN is the ability to learn from a negative sample. Various experiments show that learning from a negative example helps to prevent premature convergence, promotes diversity and preserves good building blocks.