Workflow nets are a standard way for modeling and analyzing workflows. There are two aspects in a workflow: definition and instance. In form of workflow nets, a workflow definition and a workflow instance are respectively represented as a net structure and a marking. The correctness of the workflow definition can be checked by using a workflow nets' property, called soundness. On the other hand, the correctness of the workflow instance can be checked by using a Petri nets' well-known property, called reachability. The reachability problem is known to be intractable. In this paper, we have shown that the reachability problem for (i) sound extended free-choice workflow nets with a marking representing one workflow instance or (ii) acyclic well-structured workflow nets with a marking representing one or more workflow instances can be solved in polynomial time.
This paper introduces a comparison of three automatic gait generation methods for quadruped robots: GA (Genetic Algorithm), GP (genetic programming) and CPG (Central Pattern Generator). It aims to provide a useful guideline for the selection of gait generation methods. GA-based approaches seek to optimize paw locus in Cartesian space. GP-based techniques generate joint trajectories using regression polynomials. The CPGs are neural circuits that generate oscillatory output from an input coming from the brain. Optimizations for the three proposed methods are executed and analyzed using a Webots simulation of the quadruped robot built by Bioloid. The experimental comparisons and analyses provided herein will be an informative guidance for research of gait generation method.
This paper proposes an adaptive marker coding (AMC) for correction of insertion/deletion/substitution errors. Unlike the conventional marker codings which select marker-bit values deterministically, the AMC adaptively reverses the first and last bits of each marker as well as bits surrounding the marker. Decoding is based on a forward-backward algorithm which takes into account the dependency of bit-values around the marker. Evaluation shows that, for a channel with insertion/deletion error probability 1.8×10-2, the decoded BER of existing marker coding of rate 9/16 is 4.25×10-3, while that of the proposed coding with the same code rate is 1.73×10-3.
Ittetsu TANIGUCHI Kohei AOKI Hiroyuki TOMIYAMA Praveen RAGHAVAN Francky CATTHOOR Masahiro FUKUI
A fast and accurate architecture exploration for high performance and low energy VLIW data-path is proposed. The main contribution is a method to find Pareto optimal FU structures, i.e., the optimal number of FUs and the best instruction assignment for each FU. The proposed architecture exploration method is based on GA and enables the effective exploration of vast solution space. Experimental results showed that proposed method was able to achieve fast and accurate architecture exploration. For most cases, the estimation error was less than 1%.
A method for efficiently estimating the time-varying spectra of nonstationary autoregressive (AR) signals is derived using an indefinite matrix-based sliding window fast linear prediction (ISWFLP). In the linear prediction, the indefinite matrix plays a very important role in sliding an exponentially weighted finite-length window over the prediction error samples. The resulting ISWFLP algorithm successively estimates the time-varying AR parameters of order N at a computational complexity of O(N) per sample. The performance of the AR parameter estimation is superior to the performances of the conventional techniques, including the Yule-Walker, covariance, and Burg methods. Consequently, the ISWFLP-based AR spectral estimation method is able to rapidly track variations in the frequency components with a high resolution and at a low computational cost. The effectiveness of the proposed method is demonstrated by the spectral analysis results of a sinusoidal signal and a speech signal.
1+1 protection provides instantaneous proactive recovery from any single link failure by duplicating and sending the same source data onto two disjoint paths. Other resource efficient recovery techniques to deal with single link failure require switching operations at least at both ends, which restrict instantaneous recovery. However, the 1+1 protection technique demands at least double network resources. Our goal is to minimize the resources required for 1+1 protection while maintaining the advantage of instantaneous recovery. It was reported that the network coding (NC) technique reduces resource utilization in 1+1 protection, and in order to determine an optimum NC aware set of routes that minimizes the required network resources for 1+1 protection, an Integer Quadratic Programming (IQP) formulation has already been addressed. Solving an IQP problem requires large amount of memory (cannot be determined exactly) and special algorithms by the mathematical programming solver. In this paper our contributions consist of two parts. First, we formulate the optimization problem, corresponding to the IQP model, as an Integer Linear Programming (ILP) formulation, which is solvable by any linear programming solver, and so its memory and time requirements are smaller. However, the presented ILP model works well in small-scale and medium-scale networks, but fails to support large-scale networks due to excessive memory requirements and calculation time. Second, to deal with these issues, a heuristic algorithm is proposed to determine the best possible NC aware set of routes in large-scale networks. Numerical results show that our strategies achieve almost double the resource saving effect than the conventional minimal-cost routing policy in the examined medium-scale and large scale networks.
Guifang SHAO Wupeng HONG Tingna WANG Yuhua WEN
An improved genetic algorithm is employed to optimize the structure of (C60)N (N≤25) fullerene clusters with the lowest energy. First, crossover with variable precision, realized by introducing the hamming distance, is developed to provide a faster search mechanism. Second, the bit string mutation and feedback mutation are incorporated to maintain the diversity in the population. The interaction between C60 molecules is described by the Pacheco and Ramalho potential derived from first-principles calculations. We compare the performance of the Improved GA (IGA) with that of the Standard GA (SGA). The numerical and graphical results verify that the proposed approach is faster and more robust than the SGA. The second finite differential of the total energy shows that the (C60)N clusters with N=7, 13, 22 are particularly stable. Performance with the lowest energy is achieved in this work.
Baeksop KIM Jiseong KIM Jungmin SO
This letter presents a scheme to improve the running time of exemplar-based image inpainting, first proposed by Criminisi et al. In the exemplar-based image inpainting, a patch that contains unknown pixels is compared to all the patches in the known region in order to find the best match. This is very time-consuming and hinders the practicality of Criminisi's method to be used in real time. We show that a simple bounding algorithm can significantly reduce number of distance calculations, and thus the running time. Performance of the bounding algorithm is affected by the order of patches that are compared, as well as the order of pixels in a patch. We present pixel and patch ordering schemes that improve the performance of bounding algorithms. Experiments with well-known images used in inpainting literature show that the proposed reordering scheme can reduce running time of the bounding algorithm up to 50%.
This paper presents a GPU (Graphics Processing Units) implementation of dynamic programming for the optimal polygon triangulation. Recently, GPUs can be used for general purpose parallel computation. Users can develop parallel programs running on GPUs using programming architecture called CUDA (Compute Unified Device Architecture) provided by NVIDIA. The optimal polygon triangulation problem for a convex polygon is an optimization problem to find a triangulation with minimum total weight. It is known that this problem for a convex n-gon can be solved using the dynamic programming technique in O(n3) time using a work space of size O(n2). In this paper, we propose an efficient parallel implementation of this O(n3)-time algorithm on the GPU. In our implementation, we have used two new ideas to accelerate the dynamic programming. The first idea (adaptive granularity) is to partition the dynamic programming algorithm into many sequential kernel calls of CUDA, and to select the best parameters for the size and the number of blocks for each kernel call. The second idea (sliding and mirroring arrangements) is to arrange the working data for coalesced access of the global memory in the GPU to minimize the memory access overhead. Our implementation using these two ideas solves the optimal polygon triangulation problem for a convex 8192-gon in 5.57 seconds on the NVIDIA GeForce GTX 680, while a conventional CPU implementation runs in 1939.02 seconds. Thus, our GPU implementation attains a speedup factor of 348.02.
Masaki KOHANA Shusuke OKAMOTO Atsuko IKEGAMI
This paper describes a near-optimal allocation method for web-based multi-player online role-playing games (MORPGs), which must be able to cope with a large number of users and high frequency of user requests. Our previous work introduced a dynamic data reallocation method. It uses multiple web servers and divides the entire game world into small blocks. Each ownership of block is allocated to a web server. Additionally, the ownership is reallocated to the other web server according to the user's requests. Furthermore, this block allocation was formulated as a combinational optimization problem. And a simulation based experiment with an exact algorithm showed that our system could achieve 31% better than an ad-hoc approach. However, the exact algorithm takes too much time to solve a problem when the problem size is large. This paper proposes a meta-heuristic approach based on a tabu search to solve a problem quickly. A simulation result shows that our tabu search algorithm can generate solutions, whose average correctness is only 1% different from that of the exact algorithm. In addition, the average calculation time for 50 users on a system with five web servers is about 25.67 msec while the exact algorithm takes about 162 msec. An evaluation for a web-based MORPG system with our tabu search shows that it could achieve 420 users capacity while 320 for our previous system.
Tianyang DONG Jianwei SHI Jing FAN Ling ZHANG
Rule engine technologies have been widely used in the development of enterprise information systems. However, these rule-based systems may suffer the problem of low performance, when there is a large amount of facts data to be matched with the rules. The way of cluster or grid to construct rule engines can flexibly expand system processing capability by increasing cluster scale, and acquire shorter response time. In order to speed up pattern matching in rule engine, a double hash filter approach for alpha network, combined with beta node indexing, is proposed to improve Rete algorithm in this paper. By using fact type node in Rete network, a hash map about ‘fact type - fact type node’ is built in root node, and hash maps about ‘attribute constraint - alpha node’ are constructed in fact type nodes. This kind of double hash mechanism can speed up the filtration of facts in alpha network. Meanwhile, hash tables with the indexes calculated through fact objects, are built in memories of beta nodes, to avoid unnecessary iteration in the join operations of beta nodes. In addition, rule engine based on this improved Rete algorithm is applied in the enterprise information systems. The experimental results show that this method can effectively speed up the pattern matching, and significantly decrease the response time of the application systems.
Yichao LU Gang HE Guifen TIAN Satoshi GOTO
Recently, non-binary low-density parity-check (NB-LDPC) codes starts to show their superiority in achieving significant coding gains when moderate codeword lengths are adopted. However, the overwhelming decoding complexity keeps NB-LDPC codes from being widely employed in modern communication devices. This paper proposes a hybrid message-passing decoding algorithm which consumes very low computational complexity. It achieves competitive error performance compared with conventional Min-max algorithm. Simulation result on a (255,174) cyclic code shows that this algorithm obtains at least 0.5dB coding gain over other state-of-the-art low-complexity NB-LDPC decoding algorithms. A partial-parallel NB-LDPC decoder architecture for cyclic NB-LDPC codes is also developed based on this algorithm. Optimization schemes are employed to cut off hard decision symbols in RAMs and also to store only part of the reliability messages. In addition, the variable node units are redesigned especially for the proposed algorithm. Synthesis results demonstrate that about 24.3% gates and 12% memories can be saved over previous works.
The main contribution of this paper is to show optimal parallel algorithms to compute the sum, the prefix-sums, and the summed area table on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). The DMM and the UMM are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. These models have three parameters, the number p of threads, and the width w of the memory, and the memory access latency l. We first show that the sum of n numbers can be computed in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. We then go on to show that $Omega({nover w}+{nlover p}+llog n)$ time units are necessary to compute the sum. We also present a parallel algorithm that computes the prefix-sums of n numbers in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. Finally, we show that the summed area table of size $sqrt{n} imessqrt{n}$ can be computed in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. Since the computation of the prefix-sums and the summed area table is at least as hard as the sum computation, these parallel algorithms are also optimal.
In this letter, we present a novel interference-aware clustering scheme for cell broadcasting service. The proposed approach is based on a genetic algorithm for re-clustering. Using the genetic algorithm, the suggested method efficiently re-clusters the user nodes when the relays fail in receiving the cell broadcasting message from the base station. The simulation results exhibit that the proposed clustering scheme can maintain much higher capacity than the conventional clustering scheme in the cases of relay outage. The re-clustering method based on genetic algorithm also shows lower complexity than the re-clustering approach based on exhaustive search.
Wei HOU Tadashi FUJINO Toshiharu KOJIMA
Lattice-reduction (LR) technique has been adopted to improve the performance and reduce the complexity in MIMO data detection. This paper presents an improved quantization scheme for LR aided MIMO detection based on Gram-Schmidt orthogonalization. For the LR aided detection, the quantization step applies the simple rounding operation, which often leads to the quantization errors. Meanwhile, these errors may result in the detection errors. Hence the purpose of the proposed detection is to further solve the problem of degrading the performance due to the quantization errors in the signal estimation. In this paper, the proposed quantization scheme decreases the quantization errors using a simple tree search with a threshold function. Through the analysis and the simulation results, we observe that the proposed detection can achieve the nearly optimal performance with very low complexity, and require a little additional complexity compared to the conventional LR-MMSE detection in the high Eb/N0 region. Furthermore, this quantization error reduction scheme is also efficient even for the high modulation order.
Md. Ezharul ISLAM Nobuo FUNABIKI Toru NAKANISHI Kan WATANABE
Nowadays, with spreads of inexpensive small communication devices, a number of wireless local area networks (WLANs) have been deployed even in the same building for the Internet access services. Their wireless access-points (APs) are often independently installed and managed by different groups such as departments or laboratories in a university or a company. Then, a user host can access to multiple WLANs by detecting signals from their APs, which increases the energy consumption and the operational cost. It may also degrade the communication performance by increasing interferences. In this paper, we present an AP aggregation approach to solve these problems in multiple WLAN environments by aggregating deployed APs of different groups into limited ones using virtual APs. First, we formulate the AP aggregation problem as a combinatorial optimization problem and prove the NP-completeness of its decision problem. Then, we propose its heuristic algorithm composed of five phases. We verify the effectiveness through extensive simulations using the WIMNET simulator.
Hui ZHAO Shuqiang YANG Hua FAN Zhikun CHEN Jinghu XU
Scheduling plays a key role in MapReduce systems. In this paper, we explore the efficiency of an MapReduce cluster running lots of independent and continuously arriving MapReduce jobs. Data locality and load balancing are two important factors to improve computation efficiency in MapReduce systems for data-intensive computations. Traditional cluster scheduling technologies are not well suitable for MapReduce environment, there are some in-used schedulers for the popular open-source Hadoop MapReduce implementation, however, they can not well optimize both factors. Our main objective is to minimize total flowtime of all jobs, given it's a strong NP-hard problem, we adopt some effective heuristics to seek satisfied solution. In this paper, we formalize the scheduling problem as job selection problem, a load balance aware job selection algorithm is proposed, in task level we design a strict data locality tasks scheduling algorithm for map tasks on map machines and a load balance aware scheduling algorithm for reduce tasks on reduce machines. Comprehensive experiments have been conducted to compare our scheduling strategy with well-known Hadoop scheduling strategies. The experimental results validate the efficiency of our proposed scheduling strategy.
We designed multilayer wavelength-selective reflector films by stacking thin-films of transparent polymer. The optimum structure of the multilayer is determined using a combination of characteristic matrix method and a version of genetic algorithm. Such multilayer films can be used in LCD devices to enhance the color saturation of the display.
Watid PHAKPHISUT Patanasak PROMPAKDEE Pornchai SUPNITHI
In this paper, we propose the construction of quasi-cyclic (QC) LDPC codes based on the modified progressive edge-growth (PEG) algorithm to achieve the maximum local girth. Although the previously designed QC-LDPC codes based on the PEG algorithm has more flexible code rates than the conventional QC-LDPC code, in the design process, multiple choices of the edges may be chosen. In the proposed algorithm, we aim to maximize the girth property by choosing the suitable edges and thus improve the error correcting performance. Simulation results show that the QC-LDPC codes constructed from the proposed method give higher proportion of high local girths than other methods, particularly, at high code rates. In addition, the proposed codes offer superior bit error rate and block error rate performances to the previous PEG-QC codes over the additive white Gaussian noise (AWGN) channel.
Chih-Ming CHEN Ying-ping CHEN Tzu-Ching SHEN John K. ZAO
LT codes are the first practical rateless codes whose reception overhead totally depends on the degree distribution adopted. The capability of LT codes with a particular degree distribution named robust soliton has been theoretically analyzed; it asymptotically approaches the optimum when the message length approaches infinity. However, real applications making use of LT codes have finite number of input symbols. It is quite important to refine degree distributions because there are distributions whose performance can exceed that of the robust soliton distribution for short message length. In this work, a practical framework that employs evolutionary algorithms is proposed to search for better degree distributions. Our experiments empirically prove that the proposed framework is robust and can customize degree distributions for LT codes with different message length. The decoding error probabilities of the distributions found in the experiments compare well with those of robust soliton distributions. The significant improvement of LT codes with the optimized degree distributions is demonstrated in the paper.