Satoshi DENNO Shuhei MAKABE Yafei HOU
This paper proposes a non-linear overloaded MIMO detector that outperforms the conventional soft-input maximum likelihood detector (MLD) with less computational complexity. We propose iterative log-likelihood ratio (LLR) estimation and multi stage LLR estimation for the proposed detector to achieve such superior performance. While the iterative LLR estimation achieves better BER performance, the multi stage LLR estimation makes the detector less complex than the conventional soft-input maximum likelihood detector (MLD). The computer simulation reveals that the proposed detector achieves about 0.6dB better BER performance than the soft-input MLD with about half of the soft-input MLD's complexity in a 6×3 overloaded MIMO OFDM system.
Takashi FUCHINO Takashi HARADA Ken TANAKA Kenji MIKAWA
Packet classification is used to determine the behavior of incoming packets in network devices according to defined rules. As it is achieved using a linear search on a classification rule list, a large number of rules will lead to longer communication latency. To solve this, the problem of finding the order of rules minimizing the latency has been studied. Misherghi et al. and Harada et al. have proposed a problem that relaxes to policy-based constraints. In this paper, we show that the Relaxed Optimal Rule Ordering (RORO) for the allowlist is NP-hard, and by reducing from this we show that RORO for the general rule list is NP-hard. We also propose a heuristic algorithm based on the greedy method for an allowlist. Furthermore, we demonstrate the effectiveness of our method using ClassBench, which is a benchmark for packet classification algorithms.
Mohammed BALAL SIDDIQUI Mirza TARIQ BEG Syed NASEEM AHMAD
Binary Decision Diagrams (BDDs) are an important data structure for the design of digital circuits using VLSI CAD tools. The ordering of variables affects the total number of nodes and path length in the BDDs. Finding a good variable ordering is an optimization problem and previously many optimization approaches have been implemented for BDDs in a number of research works. In this paper, an optimization approach based on Spider Monkey Optimization (SMO) algorithm is proposed for the BDD variable ordering problem targeting number of nodes and longest path length. SMO is a well-known swarm intelligence-based optimization approach based on spider monkeys foraging behavior. The proposed work has been compared with other latest BDD reordering approaches using Particle Swarm Optimization (PSO) algorithm. The results obtained show significant improvement over the Particle Swarm Optimization method. The proposed SMO-based method is applied to different benchmark digital circuits having different levels of complexities. The node count and longest path length for the maximum number of tested circuits are found to be better in SMO than PSO.
Masamune NOMURA Yuki NAKAMURA Hiroo TARAO Amane TAKEI
This paper describes the effectiveness of the geometric multi-grid method in a current density analysis using a numerical human body model. The scalar potential finite difference (SPFD) method is used as a numerical method for analyzing the current density inside a human body due to contact with charged objects in a low-frequency band, and research related to methods to solve faster large-scale simultaneous equations based on the SPFD method has been conducted. In previous research, the block incomplete Cholesky conjugate gradients (ICCG) method is proposed as an effective method to solve the simultaneous equations faster. However, even though the block ICCG method is used, many iterations are still needed. Therefore, in this study, we focus on the geometric multi-grid method as a method to solve the problem. We develop the geometric-multi-grid method and evaluate performances by comparing it with the block ICCG method in terms of computation time and the number of iterations. The results show that the number of iterations needed for the geometric multi-grid method is much less than that for the block ICCG method. In addition, the computation time is much shorter, depending on the number of threads and the number of coarse grids. Also, by using multi-color ordering, the parallel performance of the geometric multi-grid method can be greatly improved.
Takashi HARADA Ken TANAKA Kenji MIKAWA
Recent years have witnessed a rapid increase in cyber-attacks through unauthorized accesses and DDoS attacks. Since packet classification is a fundamental technique to prevent such illegal communications, it has gained considerable attention. Packet classification is achieved with a linear search on a classification rule list that represents the packet classification policy. As such, a large number of rules can result in serious communication latency. To decrease this latency, the problem is formalized as optimal rule ordering (ORO). In most cases, this problem aims to find the order of rules that minimizes latency while satisfying the dependency relation of the rules, where rules ri and rj are dependent if there is a packet that matches both ri and rj and their actions applied to packets are different. However, there is a case in which although the ordering violates the dependency relation, the ordering satisfies the packet classification policy. Since such an ordering can decrease the latency compared to an ordering under the constraint of the dependency relation, we have introduced a new model, called relaxed optimal rule ordering (RORO). In general, it is difficult to determine whether an ordering satisfies the classification policy, even when it violates the dependency relation, because this problem contains unsatisfiability. However, using a zero-suppressed binary decision diagram (ZDD), we can determine it in a reasonable amount of time. In this paper, we present a simulated annealing method for RORO which interchanges rules by determining whether rules ri and rj can be interchanged in terms of policy violation using the ZDD. The experimental results show that our method decreases latency more than other heuristics.
Yuta UKON Koji YAMAZAKI Koyo NITTA
Advanced information-processing services based on cloud computing are in great demand. However, users want to be able to customize cloud services for their own purposes. To provide image-processing services that can be optimized for the purpose of each user, we propose a technique for chaining image-processing functions in a CPU-field programmable gate array (FPGA) coupled server architecture. One of the most important requirements for combining multiple image-processing functions on a network, is low latency in server nodes. However, large delay occurs in the conventional CPU-FPGA architecture due to the overheads of packet reordering for ensuring the correctness of image processing and data transfer between the CPU and FPGA at the application level. This paper presents a CPU-FPGA server architecture with a real-time packet reordering circuit for low-latency image processing. In order to confirm the efficiency of our idea, we evaluated the latency of histogram of oriented gradients (HOG) feature calculation as an offloaded image-processing function. The results show that the latency is about 26 times lower than that of the conventional CPU-FPGA architecture. Moreover, the throughput decreased by less than 3.7% under the worst-case condition where 90 percent of the packets are randomly swapped at a 40-Gbps input rate. Finally, we demonstrated that a real-time video monitoring service can be provided by combining image processing functions using our architecture.
Pietro NANNIPIERI Gianmarco DINELLI Luca FANUCCI
Data rate requirements, from consumer application to automotive and aerospace grew rapidly in the last years. This led to the development of a series of communication protocols (i.e. Ethernet, PCI-Express, RapidIO and SpaceFibre), which use more than one communication lane, both to speed up data rate and to increase link reliability. Some of these protocols, such as SpaceFibre, are able to detect real-time changes in the number of active lanes and to adapt the data flow appropriately, providing a flexible solution, robust to lane failures. This results in a real time varying data path in the lower layers of the data handling system. The aim of this paper is to propose the architecture of a hardware block capable of reading a fixed number of words from a host FIFO and shaping them on a real time variable number of words equal to the number of active lanes.
Huu-Anh TRAN Heyan HUANG Phuoc TRAN Shumin SHI Huu NGUYEN
Word order is one of the most significant differences between the Chinese and Vietnamese. In the phrase-based statistical machine translation, the reordering model will learn reordering rules from bilingual corpora. If the bilingual corpora are large and good enough, the reordering rules are exact and coverable. However, Chinese-Vietnamese is a low-resource language pair, the extraction of reordering rules is limited. This leads to the quality of reordering in Chinese-Vietnamese machine translation is not high. In this paper, we have combined Chinese dependency relation and Chinese-Vietnamese word alignment results in order to pre-order Chinese word order to be suitable to Vietnamese one. The experimental results show that our methodology has improved the machine translation performance compared to the translation system using only the reordering models of phrase-based statistical machine translation.
Lixin WANG Yutong LU Wei ZHANG Yan LEI
One of the patterns that the design of parallel file systems has to solve stems from the difficulty of handling the metadata-intensive I/O generated by parallel applications accessing a single large directory. We demonstrate a middleware design called SFS to support existing parallel file systems for distributed and scalable directory service. SFS distributes directory entries over data servers instead of metadata servers to offer increased scalability and performance. Firstly, SFS exploits an adaptive directory partitioning based on extendible hashing to support concurrent and unsynchronized partition splitting. Secondly, SFS describes an optimization based on recursive split-ordering that emphasizes speeding up the splitting process. Thirdly, SFS applies a write-optimized index structure to convert slow, small, random metadata updates into fast, large, sequential writes. Finally, SFS gracefully tolerates stale mapping at the clients while maintaining the correctness and consistency of the system. Our performance results on a cluster of 32-servers show our implementation can deliver more than 250,000 file creations per second on average.
Masaki NAKANISHI Miki MATSUYAMA Yumi YOKOO
Quantum computer simulators play an important role when we evaluate quantum algorithms. Quantum computation can be regarded as parallel computation in some sense, and thus, it is suitable to implement a simulator on hardware that can process a lot of operations in parallel. In this paper, we propose a hardware quantum computer simulator. The proposed simulator is based on the register reordering method that shifts and swaps registers containing probability amplitudes so that the probability amplitudes of target basis states can be quickly selected. This reduces the number of large multiplexers and improves clock frequency. We implement the simulator on an FPGA. Experiments show that the proposed simulator has scalability in terms of the number of quantum bits, and can simulate quantum algorithms faster than software simulators.
Yongsoo JOO Sangsoo PARK Hyokyung BAHN
Application prefetchers improve application launch performance on HDDs through either I/O reordering or I/O interleaving, but there has been no proposal to combine the two techniques. We present a new algorithm to combine both approaches, and demonstrate that it reduces cold start launch time by 50%.
Multi-hop cooperative communication has been investigated in order to overcome disadvantages such as fading, obstruction and low power. In addition, with the goal of increasing access capacity, the orthogonal frequency division multiplexing (OFDM) modulation is being advanced as a solution. In this paper, we propose the approach of relay ordering in a Decode-and-Forward OFDM scheme. Combining techniques such as maximal ratio combining and selection combining are employed at receivers and approximate outage capacity probabilities are derived for evaluating system performance over frequency selective Rayleigh fading channels. Final, the expressions are validated by Monte-Carlo simulations, and are used to compare with the same scheme based relay selection.
Yeon-Soo LEE Hyoung-Gyu LEE Hae-Chang RIM Young-Sook HWANG
In phrase-based statistical machine translation, long distance reordering problem is one of the most challenging issues when translating syntactically distant language pairs. In this paper, we propose a novel reordering model to solve this problem. In our model, reordering is affected by the overall structures of sentences such as listings, reduplications, and modifications as well as the relationships of adjacent phrases. To this end, we reflect global syntactic contexts including the parts that are not yet translated during the decoding process.
Zezhong LI Hideto IKEDA Junichi FUKUMOTO
In most phrase-based statistical machine translation (SMT) systems, the translation model relies on word alignment, which serves as a constraint for the subsequent building of a phrase table. Word alignment is usually inferred by GIZA++, which implements all the IBM models and HMM model in the framework of Expectation Maximum (EM). In this paper, we present a fully Bayesian inference for word alignment. Different from the EM approach, the Bayesian inference makes use of all possible parameter values rather than estimating a single parameter value, from which we expect a more robust inference. After inferring the word alignment, current SMT systems usually train the phrase table from Viterbi word alignment, which is prone to learn incorrect phrases due to the word alignment mistakes. To overcome this drawback, a new phrase extraction method is proposed based on multiple Gibbs samples from Bayesian inference for word alignment. Empirical results show promising improvements over baselines in alignment quality as well as the translation performance.
Amir Masoud GHAREHBAGHI Masahiro FUJITA
In this paper, we have addressed the problem of ordering transactions in network-on-chips (NoCs) for post-silicon validation. The main idea is to extract the order of the transactions from the local partial orders in each NoC tile based on a set of “happened-before” rules, assuming transactions do not have a timestamp. The assumption is based on the fact that implementation and usage of a global time as timestamp in such systems may not be practical or efficient. When a new transaction is received in a tile, we send special messages to the neighboring tiles to inform them regarding the new transaction. The process of sending those special messages continues recursively in all the tiles that receive them until another such special message is detected. This way, we relate local orders of different tiles with each other. We show that our method can reconstruct the correct transaction orders when communication delays are deterministic. We have shown the effectiveness of our method by correctly ordering the transaction in NoCs with mesh and torus topologies with different sizes from 5*5 to 9*9. Also, we have implemented the proposed method in hardware to show its feasibility.
Seokhyun YOON Kangwoon SEO Taehyun JEON
This letter addresses antenna ordering to improve the performance of the MIMO detectors in [4], where two low complexity MIMO detectors have been proposed based on either fully-connected or ring type pair-wise Markov random field (MRF). The former was shown to be better than the latter, while being more complex. The objective of this letter is to make the performance of the detector based on ring-type MRF (with complexity of O(2M 22m)) close to or better than that of fully-connected MRF (with complexity of O(M (M-1)22m)), by applying appropriate antenna ordering. The simulation results validate the proposed antenna ordering methods.
Chooi-Ling GOH Taro WATANABE Eiichiro SUMITA
While phrase-based statistical machine translation systems prefer to translate with longer phrases, this may cause errors in a free word order language, such as Japanese, in which the order of the arguments of the predicates is not solely determined by the predicates and the arguments can be placed quite freely in the text. In this paper, we propose to reorder the arguments but not the predicates in Japanese using a dependency structure as a kind of reordering. Instead of a single deterministically given permutation, we generate multiple reordered phrases for each sentence and translate them independently. Then we apply a re-ranking method using a discriminative approach by Ranking Support Vector Machines (SVM) to re-score the multiple reordered phrase translations. In our experiment with the travel domain corpus BTEC, we gain a 1.22% BLEU score improvement when only 1-best is used for re-ranking and 4.12% BLEU score improvement when n-best is used for Japanese-English translation.
Coscheduling has been gained a resurgence of interest as an effective technique to enhance the performance of parallel applications in multi-programmed clusters. However, existing coscheduling schemes do not adequately handle priority boost conflicts, leading to significantly degraded performance. To address this problem, in our previous study, we devised a novel algorithm that reorders the scheduling sequence of conflicting processes based on the rescheduling latency of their correspondents in remote nodes. In this paper, we exhaustively explore the design issues and implementation details of our contention-aware coscheduling scheme over Myrinet-based cluster system. We also practically analyze the impact of various system parameters and job characteristics on the performance of all considered schemes on a heterogeneous Linux cluster using a generic coscheduling framework. The results show that our approach outperforms existing schemes (by up to 36.6% in avg. job response time), reducing both boost conflict ratio and overall message delay.
This letter investigates the space complexity of the sender buffer in a TCP variant, TCP-PR, to deal with packet reordering. Our finding is that with the SACK option used, TCP-PR requires the sender buffer of (β+1) pipesize where β indicates the number of RTTs that must pass before packet loss is detected.
John Russell LANE Akihiro NAKAO
Multipath routing and the ability to simultaneously use multiple network paths has long been proposed as a means for meeting the reliability and performance improvement goals of a next generation Internet. However, its use causes out-of-order packet delivery, which is well known to hinder TCP performance. While next-generation transport protocols will no doubt better cope with this phenomenon, a complete switch to these new protocols cannot be made on all devices "overnight"; the reality is that we will be forced to continue using TCP on such multipath networks well after deployment of a future Internet is complete. In this paper, we investigate the use of best-effort packet reordering -- an optional network layer service for improving the performance of any TCP session in the presence of out-of-order packet delivery. Such a service holds the promise of allowing unmodified TCP to take advantage of the reliability and performance gains offered by a future multipath-enabled Internet without suffering the adverse performance effects commonly associated with out-of-order packet delivery. Our experiments test the performance of two common TCP variants under packet dispersion with differing numbers of paths and amounts of inter-path latency variance. They were conducted using multipath network and packet reorderer implementations implemented within the Emulab testbed. Our results demonstrate that a simple best-effort reordering service can insulate TCP from the type of reordering that might be expected from use of packet dispersion over disjoint paths in a wide-area network, and is capable of providing significant performance benefits with few ill side-effects.