Atef IBRAHIM Hamed ELSIMARY Abdullah ALJUMAH
This paper presents novel reconfigurable semi-systolic array architecture for the Smith-Waterman with an affine gap penalty algorithm to align protein sequences optimized for shorter database sequences. This architecture has been modified to enable hardware reuse rather than replicating processing elements of the semi-systolic array in multiple FPGAs. The proposed hardware architecture and the previously published conventional one are described at the Register Transfer Level (RTL) using VHDL language and implemented using the FPGA technology. The results show that the proposed design has significant higher normalized speedup (up to 125%) over the conventional one for query sequence lengths less than 512 residues. According to the UniProtKB/TrEMBL protein database (release 2015_05) statistics, the largest number of sequences (about 80%) have sequence length less than 512 residues that makes the proposed design outperforms the conventional one in terms of speed and area in this sequence lengths range.
Takaaki DEGUCHI Yoshiaki TANIGUCHI Go HASEGAWA Yutaka NAKAMURA Norimichi UKITA Kazuhiro MATSUDA Morito MATSUOKA
In this paper, we propose a workload assignment policy for reducing power consumption by air conditioners in data centers. In the proposed policy, to reduce the air conditioner power consumption by raising the temperature set points of the air conditioners, the temperatures of all server back-planes are equalized by moving workload from the servers with the highest temperatures to the servers with the lowest temperatures. To evaluate the proposed policy, we use a computational fluid dynamics simulator for obtaining airflow and air temperature in data centers, and an air conditioner model based on experimental results from actual data center. Through evaluation, we show that the air conditioners' power consumption is reduced by 10.4% in a conventional data center. In addition, in a tandem data center proposed in our research group, the air conditioners' power consumption is reduced by 53%, and the total power consumption of the whole data center is exhibited to be reduced by 23% by reusing the exhaust heat from the servers.
Qian DENG Li GUO Jiaru LIN Zhihui LIU
In this paper, we propose an efficient regularized zero-forcing (RZF) precoding method that has lower hardware resource requirements and produces a shorter delay to the first transmitted symbol compared with truncated polynomial expansion (TPE) that is based on Neumann series in massive multiple-input multiple-output (MIMO) systems. The proposed precoding scheme, named matrix decomposition-polynomial expansion (MDPE), essentially applies a matrix decomposition algorithm based on polynomial expansion to significantly reduce full matrix multiplication computational complexity. Accordingly, it is suitable for real-time hardware implementations and high-mobility scenarios. Furthermore, the proposed method provides a simple expression that links the optimization coefficients to the ratio of BS/UTs antennas (β). This approach can speed-up the convergence to the matrix inverse by a matrix polynomial with small terms and further reduce computation costs. Simulation results show that the MDPE scheme can rapidly approximate the performance of the full precision RZF and optimal TPE algorithm, while adaptively selecting matrix polynomial terms in accordance with the different β and SNR situations. It thereby obtains a high average achievable rate of the UTs under power allocation.
Kazuaki NAKAMURA Takuya FUNATOMI Atsushi HASHIMOTO Mayumi UEDA Michihiko MINOH
The amount of seasonings used during food preparation is quite important information for modern people to enable them to cook delicious dishes as well as to take care for their health. In this paper, we propose a near real-time automated system for measuring and recording the amount of seasonings used during food preparation. Our proposed system is equipped with two devices: electronic scales and a camera. Seasoning bottles are basically placed on the electronic scales in the proposed system, and the scales continually measure the total weight of the bottles placed on them. When a chef uses a certain seasoning, he/she first picks up the bottle containing it from the scales, then adds the seasoning to a dish, and then returns the bottle to the scales. In this process, the chef's picking and returning actions are monitored by the camera. The consumed amount of each seasoning is calculated as the difference in weight between before and after it is used. We evaluated the performance of the proposed system with experiments in 301 trials in actual food preparation performed by seven participants. The results revealed that our system successfully measured the consumption of seasonings in 60.1% of all the trials.
Nagao OGINO Yuto NAKAMURA Shigehiro ANO
A threshold secret sharing scheme can realize reliable delivery of important content using redundant routes through a network. Furthermore, multicast delivery of threshold secret shared content can achieve efficient resource utilization thanks to the application of multicast and network coding techniques to multiple pieces of the content. Nevertheless, a tradeoff exists between reliability and efficiency if multicast content delivery uses network coding. This paper proposes a flexible multicast delivery scheme for threshold secret shared content that can control the tradeoff between reliability and efficiency. The proposed scheme classifies all the pieces obtained from the original content into multiple groups, and each group is subjected to network coding independently. An optimization procedure is proposed for the multicast delivery scheme, which involves two different heuristic delivery route computation methods applicable to large-scale networks. Evaluation results show that the optimized multicast delivery scheme adopting an appropriate grouping method and classifying the pieces into a suitable number of groups can minimize the required link bandwidth while satisfying a specified content loss probability requirement.
Orthogonal frequency division multiplexing (OFDM) channel estimation is the key technique used in broadband wireless networks. The Doppler frequency caused by fast mobility environments will cause inter-carrier interference (ICI) and degrade the performance of OFDM systems. Due to the severe ICI, channel estimation becomes a difficult task in higher mobility scenarios. Our aim is to propose a pilot-aided channel estimation method that is robust to high Doppler frequency with low computational complexity and pilot overheads. In this paper, the time duration of each estimate covers multiple consecutive OFDM symbols, named a “window”. A close-form of polynomial channel modeling is derived. The proposed method is initialized to the least squares (LS) estimates of the channels corresponding to the time interval of the pilot symbols within the window. Then, the channel interpolation is performed in the entire window. The results of computer simulations and computation complexity evaluations show that the proposed technique is robust to high Doppler frequency with low computation complexity and low pilot overheads. Compared with the state-of-the-art method and some conventional methods, the new technique proposed here has much lower computational complexity while offering comparable performance.
Anxin LI Anass BENJEBBOUR Xiaohang CHEN Huiling JIANG Hidetoshi KAYAMA
Non-orthogonal multiple access (NOMA) utilizing the power domain and advanced receiver has been considered as one promising multiple access technology for further cellular enhancements toward the 5th generation (5G) mobile communications system. Most of the existing investigations into NOMA focus on the combination of NOMA with orthogonal frequency division multiple access (OFDMA) for either downlink or uplink. In this paper, we investigate NOMA for uplink with single carrier-frequency division multiple access (SC-FDMA) being used. Differently from OFDMA, SC-FDMA requires consecutive resource allocation to a user equipment (UE) in order to achieve low peak to average power ratio (PAPR) transmission by the UE. Therefore, sophisticated designs of scheduling algorithm for NOMA with SC-FDMA are needed. To this end, this paper investigates the key issues of uplink NOMA scheduling such as UE grouping method and resource widening strategy. Because the optimal schemes have high computational complexity, novel schemes with low computational complexity are proposed for practical usage for uplink resource allocation of NOMA with SC-FDMA. On the basis of the proposed scheduling schemes, the performance of NOMA is investigated by system-level simulations in order to provide insights into the suitability of using NOMA for uplink radio access. Key issues impacting NOMA performance are evaluated and analyzed, such as scheduling granularity, UE number and the combination with fractional frequency reuse (FFR). Simulation results verify the effectiveness of the proposed algorithms and show that NOMA is a promising radio access technology for 5G systems.
Takuya NISHIDA Yu-ichi HAYASHI Takaaki MIZUKI Hideaki SONE
Assume that Alice, Bob, and Carol, each of whom privately holds a one-bit input, want to learn the output of some Boolean function, say the majority function, of their inputs without revealing more of their own secret inputs than necessary. In this paper, we show that such a secure three-input function evaluation can be performed with a deck of real cards; specifically, the three players can learn only the output of the function using eight physical cards — four black and four red cards — with identical backs.
Mohd Anuaruddin BIN AHMADON Shingo YAMAGUCHI
The number of states is a very important matter for model checking approach in Petri net's analysis. We first gave a formal definition of state number calculation problem: For a Petri net with an initial state (marking), how many states does it have? Next we showed the problem cannot be solved in polynomial time for a popular subclass of Petri nets, known as free choice workflow nets, if P≠NP. Then we proposed a polynomial time algorithm to solve the problem by utilizing a representational bias called as process tree. We also showed effectiveness of the algorithm through an application example.
Toru MANO Takeru INOUE Kimihiro MIZUTANI Osamu AKASHI
Network virtualization is one of the promising technologies that can increase flexibility, diversity, and manageability of networks. Building optimal virtual networks across multiple domains is getting much attention, but existing studies were based on an unrealistic assumption, that is, providers' private information can be disclosed; as is well known, providers never actually do that. In this paper, we propose a new method that solves this multi-domain problem without revealing providers' private information. Our method uses an advanced secure computation technique called multi-party computation (MPC). Although MPC enables existing unsecured methods to optimize virtual networks securely, it requires very large time to finish the optimization due to the MPC's complex distributed protocols. Our method, in contrast, is designed to involve only a small number of MPC operations to find the optimal solution, and it allows providers to execute a large part of the optimization process independently without heavy distributed protocols. Evaluation results show that our method is faster than an existing method enhanced with MPC by several orders of magnitude. We also unveil that our method has the same level of embedding cost.
Golf is a solitaire game, where the object is to move all cards from a 5×8 rectangular layout of cards to the foundation. A top card in each column may be moved to the foundation if it is either one rank higher or lower than the top card of the foundation. If no cards may be moved, then the top card of the stock may be moved to the foundation. We prove that the generalized version of Golf Solitaire is NP-complete.
In this paper, we explore possibilities and difficulties to prove super-quadratic formula size lower bounds from the following aspects. First, we consider recursive Boolean functions and prove their general formula size upper bounds. We also discuss recursive Boolean functions based on exact 2-bit functions. We show that their formula complexity are at least Ω(n2). Hence they can be candidate Boolean functions to prove super-quadratic formula size lower bounds. Next, we consider the reason of the difficulty of resolving the formula complexity of the majority function in contrast with the parity function. In particular, we discuss the structure of an optimal protocol partition for the Karchmer-Wigderson communication game.
Keisuke DOHI Koji OKINA Rie SOEJIMA Yuichiro SHIBATA Kiyoshi OGURI
In this paper, we discuss performance modeling of 3-D stencil computing on an FPGA accelerator with a high-level synthesis environment, aiming for efficient exploration of user-space design parameters. First, we analyze resource utilization and performance to formulate these relationships as mathematical models. Then, in order to evaluate our proposed models, we implement heat conduction simulations as a benchmark application, by using MaxCompiler, which is a high-level synthesis tool for FPGAs, and MaxGenFD, which is a domain specific framework of the MaxCompiler for finite-difference equation solvers. The experimental results with various settings of architectural design parameters show the best combination of design parameters for pipeline structure can be systematically found by using our models. The effects of changing arithmetic accuracy and using data stream compression are also discussed.
Forty Thieves is a solitaire game with two 52-card decks. The object is to move all cards from ten tableau piles of four cards to eight foundations. Each foundation is built up by suit from ace to king of the same suit, and each tableau pile is built down by suit. You may move the top card from any tableau pile to a tableau or foundation pile, and from the stock to a foundation pile. We prove that the generalized version of Forty Thieves is NP-complete.
Ryo KIKUCHI Dai IKARASHI Koki HAMADA Koji CHIDA
Secret sharing (SS) has been extensively studied as for both secure data storage and a fundamental building block for multiparty computation (MPC). Recently, Kikuchi et al. proposed a passively and unconditionally secure conversion protocol that converts from a share of a ramp scheme to another of homomorphic SS scheme. The share-size of the ramp scheme is small, and the homomorphic SS scheme is a class of SS schemes that includes Shamir's and replicated SS schemes, which are convenient for MPC. Therefore, their protocol is a conversion from an SS scheme whose share-size is small to MPC-friendly SS schemes, and can be applied to reduce the amount of data storage while maintaining extendibility to MPC. We propose five unconditionally and actively secure protocols in the honest majority. In this paper, we consider a privacy and correctness as security requirement and does not consider a robustness: A cheat caused by an active adversary must be detected. These protocols consist of two conversion protocols, two reveal protocols and a protocol generating specific randomness. Main protocols among them are two conversion protocols for bilateral conversion between a ramp scheme and linear SS scheme, and the others are building blocks of the main protocols. Linear SS scheme is a subset of homomorphic SS scheme but includes both Shamir's and replicated SS schemes. Therefore, these main protocols are conversions between an SS scheme whose share-size is small to MPC-friendly SS schemes. These main protocols are unconditionally and actively secure so if MPC protocols used after the conversion are actively secure, the whole system involving SS scheme, conversion, and MPC protocols can be unconditionally and actively secure by using our main protocols. One of our two main protocols is the first to convert from MPC-friendly SS schemes to the ramp scheme. This enhances applications, such as secure backup, of the conversion protocol. Other than the two main protocols, we propose a protocol for generating specific randomnesses and two reveal protocols as building blocks. The latter two reveal protocols are actively and unconditionally secure in the honest majority and requires O(n||F||)-bit communication per revealing, and we believe that it is independently interest.
Ryo KIKUCHI Koji CHIDA Dai IKARASHI Wakaha OGATA Koki HAMADA Katsumi TAKAHASHI
Secret sharing scheme (SS) has been extensively studied since SSs are important not only for secure data storage but also as a fundamental building block for multiparty computation (MPC). For an application to secure data storage, the share size of SS is an important factor. For an application to a building block for MPC, the extendibility to MPC is needed. Computationally secure SSs and a ramp scheme have a small share size but there have been few studies concerning their MPC. In contrast, there have been many studies about MPC on Shamir's and replicated SSs while their share size is large. We consider an application scenario of SS such as applying SSs to secure data storage service with MPC. In this application, users store their data in servers through SS, and sometimes the servers perform MPC as an optional feature. In this case, the extendibility to MPC is needed and good code-efficiency is preferable. We propose a new computational SS, and show how to convert shares of our SS and a ramp SS to those of multiparty-friendly SS such as Shamir's and replicated SS. This enables one to secretly-share data compactly and extend secretly-shared data to MPC if needed.
Qiang YAO Keita TAKAHASHI Toshiaki FUJII
In recent years, ray space (or light field in other literatures) photography has become popular in the area of computer vision and image processing, and the capture of a ray space has become more significant to these practical applications. In order to handle the huge data problem in the acquisition stage, original data are compressively sampled in the first place and completely reconstructed later. In this paper, in order to achieve better reconstruction quality and faster reconstruction speed, we propose a statistically weighted model in the reconstruction of compressively sampled ray space. This model can explore the structure of ray space data in an orthogonal basis, and integrate this structure into the reconstruction of ray space. In the experiment, the proposed model can achieve much better reconstruction quality for both 2D image patch and 3D image cube cases. Especially in a relatively low sensing ratio, about 10%, the proposed method can still recover most of the low frequency components which are of more significance for representation of ray space data. Besides, the proposed method is almost as good as the state-of-art technique, dictionary learning based method, in terms of reconstruction quality, and the reconstruction speed of our method is much faster. Therefore, our proposed method achieves better trade-off between reconstruction quality and reconstruction time, and is more suitable in the practical applications.
Xinjie GUAN Xili WAN Ryoichi KAWAHARA Hiroshi SAITO
With the advent of high speed links, online flow measurement for, e.g., flow round trip time (RTT), has become difficult due to the enormous demands placed on computational resources. Most existing measurement methods are designed to count the numbers of flows or sizes of flows, but we address the flow RTT measurement, which is an important QoS metric for network management and cannot be measured with existing measurement methods. We first adapt a standard Bloom Filter (BF) for the flow RTT distribution estimation. However, due to the existence of multipath routing and Syn flooding attacks, the standard BF does not perform well. We further design the double-deletion bloom filter (DDBF) scheme, which alleviates potential hash collisions of the standard BF by explicitly deleting used records and implicitly deleting out-of-date records. Because of these double deletion operations, the DDBF accurately estimates the RTT distribution of TCP flows with limited memory space, even with the appearance of multipath routing and Syn flooding attacks. Theoretical analysis indicates that the DDBF scheme achieves a higher accuracy with a constant and smaller amount of memory compared with the standard BF. In addition, we validate our scheme using real traces and demonstrate significant memory-savings without degrading accuracy.
Naoya ONIZAWA Warren J. GROSS Takahiro HANYU Vincent C. GAUDET
Stochastic decoding provides ultra-low-complexity hardware for high-throughput parallel low-density parity-check (LDPC) decoders. Asynchronous stochastic decoding was proposed to demonstrate the possibility of low power dissipation and high throughput in stochastic decoders, but decoding might stop before convergence due to “lock-up”, causing error floors that also occur in synchronous stochastic decoding. In this paper, we introduce a wire-delay dependent (WDD) scheduling algorithm for asynchronous stochastic decoding in order to reduce the error floors. Instead of assigning the same delay to all computation nodes in the previous work, different computation delay is assigned to each computation node depending on its wire length. The variation of update timing increases switching activities to decrease the possibility of the “lock-up”, lowering the error floors. In addition, the WDD scheduling algorithm is simplified for the hardware implementation in order to eliminate time-averaging and multiplication functions used in the original WDD scheduling algorithm. BER performance using a regular (1024, 512) (3,6) LDPC code is simulated based on our timing model that has computation and wire delay estimated under ASPLA 90nm CMOS technology. It is demonstrated that the proposed asynchronous decoder achieves a 6.4-9.8× smaller latency than that of the synchronous decoder with a 0.25-0.3 dB coding gain.
Hoon RYU Jung-Lok YU Duseok JIN Jun-Hyung LEE Dukyun NAM Jongsuk LEE Kumwon CHO Hee-Jung BYUN Okhwan BYEON
We discuss a new high performance computing service (HPCS) platform that has been developed to provide domain-neutral computing service under the governmental support from “EDucation-research Integration through Simulation On the Net” (EDISON) project. With a first focus on technical features, we not only present in-depth explanations of the implementation details, but also describe the strengths of the EDISON platform against the successful nanoHUB.org gateway. To validate the performance and utility of the platform, we provide benchmarking results for the resource virtualization framework, and prove the stability and promptness of the EDISON platform in processing simulation requests by analyzing several statistical datasets obtained from a three-month trial service in the initiative area of computational nanoelectronics. We firmly believe that this work provides a good opportunity for understanding the science gateway project ongoing for the first time in Republic of Korea, and that the technical details presented here can be served as an useful guideline for any potential designs of HPCS platforms.