In this paper, we explore possibilities and difficulties to prove super-quadratic formula size lower bounds from the following aspects. First, we consider recursive Boolean functions and prove their general formula size upper bounds. We also discuss recursive Boolean functions based on exact 2-bit functions. We show that their formula complexity are at least Ω(n2). Hence they can be candidate Boolean functions to prove super-quadratic formula size lower bounds. Next, we consider the reason of the difficulty of resolving the formula complexity of the majority function in contrast with the parity function. In particular, we discuss the structure of an optimal protocol partition for the Karchmer-Wigderson communication game.
Toru MANO Takeru INOUE Kimihiro MIZUTANI Osamu AKASHI
Network virtualization is one of the promising technologies that can increase flexibility, diversity, and manageability of networks. Building optimal virtual networks across multiple domains is getting much attention, but existing studies were based on an unrealistic assumption, that is, providers' private information can be disclosed; as is well known, providers never actually do that. In this paper, we propose a new method that solves this multi-domain problem without revealing providers' private information. Our method uses an advanced secure computation technique called multi-party computation (MPC). Although MPC enables existing unsecured methods to optimize virtual networks securely, it requires very large time to finish the optimization due to the MPC's complex distributed protocols. Our method, in contrast, is designed to involve only a small number of MPC operations to find the optimal solution, and it allows providers to execute a large part of the optimization process independently without heavy distributed protocols. Evaluation results show that our method is faster than an existing method enhanced with MPC by several orders of magnitude. We also unveil that our method has the same level of embedding cost.
Ahmed SHALABY Ikki FUJIWARA Michihiro KOIBUCHI
Recently network bandwidth becomes a performance concern particularly for collective communication since bisection bandwidths of supercomputers become far less than their full bisection bandwidths. In this context we propose the use of a network coding technique to reduce the number of unicasts and the size of data transferred in latency-sensitive collective communications in supercomputers. Our proposed network coding scheme has a hierarchical multicasting structure with intra-group and inter-group unicasts. Quantitative analysis show that the aggregate path hop counts by our hierarchical network coding decrease as much as 94% when compared to conventional unicast-based multicasts. We validate these results by cycle-accurate network simulations. In 1,024-switch networks, the network reduces the execution time of collective communications as much as 70%. We also show that our hierarchical network coding is beneficial for any packet size.
Golf is a solitaire game, where the object is to move all cards from a 5×8 rectangular layout of cards to the foundation. A top card in each column may be moved to the foundation if it is either one rank higher or lower than the top card of the foundation. If no cards may be moved, then the top card of the stock may be moved to the foundation. We prove that the generalized version of Golf Solitaire is NP-complete.
Keehang KWON Kyunghwan PARK Mi-Young PARK
To represent interactive objects, we propose a choice-disjunctive declaration statement of the form $S add R$ where S, R are the (procedure or field) declaration statements within a class. This statement has the following semantics: request the user to choose one between S and R when an object of this class is created. This statement is useful for representing interactive objects that require interaction with the user.
Kei KINOSHITA Yoshiki YAMAGUCHI Daisuke TAKANO Tomoyuki OKAMURA Tetsuhiko YAO
This paper seeks to improve power-performance efficiency of embedded systems by the use of dynamic reconfiguration. Programmable logic devices (PLDs) have the competence to optimize the power consumption by the use of partial and/or dynamic reconfiguration. It is a non-exclusive approach, which can use other power-reduction techniques simultaneous, and thus it is applicable to a myriad of systems. The power-performance improvement by dynamic reconfiguration was evaluated through an augmented reality system that translates Japanese into English. It is a wearable and mobile system with a head-mounted display (HMD). In the system, the computing core detects a Japanese word from an input video frame and the translated term will be output to the HMD. It includes various image processing approaches such as pattern recognition and object tracking, and these functions run sequentially. The system does not need to prepare all functions simultaneously, which provides a function by reconfiguration only when it is needed. In other words, by dynamic reconfiguration, the spatiotemporal module-based pipeline can introduce the reduction of its circuit amount and power consumption compared to the naive approach. The approach achieved marked improvements; the computational speed was the same but the power consumption was reduced to around $rac{1}{6}$.
Yu PENG Shouyi YIN Leibo LIU Shaojun WEI
Coarse-grained Reconfigurable Architecture (CGRA) is a promising mobile computing platform that provides both high performance and high energy efficiency. In an application, loop nests are usually mapped onto CGRA for further acceleration, so optimizing the mapping is an important goal for design of CGRAs. Moreover, obviously almost all of mobile devices are powered by batteries, how to reduce energy consumption also becomes one of primary concerns in using CGRAs. This paper makes three contributions: a) Proposing an energy consumption model for CGRA; b) Formulating loop nests mapping problem to minimize the battery charge loss; c) Extract an efficient heuristic algorithm called BPMap. Experiment results on most kernels of the benchmarks and real-life applications show that our methods can improve the performance of the kernels and lower the energy consumption.
The fast multipole method (FMM) for N-body simulations is attracting much attention since it requires minimal communication between computing nodes. We implemented hardware pipelines specialized for the FMM on an FPGA device, the GRAPE-9. An N-body simulation with 1.6×107 particles ran 16 times faster than that on a CPU. Moreover the particle-to-particle stage of the FMM on the GRAPE-9 executed 2.5 times faster than on a GPU in a limited case.
Forty Thieves is a solitaire game with two 52-card decks. The object is to move all cards from ten tableau piles of four cards to eight foundations. Each foundation is built up by suit from ace to king of the same suit, and each tableau pile is built down by suit. You may move the top card from any tableau pile to a tableau or foundation pile, and from the stock to a foundation pile. We prove that the generalized version of Forty Thieves is NP-complete.
Zule XU Seungjong LEE Masaya MIYAHARA Akira MATSUZAWA
We present a time-to-digital converter (TDC) achieving sub-picosecond resolution and high precision for all-digital phase-locked-loops (ADPLLs). The basic idea is using a charge pump to translate time interval into charge, and a successive-approximation-register-analog-to-digital converter (SAR-ADC) to quantize the charge. With this less complex configuration, high resolution, high precision, low power, and small area can be achieved all together. We analyzed the noise contribution from the charge pump and describe detailed design and implementation for sizing the capacitor and transistors, with the awareness of noise and linearity. The analysis demonstrates the proposed TDC capable of sub-picosecond resolution and high precision. Two prototype chips were fabricated in 65nm CMOS with 0.06mm2, and 0.018mm2 core areas, respectively. The achieved resolutions are 0.84ps and 0.80ps, in 8-bit and 10-bit range, respectively. The measured single-shot-precisions range from 0.22 to 0.6ps, and from 0.66 to 1.04ps, respectively, showing consistent trends with the analysis. Compared with state-of-the-arts, best performance balance has been achieved.
Up until now, the best public key encryption with multi-dimensional range query (PKMDRQ) scheme has two problems which need to be resolved. One is that the scheme is selectively secure. The other is that the time of decryption is long. To address these problems, we present a method of converting a predicate encryption supporting inner product (IPE) scheme into a PKMDRQ scheme. By taking advantage of this approach, an instance is also proposed. The comparison between the previous work and ours shows that our scheme is more efficient over the time complexity. Moreover, our scheme is adaptively secure.
Keiko TAGUCHI Andrew FINCH Seiichi YAMAMOTO Eiichiro SUMITA
In this article we present a novel corpus-based method for inducing romanization systems for languages through a bilingual alignment of transliteration word pairs. First, the word pairs are aligned using a non-parametric Bayesian approach, and then for each grapheme sequence to be romanized, a particular romanization is selected according to a user-specified criterion. As far as we are aware, this paper is the only one to describe a method for automatically deriving complete romanization systems. Unlike existing human-derived romanization systems, the proposed method is able to discover induced romanization systems tailored for specific purposes, for example, for use in data mining, or efficient user input methods. Our experiments study the romanization of four totally different languages: Russian, Japanese, Hindi and Myanmar. The first two languages already have standard romanization systems in regular use, Hindi has a large number of diverse systems, and Myanmar has no standard system for romanization. We compare our induced romanization system to existing systems for Russian and Japanese. We find that the systems so induced are almost identical to Russian, and 69% identical to Japanese. We applied our approach to the task of transliteration mining, and used Levenshtein distance as the romanization selection criterion. Our experiments show that our induced romanization system was able to match the performance of the human created system for Russian, and offer substantially improved mining performance for Japanese. We provide an analysis of the mechanism our approach uses to improve mining performance, and also analyse the differences in characteristics between the induced system for Japanese and the official Japanese Nihon-shiki system. In order to investigate the limits of our approach, we studied the romanization of Myanmar, a low-resource language with a large vocabulary of graphemes. We estimate the approximate corpus size required to effectively romanize the most frequency k graphemes in the language for all values of k up to 1800.
Lifeng HE Bin YAO Xiao ZHAO Yun YANG Yuyan CHAO Atsushi OHTA
This paper proposes a graph-theory-based Euler number computing algorithm. According to the graph theory and the analysis of a mask's configuration, the Euler number of a binary image in our algorithm is calculated by counting four patterns of the mask. Unlike most conventional Euler number computing algorithms, we do not need to do any processing of the background pixels. Experimental results demonstrated that our algorithm is much more efficient than conventional Euler number computing algorithms.
Keisuke DOHI Koji OKINA Rie SOEJIMA Yuichiro SHIBATA Kiyoshi OGURI
In this paper, we discuss performance modeling of 3-D stencil computing on an FPGA accelerator with a high-level synthesis environment, aiming for efficient exploration of user-space design parameters. First, we analyze resource utilization and performance to formulate these relationships as mathematical models. Then, in order to evaluate our proposed models, we implement heat conduction simulations as a benchmark application, by using MaxCompiler, which is a high-level synthesis tool for FPGAs, and MaxGenFD, which is a domain specific framework of the MaxCompiler for finite-difference equation solvers. The experimental results with various settings of architectural design parameters show the best combination of design parameters for pipeline structure can be systematically found by using our models. The effects of changing arithmetic accuracy and using data stream compression are also discussed.
Jiang LI Yusuke ATSUMARI Hiromasa KUBO Yuichi OGISHIMA Satoru YOKOTA Hakaru TAMUKOH Masatoshi SEKINE
A processing system with multiple field programmable gate array (FPGA) cards is described. Each FPGA card can interconnect using six I/O (up, down, left, right, front, and back) terminals. The communication network among FPGAs is scalable according to user design. When the system operates multi-dimensional applications, transmission efficiency among FPGA improved through user-adjusted dimensionality and network topologies for different applications. We provide a fast and flexible circuit configuration method for FPGAs of a multi-dimensional FPGA array. To demonstrate the effectiveness of the proposed method, we assess performance and power consumption of a circuit that calculated 3D Poisson equations using the finite difference method.
Zhangjie FU Xingming SUN Qi LIU Lu ZHOU Jiangang SHU
Cloud computing is becoming increasingly popular. A large number of data are outsourced to the cloud by data owners motivated to access the large-scale computing resources and economic savings. To protect data privacy, the sensitive data should be encrypted by the data owner before outsourcing, which makes the traditional and efficient plaintext keyword search technique useless. So how to design an efficient, in the two aspects of accuracy and efficiency, searchable encryption scheme over encrypted cloud data is a very challenging task. In this paper, for the first time, we propose a practical, efficient, and flexible searchable encryption scheme which supports both multi-keyword ranked search and parallel search. To support multi-keyword search and result relevance ranking, we adopt Vector Space Model (VSM) to build the searchable index to achieve accurate search results. To improve search efficiency, we design a tree-based index structure which supports parallel search to take advantage of the powerful computing capacity and resources of the cloud server. With our designed parallel search algorithm, the search efficiency is well improved. We propose two secure searchable encryption schemes to meet different privacy requirements in two threat models. Extensive experiments on the real-world dataset validate our analysis and show that our proposed solution is very efficient and effective in supporting multi-keyword ranked parallel searches.
Ryo KIKUCHI Dai IKARASHI Koki HAMADA Koji CHIDA
Secret sharing (SS) has been extensively studied as for both secure data storage and a fundamental building block for multiparty computation (MPC). Recently, Kikuchi et al. proposed a passively and unconditionally secure conversion protocol that converts from a share of a ramp scheme to another of homomorphic SS scheme. The share-size of the ramp scheme is small, and the homomorphic SS scheme is a class of SS schemes that includes Shamir's and replicated SS schemes, which are convenient for MPC. Therefore, their protocol is a conversion from an SS scheme whose share-size is small to MPC-friendly SS schemes, and can be applied to reduce the amount of data storage while maintaining extendibility to MPC. We propose five unconditionally and actively secure protocols in the honest majority. In this paper, we consider a privacy and correctness as security requirement and does not consider a robustness: A cheat caused by an active adversary must be detected. These protocols consist of two conversion protocols, two reveal protocols and a protocol generating specific randomness. Main protocols among them are two conversion protocols for bilateral conversion between a ramp scheme and linear SS scheme, and the others are building blocks of the main protocols. Linear SS scheme is a subset of homomorphic SS scheme but includes both Shamir's and replicated SS schemes. Therefore, these main protocols are conversions between an SS scheme whose share-size is small to MPC-friendly SS schemes. These main protocols are unconditionally and actively secure so if MPC protocols used after the conversion are actively secure, the whole system involving SS scheme, conversion, and MPC protocols can be unconditionally and actively secure by using our main protocols. One of our two main protocols is the first to convert from MPC-friendly SS schemes to the ramp scheme. This enhances applications, such as secure backup, of the conversion protocol. Other than the two main protocols, we propose a protocol for generating specific randomnesses and two reveal protocols as building blocks. The latter two reveal protocols are actively and unconditionally secure in the honest majority and requires O(n||F||)-bit communication per revealing, and we believe that it is independently interest.
Pham Thanh GIANG Kenji NAKAGAWA
In this paper, we propose a new cross-layer scheme Cooperation between channel Access control and TCP Rate Adaptation (CATRA) aiming to manage TCP flow contention in multi-hop ad hoc networks. CATRA scheme collects useful information from MAC and physical layers to estimate channel utilization of the station. Based on this information, we adjust Contention Window (CW) size to control the contention between stations. It can also achieve fair channel access for fair channel access of each station and the efficient spatial channel usage. Moreover, the fair value of bandwidth allocation for each flow is calculated and sent to the Transport layer. Then, we adjust the sending rate of TCP flow to solve the contention between flows and the throughput of each flow becomes fairer. The performance of CATRA is examined on various multi-hop network topologies by using Network Simulator (NS-2).
Hsiao-Yun LI Shiu-Cheng CHEN Jia-Shiang FU
An artificial transmission line with variable capacitors as its shunt elements, also known as a nonlinear transmission line, can be used to generate pulsed waveforms with short durations. In this work, the variable capacitors are implemented using ferroelectric materials. Analysis and experimental results of such a ferroelectric-based artificial transmission line are presented. The differential equation that describes the nonlinear transmission line is derived and solved. The analytical expression for the solitary waves propagating along the line is found. An artificial transmission line is fabricated using thin-film barium--strontium--titanate capacitors and commercially available chip inductors. The fabrication process of the ferroelectric-based artificial transmission line is described. On-wafer characterization of the line is performed. Measurement results show that, with proper dc bias and substantial input power, a sinusoidal input waveform turns into a bell-shaped pulse train at the output, demonstrating the pulse-shaping capability of the ferroelectric-based artificial transmission line.
Ryo KIKUCHI Koji CHIDA Dai IKARASHI Wakaha OGATA Koki HAMADA Katsumi TAKAHASHI
Secret sharing scheme (SS) has been extensively studied since SSs are important not only for secure data storage but also as a fundamental building block for multiparty computation (MPC). For an application to secure data storage, the share size of SS is an important factor. For an application to a building block for MPC, the extendibility to MPC is needed. Computationally secure SSs and a ramp scheme have a small share size but there have been few studies concerning their MPC. In contrast, there have been many studies about MPC on Shamir's and replicated SSs while their share size is large. We consider an application scenario of SS such as applying SSs to secure data storage service with MPC. In this application, users store their data in servers through SS, and sometimes the servers perform MPC as an optional feature. In this case, the extendibility to MPC is needed and good code-efficiency is preferable. We propose a new computational SS, and show how to convert shares of our SS and a ramp SS to those of multiparty-friendly SS such as Shamir's and replicated SS. This enables one to secretly-share data compactly and extend secretly-shared data to MPC if needed.