Yan DING Huaimin WANG Lifeng WEI Songzheng CHEN Hongyi FU Xinhai XU
MapReduce is commonly used as a parallel massive data processing model. When deploying it as a service over the open systems, the computational integrity of the participants is becoming an important issue due to the untrustworthy workers. Current duplication-based solutions can effectively solve non-collusive attacks, yet most of them require a centralized worker to re-compute additional sampled tasks to defend collusive attacks, which makes the worker a bottleneck. In this paper, we try to explore a trusted worker scheduling framework, named VAWS, to detect collusive attackers and assure the integrity of data processing without extra re-computation. Based on the historical results of verification, we construct an Integrity Attestation Graph (IAG) in VAWS to identify malicious mappers and remove them from the framework. To further improve the efficiency of identification, a verification-couple selection method with the IAG guidance is introduced to detect the potential accomplices of the confirmed malicious worker. We have proven the effectiveness of our proposed method on the improvement of system performance in theoretical analysis. Intensive experiments show the accuracy of VAWS is over 97% and the overhead of computation is closed to the ideal value of 2 with the increasing of the number of map tasks in our scheme.
Bu-Ching LIN Juinn-Dar HUANG Jing-Yang JOU
The notion of multiple constant multiplication (MCM) is extensively adopted in digital signal processing (DSP) applications such as finite impulse filter (FIR) designs. A set of adders is utilized to replace regular multipliers for the multiplications between input data and constant filter coefficients. Though many algorithms have been proposed to reduce the total number of adders in an MCM block for area minimization, they do not consider the actual bitwidth of each adder, which may not estimate the hardware cost well enough. Therefore, in this article we propose a bitwidth-aware MCM optimization algorithm that focuses on minimizing the total number of adder bits rather than the adder count. It first builds a subexpression graph based on the given coefficients, derives a set of constraints for adder bitwidth minimization, and then optimally solves the problem through integer linear programming (ILP). Experimental results show that the proposed algorithm can effectively reduce the required adder bit count and outperforms the existing state-of-the-art techniques.
Takayuki NOZAKI Kenta KASAI Kohichi SAKANIWA
In this paper, we propose a message passing decoding algorithm which lowers decoding error rates in the error floor regions for non-binary low-density parity-check (LDPC) codes transmitted over the binary erasure channel (BEC) and the memoryless binary-input output-symmetric (MBIOS) channels. In the case for the BEC, this decoding algorithm is a combination with belief propagation (BP) decoding and maximum a posteriori (MAP) decoding on zigzag cycles, which cause decoding errors in the error floor region. We show that MAP decoding on the zigzag cycles is realized by means of a message passing algorithm. Moreover, we extend this decoding algorithm to the MBIOS channels. Simulation results demonstrate that the decoding error rates in the error floor regions by the proposed decoding algorithm are lower than those by the BP decoder.
Yusuke KOZAWA Toshiyuki AMAGASA Hiroyuki KITAGAWA
Probabilistic frequent itemset mining, which discovers frequent itemsets from uncertain data, has attracted much attention due to inherent uncertainty in the real world. Many algorithms have been proposed to tackle this problem, but their performance is not satisfactory because handling uncertainty incurs high processing cost. To accelerate such computation, we utilize GPUs (Graphics Processing Units). Our previous work accelerated an existing algorithm with a single GPU. In this paper, we extend the work to employ multiple GPUs. Proposed methods minimize the amount of data that need to be communicated among GPUs, and achieve load balancing as well. Based on the methods, we also present algorithms on a GPU cluster. Experiments show that the single-node methods realize near-linear speedups, and the methods on a GPU cluster of eight nodes achieve up to a 7.1 times speedup.
Koichi MORIYAMA Simón Enrique ORTIZ BRANCO Mitsuhiro MATSUMOTO Ken-ichi FUKUI Satoshi KURIHARA Masayuki NUMAO
In standard fighting videogames, users usually prefer playing against other users rather than against machines because opponents controlled by machines are in a rut and users can memorize their behaviors after repetitive plays. On the other hand, human players adapt to each other's behaviors, which makes fighting videogames interesting. Thus, in this paper, we propose an artificial agent for a fighting videogame that can adapt to its users, allowing users to enjoy the game even when playing alone. In particular, this work focuses on combination attacks, or combos, that give great damage to the opponent. The agent treats combos independently, i.e., it is composed of a subagent for predicting combos the user executes, that for choosing combos the agent executes, and that for controlling the whole agent. Human users evaluated the agent compared to static opponents, and the agent received minimal negative ratings.
Akira FUJIMAKI Masamitsu TANAKA Ryo KASAGI Katsumi TAKAGI Masakazu OKADA Yuhi HAYAKAWA Kensuke TAKATA Hiroyuki AKAIKE Nobuyuki YOSHIKAWA Shuichi NAGASAWA Kazuyoshi TAKAGI Naofumi TAKAGI
We describe a large-scale integrated circuit (LSI) design of rapid single-flux-quantum (RSFQ) circuits and demonstrate several reconfigurable data-path (RDP) processor prototypes based on the ISTEC Advanced Process (ADP2). The ADP2 LSIs are made up of nine Nb layers and Nb/AlOx/Nb Josephson junctions with a critical current density of 10kA/cm2, allowing higher operating frequencies and integration. To realize truly large-scale RSFQ circuits, careful design is necessary, with several compromises in the device structure, logic gates, and interconnects, balancing the competing demands of integration density, design flexibility, and fabrication yield. We summarize numerical and experimental results related to the development of a cell-based design in the ADP2, which features a unit cell size reduced to 30-µm square and up to four strip line tracks in the unit cell underneath the logic gates. The ADP LSIs can achieve ∼10 times the device density and double the operating frequency with the same power consumption per junction as conventional LSIs fabricated using the Nb four-layer process. We report the design and test results of RDP processor prototypes using the ADP2 cell library. The RDP processors are composed of many arrays of floating-point units (FPUs) and switch networks, and serve as accelerators in a high-performance computing system. The prototypes are composed of two-dimensional arrays of several arithmetic logic units instead of FPUs. The experimental results include a successful demonstration of full operation and reconfiguration in a 2×2 RDP prototype made up of 11.5k junctions at 45GHz after precise timing design. Partial operation of a 4×4 RDP prototype made up of 28.5k-junctions is also demonstrated, indicating the scalability of our timing design.
Yusuke MIZUNO Kazunobu KONDO Takanori NISHINO Norihide KITAOKA Kazuya TAKEDA
Blind source separation is a technique that can separate sound sources without such information as source location, the number of sources, and the utterance content. Multi-channel source separation using many microphones separates signals with high accuracy, even if there are many sources. However, these methods have extremely high computational complexity, which must be reduced. In this paper, we propose a computational complexity reduction method for blind source separation based on frequency domain independent component analysis (FDICA) and examine temporal data that are effective for source separation. A frame with many sound sources is effective for FDICA source separation. We assume that a frame with a low kurtosis has many sound sources and preferentially select such frames. In our proposed method, we used the log power spectrum and the kurtosis of the magnitude distribution of the observed data as selection criteria and conducted source separation experiments using speech signals from twelve speakers. We evaluated the separation performances by the signal-to-interference ratio (SIR) improvement score. From our results, the SIR improvement score was 24.3dB when all the frames were used, and 23.3dB when the 300 frames selected by our criteria were used. These results clarified that our proposed selection criteria based on kurtosis and magnitude is effective. Furthermore, we significantly reduced the computational complexity because it is proportional to the number of selected frames.
This paper proposes the state observer design for feedforward nonlinear systems with delayed output. It is shown that by using the Lyapunov-Krasovskii functional, the proposed design method ensures the asymptotic stability of estimation error for an arbitrarily large output delay. Finally, an illustrative example is given in order to show the effectiveness of our design method.
Hyun-Tae KIM Jinung AN Chang Wook AHN
In this paper, a new evolutionary approach to recommender systems is presented. The aim of this work is to develop a new recommendation method that effectively adapts and immediately responds to the user's preference. To this end, content-based filtering is judiciously utilized in conjunction with interactive evolutionary computation (IEC). Specifically, a fitness-based truncation selection and a feature-wise crossover are devised to make full use of desirable properties of promising items within the IEC framework. Moreover, to efficiently search for proper items, the content-based filtering is modified in cooperation with data grouping. The experimental results demonstrate the effectiveness of the proposed approach, compared with existing methods.
Masafumi TAKEMATSU Junichi HONDA Yuki KIMURA Kazunori UCHIDA
This paper is concerned with a method to reduce the computation time of the Discrete Ray Tracing Method (DRTM) which was proposed to numerically analyze electromagnetic fields above Random Rough Surfaces (RRSs). The essence of DRTM is firstly to search rays between source and receiver and secondly to compute electric fields based on the traced rays. In the DRTM, the method discretizes not only RRSs but also ray tracing procedure. In order to reduce computation time for ray searching, the authors propose to modify the conventional algorithm discretizing RRSs with equal intervals to a new one which discretizes them with unequal intervals according to their profiles. The authors also use an approximation of Fresnel function which enables us to reduce field computation time. The authors discuss the reduction rate for computation time of the DRTM from the numerical view points of ray searching and field computation. Finally, this paper shows how much computation time is reduced by the new method.
Per-User Unitary Rate Control (PU2RC) performs poorly when the number of users is small and suffers from the sum-rate ceiling effect in the high signal-to-noise ratio (SNR) regime. In this paper, we propose a multimode transmission (MMT) strategy to overcome these inherent shortcomings of PU2RC. In the proposed MMT strategy, the transmitter finds out the optimal transmission mode and schedules users using each user's instantaneous channel quality information (CQI) parameters. First we assume that each user's CQI parameters are perfectly reported in order to introduce the proposed MMT strategy. Then we consider the quantization of CQI parameters using codebooks designed by the Lloyd algorithm. Moreover, we modify the CQI parameters to improve the system's robustness against quantization error. Finally, in order to reduce the quantization error, we design a hierarchical codebook to jointly quantize the modified CQI parameters by considering the correlation between them. Simulation results show that the proposed MMT strategy effectively overcomes the shortcomings of PU2RC and is robust against low quantization level of CQI parameters.
In this paper, we study the impact of opportunistic user scheduling on the outage probability of cognitive radio (CR) multiple-input multiple-output (MIMO) systems in the high power region where the peak transmit power constraint is higher than the peak interference constraint. The primary contributions of this paper are the derivation of exact closed-form expressions of the proposed scheduled CR-MIMO systems for outage probability and asymptotic analysis to quantify the diversity order and signal to noise ratio (SNR) gain. Through exact analytical results, we provide the achievable outage probability of the proposed scheduled systems as a function of SNR. Also, through asymptotic analysis, we show that the scheduled CR-MIMO systems provide some diversity order gain over the non-scheduled CR-MIMO systems which comes from multi-user diversity (MUD). Also, the SNR gain of the proposed scheduled systems is identical to that of the non-scheduled CR-MIMO systems.
The Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM) are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. It is assumed that warps (or groups of threads) on the DMM and the UMM work synchronously in a round-robin manner. However, warps work asynchronously in real GPUs, in the sense that they are randomly (or arbitrarily) dispatched for execution. The first contribution of this paper is to introduce asynchronous versions of these models in which warps are arbitrarily dispatched. In addition, we assume that threads can execute the “syncthreads” instruction for barrier synchronization. Since the barrier synchronization operation may be costly, we should evaluate and minimize the number of barrier synchronization operations executed by parallel algorithms. The second contribution of this paper is to show a parallel algorithm to the sum of n numbers in optimal computing time and few barrier synchronization steps. Our parallel algorithm computes the sum of n numbers in O(n/w+llog n) time units and O(log l/log w+log log w) barrier synchronization steps using wl threads on the asynchronous UMM with width w and latency l. Since the computation of the sum takes at least Ω(n/w+llog n) time units, this algorithm is time optimal. Finally, we show that the prefix-sums of n numbers can also be computed in O(n/w+llog n) time units and O(log l/log w+log log w) barrier synchronization steps using wl threads.
Shuichi INOKUCHI Takahiro ITO Mitsuhiko FUJIO Yoshihiro MIZOGUCHI
We introduce the notion of 'Composition', 'Union' and 'Division' of cellular automata on groups. A kind of notions of compositions was investigated by Sato [10] and Manzini [6] for linear cellular automata, we extend the notion to general cellular automata on groups and investigated their properties. We observe the all unions and compositions generated by one-dimensional 2-neighborhood cellular automata over Z2 including non-linear cellular automata. Next we prove that the composition is right-distributive over union, but is not left-distributive. Finally, we conclude by showing reformulation of our definition of cellular automata on group which admit more than three states. We also show our formulation contains the representation using formal power series for linear cellular automata in Manzini [6].
Hiroshi YAMADA Shuntaro TONOSAKI Kenji KONO
Infrastructure as a Service (IaaS), a form of cloud computing, is gaining attention for its ability to enable efficient server administration in dynamic workload environments. In such environments, however, updating the software stack or content files of virtual machines (VMs) is a time-consuming task, discouraging administrators from frequently enhancing their services and fixing security holes. This is because the administrator has to upload the whole new disk image to the cloud platform via the Internet, which is not yet fast enough that large amounts of data can be transferred smoothly. Although the administrator can apply incremental updates directly to the running VMs, he or she has to carefully consider the type of update and perform operations on all running VMs, such as application restarts. This is a tedious and error-prone task. This paper presents a technique for synchronizing VMs with less time and lower administrative burden. We introduce the Virtual Disk Image Repository, which runs on the cloud platform and automatically updates the virtual disk image and the running VMs with only the incremental update information. We also show a mechanism that performs necessary operations on the running VM such as restarting server processes, based on the types of files that are updated. We implement a prototype on Linux 2.6.31.14 and Amazon Elastic Compute Cloud. An experiment shows that our technique can synchronize VMs in an order-of-magnitude shorter time than the conventional disk-image-based VM method. Also, we discuss limitations of our technique and some directions for more efficient VM updates.
Takahiro ITO Daisuke ANZAI Jianqing WANG
Tracking capsule endoscope location is one of the promising applications offered by implant body area networks (BANs). When tracking the capsule endoscope location, i.e., continuously localize it, it is effective to take the weighted sum of its past locations to its present location, in other words, to low-pass filter its past locations. Furthermore, creating an exact mathematical model of location transition will improve tracking performance. Therefore, in this paper, we investigate two tracking methods with received signal strength indicator (RSSI)-based localization in order to solve the capsule endoscope location tracking problem. One of the two tracking methods is finite impulse response (FIR) filter-based tracking, which tracks the capsule endoscope location by averaging its past locations. The other one is particle filter-based tracking in order to deal with a nonlinear transition model on the capsule endoscope. However, the particle filter requires that the particle weight is calculated according to its condition (namely, its likelihood value), while the transition model on capsule endoscope location has some model parameters which cannot be estimated from the received wireless signal. Therefore, for the purpose of applying the particle filter to capsule endoscope tracking, this paper makes some modifications in the resampling step of the particle filter algorithm. Our computer simulation results demonstrate that the two tracking methods can improve the performance as compared with the conventional maximum likelihood (ML) localization. Furthermore, we confirm that the particle filter-based tracking outperforms the conventional FIR filter-based tracking by taking the realistic capsule endoscope transition model into consideration.
Daisuke OCHI Hideaki KIMATA Yoshinori KUSACHI Kosuke TAKAHASHI Akira KOJIMA
Due to the recent progress made in camera and network environments, on-line video services enable people around the world to watch or share high-quality HD videos that can record a wider angle without losing objects' details in each image. As a result, users of these services can watch videos in different ways with different ROIs (Regions of Interest), especially when there are multiple objects in a scene, and thus there are few common ways for them to transfer their impressions for each scene directly. Posting messages is currently the usual way but it does not sufficiently enable all users to transfer their impressions. To transfer a user's impressions directly and provide users with a richer video watching experience, we propose a system that enables them to extract their favorite parts of videos as ROI trajectories through simple and intuitive manipulation of their tablet device. It also enables them to share a recorded trajectory with others after stabilizing it in a manner that should be satisfactory to every user. Using statistical analysis of user manipulations, we have demonstrated an approach to trajectory stabilization that can eliminate undesirable or uncomfortable elements due to tablet-specific manipulations. The system's validity has been confirmed by subjective evaluations.
This paper proposes a new optimization problem and several implementation algorithms for energy-efficient clouds where energy efficiency is measured by the number of physical machines that can be removed from operation and turned off. The optimization problem is formulated is such a way that solutions are considered favorable not only when the number of migrations is minimized but also when the resulting layout has more free physical machines which can therefore be turned off to save electricity.
Traditional face swapping technologies require that the faces of source images and target images have similar pose and appearance (usually frontal). For overcoming this limit in applications this paper presents a pose-free face swapping method based on personalized 3D face modeling. By using a deformable 3D shape morphable model, a photo-realistic 3D face is reconstructed from a single frontal view image. With the aid of the generated 3D face, a virtual source image of the person with the same pose as the target face can be rendered, which is used as a source image for face swapping. To solve the problem of illumination difference between the target face and the source face, a color transfer merging method is proposed. It outperforms the original color transfer method in dealing with the illumination gap problem. An experiment shows that the proposed face reconstruction method is fast and efficient. In addition, we have conducted experiments of face swapping in a variety of scenarios such as children's story book, role play, and face de-identification stripping facial information used for identification, and promising results have been obtained.
Youwen ZHU Tsuyoshi TAKAGI Rong HU
Recently, Yuan et al. (IEEE Infocom'13, pp.2652-2660) proposed an efficient secure nearest neighbor (SNN) search scheme on encrypted cloud database. Their scheme is claimed to be secure against the collusion attack of query clients and cloud server, because the colluding attackers cannot infer the encryption/decryption key. In this letter, we observe that the encrypted dataset in Yuan's scheme can be broken by the collusion attack without deducing the key, and present a simple but powerful attack to their scheme. Experiment results validate the high efficiency of our attacking approach. Additionally, we also indicate an upper bound of collusion-resistant ability of any accurate SNN query scheme.