Guangbo WANG Jianhua WANG Zhencheng GUO
Self-updating encryption (SUE) is a new cryptographic scheme produced in the recent work of Lee, Choi, Lee, Park and Yung (Asiacrypt 2013) to achieve a time-updating mechanism for revocation. In SUE, a ciphetext and a private key are associated with the time and a user can decrypt a ciphertext only if its time is earlier than that of his private key. But one drawback is the encryption computational overhead scales with the size of the time which makes it a possible bottleneck for some applications. To address this problem, we provide a new technique for the SUE that splits the encryption algorithm into two phases: an offline phase and an online phase. In the offline phase, an intermediate ciphertext header is generated before it knows the concrete encryption time. Then an online phase is implemented to rapidly generate an SUE ciphertext header when the time becomes known by making use of the intermediate ciphertext header. In addition, two different online encryption constructions are proposed in view of different time level taking 50% as the boundary. At last, we prove the security of our scheme and provide the performance analysis which shows that the vast majority of computational overhead can be moved to the offline phase. One motivating application for this technique is resource-constrained mobile devices: the preparation work can be done when the mobile devices are plugged into a power source, then they can later rapidly perform SUE operations on the move without significantly consuming the battery.
Ken-ichi IWATA Mitsuharu ARIMURA
A generalization of compression via substring enumeration (CSE) for k-th order Markov sources with a finite alphabet is proposed, and an upper bound of the codeword length of the proposed method is presented. We analyze the worst case maximum redundancy of CSE for k-th order Markov sources with a finite alphabet. The compression ratio of the proposed method asymptotically converges to the optimal one for k-th order Markov sources with a finite alphabet if the length n of a source string tends to infinity.
Soramichi AKIYAMA Takahiro HIROFUCHI Ryousei TAKANO Shinichi HONIDEN
Live migration plays an important role on improving efficiency of cloud data centers by enabling dynamically replacing virtual machines (VMs) without disrupting services running on them. Although many studies have proposed acceleration mechanisms of live migration, IO-intensive VMs still suffer from long total migration time due to a large amount of page cache. Existing studies for this problem either force the guest OS to delete the page cache before a migration, or they do not consider dynamic characteristics of cloud data centers. We propose a parallel and adaptive transfer of page cache for migrating IO-intensive VMs which (1) does not delete the page cache and is still fast by utilizing the storage area network of a data center, and (2) achieves the shortest total migration time without tuning hand-crafted parameters. Experiments showed that our method reduces total migration time of IO-intensive VMs up to 33.9%.
Wei-Kai CHENG Jui-Hung HUNG Yi-Hsuan CHIU
As the increasing complexity of chip design, reducing both power consumption and clock skew becomes a crucial research topic in clock network synthesis. Among various clock network synthesis approaches, clock tree has less power consumption in comparison with clock mesh structure. In contrast, clock mesh has a higher tolerance of process variation and hence is easier to satisfy the clock skew constraint. To reduce the power consumption of clock mesh network, an effective way is to minimize the wire capacitance of stub wires. In addition, integration of clock gating and register clustering techniques on clock mesh network can further reduce dynamic power consumption. In this paper, under both enable timing constraint and clock skew constraint, we propose a methodology to reduce the switching capacitance by non-uniform clock mesh synthesis, clock gate insertion and register clustering. In comparison with clock mesh synthesis and clock gating technique individually, experimental results show that our methodology can improve both the clock skew and switching capacitance efficiently.
A Field Programmable Gate Array (FPGA) with fine-grained body biasing shows satisfactory static power reduction. Contrarily, the FPGA incurs high overhead because additional body bias selectors and electrical isolation regions are needed to program the threshold voltage (Vt) of elemental circuits such as MUX, buffer and LUT in the FPGA. In this paper, low overhead design of FPGA with fine-grained body biasing is described. The FPGA is designed and fabricated on 65-nm SOTB CMOS technology. By not only adopting a customized design rule specifying that reliability is verified by TEGs but downsizing a body bias selector, the FPGA tile area becomes small by 39% compared with the conventional design, resulting in 900 FPGA tiles with 4,4000 programmable Vt regions. In addition, the chip performance is evaluated by implementing 32-bit binary counter in the supply voltage range of 0.5V from 1.2V. The counter circuit operates at a frequency of 72MHz and 14MHz with the supply voltage of 1.2V and 0.5V respectively. The static power saving of 80% in elemental circuits of the FPGA at 0.5-V supply voltage and 0.5-V reverse body bias voltage is achieved in the best case. In the whole chip including configuration memory and body bias selector in addition to elemental circuits, effective static power reduction around 30% is maintained by applying 0.3-V reverse body bias voltage at each supply voltage.
Yuechao LU Fumihiko INO Kenichi HAGIHARA
This paper proposes a cache-aware optimization method to accelerate out-of-core cone beam computed tomography reconstruction on a graphics processing unit (GPU) device. Our proposed method extends a previous method by increasing the cache hit rate so as to speed up the reconstruction of high-resolution volumes that exceed the capacity of device memory. More specifically, our approach accelerates the well-known Feldkamp-Davis-Kress algorithm by utilizing the following three strategies: (1) a loop organization strategy that identifies the best tradeoff point between the cache hit rate and the number of off-chip memory accesses; (2) a data structure that exploits high locality within a layered texture; and (3) a fully pipelined strategy for hiding file input/output (I/O) time with GPU execution and data transfer times. We implement our proposed method on NVIDIA's latest Maxwell architecture and provide tuning guidelines for adjusting the execution parameters, which include the granularity and shape of thread blocks as well as the granularity of I/O data to be streamed through the pipeline, which maximizes reconstruction performance. Our experimental results show that it took less than three minutes to reconstruct a 20483-voxel volume from 1200 20482-pixel projection images on a single GPU; this translates to a speedup of approximately 1.47 as compared to the previous method.
Peixin CHEN Yilun WU Jinshu SU Xiaofeng WANG
The key escrow problem and high computational cost are the two major problems that hinder the wider adoption of hierarchical identity-based signature (HIBS) scheme. HIBS schemes with either escrow-free (EF) or online/offline (OO) model have been proved secure in our previous work. However, there is no much EF or OO scheme that has been evaluated experimentally. In this letter, several EF/OO HIBS schemes are considered. We study the algorithmic complexity of the schemes both theoretically and experimentally. Scheme performance and practicability of EF and OO models are discussed.
Faster-than-Nyquist (FTN) signaling is investigated for quasi-static flat fading massive multiple-input multiple-output (MIMO) systems. In FTN signaling, pulse trains are sent at a symbol rate higher than the Nyquist rate to increase the transmission rate. As a result, inter-symbol interference occurs inevitably for flat fading channels. This paper assesses the information-theoretically achievable rate of MIMO FTN signaling based on the optimum joint equalization and multiuser detection. The replica method developed in statistical physics is used to evaluate the achievable rate in the large-system limit, where the dimensions of input and output signals tend to infinity at the same rate. An analytical expression of the achievable rate is derived for general modulation schemes in the large-system limit. It is shown that FTN signaling does not improve the channel capacity of massive MIMO systems, and that FTN signaling with quadrature phase-shift keying achieves the channel capacity for all signal-to-noise ratios as the symbol period tends to zero.
Tatsuyuki MATSUSHITA Shinji YAMANAKA Fangming ZHAO
Peer-to-peer (P2P) networks have attracted increasing attention in the distribution of large-volume and frequently accessed content. In this paper, we mainly consider the problem of key leakage in secure P2P content distribution. In secure content distribution, content is encrypted so that only legitimate users can access the content. Usually, users (peers) cannot be fully trusted in a P2P network because malicious ones might leak their decryption keys. If the redistribution of decryption keys occurs, copyright holders may incur great losses caused by free riders who access content without purchasing it. To decrease the damage caused by the key leakage, the individualization of encrypted content is necessary. The individualization means that a different (set of) decryption key(s) is required for each user to access content. In this paper, we propose a P2P content distribution scheme resilient to the key leakage that achieves the individualization of encrypted content. We show the feasibility of our scheme by conducting a large-scale P2P experiment in a real network.
This letter considers a cognitive radio (CR) network where multiple secondary downlinks coexist with a primary network. The primary user (PU) is assumed to be protected by the interference outage constraint with only channel distribution information (CDI) being available at the secondary users (SUs). The power allocation problem to maximize the sum outage capacity of the SUs under the interference outage constraint and the transmit power constraint is investigated. Due to the difficulty in obtaining the optimal solution, we propose a heuristic power allocation algorithm based on the bisection search method that can guarantee to satisfy both the interference outage and the transmit power constraints. It is shown that the proposed algorithm converges fast and outperforms other reference algorithms.
Takumi HONDA Yasuaki ITO Koji NAKANO
In this paper, we present a GPU implementation of bulk multiple-length multiplications. The idea of our GPU implementation is to adopt a warp-synchronous programming technique. We assign each multiple-length multiplication to one warp that consists of 32 threads. In parallel processing using multiple threads, usually, it is costly to synchronize execution of threads and communicate within threads. In warp-synchronous programming technique, however, execution of threads in a warp can be synchronized instruction by instruction without any barrier synchronous operations. Also, inter-thread communication can be performed by warp shuffle functions without accessing shared memory. The experimental results show that our GPU implementation on NVIDIA GeForce GTX 980 attains a speed-up factor of 52 for 1024-bit multiple-length multiplication over the sequential CPU implementation. Moreover, we use this 1024-bit multiple-length multiplication for larger size of bits as a sub-routine. The GPU implementation attains a speed-up factor of 21 for 65536-bit multiple-length multiplication.
Junji TAKEMASA Yuki KOIZUMI Toru HASEGAWA
Energy efficiency is an important requirement to forth-coming NDN (Named Data Networking) networks and caching inherent to NDN is a main driver of energy reduction in such networks. This paper addresses the research question “Does caching really reduce the energy consumption of the entire network?”. To answer the question, we precisely estimate how caching reduces energy consumption of forth-coming commercial NDN networks by carefully considering configurations of NDN routers. This estimation reveals that energy reduction due to caching depends on energy-proportionality of NDN routers.
Yusaku HAYAMIZU Tomohiko YAGYU Miki YAMAMOTO
Communication infrastructures under the influence of the disaster strike, e.g., earthquake, will be partitioned due to the significant damage of network components such as base stations. The communication model of the Internet bases on a location-oriented ID, i.e., IP address, and depends on the DNS (Domain Name System) for name resolution. Therefore such damage remarkably deprives the reachability to the information. To achieve robustness of information retrieval in disaster situation, we try to apply CCN/NDN (Content-Centric Networking/Named-Data Networking) to information networks fragmented by the disaster strike. However, existing retransmission control in CCN is not suitable for the fragmented networks with intermittent links due to the timer-based end-to-end behavior. Also, the intermittent links cause a problem for cache behavior. In order to resolve these technical issues, we propose a new packet forwarding scheme with the dynamic routing protocol which resolves retransmission control problem and cache control scheme suitable for the fragmented networks. Our simulation results reveal that the proposed caching scheme can stably store popular contents into cache storages of routers and improve cache hit ratio. And they also reveal that our proposed packet forwarding method significantly improves traffic load, energy consumption and content retrieval delay in fragmented networks.
Zhengyu ZHU Zhongyong WANG Zheng CHU Di ZHANG
In this letter, we consider robust beamforming optimization for a multiuser multiple-input single-output system with simultaneous wireless information and power transmission (SWIPT) for the case of imperfect channel state information. Adopting the ellipsoidal uncertainty on channel vector, the robust beamforming design are reformulated as convex semi-definite programming (SDP) by rank-one relaxation. To reduce the complexity, an ellipsoidal uncertainty on channel covariance is studied to derive the equivalent form of original problem. Simulation results are provided to demonstrate the effectiveness of the proposed schemes.
We define the topological entropy of the discretized Markov transformations. Previously, we obtained the topological entropy of the discretized dyadic transformation. In this research, we obtain the topological entropy of the discretized golden mean transformation. We also generalize this result and give the topological entropy of the discretized Markov β-transformations with the alphabet Σ={0,1,…,k-1} and the set F={(k-1)c,…,(k-1)(k-1)}(1≤c≤k-1) of (k-c) forbidden blocks, whose underlying transformations exhibit a wide class of greedy β-expansions of real numbers.
Norisato SUGA Toshihiro FURUKAWA
In this letter, we show the new signal power estimation method base on the subspace projection. This work mainly contributes to the SINR estimation problem because, in this research, the signal power estimation is implicitly or explicitly performed. The difference between our method and the conventional method related to this topic is the exploitation of the subspace character of the signals constructing the observed signal. As tools to perform subspace operation, we apply orthogonal projection and oblique projection which can extracts desired parameters. In the proposed scheme, the statistics of the projected observed signal by these projection are used to estimate the parameters.
A Reed-Solomon (RS) decoder is designed based on the pipelined recursive Euclidean algorithm in the key equation solution. While the Euclidean algorithm uses less Galois multipliers than the modified Euclidean (ME) and reformulated inversionless Berlekamp-Massey (RiBM) algorithms, division between two elements in Galois field is required. By implementing the division with a multi-cycle Galois inverter and a serial Galois multiplier, the proposed key equation solver architecture achieves lower complexity than the conventional ME and RiBM based architectures. The proposed RS (255,239) decoder reduces the hardware complexity by 25.9% with 6.5% increase in decoding latency.
Tatsuro KOJO Masashi TAWADA Masao YANAGISAWA Nozomu TOGAWA
Non-volatile memories are paid attention to as a promising alternative to memory design. Data stored in them still may be destructed due to crosstalk and radiation. We can restore the data by using error-correcting codes which require extra bits to correct bit errors. Further, non-volatile memories consume ten to hundred times more energy than normal memories in bit-writing. When we configure them using error-correcting codes, it is quite necessary to reduce writing bits. In this paper, we propose a method to generate a bit-write-reducing code with error-correcting ability. We first pick up an error-correcting code which can correct t-bit errors. We cluster its codeswords and generate a cluster graph satisfying the S-bit flip conditions. We assign a data to be written to each cluster. In other words, we generate one-to-many mapping from each data to the codewords in the cluster. We prove that, if the cluster graph is a complete graph, every data in a memory cell can be re-written into another data by flipping at most S bits keeping error-correcting ability to t bits. We further propose an efficient method to cluster error-correcting codewords. Experimental results show that the bit-write-reducing and error-correcting codes generated by our proposed method efficiently reduce energy consumption. This paper proposes the world-first theoretically near-optimal bit-write-reducing code with error-correcting ability based on the efficient coding theories.
Dooheon YANG Minyoung YOON Sangwook NAM
This paper proposes a multiway power divider for wideband (4:1) beamforming arrays. The divider's input reflection characteristic (S11) is achieved using a multisection stepped-impedance transformer. Moreover, the divider's isolation (S32) bandwidth is increased by incorporating inductors and capacitors in addition to the conventional resistor only isolation networks of the divider. The analysis of the proposed divider and comparison with the previous research model was conducted with four-way configuration. A prototype of a wideband eight-way power divider is fabricated and measured. The measured fractional bandwidth is about 137% from 1.3 to 6.8GHz with the -10dB criteria of input reflection (S11), output reflection (S22) and isolation (S32) simultaneously.
In this paper, we propose a novel decomposition method to segment multiple object regions simultaneously in cluttered videos. This method formulates object regions segmentation as a labeling problem in which we assign object IDs to the superpixels in a sequence of video frames so that the unary color matching cost is low, the assignment induces compact segments, and the superpixel labeling is consistent through time. Multi-object segmentation in a video is a combinatorial problem. We propose a binary linear formulation. Since the integer linear programming is hard to solve directly, we relax it and further decompose the relaxation into a sequence of much simpler max-flow problems. The proposed method is guaranteed to converge in a finite number of steps to the global optimum of the relaxation. It also has a high chance to obtain all integer solution and therefore achieves the global optimum. The rounding of the relaxation result gives an N-approximation solution, where N is the number of objects. Comparing to directly solving the integer program, the novel decomposition method speeds up the computation by orders of magnitude. Our experiments show that the proposed method is robust against object pose variation, occlusion and is more accurate than the competing methods while at the same time maintains the efficiency.