Chien-Hui LIAO Charles H.-P. WEN
Hotspots occur frequently in 3D multi-core processors (3D-MCPs), and they may adversely impact both the reliability and lifetime of a system. We present a new thermally constrained task scheduler based on a thermal-pattern-aware voltage assignment (TPAVA) to reduce hotspots in and optimize the performance of 3D-MCPs. By analyzing temperature profiles of different voltage assignments, TPAVA pre-emptively assigns different initial operating-voltage levels to cores for reducing temperature increase in 3D-MCPs. The proposed task scheduler consists of an on-line allocation strategy and a new voltage-scaling strategy. In particular, the proposed on-line allocation strategy uses the temperature-variation rates of the cores and takes into two important thermal behaviors of 3D-MCPs that can effectively minimize occurrences of hotspots in both thermally homogeneous and heterogeneous 3D-MCPs. Furthermore, a new vertical-grouping voltage scaling (VGVS) strategy that considers thermal correlation in 3D-MCPs is used to handle thermal emergencies. Experimental results indicate that, when compared to a previous online thermally constrained task scheduler, the proposed task scheduler can reduce hotspot occurrences by approximately 66% (71%) and improve throughput by approximately 8% (2%) in thermally homogeneous (heterogeneous) 3D-MCPs. These results indicate that the proposed task scheduler is an effective technique for suppressing hotspot occurrences and optimizing throughput for 3D-MCPs subject to thermal constraints.
Rong CHEN Cunqian FENG Sisan HE Yi RAO
The extraction of micro-motion parameters is deeply influenced by the precision of estimation on translational motion parameters. Based on the periodicity of micro-motion, the quadratic polynomial fitting is carried out among range delays to align envelope. The micro-motion component of phase information is eliminated by conjugate multiplication after which the translational motion parameters are estimated. Then the translational motion is precisely compensated through the third order polynomial fitting. Results of simulation demonstrate that the algorithm put forward here can realize the precise compensation for translational motion parameters even under an environment with low signal noise ratio (SNR).
Ryusuke IMADA Katsuhide FUJITA
Sponsored search is a mechanism that shows the appropriate advertisements (ads) according to search queries. The orders and payments of ads are determined by the auction. However, the externalities which give effects to CTR and haven't been considered in some existing works because the mechanism with externalities has high computational cost. In addition, some algorithms which can calculate the approximated solution considering the externalities within the polynomial-time are proposed, however, it assumed that one bidder can propose only a single ad. In this paper, we propose the approximation allocation algorithm that one bidder can offer many ads considering externalities. The proposed algorithm employs the concept of the combinatorial auction in order to consider the combinational bids. In addition, the proposed algorithm can find the approximated allocation by the dynamic programming. Moreover, we prove the computational complexity and the monotonicity of the proposed mechanism, and demonstrate computational costs and efficiency ratios by changing the number of ads, slots and maximum bids. The experimental results show that the proposed algorithm can calculate 0.7-approximation solution even though the full search can't find solutions in the limited times.
Wataru NAKAMURA Hirosuke YAMAMOTO Terence CHAN
In this paper, we treat (k, L, n) ramp secret sharing schemes (SSSs) that can detect impersonation attacks and/or substitution attacks. First, we derive lower bounds on the sizes of the shares and random number used in encoding for given correlation levels, which are measured by the mutual information of shares. We also derive lower bounds on the success probabilities of attacks for given correlation levels and given sizes of shares. Next we propose a strong (k, L, n) ramp SSS against substitution attacks. As far as we know, the proposed scheme is the first strong (k, L, n) ramp SSSs that can detect substitution attacks of at most k-1 shares. Our scheme can be applied to a secret SL uniformly distributed over GF(pm)L, where p is a prime number with p≥L+2. We show that for a certain type of correlation levels, the proposed scheme can achieve the lower bounds on the sizes of the shares and random number, and can reduce the success probability of substitution attacks within nearly L times the lower bound when the number of forged shares is less than k. We also evaluate the success probability of impersonation attack for our schemes. In addition, we give some examples of insecure ramp SSSs to clarify why each component of our scheme is essential to realize the required security.
Yuta NAKAHARA Shota SAITO Toshiyasu MATSUSHIMA
A new type of spatially coupled low density parity check (SCLDPC) code is proposed. This code has two benefits. (1) This code requires less number of iterations to correct the erasures occurring through the binary erasure channel in the waterfall region than that of the usual SCLDPC code. (2) This code has lower error floor than that of the usual SCLDPC code. Proposed code is constructed as a coupled chain of the underlying LDPC codes whose code lengths exponentially increase as the position where the codes exist is close to the middle of the chain. We call our code spatially “Mt. Fuji” coupled LDPC (SFCLDPC) code because the shape of the graph representing the code lengths of underlying LDPC codes at each position looks like Mt. Fuji. By this structure, when the proposed SFCLDPC code and the original SCLDPC code are constructed with the same code rate and the same code length, L (the number of the underlying LDPC codes) of the proposed SFCLDPC code becomes smaller and M (the code lengths of the underlying LDPC codes) of the proposed SFCLDPC code becomes larger than those of the SCLDPC code. These properties of L and M enables the above reduction of the number of iterations and the bit error rate in the error floor region, which are confirmed by the density evolution and computer simulations.
Sen MORIYA Kana KIKUCHI Hiroshi SASANO
In this study, we consider techniques to search for high-rate punctured convolutional code (PCC) encoders by rearranging row vectors of identical-encoder generator matrices. One well-known method to obtain a good PCC encoder is to perform an exhaustive search of all candidates. However, this approach is time-intensive. An exhaustive search with a rate RG=1/2 original encoder requires a relatively short time, whereas that with an RG=1/3 or lower original encoder takes significantly longer. The encoders with lower-rate original encoders are expected to create better PCC encoders. Thus, this paper proposes a method that uses exhaustive search results with rate RG=1/2 original encoders, and rearranges row vectors of identical-encoder generator matrices to create PCCs with a lower rate original code. Further, we provide PCC encoders obtained by searches that utilize our method.
Kazuyoshi TSUCHIYA Yasuyuki NOGAMI Satoshi UEHARA
A pseudorandom number generator is widely used in cryptography. A cryptographic pseudorandom number generator is required to generate pseudorandom numbers which have good statistical properties as well as unpredictability. An m-sequence is a linear feedback shift register sequence with maximal period over a finite field. M-sequences have good statistical properties, however we must nonlinearize m-sequences for cryptographic purposes. A geometric sequence is a binary sequence given by applying a nonlinear feedforward function to an m-sequence. Nogami, Tada and Uehara proposed a geometric sequence whose nonlinear feedforward function is given by the Legendre symbol. They showed the geometric sequences have good properties for the period, periodic autocorrelation and linear complexity. However, the geometric sequences do not have the balance property. In this paper, we introduce geometric sequences of two types and show some properties of interleaved sequences of the geometric sequences of two types. These interleaved sequences have the balance property and double the period of the geometric sequences by the interleaved structure. Moreover, we show correlation properties and linear complexity of the interleaved sequences. A key of our observation is that the second type geometric sequence is the complement of the left shift of the first type geometric sequence by half-period positions.
Jun SHIOMI Tohru ISHIHARA Hidetoshi ONODERA
Scaling supply voltage (VDD) and threshold voltage (Vth) dynamically has a strong impact on energy efficiency of CMOS LSI circuits. Techniques for optimizing VDD and Vth simultaneously under dynamic workloads are thus widely investigated over the past 15 years. In this paper, we refer to the optimum pair of VDD and Vth, which minimizes the energy consumption of a circuit under a specific performance constraint, as a minimum energy point (MEP). Based on the simple transregional models of a CMOS circuit, this paper derives a simple necessary and sufficient condition for the MEP operation. The simple condition helps find the MEP of CMOS circuits. Measurement results using standard-cell based memories (SCMs) fabricated in a 65-nm process technology also validate the condition derived in this paper.
In this paper, a study on the design and implementation of uniform 4-level quantizers for soft-decision decodings for binary linear codes is shown. Simulation results on quantized Viterbi decoding with a 4-level quantizer for the (64,42,8) Reed-Muller code show that the optimum stepsize, which is derived from the cutoff rate, gives an almost optimum error performance. In addition, the simulation results show that the case where the number of optimum codewords is larger than the one for a received sequence causes non-negligible degradation on error performance at high SN ratios of Eb/N0.
Jianbin ZHOU Dajiang ZHOU Li GUO Takeshi YOSHIMURA Satoshi GOTO
This paper presents a measurement-domain intra prediction coding framework that is compatible with compressive sensing (CS)-based image sensors. In this framework, we propose a low-complexity intra prediction algorithm that can be directly applied to measurements captured by the image sensor. We proposed a structural random 0/1 measurement matrix, embedding the block boundary information that can be extracted from the measurements for intra prediction. Furthermore, a low-cost Very Large Scale Integration (VLSI) architecture is implemented for the proposed framework, by substituting the matrix multiplication with shared adders and shifters. The experimental results show that our proposed framework can compress the measurements and increase coding efficiency, with 34.9% BD-rate reduction compared to the direct output of CS-based sensors. The VLSI architecture of the proposed framework is 9.1 Kin area, and achieves the 83% reduction in size of memory bandwidth and storage for the line buffer. This could significantly reduce both the energy consumption and bandwidth in communication of wireless camera systems, which are expected to be massively deployed in the Internet of Things (IoT) era.
Kotaro TERADA Masao YANAGISAWA Nozomu TOGAWA
As application hardware designs and implementations in a short term are required, high-level synthesis is more and more essential EDA technique nowadays. In deep-submicron era, interconnection delays are not negligible even in high-level synthesis thus distributed-register and -controller architectures (DR architectures) have been proposed in order to cope with this problem. It is also profitable to take data-bitwidth into account in high-level synthesis. In this paper, we propose a bitwidth-aware high-level synthesis algorithm using operation chainings targeting Tiled-DR architectures. Our proposed algorithm optimizes bitwidths of functional units and utilizes the vacant tiles by adding some extra functional units to realize effective operation chainings to generate high performance circuits without increasing the total area. Experimental results show that our proposed algorithm reduces the overall latency by up to 47% compared to the conventional approach without area overheads by eliminating unnecessary bitwidths and adding efficient extra FUs for Tiled-DR architectures.
Tao LIU Chengqian XU Yubo LI Xiaoyu CHEN
In this correspondence, two types of multiple binary zero correlation zone (ZCZ) sequence sets with inter-set zero cross-correlation zone (ZCCZ) are constructed. Based on orthogonal matrices with order N×N, multiple binary ZCZ sequence sets with inter-set even and odd ZCCZ lengthes are constructed, each set is an optimal ZCZ sequence set with parameters (2N2, N, N+1)-ZCZ, among these ZCZ sequence sets, sequences possess ideal cross-correlation property within a zone of length 2Z or 2Z+1. These resultant multiple ZCZ sequence sets can be used in quasi-synchronous CDMA systems to remove the inter-cell interference (ICI).
Tso-Cho CHEN Erl-Huei LU Chia-Jung LI Kuo-Tsang HUANG
In this paper, a weighted multiple bit flipping (WMBF) algorithman for decoding low-density parity-check (LDPC) codes is proposed first. Then the improved WMBF algorithm which we call the efficient weighted bit-flipping (EWBF) algorithm is developed. The EWBF algorithm can dynamically choose either multiple bit-flipping or single bit-flipping in each iteration according to the log-likelihood ratio of the error probability of the received bits. Thus, it can efficiently increase the convergence speed of decoding and prevent the decoding process from falling into loop traps. Compared with the parallel weighted bit-flipping (PWBF) algorithm, the EWBF algorithm can achieve significantly lower computational complexity without performance degradation when the Euclidean geometry (EG)-LDPC codes are decoded. Furthermore, the flipping criterion does not require any parameter adjustment.
Feng KE Yue ZHANG Yuanyi DENG Yuehua DING
A relay assignment scheme is proposed in this paper that minimizes the mean delay of transmission for energy harvesting (EH) cooperative communication systems, whose source node and relay nodes are all equipped with energy harvesters. We jointly consider the long-term channel side information (CSI) and energy side information (ESI) of all nodes, and formulate the delay minimization problem as an integer programming problem. To solve this problem, a refined cyclic coordinate method (RCCM) is proposed that considers the cases of fixed-packet-length (FPL) and variable-packet-length (VPL) transmission. Simulation results show that the proposed scheme achieves performance close to that of the real-time relay selection (RRS) scheme with instantaneous CSI and ESI, which gives upper bound of the performance. Moreover, compared with the simple relay rotation (SRR) scheme where each relay has equal service time, the performance of the proposed scheme is significantly improved.
Yasushi FUKUDA Zule XU Takayuki KAWAHARA
In an IoT system, neural networks have the potential to perform advanced information processing in various environments. To clarify this, the robustness of a restricted Boltzmann machine (RBM) used for deep neural networks, such as a deep belief network (DBN), was studied in this paper. Even if memory or logic errors occurred in the circuit operating in the RBM while pre-training the DBN, they did not affect the identification rate of the DBN, showing the robustness of the RBM. In addition, robustness against soft errors was evaluated. The soft errors had almost no influence on the RBM unless they were as large as 1012 times or more in the 50-nm CMOS process.
Muhammad ALFIAN AMRIZAL Atsuya UNO Yukinori SATO Hiroyuki TAKIZAWA Hiroaki KOBAYASHI
Coordinated checkpointing is a widely-used checkpoint/restart protocol for fault-tolerance in large-scale HPC systems. However, this protocol will involve massive amounts of I/O concentration, resulting in considerably high checkpoint overhead and high energy consumption. This paper focuses on speculative checkpointing, a CPR mechanism that allows for temporal distribution of checkpointings to avoid I/O concentration. We propose execution time and energy models for speculative checkpointing, and investigate energy-performance characteristics when speculative checkpointing is adopted in exascale systems. Using these models, we study the benefit of speculative checkpointing over coordinated checkpointing under various realistic scenarios for exascale HPC systems. We show that, compared to coordinated checkpointing, speculative checkpointing can achieve up to a 11% energy reduction at the cost of a relatively-small increase in the execution time. In addition, a significant energy-performance trade-off is expected when the system scale exceeds 1.2 million nodes.
Shunsuke YAGAI Masato OGUCHI Miyuki NAKANO Saneyasu YAMAGUCHI
In data centers, large numbers of computers are run simultaneously. These computers consume an enormous amount of energy. Several challenges related to this issue have been published. An energy-efficient storage management method that cooperates with applications was one effective approach. In this method, data and storage devices are managed using application support and the power consumption of storage devices is significantly decreased. However, existing studies do not take the virtualized environment into account. Recently, many data-intensive applications have been run in a virtualized environment, such as the cloud computing environment. In this paper, we focus on a virtualized environment wherein multiple virtual machines run on a physical computer and a data intensive application runs on each virtual machine. We discuss a method for reducing storage device power consumption using application support. First, we propose two storage management methods using application information. One method optimizes the inter-HDD file layout. This method removes frequently-accessed files from a certain HDD and switches the HDD to power-off mode. To balance loads and reduce seek distances, this method separates a heavily accessed file and consolidates files in a virtual machine with low access frequency. The other method optimizes the intra-HDD file layout, in addition to performing inter-HDD optimization. This method places frequently accessed files near each other. Second, we present our experimental results and demonstrate that the proposed methods can create sufficiently long HDD access intervals that power-off mode can be used, and thereby, reduce the power consumption of storage devices.
Runzi ZHANG Jinlin WANG Yiqiang SHENG Xiao CHEN Xiaozhou YE
Cache affinity has been proved to have great impact on the performance of packet processing applications on multi-core platforms. Flow-based packet scheduling can make the best of data cache affinity with flow associated data and context structures. However, little work on packet scheduling algorithms has been conducted when it comes to instruction cache (I-Cache) affinity in modified pipelining (MPL) architecture for multi-core systems. In this paper, we propose a protocol-aware packet scheduling (PAPS) algorithm aiming at maximizing I-Cache affinity at protocol dependent stages in MPL architecture for multi-protocol processing (MPP) scenario. The characteristics of applications in MPL are analyzed and a mapping model is introduced to illustrate the procedure of MPP. Besides, a stage processing time model for MPL is presented based on the analysis of multi-core cache hierarchy. PAPS is a kind of flow-based packet scheduling algorithm and it schedules flows in consideration of both application-level protocol of flows and load balancing. Experiments demonstrate that PAPS outperforms the Round Robin algorithm and the HRW-based (HRW) algorithm for MPP applications. In particular, PAPS can eliminate all I-Cache misses at protocol dependent stage and reduce the average CPU cycle consumption per packet by more than 10% in comparison with HRW.
This letter describes the development and implementation of the lane detection system accelerated by the neuromorphic hardware. Because the neuromorphic hardware has inherently parallel nature and has constant output latency regardless the size of the knowledge, the proposed lane detection system can recognize various types of lanes quickly and efficiently. Experimental results using the road images obtained in the actual driving environments showed that white and yellow lanes could be detected with an accuracy of more than 94 percent.
Ryuta KAWANO Hiroshi NAKAHARA Ikki FUJIWARA Hiroki MATSUTANI Michihiro KOIBUCHI Hideharu AMANO
End-to-end network latency has become an important issue for parallel application on large-scale high performance computing (HPC) systems. It has been reported that randomly-connected inter-switch networks can lower the end-to-end network latency. This latency reduction is established in exchange for a large amount of routing information. That is, minimal routing on irregular networks is achieved by using routing tables for all destinations in the networks. In this work, a novel distributed routing method called LOREN (Layout-Oriented Routing with Entries for Neighbors) to achieve low-latency with a small routing table is proposed for irregular networks whose link length is limited. The routing tables contain both physically and topologically nearby neighbor nodes to ensure livelock-freedom and a small number of hops between nodes. Experimental results show that LOREN reduces the average latencies by 5.8% and improves the network throughput by up to 62% compared with a conventional compact routing method. Moreover, the number of required routing table entries is reduced by up to 91%, which improves scalability and flexibility for implementation.