Ervianto ABDULLAH Satoshi FUJITA
Recently Peer-to-Peer Content Delivery Networks (P2P CDNs) have attracted considerable attention as a cost-effective way to disseminate digital contents to paid users in a scalable and dependable manner. However, due to its peer-to-peer nature, it faces threat from “colluders” who paid for the contents but illegally share them with unauthorized peers. This means that the detection of colluders is a crucial task for P2P CDNs to preserve the right of contents holders and paid users. In this paper, we propose two colluder detection schemes for P2P CDNs. The first scheme is based on the reputation collected from all peers participating in the network and the second scheme improves the quality of colluder identification by using a technique which is well known in the field of system level diagnosis. The performance of the schemes is evaluated by simulation. The simulation results indicate that even when 10% of authorized peers are colluders, our schemes identify all colluders without causing misidentifications.
Hui ZHAO Shuqiang YANG Hua FAN Zhikun CHEN Jinghu XU
Scheduling plays a key role in MapReduce systems. In this paper, we explore the efficiency of an MapReduce cluster running lots of independent and continuously arriving MapReduce jobs. Data locality and load balancing are two important factors to improve computation efficiency in MapReduce systems for data-intensive computations. Traditional cluster scheduling technologies are not well suitable for MapReduce environment, there are some in-used schedulers for the popular open-source Hadoop MapReduce implementation, however, they can not well optimize both factors. Our main objective is to minimize total flowtime of all jobs, given it's a strong NP-hard problem, we adopt some effective heuristics to seek satisfied solution. In this paper, we formalize the scheduling problem as job selection problem, a load balance aware job selection algorithm is proposed, in task level we design a strict data locality tasks scheduling algorithm for map tasks on map machines and a load balance aware scheduling algorithm for reduce tasks on reduce machines. Comprehensive experiments have been conducted to compare our scheduling strategy with well-known Hadoop scheduling strategies. The experimental results validate the efficiency of our proposed scheduling strategy.
Nan WU Hua WANG Hongjie ZHAO Jingming KUANG
This paper studies the performance of code-aided (CA) soft-information based carrier phase recovery, which iteratively exploits the extrinsic information from channel decoder to improve the accuracy of phase synchronization. To tackle the problem of strong coupling between phase recovery and decoding, a semi-analytical model is proposed to express the distribution of extrinsic information as a function of phase offset. Piecewise approximation of the hyperbolic tangent function is employed to linearize the expression of soft symbol decision. Building on this model, open-loop characteristic and closed-loop performance of CA iterative soft decision-directed (ISDD) carrier phase synchronizer are derived in closed-form. Monte Carlo simulation results corroborate that the proposed expressions are able to characterize the performance of CA ISDD carrier phase recovery for systems with different channel codes.
The throughput rate of Viterbi decoding (VD) is not limited by the speed of functional units when look-ahead computation techniques are used. The disadvantages of the look-ahead computation in VD are the hardware complexity and the decode latency. In this paper, implementation methods of the look-ahead ACS computation are proposed to improve the hardware efficiency and reduce the latency where the hardware efficiency and the latency can be balanced with a single parameter.
Most of scientists except computer scientists do not want to make efforts for performance tuning with rewriting their MPI applications. In addition, the number of processing elements which can be used by them is increasing year by year. On large-scale parallel systems, the number of accumulated messages on a message buffer tends to increase in some of their applications. Since searching message queue in MPI is time-consuming, system side scalable acceleration is needed for those systems. In this paper, a support function named LHS (Limited-length Head Separation) is proposed. Its performance in searching message buffer and hardware cost are evaluated. LHS accelerates searching message buffer by means of switching location to store limited-length heads of messages. It uses the effects such as increasing hit rate of cache on host with partial off-loading to hardware. Searching speed of message buffer when the order of message reception is different from the receiver's expectation is accelerated 14.3 times with LHS on FPGA-based network interface card (NIC) named DIMMnet-2. This absolute performance is 38.5 times higher than that of IBM BlueGene/P although the frequency is 8.5times slower than BlueGene/P. LHS has higher scalability than ALPU in the performance per frequency. Since these results are obtained with partially on loaded linear searching on old Pentium®4, performance gap will increase using state of art CPU. Therefore, LHS is more suitable for larger parallel systems. The discussions for adopting proposed method to state of art processors and systems are also presented.
Ahmadou Dit Adi CISSE Michihiro KOIBUCHI Masato YOSHIMI Hidetsugu IRIE Tsutomu YOSHINAGA
Silicon photonics Network-on-Chips (NoCs) have emerged as an attractive solution to alleviate the high power consumption of traditional electronic interconnects. In this paper, we propose a fully optical ring NoC that combines static and dynamic wavelength allocation communication mechanisms. A different wavelength-channel is statically allocated to each destination node for light weight communication. Contention of simultaneous communication requests from multiple source nodes to the destination is solved by a token based arbitration for the particular wavelength-channel. For heavy load communication, a multiwavelength-channel is available by requesting it in execution time from source node to a special node that manages dynamic allocation of the shared multiwavelength-channel among all nodes. We combine these static and dynamic communication mechanisms in a same network that introduces selection techniques based on message size and congestion information. Using a photonic NoC simulator based on Phoenixsim, we evaluate our architecture under uniform random, neighbor, and hotspot traffic patterns. Simulation results show that our proposed fully optical ring NoC presents a good performance by utilizing adequate static and dynamic channels based on the selection techniques. We also show that our architecture can reduce by more than half, the energy consumption necessary for arbitration compared to hybrid photonic ring and mesh NoCs. A comparison with several previous works in term of architecture hardware cost shows that our architecture can be an attractive cost-performance efficient interconnection infrastructure for future SoCs and CMPs.
Zhou JIN Xiao WU Dan NIU Yasuaki INOUE
Recently, the compound element pseudo transient analysis, CEPTA, method is regarded as an efficient practical method to find DC operating points of nonlinear circuits when the Newton-Raphson method fails. In the previous CEPTA method, an effective SPICE3 implementation algorithm was proposed without expanding the Jacobian matrix. However the limitation of step size was not well considered. Thus, the non-convergence problem occurs and the simulation efficiency is still a big challenge for current LSI nonlinear cicuits, especially for some practical large-scale circuits. Therefore, in this paper, we propose a new SPICE3 implementation algorithm and an embedding algorithm, which is where to insert the pseudo capacitors, for the CEPTA method. The proposed implementation algorithm has no limitation for step size and can significantly improve simulation efficiency. Considering the existence of various types of circuits, we extend some possible embedding positions. Numerical examples demonstrate the improvement of simulation efficiency and convergence performance.
The main contribution of this paper is to show optimal parallel algorithms to compute the sum, the prefix-sums, and the summed area table on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). The DMM and the UMM are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. These models have three parameters, the number p of threads, and the width w of the memory, and the memory access latency l. We first show that the sum of n numbers can be computed in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. We then go on to show that $Omega({nover w}+{nlover p}+llog n)$ time units are necessary to compute the sum. We also present a parallel algorithm that computes the prefix-sums of n numbers in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. Finally, we show that the summed area table of size $sqrt{n} imessqrt{n}$ can be computed in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. Since the computation of the prefix-sums and the summed area table is at least as hard as the sum computation, these parallel algorithms are also optimal.
Hiroyuki MIYAZAKI Tatsunori OBARA Fumiyuki ADACHI
In this paper, joint transmit/receive frequency-domain equalization (FDE) is proposed for analog network coded (ANC) single-carrier (SC) bi-directional multi-antenna relay. In the proposed scheme, diversity transmission using transmit FDE is performed at relay station (RS) equipped with multiple antennas while receive FDE is carried out at base station (BS) and mobile terminal (MT) both equipped with single antenna. The transmit and receive FDE weights are jointly optimized so as to minimize the end-to-end mean square error (MSE). We evaluate, by computer simulation, the throughput performance and show that the joint transmit/receive FDE obtains the spatial and frequency diversity gains and accordingly achieve better throughput performance compared to either the transmit FDE only or the receive FDE only. It is also shown that ANC SC bi-directional multi-antenna relay can extend the communication coverage area for the given required throughput compared to conventional direct transmission.
Guifang SHAO Wupeng HONG Tingna WANG Yuhua WEN
An improved genetic algorithm is employed to optimize the structure of (C60)N (N≤25) fullerene clusters with the lowest energy. First, crossover with variable precision, realized by introducing the hamming distance, is developed to provide a faster search mechanism. Second, the bit string mutation and feedback mutation are incorporated to maintain the diversity in the population. The interaction between C60 molecules is described by the Pacheco and Ramalho potential derived from first-principles calculations. We compare the performance of the Improved GA (IGA) with that of the Standard GA (SGA). The numerical and graphical results verify that the proposed approach is faster and more robust than the SGA. The second finite differential of the total energy shows that the (C60)N clusters with N=7, 13, 22 are particularly stable. Performance with the lowest energy is achieved in this work.
This paper addresses a high-level synthesis (HLS) using dual-edge-triggered flip-flops (DETFFs) as memory elements. In DETFF-based HLS, the duty cycle becomes a manageable resource to improve the timing performance. To utilize the duty cycle radically, a programmable duty cycle (PDC) mechanism is built into this HLS, and captured by a new HLS task named PDC scheduling. As a first step toward DETFF-based HLS with PDC, the execution time minimization problem is formulated for given results of operation scheduling. A linear program is presented to solve this problem in polynomial time. As a next step, simultaneous operation scheduling and PDC scheduling problem for the same objective is tackled. A mixed integer linear programming-based (MILP) approach is presented to solve this problem. The experimental results show that the MILP can reduce the execution time for several benchmarks.
Hidenori KUWAKADO Shoichi HIROSE
A hash function is an important primitive for cryptographic protocols. Since algorithms of well-known hash functions are almost serial, it seems difficult to take full advantage of recent multi-core processors. This paper proposes a multilane hashing (MLH) mode that achieves both of high parallelism and high security. The MLH mode is designed in such a way that the processing speed is almost linear in the number of processors. Since the MLH mode exploits an existing hash function as a black box, it is applicable to any hash function. The bound on the indifferentiability of the MLH mode from a random oracle is beyond the birthday bound on the output length of an underlying primitive.
Yichao LU Gang HE Guifen TIAN Satoshi GOTO
Recently, non-binary low-density parity-check (NB-LDPC) codes starts to show their superiority in achieving significant coding gains when moderate codeword lengths are adopted. However, the overwhelming decoding complexity keeps NB-LDPC codes from being widely employed in modern communication devices. This paper proposes a hybrid message-passing decoding algorithm which consumes very low computational complexity. It achieves competitive error performance compared with conventional Min-max algorithm. Simulation result on a (255,174) cyclic code shows that this algorithm obtains at least 0.5dB coding gain over other state-of-the-art low-complexity NB-LDPC decoding algorithms. A partial-parallel NB-LDPC decoder architecture for cyclic NB-LDPC codes is also developed based on this algorithm. Optimization schemes are employed to cut off hard decision symbols in RAMs and also to store only part of the reliability messages. In addition, the variable node units are redesigned especially for the proposed algorithm. Synthesis results demonstrate that about 24.3% gates and 12% memories can be saved over previous works.
Mitsuru OHTAKE Kousuke TOBARI Masaaki FUTAMOTO
Co/Pd multilayer films are prepared on fcc-Pd underlayers of (001), (011), and (111) orientations hetero-epitaxially grown on MgO single-crystal substrates at room temperature. The effects of underlayer orientation, Co and Pd layer thicknesses, and repetition number of Co/Pd bi-layer on the structure and the magnetic properties are investigated. fcc-Co/fcc-Pd multilayer films of (001), (011), and (111) orientations epitaxially grow on the Pd underlayers of (001), (011), and (111) orientations, respectively. Flatter and sharper Co/Pd interface is formed in the order of (011) < (111) < (001) orientation. Atomic mixing around the Co/Pd interface is enhanced by deposition of thinner Co and Pd layers, and Co-Pd alloy phase is formed. With increasing the repetition number (decreasing the thicknesses of Co and Pd layers), perpendicular magnetic anisotropy is promoted. Stronger perpendicular anisotropy is observed in the order of film orientation of (001) < (011) < (111). Perpendicular anisotropy of Co/Pd multilayer film is considered to be originated from the two sources; the interface anisotropy and the magnetocrystalline anisotropy associated with Co-Pd lattice shrinkage along the perpendicular direction. In order to enhance the perpendicular anisotropy of Co/Pd multilayer film, it is important to align the film orientation to be (111) and to enhance the lattice distortion along the perpendicular direction.
Wei TIAN Yue WANG Xiuming SHAN Jian YANG
In this paper, we propose a robust registration method, named Bounded-Variables Least Median of Squares (BVLMS). It overcomes both the misassociations and the ill-conditioning due to the interactions between Bounded-Variables Least Squares (BVLS) and Least Median of Squares (LMS). Simulation results demonstrate the feasibility of this new registration method.
Jinghua YAN Xiaochun YUN Hao LUO Zhigang WU Shuzhuang ZHANG
Traffic classification has recently gained much attention in both academic and industrial research communities. Many machine learning methods have been proposed to tackle this problem and have shown good results. However, when applied to traffic with out-of-sequence packets, the accuracy of existing machine learning approaches decreases dramatically. We observe the main reason is that the out-of-sequence packets change the spatial representation of feature vectors, which means the property of linear mapping relation among features used in machine learning approaches cannot hold any more. To address this problem, this paper proposes an Improved Dynamic Time Warping (IDTW) method, which can align two feature vectors using non-linear alignment. Experimental results on two real traces show that IDTW achieves better classification accuracy in out-of-sequence traffic classification, in comparison to existing machine learning approaches.
Hiroyuki YASUDA Mikio HASEGAWA
We propose a natural synchronization scheme for wireless uncoupled devices, without any signal exchange among them. Our proposed scheme only uses natural environmental fluctuations, such as the temperature or humidity of the air, the environmental sounds, and so on, for the synchronization of the uncoupled devices. This proposed synchronization is realized based on the noise-induced synchronization phenomenon, uncoupled nonlinear oscillators synchronize with each other only by adding identical common noises to each of them. Based on the theory of this phenomenon, the oscillators can also be synchronized by noise sequences, which are not perfectly identical signals. Since the environmental natural fluctuations collected at neighboring locations are similar to each other and cross-correlation becomes high, our proposed scheme enabling synchronization only by natural environmental fluctuations can be realized. As an application of this proposed synchronization, we introduce wireless sensor networks, for which synchronization is important for reducing power consumption by intermittent data transmission. We collect environmental fluctuations using the wireless sensor network devices. Our results show that the wireless sensor network devices can be synchronized only by the independently collected natural signals, such as temperature and humidity, at each wireless sensor device.
Vakhtang JANDIERI Kiyotoshi YASUMOTO Young-Ki CHO
Electromagnetic scattering and radiation in cylindrical electromagnetic bandgap (EBG) structure is analyzed. The radiated field from a line source placed inside the eccentric configuration of the cylindrical EBG structure and plane wave incident on the cylindrical EBG structure is numerically studied based on the method proposed by the authors in their early papers. Using the developed formulation, it is shown first time that when the cylindrical EBG is illuminated by plane wave of particular resonance frequencies, the field are strongly enhanced or shaded inside the cylindrical EBG structure and this effect depends on the angle of incidence of the plane waves. We give a deep physical insight into explanation of this phenomenon based on the Lorentz reciprocity relation for cylindrical structures.
Fanxin ZENG Xiaoping ZENG Zhenyu ZHANG Guixin XUAN
In an orthogonal frequency division multiplexing (OFDM) communication system, two users use the same frequencies and number of sub-carriers so as to increase spectrum efficiency. When the codewords employed by them form a Golay complementary sequence (CS) mate, this system enjoys the upper bound of peak-to-mean envelope power ratio (PMEPR) as low as 4. This letter presents a construction method for producing S16-QAM and A16-QAM Golay CS mates, which arrives at the upper bound 4 of PMEPR. And when used as a Golay CS pair, they have an upper bound 2 of PMEPR, which is the same ones in both [18] and [17]. However, both cannot produce such mates.
Pooia LALBAKHSH Bahram ZAERI Ali LALBAKHSH
The paper introduces a novel pheromone update strategy to improve the functionality of ant colony optimization algorithms. This modification tries to extend the search area by an optimistic reinforcement strategy in which not only the most desirable sub-solution is reinforced in each step, but some of the other partial solutions with acceptable levels of optimality are also favored. therefore, it improves the desire for the other potential solutions to be selected by the following artificial ants towards a more exhaustive algorithm by increasing the overall exploration. The modifications can be adopted in all ant-based optimization algorithms; however, this paper focuses on two static problems of travelling salesman problem and classification rule mining. To work on these challenging problems we considered two ACO algorithms of ACS (Ant Colony System) and AntMiner 3.0 and modified their pheromone update strategy. As shown by simulation experiments, the novel pheromone update method can improve the behavior of both algorithms regarding almost all the performance evaluation metrics.