Taisaku ISHIWATA Yoshinao SHIRAKI
In this paper, we propose a rectangular weighting function that can be used in the method of iteratively reweighted least squares (IRWLS) for designing equiripple all-pass IIR filters. The purpose of introducing this weighting function is to improve the convergence performance in the solution of the IRWLS. The height of each rectangle is designed to be equal to the local maximum of each ripple, and the width of each rectangle is designed so that the area of each rectangle becomes equal to the area of each ripple. Here, the ripple is the absolute value of the phase error. We show experimentally that the convergence performance in the solution of the IRWLS can be improved by using the proposed weighting function.
Xiongxin ZHAO Zhixiang CHEN Xiao PENG Dajiang ZHOU Satoshi GOTO
In this paper, we propose a synthesizable LDPC decoder IP core for the WiMAX system with high parallelism and enhanced error-correcting performance. By taking the advantages of both layered scheduling and fully-parallel architecture, the decoder can fully support multi-mode decoding specified in WiMAX with the parallelism much higher than commonly used partial-parallel layered LDPC decoder architecture. 6-bit quantized messages are split into bit-serial style and 2bit-width serial processing lines work concurrently so that only 3 cycles are required to decode one layer. As a result, 12∼24 cycles are enough to process one iteration for all the code-rates specified in WiMAX. Compared to our previous bit-serial decoder, it doubles the parallelism and solves the message saturation problem of the bit-serial arithmetic, with minor gate count increase. Power synthesis result shows that the proposed decoder achieves 5.83pJ/bit/iteration energy efficiency which is 46.8% improvement compared to state-of-the-art work. Furthermore, an advanced dynamic quantization (ADQ) technique is proposed to enhance the error-correcting performance in layered decoder architecture. With about 2% area overhead, 6-bit ADQ can achieve the error-correcting performance close to 7-bit fixed quantization with improved error floor performance.
The objective of this research is to design a high-performance NAND flash memory system with a data buffer. The proposed buffer system in the NAND flash memory consists of two parts, i.e., a fully associative temporal buffer for temporal locality and a fully associative spatial buffer for spatial locality. We propose a new operating mechanism for reducing overhead of flash memory, that is, erase and write operations. According to our simulation results, the proposed buffer system can reduce the write and erase operations by about 73% and 79% for spec application respectively, compared with a fully associative buffer with two times more space. Futhermore, the average memory access time can improve by about 60% compared with other large buffer systems.
Cheng ZHANG Bo GU Kyoko YAMORI Sugang XU Yoshiaki TANAKA
Due to network users' different time-preference, network traffic load usually significantly differs at different time. In traffic peak time, network congestion may happen, which make the quality of service for network users deteriorate. There are essentially two ways to improve the quality of services in this case: (1) Network service providers (NSPs) over-provision network capacity by investment; (2) NSPs use time-dependent pricing (TDP) to reduce the traffic at traffic peak time. However, over-provisioning network capacity can be costly. Therefore, some researchers have proposed TDP to control congestion as well as improve the revenue of NSP. But to the best of our knowledge, all of the literature related time-dependent pricing scheme only consider the monopoly NSP case. In this paper, a duopoly NSP case is studied. The NSPs try to maximize their overall revenue by setting time-dependent price, while users choose NSP by considering their own preference, congestion status in the networks and the price set by the NSPs. Analytical and experimental results show that the TDP benefits the NSPs, but the revenue improvement is limited due to the competition effect.
Tomoya OHTA Satoshi DENNO Masahiro MORIKURA
This paper proposes a novel heterodyne multiband multiple-input multiple-output (MIMO) receiver with baseband automatic gain control (AGC) for cognitive radios. The proposed receiver uses heterodyne reception implemented with a wide-passband band-pass filter in the radio frequency (RF) stage to be able to receive signals in arbitrary frequency bands. Even when an RF Hilbert transformer is utilized in the receiver, image-band interference occurs due to the imperfection of the Hilbert transformer. In the receiver, analog baseband AGC is introduced to prevent the baseband signals exceeding the voltage reference of analog-to-digital converters (ADCs). This paper proposes a novel technique to estimate the imperfection of the Hilbert transformer in the heterodyne multiband MIMO receiver with baseband AGC. The proposed technique estimates not only the imperfection of the Hilbert transformer but also the AGC gain ratio, and analog devices imperfection in the feedback loop, which enables to offset the imperfection of the Hilbert transformer. The performance of the proposed receiver is verified by using computer simulations. As a result, the required resolution of the ADC is 9 bits in the proposed receiver. Moreover, the proposed receiver has less computational complexity than that with the baseband interference cancellation unless a frequency band is changed every 9 packets or less.
Guifang SHAO Wupeng HONG Tingna WANG Yuhua WEN
An improved genetic algorithm is employed to optimize the structure of (C60)N (N≤25) fullerene clusters with the lowest energy. First, crossover with variable precision, realized by introducing the hamming distance, is developed to provide a faster search mechanism. Second, the bit string mutation and feedback mutation are incorporated to maintain the diversity in the population. The interaction between C60 molecules is described by the Pacheco and Ramalho potential derived from first-principles calculations. We compare the performance of the Improved GA (IGA) with that of the Standard GA (SGA). The numerical and graphical results verify that the proposed approach is faster and more robust than the SGA. The second finite differential of the total energy shows that the (C60)N clusters with N=7, 13, 22 are particularly stable. Performance with the lowest energy is achieved in this work.
Most of scientists except computer scientists do not want to make efforts for performance tuning with rewriting their MPI applications. In addition, the number of processing elements which can be used by them is increasing year by year. On large-scale parallel systems, the number of accumulated messages on a message buffer tends to increase in some of their applications. Since searching message queue in MPI is time-consuming, system side scalable acceleration is needed for those systems. In this paper, a support function named LHS (Limited-length Head Separation) is proposed. Its performance in searching message buffer and hardware cost are evaluated. LHS accelerates searching message buffer by means of switching location to store limited-length heads of messages. It uses the effects such as increasing hit rate of cache on host with partial off-loading to hardware. Searching speed of message buffer when the order of message reception is different from the receiver's expectation is accelerated 14.3 times with LHS on FPGA-based network interface card (NIC) named DIMMnet-2. This absolute performance is 38.5 times higher than that of IBM BlueGene/P although the frequency is 8.5times slower than BlueGene/P. LHS has higher scalability than ALPU in the performance per frequency. Since these results are obtained with partially on loaded linear searching on old Pentium®4, performance gap will increase using state of art CPU. Therefore, LHS is more suitable for larger parallel systems. The discussions for adopting proposed method to state of art processors and systems are also presented.
Ahmadou Dit Adi CISSE Michihiro KOIBUCHI Masato YOSHIMI Hidetsugu IRIE Tsutomu YOSHINAGA
Silicon photonics Network-on-Chips (NoCs) have emerged as an attractive solution to alleviate the high power consumption of traditional electronic interconnects. In this paper, we propose a fully optical ring NoC that combines static and dynamic wavelength allocation communication mechanisms. A different wavelength-channel is statically allocated to each destination node for light weight communication. Contention of simultaneous communication requests from multiple source nodes to the destination is solved by a token based arbitration for the particular wavelength-channel. For heavy load communication, a multiwavelength-channel is available by requesting it in execution time from source node to a special node that manages dynamic allocation of the shared multiwavelength-channel among all nodes. We combine these static and dynamic communication mechanisms in a same network that introduces selection techniques based on message size and congestion information. Using a photonic NoC simulator based on Phoenixsim, we evaluate our architecture under uniform random, neighbor, and hotspot traffic patterns. Simulation results show that our proposed fully optical ring NoC presents a good performance by utilizing adequate static and dynamic channels based on the selection techniques. We also show that our architecture can reduce by more than half, the energy consumption necessary for arbitration compared to hybrid photonic ring and mesh NoCs. A comparison with several previous works in term of architecture hardware cost shows that our architecture can be an attractive cost-performance efficient interconnection infrastructure for future SoCs and CMPs.
Xiang WANG Yan JIA Ruhua CHEN Hua FAN Bin ZHOU
Text categorization, especially short text categorization, is a difficult and challenging task since the text data is sparse and multidimensional. In traditional text classification methods, document texts are represented with “Bag of Words (BOW)” text representation schema, which is based on word co-occurrence and has many limitations. In this paper, we mapped document texts to Wikipedia concepts and used the Wikipedia-concept-based document representation method to take the place of traditional BOW model for text classification. In order to overcome the weakness of ignoring the semantic relationships among terms in document representation model and utilize rich semantic knowledge in Wikipedia, we constructed a semantic matrix to enrich Wikipedia-concept-based document representation. Experimental evaluation on five real datasets of long and short text shows that our approach outperforms the traditional BOW method.
In P2P applications, networks are formed by devices belonging to independent users. Therefore, routing hotspots or routing congestions are typically created by an unanticipated new event that triggers an unanticipated surge of users to request streaming service from some particular nodes; and a challenging problem is how to provide incentive mechanisms to allocation bandwidth more fairly in order to avoid congestion and other short backs for P2P QoS. In this paper, we study P2P bandwidth game — the bandwidth allocation in P2P networks. Unlike previous works which focus either on routing or on forwarding, this paper investigates the game theoretic mechanism to incentivize node's real bandwidth demands and propose novel method that avoid congestion proactively, that is, prior to a congestion event. More specifically, we define an incentive-compatible pricing vector explicitly and give theoretical proofs to demonstrate that our mechanism can provide incentives for nodes to tell the true bandwidth demand. In order to apply this mechanism to the P2P distribution applications, we evaluate our mechanism by NS-2 simulations. The simulation results show that the incentive pricing mechanism can distribute the bandwidth fairly and effectively and can also avoid the routing hotspot and congestion effectively.
The main contribution of this paper is to show optimal parallel algorithms to compute the sum, the prefix-sums, and the summed area table on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). The DMM and the UMM are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. These models have three parameters, the number p of threads, and the width w of the memory, and the memory access latency l. We first show that the sum of n numbers can be computed in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. We then go on to show that $Omega({nover w}+{nlover p}+llog n)$ time units are necessary to compute the sum. We also present a parallel algorithm that computes the prefix-sums of n numbers in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. Finally, we show that the summed area table of size $sqrt{n} imessqrt{n}$ can be computed in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. Since the computation of the prefix-sums and the summed area table is at least as hard as the sum computation, these parallel algorithms are also optimal.
Jing-Chao LI Yi-Bing LI Shouhei KIDERA Tetsuo KIRIMOTO
As a consequence of recent developments in communications, the parameters of communication signals, such as the modulation parameter values, are becoming unstable because of time-varying SNR under electromagnetic conditions. In general, it is difficult to classify target signals that have time-varying parameters using traditional signal recognition methods. To overcome this problem, this study proposes a novel recognition method that works well even for such time-dependent communication signals. This method is mainly composed of feature extraction and classification processes. In the feature extraction stage, we adopt Shannon entropy and index entropy to obtain the stable features of modulated signals. In the classification stage, the interval gray relation theory is employed as suitable for signals with time-varying parameter spaces. The advantage of our method is that it can deal with time-varying SNR situations, which cannot be handled by existing methods. The results from numerical simulation show that the proposed feature extraction algorithm, based on entropy characteristics in time-varying SNR situations,offers accurate clustering performance, and the classifier, based on interval gray relation theory, can achieve a recognition rate of up to 82.9%, even when the SNR varies from -10 to -6 dB.
Hui ZHAO Shuqiang YANG Hua FAN Zhikun CHEN Jinghu XU
Scheduling plays a key role in MapReduce systems. In this paper, we explore the efficiency of an MapReduce cluster running lots of independent and continuously arriving MapReduce jobs. Data locality and load balancing are two important factors to improve computation efficiency in MapReduce systems for data-intensive computations. Traditional cluster scheduling technologies are not well suitable for MapReduce environment, there are some in-used schedulers for the popular open-source Hadoop MapReduce implementation, however, they can not well optimize both factors. Our main objective is to minimize total flowtime of all jobs, given it's a strong NP-hard problem, we adopt some effective heuristics to seek satisfied solution. In this paper, we formalize the scheduling problem as job selection problem, a load balance aware job selection algorithm is proposed, in task level we design a strict data locality tasks scheduling algorithm for map tasks on map machines and a load balance aware scheduling algorithm for reduce tasks on reduce machines. Comprehensive experiments have been conducted to compare our scheduling strategy with well-known Hadoop scheduling strategies. The experimental results validate the efficiency of our proposed scheduling strategy.
Shinichi NISHIZAWA Tohru ISHIHARA Hidetoshi ONODERA
This paper propose a structure of standard cells where the P/N boundary ratio of each cell can be independently customized for near-threshold operation. Lowering the supply voltage is one of the most promising approaches for reducing the power consumption of VLSI circuit, however, this causes an increase of imbalance between rise and fall delays for cells having transistor stacks. Conventional cell library with fixed P/N boundary is not efficient to compensate this delay imbalance. Proposed structure achieves individual P/N boundary ratio optimization for each standard cell, therefore it cancels the imbalance between rise and fall delays at the expense of cell area. Proposed structure is verified using measured result of Ring Oscillator circuits and simulation result of benchmark circuits in 65nm CMOS. The experiments with ISCAS'85 benchmark circuits demonstrate that the standard cell library consisting of the proposed cells reduces the power consumption of the benchmark circuits by 16% on average without increasing the circuit area, compared to that of the same circuit synthesized with a library which is not optimized for the near-threshold operation.
Yuelei XIAO Yumin WANG Liaojun PANG Shichong TAN
To solve the problems of the existing trusted network access protocols for Wireless Local Area Network (WLAN) mesh networks, we propose a new trusted network access protocol for WLAN mesh networks, which is abbreviated as WMN-TNAP. This protocol implements mutual user authentication and Platform-Authentication between the supplicant and Mesh Authenticator (MA), and between the supplicant and Authentication Server (AS) of a WLAN mesh network, establishes the key management system for the WLAN mesh network, and effectively prevents the platform configuration information of the supplicant, MA and AS from leaking out. Moreover, this protocol is proved secure based on the extended Strand Space Model (SSM) for trusted network access protocols.
Shin-ya ABE Youhua SHI Kimiyoshi USAMI Masao YANAGISAWA Nozomu TOGAWA
In this paper, we propose an adaptive voltage huddle-based distributed-register architecture (AVHDR architecture), which integrates dynamic multiple supply voltages and interconnection delay into high-level synthesis. In AVHDR architecture, voltages can be dynamically assigned for energy reduction. In other words, low supply voltages are assigned to non-critical operations, and leakage power is cut off by turning off the power supply to the sleeping functional units. Next, an AVHDR-based high-level synthesis algorithm is proposed. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, the modules in each huddle can be placed close to each other and the corresponding AVHDR architecture can be generated and optimized with floorplanning information. Experimental results show that on average our algorithm achieves 43.9% energy-saving compared with conventional algorithms.
Xinyuan CAI Chunheng WANG Baihua XIAO Yunxue SHAO
Face verification is the task of determining whether two given face images represent the same person or not. It is a very challenging task, as the face images, captured in the uncontrolled environments, may have large variations in illumination, expression, pose, background, etc. The crucial problem is how to compute the similarity of two face images. Metric learning has provided a viable solution to this problem. Until now, many metric learning algorithms have been proposed, but they are usually limited to learning a linear transformation. In this paper, we propose a nonlinear metric learning method, which learns an explicit mapping from the original space to an optimal subspace using deep Independent Subspace Analysis (ISA) network. Compared to the linear or kernel based metric learning methods, the proposed deep ISA network is a deep and local learning architecture, and therefore exhibits more powerful ability to learn the nature of highly variable dataset. We evaluate our method on the Labeled Faces in the Wild dataset, and results show superior performance over some state-of-the-art methods.
In this paper, we propose a new buffer map notification scheme for Peer-to-Peer Video-on-Demand systems (P2P VoDs) which support VCR operations such as fast-forward, fast-backward, and seek. To enhance the fluidity of such VCR operations, we need to refine the size of each piece as small as possible. However, such a refinement significantly degrades the performance of buffer map notification schemes with respect to the overhead, piece availability and the efficiency of resource utilizations. The basic idea behind our proposed scheme is to use a piece-based buffer map with a segment-based buffer map in a complementary manner. The result of simulations indicates that the proposed scheme certainly increases the accuracy of the information on the piece availability in the neighborhood with a sufficiently low cost, which reduces the intermittent waiting time of each peer by more than 40% even under a situation in which 50% of peers conduct the fast-forward operation over a range of 30% of the entire video.
Ryousei TAKANO Hidemoto NAKADA Takahiro HIROFUCHI Yoshio TANAKA Tomohiro KUDOH
A virtual machine (VM) migration is useful for improving flexibility and maintainability in cloud computing environments. However, VM monitor (VMM)-bypass I/O technologies, including PCI passthrough and SR-IOV, in which the overhead of I/O virtualization can be significantly reduced, make VM migration impossible. This paper proposes a novel and practical mechanism, called Symbiotic Virtualization (SymVirt), for enabling migration and checkpoint/restart on a virtualized cluster with VMM-bypass I/O devices, without the virtualization overhead during normal operations. SymVirt allows a VMM to cooperate with a message passing layer on the guest OS, then it realizes VM-level migration and checkpoint/restart by using a combination of a user-level dynamic device configuration and coordination of distributed VMMs. We have implemented the proposed mechanism on top of QEMU/KVM and the Open MPI system. All PCI devices, including Infiniband, Ethernet, and Myrinet, are supported without implementing specific para-virtualized drivers; and it is not necessary to modify either of the MPI runtime and applications. Using the proposed mechanism, we demonstrate reactive and proactive FT mechanisms on a virtualized Infiniband cluster. We have confirmed the effectiveness using both a memory intensive micro benchmark and the NAS parallel benchmark.
Md. Ezharul ISLAM Nobuo FUNABIKI Toru NAKANISHI Kan WATANABE
Nowadays, with spreads of inexpensive small communication devices, a number of wireless local area networks (WLANs) have been deployed even in the same building for the Internet access services. Their wireless access-points (APs) are often independently installed and managed by different groups such as departments or laboratories in a university or a company. Then, a user host can access to multiple WLANs by detecting signals from their APs, which increases the energy consumption and the operational cost. It may also degrade the communication performance by increasing interferences. In this paper, we present an AP aggregation approach to solve these problems in multiple WLAN environments by aggregating deployed APs of different groups into limited ones using virtual APs. First, we formulate the AP aggregation problem as a combinatorial optimization problem and prove the NP-completeness of its decision problem. Then, we propose its heuristic algorithm composed of five phases. We verify the effectiveness through extensive simulations using the WIMNET simulator.