In this letter, the tradeoff between symbol rate and diversity gain of Space-Time Block Codes (STBCs) with linear receivers is considered. It is known that Group Orthogonal-Toeplitz Codes (GOTCs) can achieve a good tradeoff with linear receivers. However, the symbol rate of GOTCs is limited to that of the base Orthogonal Space-Time Block Codes (OSTBCs). We propose to simply change the GOTC base codes from OSTBCs to Quasi-Orthogonal Space-Time Block Codes (Q-OSTBCs). Q-OSTBCs can improve the symbol rate of GOTCs at the expense of diversity gain. Simulation results show that Q-OSTBC based GOTCs can improve the tradeoff between symbol rate and diversity gain over that of the original GOTCs.
Dong-Won KUM Ajmal KHAN You-Ze CHO
This paper proposes an efficient broadcast scheme based on traffic density measurement to mitigate broadcast storms in Vehicular Ad Hoc Networks (VANETs). In a VANET, the number of vehicles that rebroadcasts a message is closely related with the collision ratio of the message, so a well-designed broadcast scheme should consider traffic density when rebroadcasting a message. The proposed scheme introduces a traffic density measurement scheme and broadcast scheme for VANET. It is based on the slotted p-persistence scheme, but the rebroadcast procedure is enhanced and the rebroadcast probability p is controlled dynamically according to the estimated traffic density. Simulation results demonstrate that the proposed scheme outperforms existing schemes in terms of the end-to-end delay and collision ratio.
Seyed Amir HASHEMI Hassan GHAFOORIFARD Abdolali ABDIPOUR
In this paper, using the Linear Time Variant (LTV) phase noise model and considering higher order harmonics generated by the oscillator output signal, a more general formula for transformation of the excess phase to the output signal is presented. Despite the basic LTV model which assumes that the total carrier power is within the fundamental harmonic, in the proposed model, the total carrier power is assumed to be distributed among all output harmonics. For the first harmonic, the developed expressions reduce to the basic LTV formulas. Simulation and experimental results are used to ensure the validity of the model.
Mamoru OHARA Takashi YAMAGUCHI
In numerical simulations using massively parallel computers like GPGPU (General-Purpose computing on Graphics Processing Units), we often need to transfer computational results from external devices such as GPUs to the main memory or secondary storage of the host machine. Since size of the computation results is sometimes unacceptably large to hold them, it is desired that the data is compressed and stored. In addition, considering overheads for transferring data between the devices and host memories, it is preferable that the data is compressed in a part of parallel computation performed on the devices. Traditional compression methods for floating-point numbers do not always show good parallelism. In this paper, we propose a new compression method for massively-parallel simulations running on GPUs, in which we combine a few successive floating-point numbers and interleave them to improve compression efficiency. We also present numerical examples of compression ratio and throughput obtained from experimental implementations of the proposed method runnig on CPUs and GPUs.
In wireless sensor networks, unbalanced energy consumption and transmission collisions are two inherent problems and can significantly reduce network lifetime. This letter proposes an unequal clustering and TDMA-like scheduling mechanism (UCTSM) based on a gradient sinking model in wireless sensor networks. It integrates unequal clustering and TDMA-like transmission scheduling to balance the energy consumption among cluster heads and reduce transmission collisions. Simulation results show that UCTSM balances the energy consumption among the cluster heads, saves nodes' energy and so improves the network lifetime.
Xiangdong CHEN Gwanggil JEON Jechang JEONG
In this letter, an intra-field deinterlacing algorithm based on a gradient inverse weighted filtering (GIWF) interpolator is proposed. The proposed algorithm consists of three steps: We first interpolate the missing line with simple strategies in the working window. Then we calculate the coefficients of the gradient-weighted filters by exploiting the local gray gradients among the neighboring pixels. In the last step, we interpolate the missing line using the proposed GIWF interpolator. Experiments show that the proposed algorithm provides superior performances in terms of both objective and subjective image qualities.
Arne KUTZNER Pok-Son KIM Won-Kwang PARK
We propose a family of algorithms for efficiently merging on contemporary GPUs, so that each algorithm requires O(m log (+1)) element comparisons, where m and n are the sizes of the input sequences with m ≤ n. According to the lower bounds for merging all proposed algorithms are asymptotically optimal regarding the number of necessary comparisons. First we introduce a parallely structured algorithm that splits a merging problem of size 2l into 2i subproblems of size 2l-i, for some arbitrary i with (0 ≤ i ≤ l). This algorithm represents a merger for i=l but it is rather inefficient in this case. The efficiency is boosted by moving to a two stage approach where the splitting process stops at some predetermined level and transfers control to several parallely operating block-mergers. We formally prove the asymptotic optimality of the splitting process and show that for symmetrically sized inputs our approach delivers up to 4 times faster runtimes than the thrust::merge function that is part of the Thrust library. For assessing the value of our merging technique in the context of sorting we construct and evaluate a MergeSort on top of it. In the context of our benchmarking the resulting MergeSort clearly outperforms the MergeSort implementation provided by the Thrust library as well as Cederman's GPU optimized variant of QuickSort.
Wei ZHANG Li RUAN Mingfa ZHU Limin XIAO Jiajun LIU Xiaolan TANG Yiduo MEI Ying SONG Yuzhong SUN
In order to reduce cost and improve efficiency, many data centers adopt virtualization solutions. The advent of virtualization allows multiple virtual machines hosted on a single physical server. However, this poses new challenges for resource management. Web workloads which are dominant in data centers are known to vary dynamically with time. In order to meet application's service level agreement (SLA), how to allocate resources for virtual machines has become an important challenge in virtualized server environments, especially when dealing with fluctuating workloads and complex server applications. User experience is an important manifestation of SLA and attracts more attention. In this paper, the SLA is defined by server-side response time. Traditional resource allocation based on resource utilization has some drawbacks. We argue that dynamic resource allocation directly based on real-time user experience is more reasonable and also has practical significance. To address the problem, we propose a system architecture that combines response time measurements and analysis of user experience for resource allocation. An optimization model is introduced to dynamically allocate the resources among virtual machines. When resources are insufficient, we provide service differentiation and firstly guarantee resource requirements of applications that have higher priorities. We evaluate our proposal using TPC-W and Webbench. The experimental results show that our system can judiciously allocate system resources. The system helps stabilize applications' user experience. It can reduce the mean deviation of user experience from desired targets.
Naoya MAKI Takayuki NISHIO Ryoichi SHINKUMA Tatsuya MORI Noriaki KAMIYAMA Ryoichi KAWAHARA Tatsuro TAKAHASHI
In content services where people purchase and download large-volume contents, minimizing network traffic is crucial for the service provider and the network operator since they want to lower the cost charged for bandwidth and the cost for network infrastructure, respectively. Traffic localization is an effective way of reducing network traffic. Network traffic is localized when a client can obtain the requested content files from other a near-by altruistic client instead of the source servers. The concept of the peer-assisted content distribution network (CDN) can reduce the overall traffic with this mechanism and enable service providers to minimize traffic without deploying or borrowing distributed storage. To localize traffic effectively, content files that are likely to be requested by many clients should be cached locally. This paper presents a novel traffic engineering scheme for peer-assisted CDN models. Its key idea is to control the behavior of clients by using content-oriented incentive mechanism. This approach enables us to optimize traffic flows by letting altruistic clients download content files that are most likely contributed to localizing traffic among clients. In order to let altruistic clients request the desired files, we combine content files while keeping the price equal to the one for a single content. This paper presents a solution for optimizing the selection of content files to be combined so that cross traffic in a network is minimized. We also give a model for analyzing the upper-bound performance and the numerical results.
Chunghan LEE Hirotake ABE Toshio HIROTSU Kyoji UMEMURA
Predicting network throughput is important for network-aware applications. Network throughput depends on a number of factors, and many throughput prediction methods have been proposed. However, many of these methods are suffering from the fact that a distribution of traffic fluctuation is unclear and the scale and the bandwidth of networks are rapidly increasing. Furthermore, virtual machines are used as platforms in many network research and services fields, and they can affect network measurement. A prediction method that uses pairs of differently sized connections has been proposed. This method, which we call connection pair, features a small probe transfer using the TCP that can be used to predict the throughput of a large data transfer. We focus on measurements, analyses, and modeling for precise prediction results. We first clarified that the actual throughput for the connection pair is non-linearly and monotonically changed with noise. Second, we built a previously proposed predictor using the same training data sets as for our proposed method, and it was unsuitable for considering the above characteristics. We propose a throughput prediction method based on the connection pair that uses ν-support vector regression and the polynomial kernel to deal with prediction models represented as a non-linear and continuous monotonic function. The prediction results of our method compared to those of the previous predictor are more accurate. Moreover, under an unstable network state, the drop in accuracy is also smaller than that of the previous predictor.
Yasin OGE Takefumi MIYOSHI Hideyuki KAWASHIMA Tsutomu YOSHINAGA
A novel design is proposed to implement highly parallel stream join operators on a field-programmable gate array (FPGA), by examining handshake join algorithm for hardware implementation. The proposed design is evaluated in terms of the hardware resource usage, the maximum clock frequency, and the performance. Experimental results indicate that the proposed implementation can handle considerably high input rates, especially at low match rates. Results of simulation conducted to optimize size of buffers included in join and merge units give a new intuition regarding static and adaptive buffer tuning in handshake join.
Chamidu ATUPELAGE Hiroshi NAGAHASHI Masahiro YAMAGUCHI Tokiya ABE Akinori HASHIGUCHI Michiie SAKAMOTO
Histopathology is a microscopic anatomical study of body tissues and widely used as a cancer diagnosing method. Generally, pathologists examine the structural deviation of cellular and sub-cellular components to diagnose the malignancy of body tissues. These judgments may often subjective to pathologists' skills and personal experiences. However, computational diagnosis tools may circumvent these limitations and improve the reliability of the diagnosis decisions. This paper proposes a prostate image classification method by extracting textural behavior using multifractal analysis. Fractal geometry is used to describe the complexity of self-similar structures as a non-integer exponent called fractal dimension. Natural complex structures (or images) are not self-similar, thus a single exponent (the fractal dimension) may not be adequate to describe the complexity of such structures. Multifractal analysis technique has been introduced to describe the complexity as a spectrum of fractal dimensions. Based on multifractal computation of digital imaging, we obtain two textural feature descriptors; i) local irregularity: α and ii) global regularity: f(α). We exploit these multifractal feature descriptors with a texton dictionary based classification model to discriminate cancer/non-cancer tissues of histopathology images of H&E stained prostate biopsy specimens. Moreover, we examine other three feature descriptors; Gabor filter bank, LM filter bank and Haralick features to benchmark the performance of the proposed method. Experiment results indicated that the performance of the proposed multifractal feature descriptor outperforms the other feature descriptors by achieving over 94% of correct classification accuracy.
Yasumichi TAKAI Masanori HASHIMOTO Takao ONOYE
This paper investigates power gating implementations that mitigate power supply noise. We focus on the body connection of power-gated circuits, and examine the amount of power supply noise induced by power-on rush current and the contribution of a power-gated circuit as a decoupling capacitance during the sleep mode. To figure out the best implementation, we designed and fabricated a test chip in 65 nm process. Experimental results with measurement and simulation reveal that the power-gated circuit with body-tied structure in triple-well is the best implementation from the following three points; power supply noise due to rush current, the contribution of decoupling capacitance during the sleep mode and the leakage reduction thanks to power gating.
Norimichi UKITA Kazuki MATSUDA
This paper proposes a method for reconstructing accurate 3D surface points. To this end, robust and dense reconstruction with Shape-from-Silhouettes (SfS) and accurate multiview stereo are integrated. Unlike gradual shape shrinking and/or bruteforce large space search by existing space carving approaches, our method obtains 3D points by SfS and stereo independently, and then selects correct ones from them. The point selection is achieved in accordance with spatial consistency and smoothness of 3D point coordinates and normals. The globally optimized points are selected by graph-cuts. Experimental results with several subjects containing complex shapes demonstrate that our method outperforms existing approaches and our previous method.
Junbo ZHANG Fuping PAN Bin DONG Qingwei ZHAO Yonghong YAN
This paper presents our investigation into improving the performance of our previous automatic reading quality assessment system. The method of the baseline system is calculating the average value of the Phone Log-Posterior Probability (PLPP) of all phones in the voice to be assessed, and the average value is used as the reading quality assessment feature. In this paper, we presents three improvements. First, we cluster the triphones, and then calculate the average value of the normalized PLPP for each classification separately, and use this average values as the multi-dimensional assessment features instead of the original one-dimensional assessment feature. This method is simple but effective, which made the score difference of the machine scoring and manual scoring decrease by 30.2% relatively. Second, in order to assess the reading rhythm, we train Gaussian Mixture Models (GMM), which contain the information of each triphone's relative duration under standard pronunciation. Using the GMM, we can calculate the probability that the relative duration of each phone is conform to the standard pronunciation, and the average value of the probabilities is added to the assessment feature vector as a dimension of feature, which decreased the score difference between the machine scoring and manual scoring by 9.7% relatively. Third, we detect Filled Pauses (FP) by analyzing the formant curve, and then calculate the relative duration of FP, and add the relative duration of FP to the assessment feature vector as a dimension of feature. This method made the score difference between the machine scoring and manual scoring be further decreased by 10.2% relatively. Finally, when the feature vector extracted by the three methods are used together, the score difference between the machine scoring and manual scoring was decreased by 43.9% relatively compared to the baseline system.
The impact of clock-skew on circuit timing increases rapidly as technology scales. As a result, it becomes important to deal with clock-skew at the early stages of circuit designs. This paper presents a novel datapath design that aims at mitigating the impact of clock-skew in high-level synthesis, by integrating margin (evaluated as the maximum number of clock-cycles to absorb clock-skew) and ordered clocking into high-level synthesis tasks. As a first attempt to the proposed datapath design, this paper presents a 0-1 integer linear programming formulation that focuses on register binding to achieve the minimum cost (the minimum number of registers) under given scheduling result. Experimental results show the optimal results can be obtained without increasing the latency, and with a few extra registers compared to traditional high-level synthesis design.
Bin SHENG Pengcheng ZHU Xiaohu YOU
In OFDM systems, the correlation of cyclic prefix (CP) with its corresponding part at the end of the symbol can be used to estimate the maximum Doppler spread. However, the estimation accuracy of this CP based method is seriously affected by the inter-symbol interference (ISI) generated in the multipath channel. In this letter, we propose an enhanced CP based method which is immune to the ISI and can hence obtain an unbiased estimate of the auto-correlation function in multipath channels.
Xiaobo ZHOU Xin HE Khoirul ANWAR Tad MATSUMOTO
In this paper, we reformulate the issue related to wireless mesh networks (WMNs) from the Chief Executive Officer (CEO) problem viewpoint, and provide a practical solution to a simple case of the problem. It is well known that the CEO problem is a theoretical basis for sensor networks. The problem investigated in this paper is described as follows: an originator broadcasts its binary information sequence to several forwarding nodes (relays) over Binary Symmetric Channels (BSC); the originator's information sequence suffers from independent random binary errors; at the forwarding nodes, they just further interleave, encode the received bit sequence, and then forward it, without making heavy efforts for correcting errors that may occur in the originator-relay links, to the final destination (FD) over Additive White Gaussian Noise (AWGN) channels. Hence, this strategy reduces the complexity of the relay significantly. A joint iterative decoding technique at the FD is proposed by utilizing the knowledge of the correlation due to the errors occurring in the link between the originator and forwarding nodes (referred to as intra-link). The bit-error-rate (BER) performances show that the originator's information can be reconstructed at the FD even by using a very simple coding scheme. We provide BER performance comparison between joint decoding and separate decoding strategies. The simulation results show that excellent performance can be achieved by the proposed system. Furthermore, extrinsic information transfer (EXIT) chart analysis is performed to investigate convergence property of the proposed technique, with the aim of, in part, optimizing the code rate at the originator.
Youhua SHI Nozomu TOGAWA Masao YANAGISAWA
Scan-based side channel attack on hardware implementations of cryptographic algorithms has shown its great security threat. Unlike existing scan-based attacks, in our work we observed that instead of the secret-related-registers, some non-secret registers also carry the potential of being misused to help a hacker to retrieve secret keys. In this paper, we first present a scan-based side channel attack method on AES by making use of the round counter registers, which are not paid attention to in previous works, to show the potential security threat in designs with scan chains. And then we discussed the issues of secure DFT requirements and proposed a secure scan scheme to preserve all the advantages and simplicities of traditional scan test, while significantly improve the security with ignorable design overhead, for crypto hardware implementations.
Orthogonal frequency division multiplexing (OFDM) has great advantages such as high spectrum efficiency and robustness against multipath fading. In order to enhance the advantages, an Hermite-symmetric subcarrier coding for OFDM, which is used for transmission systems like the asymmetric digital subscriber line (ADSL) and multiband OFDM in ultra-wideband (UWB) communications, is very attractive. The subcarrier coding can force the imaginary part of the OFDM signal to be zero, then another data sequence can be simultaneously transmitted in the quadrature channel. In order to theoretically verify the effectiveness of the Hermite-symmetric subcarrier coding in wireless OFDM (HC-OFDM) systems, we derive closed-form equations for bit error rate (BER) and throughput over fading channels. Our analytical results can theoretically indicate that the HC-OFDM systems achieve the improvement of the performances owing to the effect of the subcarrier coding.