In parallelizing compilers on distributed memory systems, distributions of irregular sized array blocks are provided for load balancing and irregular problems. The irregular data redistribution is different from the regular block-cyclic redistribution. This paper is devoted to scheduling message for irregular data redistribution that attempt to obtain suboptimal solutions while satisfying the minimal communication costs condition and the minimal step condition. Based on the list scheduling, an efficient algorithm is developed and its experimental results are compared with previous algorithms. The improved list algorithm provides more chance for conflict messages in its relocation phase, since it allocates conflict messages through methods used in a divide-and-conquer algorithm and a relocation algorithm proposed previously. The method of selecting the smallest relocation cost guarantees that the improved list algorithm is more efficient than the other two in average.
Hui WANG Sabine VAN HUFFEL Guan GUI Qun WAN
This paper studies the problem of recovering an arbitrarily distributed sparse matrix from its one-bit (1-bit) compressive measurements. We propose a matrix sketching based binary method iterative hard thresholding (MSBIHT) algorithm by combining the two dimensional version of BIHT (2DBIHT) and the matrix sketching method, to solve the sparse matrix recovery problem in matrix form. In contrast to traditional one-dimensional BIHT (BIHT), the proposed algorithm can reduce computational complexity. Besides, the MSBIHT can also improve the recovery performance comparing to the 2DBIHT method. A brief theoretical analysis and numerical experiments show the proposed algorithm outperforms traditional ones.
Tianruo ZHANG Chen LIU Minghui WANG Satoshi GOTO
This paper proposes a region-of-interest (ROI) based H.264 encoder and the VLSI architecture of the ROI detection algorithm. In ROI based video coding system, pre-processing unit to detect ROI should only introduce low computational complexity overhead due to the low power requirement. The Macroblocks (MBs) in ROIs are detected sequentially in the same order of H.264 encoding to satisfy the MB level pipelining of ROI detector and H.264 encoder. ROI detection is performed in a novel estimation-and-verification process with an ROI contour template. Proposed architecture can be configured to detect either single ROI or multiple ROIs in each frame and the throughput of single detection mode is 5.5 times of multiple detection mode. 98.01% and 97.89% of MBs in ROIs can be detected in single and multiple detection modes respectively. Hardware cost of proposed architecture is only 4.68 k gates. Detection speed is 753 fps for CIF format video at the operation frequency of 200 MHz in multiple detection mode with power consumption of 0.47 mW. Compared with previous fast ROI detection algorithms for video coding application, the proposed architecture obtains more accurate and smaller ROI. Therefore, more efficient ROI based computation complexity and compression efficiency optimization can be implemented in H.264 encoder.
Bo ZHAO Guangming YU Tao CHEN Pengpeng CHEN Huazhong YANG Hui WANG
A low-power low-noise intermediate-frequency (IF) circuit is proposed for Gaussian frequency shift keying (GFSK) low-IF receivers. The proposed IF circuit is realized by an all-analog architecture composed of a couple of limiting amplifiers (LAs) and received signal strength indicators (RSSIs), a couple of band-pass filters (BPFs), a frequency detector (FD), a low-pass filter (LPF) and a slicer. The LA and RSSI are realized by an optimized combination of folded amplifiers and current subtractor based rectifiers to avoid the process induced depressing on accuracy. In addition, taking into account the nonlinearity and static current of rectifiers, we propose an analytical model as an accurate approximation of RSSIs' transfer character. An active-RC based GFSK demodulation scheme is proposed, and then both low power consumption and a large dynamic range are obtained. The chip is implemented with HJTC 0.18 µm CMOS technology and measured under an intermediate frequency of 200 kHz, a data rate of 100 kb/s and a modulation index of 1. The RSSI has a dynamic range of 51 dB with a logarithmic linearity error of less than 1 dB, and the slope is 23.9 mV/dB. For 0.1% bit error ratio (BER), the proposed IF circuit has the minimum input signal-to-noise ratio (SNR) of 5 dB and an input dynamic range of 55.4 dB, whereas it can tolerate a frequency offset of -3%+9.5% at 6 dB input SNR. The total power consumption is 5.655.89 mW.
In this paper, we explicitly construct a large class of symmetric Boolean functions on 2k variables with algebraic immunity not less than d, where integer k is given arbitrarily and d is a given suffix of k in binary representation. If let d = k, our constructed functions achieve the maximum algebraic immunity. Remarkably, 2⌊ log2k ⌋ + 2 symmetric Boolean functions on 2k variables with maximum algebraic immunity are constructed, which are much more than the previous constructions. Based on our construction, a lower bound of symmetric Boolean functions with algebraic immunity not less than d is derived, which is 2⌊ log2d ⌋ + 2(k-d+1). As far as we know, this is the first lower bound of this kind.
Leiou WANG Donghui WANG Chengpeng HAO
SUMPLE, one of important signal combining approaches, its combining loss increases when a sensor in an array fails. A novel failure detection circuit for SUMPLE is proposed by using variability index. This circuit can effectively judge whether a sensor fails or not. Simulation results validate its effectiveness with respect to the existing algorithms.
Xun HE Xin JIN Minghui WANG Dajiang ZHOU Satoshi GOTO
This paper presents a high-performance dual-issue 32-core SIMD platform for image and video processing. The SIMD cores support 8/16 bits SIMD MAC instructions, and vertical vector access. Eight cores with a 4-ports L2 cache are connected by CIB bus as a cluster. Four clusters are connected by mesh network. This hierarchical network can provide more than 192 GB/s low latency inter-core BW in average. The 4-ports L2 cache architecture is also designed to provide 192 GB/s L2 cache BW. To reduce coherence operation in large-scale SMP, an application specified protocol is proposed. Compared with MOESI, 67.8% of L1 cache energy can be saved in 32 cores case. The whole system including 32 vector cores, 256 KB L2 cache, 64-bit DDRII PHY and two PLL units, occupy 25 mm2 in 65 nm CMOS. It can achieve a peak performance of 375 GMACs and 98 GMACs/W at 1.2 V.
Luyang LI Linhui WANG Dong ZHENG Qinlan ZHAO
Construction of multiple output functions is one of the most important problems in the design and analysis of stream ciphers. Generally, such a function has to be satisfied with several criteria, such as high nonlinearity, resiliency and high algebraic degree. But there are mutual restraints among the cryptographic parameters. Finding a way to achieve the optimization is always regarded as a hard task. In this paper, by using the disjoint linear codes and disjoint spectral functions, two classes of resilient multiple output functions are obtained. It has been proved that the obtained functions have high nonlinearity and high algebraic degree.
Huihui WANG Hitoshi OHNUKI Hideaki ENDO Mitsuru IZUMI
Thin film glucose biosensors were fabricated with organic/inorganic hybrid films based on glucose oxidase (GOx) and Prussian Blue nano-clusters. The biosensors composed of hybrid films were characterized by the low operating potential and the advantage to interference-free detection. In this research, we employed two kinds of thin films for GOx immobilization: Langmuir-Blodgett (LB) and self-assembled monolayer (SAM). The LB film immobilizes GOx in its inside through the electrostatic force, while the SAM immobilizes GOx with the covalent bond. The sensors with LB film produced a relatively high current signal, while the non-linear behavior and a low stability were recognized. On the other hand, the sensors with SAM presented a good linear relationship and a very stable performance.
Selective attention mechanism, plays an important role in human visual perception, can be investigated by developing an approach to perceiving the multi-meaningful-dotted-pattern in a color blindness plate (CBP). In this Letter, a perception model driven by a simple active vision mechanism is presented for the image segmentation and understanding of a CBP. Experiments show that to understand one meaningful pattern in an image containing multi-meaningful patterns, the active visual search (i.e., pattern attention) is a very useful function.
Hui WANG Shigeyuki OSADA Tokumi YOKOHIRA Kiyohiko OKAYAMA Nariyoshi YAMAI
In order to improve TCP performance, the use of a PEP (Performance Enhancing Proxy) has been proposed. The PEP operates on a router along a TCP connection. When a data packet arrives at the PEP, it forwards the packet to the destination host, transmits the corresponding ACK (premature ACK) to the source host on behalf of the destination host, and stores a copy of the packet in a local buffer (PEP buffer) in case the packet needs to be retransmitted. In this paper, in accordance with a strategy that keeps the number of prematurely acknowledged packets in the PEP buffer below a fixed threshold (watermark) value, we investigate the relation between the watermark value and the average throughput. Extensive simulations show that the results can be roughly classified into two cases. In the first case, the average throughput becomes larger for larger watermark values and becomes a constant value when the watermark value is over a certain value. In the second case, although the average throughput becomes larger for lager watermark value in the same way, it decreases when the watermark value is over a certain value. We also show that the latter (former) case can occur more easily as the propagation delay in the input side network of the PEP becomes smaller (larger) and the propagation delay in the output side network of the PEP becomes larger (smaller), and also show that the latter (former) case can occur more easily as the transmission speed in the input side network becomes larger (smaller) and the transmission speed in the output side network becomes smaller (larger) while the PEP buffer capacity becomes smaller (larger).
Lingyun XIANG Xinhui WANG Chunfang YANG Peng LIU
In order to prevent the synonym substitution breaking the balance among frequencies of synonyms and improve the statistical undetectability, this paper proposed a novel linguistic steganography based on synonym run-length encoding. Firstly, taking the relative word frequency into account, the synonyms appeared in the text are digitized into binary values and expressed in the form of runs. Then, message are embedded into the parities of runs' lengths by self-adaptively making a positive or negative synonym transformation on boundary elements of two adjacent runs, while preserving the number of relative high and low frequency synonyms to reduce the embedding distortion. Experimental results have shown that the proposed synonym run-length encoding based linguistic steganographic algorithm makes fewer changes on the statistical characteristics of cover texts than other algorithms, and enhances the capability of anti-steganalysis.
Shuaihui WANG Guyu HU Zhisong PAN Jin ZHANG Dong LI
Signed networks are ubiquitous in the real world. It is of great significance to study the problem of community detection in signed networks. In general, the behaviors of nodes in a signed network are rational, which coincide with the players in the theory of game that can be used to model the process of the community formation. Unlike unsigned networks, signed networks include both positive and negative edges, representing the relationship of friends and foes respectively. In the process of community formation, nodes usually choose to be in the same community with friends and between different communities with enemies. Based on this idea, we proposed a game theory model to address the problem of community detection in signed networks. Taking nodes as players, we build a gain function based on the numbers of positive edges and negative edges inside and outside a community, and prove the existence of Nash equilibrium point. In this way, when the game reaches the Nash equilibrium state, the optimal strategy space for all nodes is the result of the final community division. To systematically investigate the performance of our method, elaborated experiments on both synthetic networks and real-world networks are conducted. Experimental results demonstrate that our method is not only more accurate than other existing algorithms, but also more robust to noise.
Bing LUO Qimei CUI Hui WANG Xiaofeng TAO Ping ZHANG
It is known that traditional water-filling provides a closed form solution for capacity maximization in frequency-selective channels or fading channels with adaptive modulation. However, the solution is derived from a maximum mutual information argument with a single total power constraint. Motivated by the new technology of coordinated multiple point transmission (CoMP), this paper considers a novel power allocation scheme for a frequency-selective fading channel with multiple coordinated transmission points (CTP) transmission, in which each CTP has a power constraint and an individual channel state information (CSI). In order to maximize the channel's throughput, closed form solutions are obtained by solving a non-convex constrained optimization problem. The solution turns out to take the form of traditional WF and also combined with some regular cooperative feature. Based on the derived solution, we firstly investigate a joint water-filling (Jo-WF) power allocation scheme and a new iterative Jo-WF algorithm. Numerical results are presented to verify the optimality of the derived scheme and to show throughput gains over traditional non-coordinated water-filling (WF) and equal power allocation (EPA). Considering the flexibility of CTP's category, e.g., base station or relay station, it is known that the derived Jo-WF power allocation scheme can be valid for any coordinated networks such as next-generation cellular networks or ad-hoc networks.
Yu PAN Guyu HU Zhisong PAN Shuaihui WANG Dongsheng SHAO
Detecting community structures and analyzing temporal evolution in dynamic networks are challenging tasks to explore the inherent characteristics of the complex networks. In this paper, we propose a semi-supervised evolutionary clustering model based on symmetric nonnegative matrix factorization to detect communities in dynamic networks, named sEC-SNMF. We use the results of community partition at the previous time step as the priori information to modify the current network topology, then smooth-out the evolution of the communities and reduce the impact of noise. Furthermore, we introduce a community transition probability matrix to track and analyze the temporal evolutions. Different from previous algorithms, our approach does not need to know the number of communities in advance and can deal with the situation in which the number of communities and nodes varies over time. Extensive experiments on synthetic datasets demonstrate that the proposed method is competitive and has a superior performance.
Zheng-Liang HUANG Fa-Xin YU Shu-Ting ZHANG Hao LUO Ping-Hui WANG Yao ZHENG
GaAs MMICs (Monolithic Microwave Integrated Circuits) reliability is a critical part of the overall reliability of the thermal solution in semiconductor devices. With MMICs reliability improved, GaAs MMICs failure rates will reach levels which are impractical to measure with conventional methods in the near future. This letter proposes a methodology to predict the GaAs MMICs reliability by combining empirical and statistical methods based on zero-failure GaAs MMICs life testing data. Besides, we investigate the effect of accelerated factors on MMICs degradation and make a comparison between the Weibull and lognormal distributions. The method has been used in the reliability evaluation of GaAs MMICs successfully.
An aggregate signature scheme,which is an extension of ordinary signature, allows anyone to compress n signatures of n messages from n signers into a single short signature for reducing the size multiple individual signatures. Recently, Liu et al. proposed an efficient certificateless aggregate signature scheme with shorter public key size, constant AS size and with constant pairing computations. Although they proved that the scheme has existential unforgeability against adaptive chosen messages attacks. However, in this paper, two concrete attacks are proposed to show that Liu et al.'s scheme actually does not reach the security as they claimed.
Jiahui WANG Hideo SAITO Makoto KIMURA Masaaki MOCHIMARU Takeo KANADE
Recently, researches and developments for measuring and modeling of the human body have been receiving much attention. Our aim is to reconstruct an accurate shape of a human foot from multiple camera images, which can capture dynamic behavior of the object. In this paper, a foot-shape database is used for accurate reconstruction of human foot. By using Principal Component Analysis, the foot shape can be represented with new meaningful variables. The dimensionality of the data is also reduced. Thus, the shape of object can be recovered efficiently, even though the object is partially occluded in some input views. To demonstrate the proposed method, two kinds of experiments are presented: reconstruction of human foot in a virtual reality environment with CG multi-camera images, and in real world with eight CCD cameras. In the experiments, the reconstructed shape error with our method is around 2 mm in average, while the error is more than 4 mm with conventional volume intersection method.
Software-defined networking (SDN) decouples the control and forwarding of network devices, providing benefits such as simplified control. However, due to cost constraints and other factors, SDN is difficult to fully deploy. It has been proposed that SDN devices can be incrementally deployed in a traditional IP network, i.e., hybrid SDN, to provide partial SDN benefits. Studies have shown that better traffic engineering performance can be achieved by modifying the coverage and placement of SDN devices in hybrid SDN, because they can influence the behavior of legacy switches through certain strategies. However, it is difficult to develop and execute a traffic engineering strategy in hybrid SDN. This article proposes a routing algorithm to achieve approximate load balancing, which minimizes the maximum link utilization by using the optimal solution of linear programming and merging the minimum split traffic flows. A multipath forwarding mechanism under the same problem is designed to optimize transmission time. Experiments show that our algorithm has certain advantages in link utilization and transmission time compared to traditional distributed routing algorithms like OSPF and some hybrid SDN routing mechanisms. Furthermore, our algorithm can approximate the control effect of full SDN when the deployment rate of SDN devices is 40%.
Minghui WANG Xun HE Xin JIN Satoshi GOTO
Stereo-view and multi-view video formats are heavily investigated topics given their vast application potential. Depth Image Based Rendering (DIBR) system has been developed to improve Multiview Video Coding (MVC). Depth image is introduced to synthesize virtual views on the decoder side in this system. Depth image is a piecewise image, which is filled with sharp contours and smooth interior. Contours in a depth image show more importance than interior in view synthesis process. In order to improve the quality of the synthesized views and reduce the bitrate of depth image, a contour based coding strategy is proposed. First, depth image is divided into layers by different depth value intervals. Then regions, which are defined as the basic coding unit in this work, are segmented from each layer. The region is further divided into the contour and the interior. Two different procedures are employed to code contours and interiors respectively. A vector-based strategy is applied to code the contour lines. Straight lines in contours cost few of bits since they are regarded as vectors. Pixels, which are out of straight lines, are coded one by one. Depth values in the interior of a region are modeled by a linear or nonlinear formula. Coefficients in the formula are retrieved by regression. This process is called interior painting. Unlike conventional block based coding method, the residue between original frame and reconstructed frame (by contour rebuilt and interior painting) is not sent to decoder. In this proposal, contour is coded in a lossless way whereas interior is coded in a lossy way. Experimental results show that the proposed Contour Based Depth map Coding (CBDC) achieves a better performance than JMVC (reference software of MVC) in the high quality scenarios.