Hai DAI NGUYEN Anh DUC LE Masaki NAKAGAWA
This paper presents deep learning to recognize online handwritten mathematical symbols. Recently various deep learning architectures such as Convolution neural networks (CNNs), Deep neural networks (DNNs), Recurrent neural networks (RNNs) and Long short-term memory (LSTM) RNNs have been applied to fields such as computer vision, speech recognition and natural language processing where they have shown superior performance to state-of-the-art methods on various tasks. In this paper, max-out-based CNNs and Bidirectional LSTM (BLSTM) networks are applied to image patterns created from online patterns and to the original online patterns, respectively and then combined. They are compared with traditional recognition methods which are MRFs and MQDFs by recognition experiments on the CROHME database along with analysis and explanation.
Bima Sena Bayu DEWANTARA Jun MIURA
This paper proposes an appearance-based novel descriptor for estimating head orientation. Our descriptor is inspired by the Weber-based feature, which has been successfully implemented for robust texture analysis, and the gradient which performs well for shape analysis. To further enhance the orientation differences, we combine them with an analysis of the intensity deviation. The position of a pixel and its intrinsic intensity are also considered. All features are then composed as a feature vector of a pixel. The information carried by each pixel is combined using a covariance matrix to alleviate the influence caused by rotations and illumination. As the result, our descriptor is compact and works at high speed. We also apply a weighting scheme, called Block Importance Feature using Genetic Algorithm (BIF-GA), to improve the performance of our descriptor by selecting and accentuating the important blocks. Experiments on three head pose databases demonstrate that the proposed method outperforms the current state-of-the-art methods. Also, we can extend the proposed method by combining it with a head detection and tracking system to enable it to estimate human head orientation in real applications.
Huifeng GUO Dianhui CHU Yunming YE Xutao LI Xixian FAN
Ranking as an important task in information systems has many applications, such as document/webpage retrieval, collaborative filtering and advertising. The last decade has witnessed a growing interest in the study of learning to rank as a means to leverage training information in a system. In this paper, we propose a new learning to rank method, i.e. BLM-Rank, which uses a linear function to score samples and models the pairwise preference of samples relying on their scores under a Bayesian framework. A stochastic gradient approach is adopted to maximize the posterior probability in BLM-Rank. For industrial practice, we have also implemented the proposed algorithm on Graphic Processing Unit (GPU). Experimental results on LETOR have demonstrated that the proposed BLM-Rank method outperforms the state-of-the-art methods, including RankSVM-Struct, RankBoost, AdaRank-NDCG, AdaRank-MAP and ListNet. Moreover, the results have shown that the GPU implementation of the BLM-Rank method is ten-to-eleven times faster than its CPU counterpart in the training phase, and one-to-four times faster in the testing phase.
Leida LI Yu ZHOU Jinjian WU Jiansheng QIAN Beijing CHEN
Image retouching is fundamental in photography, which is widely used to improve the perceptual quality of a low-quality image. Traditional image quality metrics are designed for degraded images, so they are limited in evaluating the quality of retouched images. This letter presents a RETouched Image QUality Evaluation (RETIQUE) algorithm by measuring structure and color changes between the original and retouched images. Structure changes are measured by gradient similarity. Color colorfulness and saturation are utilized to measure color changes. The overall quality score of a retouched image is computed as the linear combination of gradient similarity and color similarity. The performance of RETIQUE is evaluated on a public Digitally Retouched Image Quality (DRIQ) database. Experimental results demonstrate that the proposed metric outperforms the state-of-the-arts.
Recently, cameras are equipped on cars in order to assist their drivers. These cameras often have a severe radial distortion because of their wide view angle, and sometimes it is necessary to compensate it in a fully automatic way in the field. We have proposed such a method, which uses the entropy of the histogram of oriented gradient (HOG) to evaluate the goodness of the compensation. Its performance was satisfactory, but the computational burden was too heavy to be executed by drive assistance devices. In this report, we discuss a method to speed up the algorithm, and obtain a new light algorithm feasible for such devices. We also show more comprehensive performance evaluation results then those in the previous reports.
In this paper, we consider a distributed power control scheme that can maximize overall capacity of an interference-limited wireless system in which the same radio resource is spatially reused among different transmitter-receiver pairs. This power control scheme employs a gradient-descent method in each transmitter, which adapts its own transmit power to co-channel interference dynamically to maximize the total weighted sum rate (WSR) of the system over a given interval. The key contribution in this paper is to propose a common feedback channel, over which a backward physical signal is accumulated for computing the gradient of the transmit power in each transmitter, thereby significantly reducing signaling overhead for the distributed power control. We show that the proposed power control scheme can achieve almost 95% of its theoretical upper WSR bound, while outperforming the non-power-controlled system by roughly 63% on average.
Blur is one of the most common distortion type and greatly impacts image quality. Most existing no-reference (NR) image blur metrics produce scores without a fixed range, so it is hard to judge the extent of blur directly. This letter presents a NR perceptual blur metric using Saliency Guided Gradient Similarity (SGGS), which produces blur scores with a fixed range of (0,1). A blurred image is first reblurred using a Gaussian low-pass filter, producing a heavily blurred image. With this reblurred image as reference, a local blur map is generated by computing the gradient similarity. Finally, visual saliency is employed in the pooling to adapt to the characteristics of the human visual system (HVS). The proposed metric features fixed range, fast computation and better consistency with the HVS. Experiments demonstrate its advantages.
Jiu XU Ning JIANG Wenxin YU Heming SUN Satoshi GOTO
In this paper, a feature named Non-Redundant Gradient Semantic Local Binary Patterns (NRGSLBP) is proposed for human detection as a modified version of the conventional Semantic Local Binary Patterns (SLBP). Calculations of this feature are performed for both intensity and gradient magnitude image so that texture and gradient information are combined. Moreover, and to the best of our knowledge, non-redundant patterns are adopted on SLBP for the first time, allowing better discrimination. Compared with SLBP, no additional cost of the feature dimensions of NRGSLBP is necessary, and the calculation complexity is considerably smaller than that of other features. Experimental results on several datasets show that the detection rate of our proposed feature outperforms those of other features such as Histogram of Orientated Gradient (HOG), Histogram of Templates (HOT), Bidirectional Local Template Patterns (BLTP), Gradient Local Binary Patterns (GLBP), SLBP and Covariance matrix (COV).
In this paper, the multicell distributed beamforming (MDBF) design problem of suppressing intra-cell interference (InCI) and inter-cell interference (ICI) is studied. To start with, in order to decrease the InCI and ICI caused by a user, we propose a gradient-iteration altruistic algorithm to derive the beamforming vectors. The convergence of the proposed iterative algorithm is proved. Second, a metric function is established to restrict the ICI and maximize cell rate. This function depends on only local channel state information (CSI) and does not need additional CSIs. Moreover, an MDBF algorithm with the metric function is proposed. This proposed algorithm utilizes gradient iteration to maximize the metric function to improve sum rate of the cell. Finally, simulation results demonstrate that the proposed algorithm can achieve higher cell rates while offering more advantages to suppress InCI and ICI than the traditional ones.
Gaoxing CHEN Lei SUN Zhenyu LIU Takeshi IKENAGA
High efficiency video coding (HEVC) is a video compression standard that outperforms the predecessor H.264/AVC by doubling the compression efficiency. To enhance the intra prediction accuracy, 35 intra prediction modes were used in the prediction units (PUs), with partition sizes ranging from 4 × 4 to 64 × 64 in HEVC. However, the manifold prediction modes dramatically increase the encoding complexity. This paper proposes a fast mode- and depth-decision algorithm based on edge detection and reconfiguration to alleviate the large computational complexity in intra prediction with trivial degradation in accuracy. For mode decision, we propose pixel gradient statistics (PGS) and mode refinement (MR). PGS uses pixel gradient information to assist in selecting the prediction mode after rough mode decision (RMD). MR uses the neighboring mode information to select the best PU mode (BPM). For depth decision, we propose a partition reconfiguration algorithm to replace the original partitioning order with a more reasonable structure, by using the smoothness of the coding unit as a criterion in deciding the prediction depth. Smoothness detection is based on the PGS result. Experiment results show that the proposed method saves about 41.50% of the original processing time with little degradation (BD bitrate increased by 0.66% and BDPSNR decreased by 0.060dB) in the coding gain.
Jong-Woong KIM Joon-Hyuk CHANG Sang Won NAM Dong Kook KIM Jong Won SHIN
In this paper, we propose a speech-presence uncertainty estimation to improve the global soft decision-based speech enhancement technique by using the spectral gradient scheme. The conventional soft decision-based speech enhancement technique uses a fixed ratio (Q) of the a priori speech-presence and speech-absence probabilities to derive the speech-absence probability (SAP). However, we attempt to adaptively change Q according to the spectral gradient between the current and past frames as well as the status of the voice activity in the previous two frames. As a result, the distinct values of Q to each frequency in each frame are assigned in order to improve the performance of the SAP by tracking the robust a priori information of the speech-presence in time.
Keisuke DOHI Kazuhiro NEGI Yuichiro SHIBATA Kiyoshi OGURI
We implement external memory-free deep pipelined FPGA implementation including HOG feature extraction and AdaBoost classification. To construct our design by compact FPGA, we introduce some simplifications of the algorithm and aggressive use of stream oriented architectures. We present comparison results between our simplified fixed-point scheme and an original floating-point scheme in terms of quality of results, and the results suggest the negative impact of the simplified scheme for hardware implementation is limited. We empirically show that, our system is able to detect human from 640480 VGA images at up to 112 FPS on a Xilinx Virtex-5 XC5VLX50 FPGA.
Chunsheng HUA Yasushi MAKIHARA Yasushi YAGI
In this paper, we propose a pedestrian detection algorithm based on both appearance and motion features to achieve high detection accuracy when applied to complex scenes. Here, a pedestrian's appearance is described by a histogram of oriented spatial gradients, and his/her motion is represented by another histogram of temporal gradients computed from successive frames. Since pedestrians typically exhibit not only their human shapes but also unique human movements generated by their arms and legs, the proposed algorithm is particularly powerful in discriminating a pedestrian from a cluttered situation, where some background regions may appear to have human shapes, but their motion differs from human movement. Unlike the algorithm based on a co-occurrence feature descriptor where significant generalization errors may arise owing to the lack of extensive training samples to cover feature variations, the proposed algorithm describes the shape and motion as unique features. These features enable us to train a pedestrian detector in the form of a spatio-temporal histogram of oriented gradients using the AdaBoost algorithm with a relatively small training dataset, while still achieving excellent detection performance. We have confirmed the effectiveness of the proposed algorithm through experiments on several public datasets.
Ning XIE Hirotaka HACHIYA Masashi SUGIYAMA
Oriental ink painting, called Sumi-e, is one of the most distinctive painting styles and has attracted artists around the world. Major challenges in Sumi-e simulation are to abstract complex scene information and reproduce smooth and natural brush strokes. To automatically generate such strokes, we propose to model the brush as a reinforcement learning agent, and let the agent learn the desired brush-trajectories by maximizing the sum of rewards in the policy search framework. To achieve better performance, we provide elaborate design of actions, states, and rewards specifically tailored for a Sumi-e agent. The effectiveness of our proposed approach is demonstrated through experiments on Sumi-e simulation.
Dong-Ju KIM Sang-Heon LEE Myoung-Kyu SHON
This paper proposes a novel face recognition approach using a centralized gradient pattern image and image covariance-based facial feature extraction algorithms, i.e. a two-dimensional principal component analysis and an alternative two-dimensional principal component analysis. The centralized gradient pattern image is obtained by AND operation of a modified center-symmetric local binary pattern image and a modified local directional pattern image, and it is then utilized as input image for the facial feature extraction based on image covariance. To verify the proposed face recognition method, the performance evaluation was carried out using various recognition algorithms on the Yale B, the extended Yale B and the CMU-PIE illumination databases. From the experimental results, the proposed method showed the best recognition accuracy compared to different approaches, and we confirmed that the proposed approach is robust to illumination variation.
In this letter, we propose a novel search approach to blur kernel estimation for defocused image restoration. An adaptive binary search on consensus is the main contribution of our research. It is based on binary search and random sample consensus set (RANSAC). Moreover an evaluating function which uses a histogram of gradient distribution is proposed for assessing restored images. Simulations on an image benchmark dataset shows that the proposed algorithm can estimate, on average, the blur kernels 15.14% more accurately than other defocused image restoration algorithms.
Xiangdong CHEN Gwanggil JEON Jechang JEONG
In this letter, an intra-field deinterlacing algorithm based on a gradient inverse weighted filtering (GIWF) interpolator is proposed. The proposed algorithm consists of three steps: We first interpolate the missing line with simple strategies in the working window. Then we calculate the coefficients of the gradient-weighted filters by exploiting the local gray gradients among the neighboring pixels. In the last step, we interpolate the missing line using the proposed GIWF interpolator. Experiments show that the proposed algorithm provides superior performances in terms of both objective and subjective image qualities.
Xueqing LI Qi WEI Fei QIAO Huazhong YANG
This paper introduces balanced switching schemes to compensate linear and quadratic gradient errors, in the unary current source array of a current-steering digital-to-analog converter (DAC). A novel algorithm is proposed to avoid the accumulation of gradient errors, yielding much less integral nonlinearities (INLs) than conventional switching schemes. Switching scheme examples with different number of current cells are also exhibited in this paper, including symmetric arrays and non-symmetric arrays in round and square outlines. (a) For symmetric arrays where each cell is divided into two parallel concentric ones, the simulated INL of the proposed round/square switching scheme is less than 25%/40% of conventional switching schemes, respectively. Such improvement is achieved by the cancelation of linear errors and the reduction of accumulated quadratic errors to near the absolute lower bound, using the proposed balanced algorithm. (b) For non-symmetric arrays, i.e. arrays where cells are not divided into parallel ones, linear errors cannot be canceled, and the accumulated INL varies with different quadratic error distribution centers. In this case, the proposed algorithm strictly controls the accumulation of quadratic gradient errors, and different from the algorithm in symmetric arrays, linear errors are also strictly controlled in two orthogonal directions simultaneously. Therefore, the INLs of the proposed non-symmetric switching schemes are less than 64% of conventional switching schemes.
Yusuke KUWAHARA Yusuke IWAMATSU Kensaku FUJII Mitsuji MUNEYASU Masakazu MORIMOTO
In this paper, we propose a normalization method dividing the gradient vector by the sum of the diagonal and two adjoining elements of the matrix expressing the correlation between the components of the discrete Fourier transform (DFT) of the reference signal used for the identification of unknown system. The proposed method can thereby improve the estimation speed of coefficients of adaptive filter.
This paper addresses conjugate-gradient (CG) based pilot-assisted channel estimation and equalization in doubly selective channels for orthogonal frequency division multiplexing (OFDM) block transmissions. With the help of the discrete prolate spheroidal sequence, which shows flat mean-square error (MSE) curves for the reconstructed channels in the presence of Doppler frequency mismatch, a basis expansion model for a parsimonious channel representation over multiple OFDM blocks is developed, a system equation for the least square channel estimation under widely used pilot lattices, where the pilot symbols are irregularly placed in the subcarrier domain, is formulated by introducing carving matrices, and the standard CG method is applied to the system. Relying on the CG method again, the linear minimum mean-square error channel equalization is pursued without performing any matrix inversion, while elevating the convergence speed of the iterative algorithm with a simple preconditioner. Finally, we validate our schemes with numerical experiments on the integrated services digital broadcasting-terrestrial system in doubly-selective channels and determine the normalized MSE and uncoded bit error rate.