Haipeng WANG Tianlin WANG Feng XU Kazuo OUCHI
In this paper, the Getis statistic is applied to ALOS- PALSAR (Advanced Land Ovserving Satellite-Phased Array L-band Synthetic Aperture Radar) images for assessing the building damage caused by the Wenchuan earthquake in 2008. As a proposed image analysis, a simulated building image using mapping and projection algorithm is first presented for analysis of the Getis statistic. The results show the high accuracy of the assessment of the proposed approach. The Getis statistic is then applied to two ALOS-PALSAR images acquired before and after the Wenchuan earthquake to assess the level of building damage. Results of the Getis statistic show that the damage level is approximately 81%.
Qingbo WU Jian XIONG Bing LUO Chao HUANG Linfeng XU
In this paper, we propose a novel joint rate distortion optimization (JRDO) model for intra prediction coding. The spatial prediction dependency is exploited by modeling the distortion propagation with a linear fitting function. A novel JRDO based Lagrange multiplier (LM) is derived from this model. To adapt to different blocks' distortion propagation characteristics, we also introduce a generalized multiple Lagrange multiplier (MLM) framework where some candidate LMs are used in the RDO process. Experiment results show that our proposed JRDO-MLM scheme is superior to the H.264/AVC encoder.
Yinan LIU Qingbo WU Linfeng XU Bo WU
Traditional action recognition approaches use pre-defined rigid areas to process the space-time information, e.g. spatial pyramids, cuboids. However, most action categories happen in an unconstrained manner, that is, the same action in different videos can happen at different places. Thus we need a better video representation to deal with the space-time variations. In this paper, we introduce the idea of mining spatial temporal saliency. To better handle the uniqueness of each video, we use a space-time over-segmentation approach, e.g. supervoxel. We choose three different saliency measures that take not only the appearance cues, but also the motion cues into consideration. Furthermore, we design a category-specific mining process to find the discriminative power in each action category. Experiments on action recognition datasets such as UCF11 and HMDB51 show that the proposed spatial temporal saliency video representation can match or surpass some of the state-of-the-art alternatives in the task of action recognition.
Wei HONG Ke WU Hongjun TANG Jixin CHEN Peng CHEN Yujian CHENG Junfeng XU
In this paper, the research advances in SIW-like (Substrate Integrated Waveguide-like) guided wave structures and their applications in the State Key Laboratory of Millimeter Waves of China is reviewed. Our work is concerned with the investigations on the propagation characteristics of SIW, half-mode SIW (HMSIW) and the folded HMSIW (FHMSIW) as well as their applications in microwave and millimeter wave filters, diplexers, directional couplers, power dividers, antennas, power combiners, phase shifters and mixers etc. Selected results are presented to show the interesting features and advantages of those new techniques.
Hai-Feng XU Song-Yu YU Ci WANG
A novel image resizing algorithm is proposed. In our method, three steps are included in the downsampling: the first-round downsampling, the interim upsampling and the second-round downsampling. The downsampling operation unit size is selected between one single 1616 block size and four 88 block sizes during the first-round downsampling processing. To distinguish the selected downsampling operation unit size, the interim upsampling and the second-round downsampling is required. The DCT coefficients of the interim upsampling image indicate the selected downsampling unit size. The DCT coefficients are converted by some way like lifting step and simultaneously downsampled in the second round. The information about selected operator unit size is contained in the final downsampling image. Experimental results demonstrate the proposed method achieves better result than the relevant existing method.
Jianping HU Tiefeng XU Hong LI
This paper presents a novel low-power register file based on adiabatic logic. The register file consists of a storage-cell array, address decoders, read/write control circuits, sense amplifiers, and read/write drivers. The storage-cell array is based on the conventional memory cell. All the circuits except the storage-cell array employ CPAL (complementary pass-transistor adiabatic logic) to recover the charge of large node capacitance on address decoders, bit-lines and word-lines in fully adiabatic manner. The minimization of energy consumption was investigated by choosing the optimal size of CPAL circuits for large load capacitance. The power consumption of the proposed adiabatic register file is significantly reduced because the energy transferred to the large capacitance buses is mostly recovered. The energy and functional simulations are performed using the net-list extracted from the layout. HSPICE simulation results indicate that the proposed register file attains energy savings of 65% to 85% as compared to the conventional CMOS implementation for clock rates ranging from 25 to 200 MHz.
Hao SAN Hajime KONAGAYA Feng XU Atsushi MOTOZAWA Haruo KOBAYASHI Kazumasa ANDO Hiroshi YOSHIDA Chieto MURAYAMA Kanichi MIYAZAWA
This paper proposes novel feedforward architecture of the second-order multibit ΔΣAD modulator with single DAC-feedback topology. The ΔΣAD modulator realizes high resolution by oversampling and noise shaping techniques. However, its SNDR (Signal to Noise and Distortion Ratio) is limited by the dynamic range of the input signal and non-idealities of circuit building blocks, particularly by the harmonic distortion in amplifier circuits. A full feedforward ΔΣAD modulator structure has the signal transfer function of unity under ideal circumstances, which means that the signal swings through the loop filter become smaller compared with a feedbacked ΔΣAD modulator. Therefore, the harmonic distortion generated inside the loop filter can be significantly reduced in the feedforward structure because the effect of non-idealities in amplifiers can be suppressed when signal swing is small. Moreover, the reduction of the internal signal swings also relaxes output swing requirements for amplifiers with low supply voltage. However, in conventional feedforward ΔΣAD modulator, an analog adder is needed before quantizer, and especially in a multibit modulator, an additional amplifier is necessary to realize the summation of feedforward signals, which leads to extra chip area and power dissipation. In this paper, we propose a novel architecture of a feedforward ΔΣAD modulator which realizes the summation of feedforward signals without additional amplifier. The proposed architecture is functionally equivalent to the conventional one but with smaller chip area and lower power dissipation. We conducted MATLAB and SPICE simulations to validate the proposed architecture and modulator circuits.
Kai TAN Qingbo WU Fanman MENG Linfeng XU
Saliency quality assessment aims at estimating the objective quality of a saliency map without access to the ground-truth. Existing works typically evaluate saliency quality by utilizing information from saliency maps to assess its compactness and closedness while ignoring the information from image content which can be used to assess the consistence and completeness of foreground. In this letter, we propose a novel multi-information fusion network to capture the information from both the saliency map and image content. The key idea is to introduce a siamese module to collect information from foreground and background, aiming to assess the consistence and completeness of foreground and the difference between foreground and background. Experiments demonstrate that by incorporating image content information, the performance of the proposed method is significantly boosted. Furthermore, we validate our method on two applications: saliency detection and segmentation. Our method is utilized to choose optimal saliency map from a set of candidate saliency maps, and the selected saliency map is feeded into an segmentation algorithm to generate a segmentation map. Experimental results verify the effectiveness of our method.
Qingbo WU Linfeng XU Zhengning WANG
In this letter, we propose a novel intra prediction coding scheme for H.264/AVC. Based on our proposed minimum distance prediction (MDP) scheme, the optimal reference samples for predicting the current pixel can be adaptively updated corresponding to different video contents. The experimental results show that up to 2 dB and 1 dB coding gains can be achieved with the proposed method for QCIF and CIF sequences respectively.
Haipeng WANG Feng XU Ya-Qiu JIN Kazuo OUCHI
An inversion method of bridge height over water by polarimetric synthetic aperture radar (SAR) is developed. A geometric ray description to illustrate scattering mechanism of a bridge over water surface is identified by polarimetric image analysis. Using the mapping and projecting algorithm, a polarimetric SAR image of a bridge model is first simulated and shows that scattering from a bridge over water can be identified by three strip lines corresponding to single-, double-, and triple-order scattering, respectively. A set of polarimetric parameters based on the de-orientation theory is applied to analysis of three types scattering, and the thinning-clustering algorithm and Hough transform are then employed to locate the image positions of these strip lines. These lines are used to invert the bridge height. Fully polarimetric image data of airborne Pi-SAR at X-band are applied to inversion of the height and width of the Naruto Bridge in Japan. Based on the same principle, this approach is also applicable to spaceborne ALOSPALSAR single-polarization data of the Eastern Ocean Bridge in China. The results show good feasibility to realize the bridge height inversion.
Fuxiang LIU Chen ZANG Lei LI Chunfeng XU Jingmin LUO
Aiming at the different abilities of the defogging algorithms in different fog concentrations, this paper proposes a fog image classification algorithm for a small and imbalanced sample dataset based on a convolution neural network, which can classify the fog images in advance, so as to improve the effect and adaptive ability of image defogging algorithm in fog and haze weather. In order to solve the problems of environmental interference, camera depth of field interference and uneven feature distribution in fog images, the CutBlur-Gauss data augmentation method and focal loss and label smoothing strategies are used to improve the accuracy of classification. It is compared with the machine learning algorithm SVM and classical convolution neural network classification algorithms alexnet, resnet34, resnet50 and resnet101. This algorithm achieves 94.5% classification accuracy on the dataset in this paper, which exceeds other excellent comparison algorithms at present, and achieves the best accuracy. It is proved that the improved algorithm has better classification accuracy.
Jianfeng XU Satoshi KOMORITA Kei KAWAMURA
We propose a framework for the integration of heterogeneous networks in human pose estimation (HPE) with the aim of balancing accuracy and computational complexity. Although many existing methods can improve the accuracy of HPE using multiple frames in videos, they also increase the computational complexity. The key difference here is that the proposed heterogeneous framework has various networks for different types of frames, while existing methods use the same networks for all frames. In particular, we propose to divide the video frames into two types, including key frames and non-key frames, and adopt three networks including slow networks, fast networks, and transfer networks in our heterogeneous framework. For key frames, a slow network is used that has high accuracy but high computational complexity. For non-key frames that follow a key frame, we propose to warp the heatmap of a slow network from a key frame via a transfer network and fuse it with a fast network that has low accuracy but low computational complexity. Furthermore, when extending to the usage of long-term frames where a large number of non-key frames follow a key frame, the temporal correlation decreases. Therefore, when necessary, we use an additional transfer network that warps the heatmap from a neighboring non-key frame. The experimental results on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed FSPose achieves a better balance between accuracy and computational complexity than the competitor method. Our source code is available at https://github.com/Fenax79/fspose.
Jianfeng XU Koichi TAKAGI Shigeyuki SAKAZAWA
This paper presents a system for automatic generation of dancing animation that is synchronized with a piece of music by re-using motion capture data. Basically, the dancing motion is synthesized according to the rhythm and intensity features of music. For this purpose, we propose a novel meta motion graph structure to embed the necessary features including both rhythm and intensity, which is constructed on the motion capture database beforehand. In this paper, we consider two scenarios for non-streaming music and streaming music, where global search and local search are required respectively. In the case of the former, once a piece of music is input, the efficient dynamic programming algorithm can be employed to globally search a best path in the meta motion graph, where an objective function is properly designed by measuring the quality of beat synchronization, intensity matching, and motion smoothness. In the case of the latter, the input music is stored in a buffer in a streaming mode, then an efficient search method is presented for a certain amount of music data (called a segment) in the buffer with the same objective function, resulting in a segment-based search approach. For streaming applications, we define an additional property in the above meta motion graph to deal with the unpredictable future music, which guarantees that there is some motion to match the unknown remaining music. A user study with totally 60 subjects demonstrates that our system outperforms the stat-of-the-art techniques in both scenarios. Furthermore, our system improves the synthesis speed greatly (maximal speedup is more than 500 times), which is essential for mobile applications. We have implemented our system on commercially available smart phones and confirmed that it works well on these mobile phones.
Nii L. SOWAH Qingbo WU Fanman MENG Liangzhi TANG Yinan LIU Linfeng XU
In this paper, we improve upon the accuracy of existing tracklet generation methods by repairing tracklets based on their quality evaluation and detection propagation. Starting from object detections, we generate tracklets using three existing methods. Then we perform co-tracklet quality evaluation to score each tracklet and filtered out good tracklet based on their scores. A detection propagation method is designed to transfer the detections in the good tracklets to the bad ones so as to repair bad tracklets. The tracklet quality evaluation in our method is implemented by intra-tracklet detection consistency and inter-tracklet detection completeness. Two propagation methods; global propagation and local propagation are defined to achieve more accurate tracklet propagation. We demonstrate the effectiveness of the proposed method on the MOT 15 dataset
Tiecheng SONG Linfeng XU Chao HUANG Bing LUO
In this paper, a simple yet efficient texture representation is proposed for texture classification by exploring the joint statistics of local quantized patterns (jsLQP). In order to combine information of different domains, the Gaussian derivative filters are first employed to obtain the multi-scale gradient responses. Then, three feature maps are generated by encoding the local quantized binary and ternary patterns in the image space and the gradient space. Finally, these feature maps are hybridly encoded, and their joint histogram is used as the final texture representation. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art LBP based and even learning based methods for texture classification.
Hai-Feng XU Song-Yu YU Ci WANG
Based on the theory of block projection onto convex sets (BPOCS), a novel de-blocking algorithm is proposed. A new smoothness constraint set (SCS) is used to remove the unnecessary high frequencies. In addition, an adaptive quantization constraint set (AQCS) is employed to suppress error in the smoothing process. The proposed size and position of new SCS are different from traditional ones. Extensive experimental results are provided to demonstrate that the proposed method can achieve better image quality with fewer iterations.
Linfeng XU Liaoyuan ZENG Zhengning WANG
In this letter, we use the saliency maps obtained by several bottom-up methods to learn a model to generate a bottom-up saliency map. In order to consider top-down image semantics, we use the high-level features of objectness and background probability to learn a top-down saliency map. The bottom-up map and top-down map are combined through a two-layer structure. Quantitative experiments demonstrate that the proposed method and features are effective to predict human fixation.
Jianfeng XU Toshihiko YAMASAKI Kiyoharu AIZAWA
3D video, which consists of a sequence of mesh models, can reproduce dynamic scenes containing 3D information. To summarize 3D video, a key frame extraction method is developed using rate-distortion (R-D) trade-off. For this purpose, an effective feature vector is extracted for each frame. Shot detection is performed using the feature vectors as a preprocessing followed by key frame extraction. Simple but reasonable definitions of rate and distortion are presented. Based on an assumption of linearity, an R-D curve is generated in each shot, where the locations of the key frames are optimized. Finally, R-D trade-off can be achieved by optimizing a cost function using a Lagrange multiplier, where the number of key frames is optimized in each shot. Therefore, our system will automatically determine the best locations and the number of key frames in the sense of R-D trade-off. Our experimental results show the extracted key frames are compact and faithful to the original 3D video.
Yinan LIU Qingbo WU Liangzhi TANG Linfeng XU
In this paper, we propose a novel self-supervised learning of video representation which is capable to anticipate the video category by only reading its short clip. The key idea is that we employ the Siamese convolutional network to model the self-supervised feature learning as two different image matching problems. By using frame encoding, the proposed video representation could be extracted from different temporal scales. We refine the training process via a motion-based temporal segmentation strategy. The learned representations for videos can be not only applied to action anticipation, but also to action recognition. We verify the effectiveness of the proposed approach on both action anticipation and action recognition using two datasets namely UCF101 and HMDB51. The experiments show that we can achieve comparable results with the state-of-the-art self-supervised learning methods on both tasks.
Jianfeng XU Hong LI Wen-Yan YIN Junfa MAO Le-Wei LI
The element-by-element finite element method (EBE-FEM) combined with the preconditioned conjugate gradient (PCG) technique is employed in this paper to calculate the coupling capacitances of multi-level high-density three-dimensional interconnects (3DIs). All capacitive coupling 3DIs can be captured, with the effects of all geometric and physical parameters taken into account. It is numerically demonstrated that with this hybrid method in the extraction of capacitances, an effective and accurate convergent solution to the Laplace equation can be obtained, with less memory and CPU time required, as compared to the results obtained by using the commercial FEM software of either MAXWELL 3D or ANSYS.