Yeon-Soo LEE Hyoung-Gyu LEE Hae-Chang RIM Young-Sook HWANG
In phrase-based statistical machine translation, long distance reordering problem is one of the most challenging issues when translating syntactically distant language pairs. In this paper, we propose a novel reordering model to solve this problem. In our model, reordering is affected by the overall structures of sentences such as listings, reduplications, and modifications as well as the relationships of adjacent phrases. To this end, we reflect global syntactic contexts including the parts that are not yet translated during the decoding process.
Kazuhiro KOBAYASHI Tomoki TODA Hironori DOI Tomoyasu NAKANO Masataka GOTO Graham NEUBIG Sakriani SAKTI Satoshi NAKAMURA
The perceived age of a singing voice is the age of the singer as perceived by the listener, and is one of the notable characteristics that determines perceptions of a song. In this paper, we describe an investigation of acoustic features that have an effect on the perceived age, and a novel voice timbre control technique based on the perceived age for singing voice conversion (SVC). Singers can sing expressively by controlling prosody and voice timbre, but the varieties of voices that singers can produce are limited by physical constraints. Previous work has attempted to overcome this limitation through the use of statistical voice conversion. This technique makes it possible to convert singing voice timbre of an arbitrary source singer into those of an arbitrary target singer. However, it is still difficult to intuitively control singing voice characteristics by manipulating parameters corresponding to specific physical traits, such as gender and age. In this paper, we first perform an investigation of the factors that play a part in the listener's perception of the singer's age at first. Then, we applied a multiple-regression Gaussian mixture models (MR-GMM) to SVC for the purpose of controlling voice timbre based on the perceived age and we propose SVC based on the modified MR-GMM for manipulating the perceived age while maintaining singer's individuality. The experimental results show that 1) the perceived age of singing voices corresponds relatively well to the actual age of the singer, 2) prosodic features have a larger effect on the perceived age than spectral features, 3) the individuality of a singer is influenced more heavily by segmental features than prosodic features 4) the proposed voice timbre control method makes it possible to change the singer's perceived age while not having an adverse effect on the perceived individuality.
Numerous studies have been focusing on the improvement of bag of features (BOF), histogram of oriented gradient (HOG) and scale invariant feature transform (SIFT). However, few works have attempted to learn the connection between them even though the latter two are widely used as local feature descriptor for the former one. Motivated by the resemblance between BOF and HOG/SIFT in the descriptor construction, we improve the performance of HOG/SIFT by a) interpreting HOG/SIFT as a variant of BOF in descriptor construction, and then b) introducing recently proposed approaches of BOF such as locality preservation, data-driven vocabulary, and spatial information preservation into the descriptor construction of HOG/SIFT, which yields the BOF-driven HOG/SIFT. Experimental results show that the BOF-driven HOG/SIFT outperform the original ones in pedestrian detection (for HOG), scene matching and image classification (for SIFT). Our proposed BOF-driven HOG/SIFT can be easily applied as replacements of the original HOG/SIFT in current systems since they are generalized versions of the original ones.
Chunyi SONG Takeshi MATSUMURA Hiroshi HARADA
Some key challenges remain to be overcome before spectrum sensing can be widely used to identify spectrum opportunities in the TV bands. To fulfill the strict sensing requirement specified by FCC, a comprehensive sensing algorithm, which produces high SNR gain and maintains sensing robustness under complex noise conditions, needs to be implemented. In addition, carefully designed physical features and improvement on cost performance ratio are also essential if a prototype is to be commercialized. To the best of our knowledge, no success has ever been announced in developing a sensing prototype that fulfills both FCC sensing requirement and the above mentioned features. In this paper, we introduce a recently developed sensing prototype for Japanese digital TV signals of ISDB-T. The prototype operates in the Japanese UHF TV band of 470-770MHz and can reliably identify presence/absence of an ISDB-T signal at the level of -114dBm in a 6MHz channel. Moreover, it has constrained size and weight, and is capable of accurately measuring power of an ISDB-T signal at an extremely low level. Efforts on reducing cost have also been made by avoiding the use of electronic components/devices of high price. Both laboratory and field tests are performed to evaluate its sensing performance and power measurement capability. In the laboratory test, sensing performance under conditions of adjacent channel interference and frequency offset, and power measurement accuracy, are checked. In field tests, the prototype is attached in a vehicle and is checked for its capability to identify the presence of purposely broadcasted ISDB-T signals at some fixed locations and also during movement of the vehicle.
Wen ZHOU Chunheng WANG Baihua XIAO Zhong ZHANG Yunxue SHAO
Recognizing human action in complex scenes is a challenging problem in computer vision. Some action-unrelated concepts, such as camera position features, could significantly affect the appearance of local spatio-temporal features, and therefore the performance of low-level features based methods degrades. In this letter, we define the action-unrelated concept: the position of camera as high-level features. We observe that they can serve as a prior to local spatio-temporal features for human action recognition. We encode this prior by modeling interactions between spatio-temporal features and camera position features. We infer camera position features from local spatio-temporal features via these interactions. The parameters of this model are estimated by a new max-margin algorithm. We evaluate the proposed method on KTH, IXMAS and Youtube actions datasets. Experimental results show the effectiveness of the proposed method.
Akio OHTA Katsunori MAKIHARA Seiichi MIYAZAKI Masao SAKURABA Junichi MUROTA
An SiO2/Si-cap/Si0.55Ge0.45 heterostructure was fabricated on p-type Si(100) and strained silicon on insulator (SOI) substrates by low pressure chemical vapor deposition (LPCVD) and subsequent thermal oxidation in an O2 + H2 gas mixture. Chemical bonding features and valence band offsets in the heterostructures were evaluated by using high-resolution x-ray photoelectron spectroscopy (XPS) measurements and thinning the stack layers with a wet chemical solution.
Motoki FUKUSIMA Akio OHTA Katsunori MAKIHARA Seiichi MIYAZAKI
We have fabricated Pt/Si-rich oxide (SiOx)/TiN stacked MIM diodes and studied an impact of the structural asymmetry on their resistive switching characteristics. XPS analyses show that a TiON interfacial layer was formed during the SiOx deposition on TiN by RF-sputtering in an Ar + O2 gas mixture. After the fabrication of Pt top electrodes on the SiOx layer, and followed by an electro-forming process, distinct bi-polar type resistive switching was confirmed. For the resistive switching from high to low resistance states so called SET process, there is no need to set the current compliance. Considering higher dielectric constant of TiON than SiOx, the interfacial TiON layer can contribute to regulate the current flow through the diode. The clockwise resistive switching, in which the reduction and oxidation (Red-Ox) reactions can occur near the TiN bottom electrode, shows lower RESET voltages and better switching endurance than the counter-clockwise switching where the Red-Ox reaction can take place near the top Pt electrode. The result implies a good repeatable nature of Red-Ox reactions at the interface between SiOx and TiON/TiN in consideration of relatively high diffusibility of oxygen atoms through Pt.
Akio OHTA Katsunori MAKIHARA Mitsuhisa IKEDA Hideki MURAKAMI Seiichiro HIGASHI Seiichi MIYAZAKI
We have investigated the impact of O2 annealing after SiOx deposition on the switching behavior to gain a better understanding of the resistance switching mechanism, especially the role of oxygen deficiency in the SiOx network. Although resistive random access memories (ReRAMs) with SiOx after 300 annealing sandwiched with Pt electrodes showed uni-polar type resistance switching characteristics, the switching behaviors were barely detectable for the samples after annealing at temperatures over 500. Taking into account of the average oxygen content in the SiOx films evaluated by XPS measurements, oxygen vacancies in SiOx play an important role in resistance switching. Also, the results of conductive AFM measurements suggest that the formation and disruption of a conducting filament path are mainly responsible for the resistance switching behavior of SiOx.
Rui XU Yasushi HIRANO Rie TACHIBANA Shoji KIDO
Computer-aided diagnosis (CAD) systems on diffuse lung diseases (DLD) were required to facilitate radiologists to read high-resolution computed tomography (HRCT) scans. An important task on developing such CAD systems was to make computers automatically recognize typical pulmonary textures of DLD on HRCT. In this work, we proposed a bag-of-features based method for the classification of six kinds of DLD patterns which were consolidation (CON), ground-glass opacity (GGO), honeycombing (HCM), emphysema (EMP), nodular (NOD) and normal tissue (NOR). In order to successfully apply the bag-of-features based method on this task, we focused to design suitable local features and the classifier. Considering that the pulmonary textures were featured by not only CT values but also shapes, we proposed a set of statistical measures based local features calculated from both CT values and eigen-values of Hessian matrices. Additionally, we designed a support vector machine (SVM) classifier by optimizing parameters related to both kernels and the soft-margin penalty constant. We collected 117 HRCT scans from 117 subjects for experiments. Three experienced radiologists were asked to review the data and their agreed-regions where typical textures existed were used to generate 3009 3D volume-of-interest (VOIs) with the size of 323232. These VOIs were separated into two sets. One set was used for training and tuning parameters, and the other set was used for evaluation. The overall recognition accuracy for the proposed method was 93.18%. The precisions/sensitivities for each texture were 96.67%/95.08% (CON), 92.55%/94.02% (GGO), 97.67%/99.21% (HCM), 94.74%/93.99% (EMP), 81.48%/86.03%(NOD) and 94.33%/90.74% (NOR). Additionally, experimental results showed that the proposed method performed better than four kinds of baseline methods, including two state-of-the-art methods on classification of DLD textures.
Wei ZHAO Rui XU Yasushi HIRANO Rie TACHIBANA Shoji KIDO Narufumi SUGANUMA
This paper describes a computer-aided diagnosis (CAD) method to classify pneumoconiosis on HRCT images. In Japan, the pneumoconiosis is divided into 4 types according to the density of nodules: Type 1 (no nodules), Type 2 (few small nodules), Type 3-a (numerous small nodules) and Type 3-b (numerous small nodules and presence of large nodules). Because most pneumoconiotic nodules are small-sized and irregular-shape, only few nodules can be detected by conventional nodule extraction methods, which would affect the classification of pneumoconiosis. To improve the performance of nodule extraction, we proposed a filter based on analysis the eigenvalues of Hessian matrix. The classification of pneumoconiosis is performed in the following steps: Firstly the large-sized nodules were extracted and cases of type 3-b were recognized. Secondly, for the rest cases, the small nodules were detected and false positives were eliminated. Thirdly we adopted a bag-of-features-based method to generate input vectors for a support vector machine (SVM) classifier. Finally cases of type 1,2 and 3-a were classified. The proposed method was evaluated on 175 HRCT scans of 112 subjects. The average accuracy of classification is 90.6%. Experimental result shows that our method would be helpful to classify pneumoconiosis on HRCT.
Cheng CHENG Bilan ZHU Masaki NAKAGAWA
This paper presents an approach based on character recognition to searching for keywords in on-line handwritten Japanese text. It employs an on-line character classifier and an off-line classifier or a combined classifier, which produce recognition candidates, and it searches for keywords in the lattice of candidates. It integrates scores to individually recognize characters and their geometric context. We use quadratic discriminant function(QDF) or support vector machines(SVM) models to evaluate the geometric features of individual characters and the relationships between characters. This paper also presents an approach based on feature matching that employs on-line or off-line features. We evaluate three recognition-based methods, two feature-matching-based methods, as well as ideal cases of the latter and concluded that the approach based on character recognition outperformed that based on feature matching.
Masaki KOBAYASHI Keisuke KAMEYAMA
In camera-based object recognition and classification, surface color is one of the most important characteristics. However, apparent object color may differ significantly according to the illumination and surface conditions. Such a variation can be an obstacle in utilizing color features. Geusebroek et al.'s color invariants can be a powerful tool for characterizing the object color regardless of illumination and surface conditions. In this work, we analyze the estimation process of the color invariants from RGB images, and propose a novel invariant feature of color based on the elementary invariants to meet the circular continuity residing in the mapping between colors and their invariants. Experiments show that the use of the proposed invariant in combination with luminance, contributes to improve the retrieval performances of partial object image matching under varying illumination conditions.
Michal KAWULOK Jolanta KAWULOK Bogdan SMOLKA
Image colorization is a semi-automatic process of adding colors to monochrome images and videos. Using existing methods, required human assistance can be limited to annotating the image with color scribbles or selecting a reference image, from which the colors are transferred to a source image or video sequence. In the work reported here we have explored how to exploit the textural information to improve this process. For every scribbled image we determine the discriminative textural feature domain. After that, the whole image is projected onto the feature space, which makes it possible to estimate textural similarity between any two pixels. For single image colorization based on a set of color scribbles, our contribution lies in using the proposed feature space domain rather than the luminance channel. In case of color transfer used for colorization of video sequences, the feature space is generated based on a reference image, and textural similarity is used to match the pixels between the reference and source images. We have conducted extensive experimental validation which confirmed the importance of using textural information and demonstrated that our method significantly improves colorization result.
Muhammad Rasyid AQMAR Koichi SHINODA Sadaoki FURUI
Variations in walking speed have a strong impact on gait-based person identification. We propose a method that is robust against walking-speed variations. It is based on a combination of cubic higher-order local auto-correlation (CHLAC), gait silhouette-based principal component analysis (GSP), and a statistical framework using hidden Markov models (HMMs). The CHLAC features capture the within-phase spatio-temporal characteristics of each individual, the GSP features retain more shape/phase information for better gait sequence alignment, and the HMMs classify the ID of each gait even when walking speed changes nonlinearly. We compared the performance of our method with other conventional methods using five different databases, SOTON, USF-NIST, CMU-MoBo, TokyoTech A and TokyoTech B. The proposed method was equal to or better than the others when the speed did not change greatly, and it was significantly better when the speed varied across and within a gait sequence.
Gibran FUENTES PINEDA Hisashi KOGA Toshinori WATANABE
We present a scalable approach to automatically discovering particular objects (as opposed to object categories) from a set of images. The basic idea is to search for local image features that consistently appear in the same images under the assumption that such co-occurring features underlie the same object. We first represent each image in the set as a set of visual words (vector quantized local image features) and construct an inverted file to memorize the set of images in which each visual word appears. Then, our object discovery method proceeds by searching the inverted file and extracting visual word sets whose elements tend to appear in the same images; such visual word sets are called co-occurring word sets. Because of unstable and polysemous visual words, a co-occurring word set typically represents only a part of an object. We observe that co-occurring word sets associated with the same object often share many visual words with one another. Hence, to obtain the object models, we further cluster highly overlapping co-occurring word sets in an agglomerative manner. Remarkably, we accelerate both extraction and clustering of co-occurring word sets by Min-Hashing. We show that the models generated by our method can effectively discriminate particular objects. We demonstrate our method on the Oxford buildings dataset. In a quantitative evaluation using a set of ground truth landmarks, our method achieved higher scores than the state-of-the-art methods.
Mitsuru AMBAI Nugraha P. UTAMA Yuichi YOSHIDA
Histogram-based image features such as HoG, SIFT and histogram of visual words are generally represented as high-dimensional, non-negative vectors. We propose a supervised method of reducing the dimensionality of histogram-based features by using non-negative matrix factorization (NMF). We define a cost function for supervised NMF that consists of two terms. The first term is the generalized divergence term between an input matrix and a product of factorized matrices. The second term is the penalty term that reflects prior knowledge on a training set by assigning predefined constants to cannot-links and must-links in pairs of training data. A multiplicative update rule for minimizing the newly-defined cost function is also proposed. We tested our method on a task of scene classification using histograms of visual words. The experimental results revealed that each of the low-dimensional basis vectors obtained from the proposed method only appeared in a single specific category in most cases. This interesting characteristic not only makes it easy to interpret the meaning of each basis but also improves the power of classification.
In this paper, we present an approach of detecting speech presence for which the decision rule is based on a combination of multiple features using a sigmoid function. A minimum classification error (MCE) training is used to update the weights adjustment for the combination. The features, consisting of three parameters: the ratio of ZCR, the spectral energy, and spectral entropy, are combined linearly with weights derived from the sub-band domain. First, the Bark-scale wavelet decomposition (BSWD) is used to split the input speech into 24 critical sub-bands. Next, the feature parameters are derived from the selected frequency sub-band to form robust voice feature parameters. In order to discard the seriously corrupted frequency sub-band, a strategy of adaptive frequency sub-band extraction (AFSE) dependant on the sub-band SNR is then applied to only the frequency sub-band used. Finally, these three feature parameters, which only consider the useful sub-band, are combined through a sigmoid type function incorporating optimal weights based on MSE training to detect either a speech present frame or a speech absent frame. Experimental results show that the performance of the proposed algorithm is superior to the standard methods such as G.729B and AMR2.
Generally, two problems of bag-of-features in image retrieval are still considered unsolved: one is that spatial information about descriptors is not employed well, which affects the accuracy of retrieval; the other is that the trade-off between vocabulary size and good precision, which decides the storage and retrieval performance. In this paper, we propose a novel approach called Hilbert scan based bag-of-features (HS-BoF) for image retrieval. Firstly, Hilbert scan based tree representation (HSBT) is studied, which is built based on the local descriptors while spatial relationships are added into the nodes by a novel grouping rule, resulting of a tree structure for each image. Further, we give two ways of codebook production based on HSBT: multi-layer codebook and multi-size codebook. Owing to the properties of Hilbert scanning and the merits of our grouping method, sub-regions of the tree are not only flexible to the distribution of local patches but also have hierarchical relations. Extensive experiments on caltech-256, 13-scene and 1 million ImageNet images show that HS-BoF obtains higher accuracy with less memory usage.
Most document clustering methods are a challenging issue for improving clustering performance. Document clustering based on semantic features is highly efficient. However, the method sometimes did not successfully cluster some documents, such as highly articulated documents. In order to improve the clustering success of complex documents using semantic features, this paper proposes a document clustering method that uses terms of the condensing document clusters and fuzzy association to efficiently cluster specific documents into meaningful topics based on the document set. The proposed method improves the quality of document clustering because it can extract documents from the perspective of the terms of the cluster topics using semantic features and synonyms, which can also better represent the inherent structure of the document in connection with the document cluster topics. The experimental results demonstrate that the proposed method can achieve better document clustering performance than other methods.
Chao LIAO Guijin WANG Quan MIAO Zhiguo WANG Chenbo SHI Xinggang LIN
Robust local image features have become crucial components of many state-of-the-art computer vision algorithms. Due to limited hardware resources, computing local features on embedded system is not an easy task. In this paper, we propose an efficient parallel computing framework for speeded-up robust features with an orientation towards multi-DSP based embedded system. We optimize modules in SURF to better utilize the capability of DSP chips. We also design a compact data layout to adapt to the limited memory resource and to increase data access bandwidth. A data-driven barrier and workload balance schemes are presented to synchronize parallel working chips and reduce overall cost. The experiment shows our implementation achieves competitive time efficiency compared with related works.