The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] scene(66hit)

21-40hit(66hit)

  • Acoustic Event Detection in Speech Overlapping Scenarios Based on High-Resolution Spectral Input and Deep Learning

    Miquel ESPI  Masakiyo FUJIMOTO  Tomohiro NAKATANI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2015/06/23
      Vol:
    E98-D No:10
      Page(s):
    1799-1807

    We present a method for recognition of acoustic events in conversation scenarios where speech usually overlaps with other acoustic events. While speech is usually considered the most informative acoustic event in a conversation scene, it does not always contain all the information. Non-speech events, such as a door knock, steps, or a keyboard typing can reveal aspects of the scene that speakers miss or avoid to mention. Moreover, being able to robustly detect these events could further support speech enhancement and recognition systems by providing useful information cues about the surrounding scenarios and noise. In acoustic event detection, state-of-the-art techniques are typically based on derived features (e.g. MFCC, or Mel-filter-banks) which have successfully parameterized the spectrogram of speech but reduce resolution and detail when we are targeting other kinds of events. In this paper, we propose a method that learns features in an unsupervised manner from high-resolution spectrogram patches (considering a patch as a certain number of consecutive frame features stacked together), and integrates within the deep neural network framework to detect and classify acoustic events. Superiority over both previous works in the field, and similar approaches based on derived features, has been assessed by statical measures and evaluation with CHIL2007 corpus, an annotated database of seminar recordings.

  • Offline Vehicle Detection at Night Using Center Surround Extremas

    Naoya KOSAKA  Ryota OGURA  Gosuke OHASHI  

     
    PAPER

      Vol:
    E98-A No:8
      Page(s):
    1727-1734

    Recently, Intelligent Transport Systems (ITS) are being researched and developed briskly. As a part of ITS, detecting vehicles from images taken by a camera loaded on a vehicle are conducted. From such backgrounds, authors have been conducting vehicle detection in nighttime. To evaluate the accuracy of this detection, gold standards of the detection are required. At present, gold standards are created manually, but manually detecting vehicles take time. Accordingly, a system which detects vehicles accurately without human help is needed to evaluate the accuracy of the vehicle detection in real time. Therefore the purpose of this study is to automatically detect vehicles in nighttime images, taken by an in-vehicle camera, with high accuracy in offline processing. To detect vehicles we focused on the brightness of the headlights and taillights, because it is difficult to detect vehicles from their shape in nighttime driving scenes. The method we propose uses Center Surround Extremas, called CenSurE for short, to detect blobs. CenSurE is a method that uses the difference in brightness between the lights and the surroundings. However, blobs obtained by CenSurE will also include objects other than headlights and taillights. For example, streetlights and delineators would be detected. To distinguish such blobs, they are tracked in inverse time and vehicles are detected using tags based on the characteristics of each object. Although every object appears from the same point in forward time process, each object appears from different places in images in inverse time processing, allowing it to track and tag blobs easily. To evaluate the effectiveness of this proposed method, experiment of detecting vehicles was conducted using nighttime driving scenes taken by a camera loaded on a vehicle. Experimental results of the proposed method were nearly equivalent to manual detection.

  • Hue-Preserving Unsharp-Masking for Color Image Enhancement

    Zihan YU  Kiichi URAHAMA  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2014/09/22
      Vol:
    E97-D No:12
      Page(s):
    3236-3238

    We propose an unsharp-masking technique which preserves the hue of colors in images. This method magnifies the contrast of colors and spatially sharpens textures in images. The contrast magnification ratio is adaptively controlled. We show by experiments that this method enhances the color tone of photographs while keeping their perceptual scene depth.

  • Adaptive Rate Control Mechanism in H.264/AVC for Scene Changes

    Jiunn-Tsair FANG  Zong-Yi CHEN  Chen-Cheng CHAN  Pao-Chi CHANG  

     
    PAPER-Image

      Vol:
    E97-A No:12
      Page(s):
    2625-2632

    Rate control that is required to regulate the bitrate of video coding is critical to time-sensitive video applications used over networks. However, the H.264/AVC standard does not respond to scene changes, and this causes the transmission quality to deteriorate as a scene change occurs. In this work, a scene change is detected by comparing the ratio of the sum of absolute difference (SAD) between two consecutive frames. As the scene change is detected, the proposed method, which is modified from the reference software of H.264/AVC, re-assigns a quantization parameter (QP) value to regulate the bitrate. Because the inter-prediction works poorly for the scene-changed frame, the proposed method estimates its frame complexity based on the content, and further creates another Q-R model to assign QP. The adaptive rate control mechanism presented in this study can quickly respond to the heavy bitrate increment caused by a change of scene. Simulation results show that the proposed method improves the average peak signal noise ratio (PSNR) to approximately 1.1dB, with a smaller buffer size compared with the performance of the reference software JM version 17.2.

  • Discriminative Reference-Based Scene Image Categorization

    Qun LI  Ding XU  Le AN  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2014/07/22
      Vol:
    E97-D No:10
      Page(s):
    2823-2826

    A discriminative reference-based method for scene image categorization is presented in this letter. Reference-based image classification approach combined with K-SVD is approved to be a simple, efficient, and effective method for scene image categorization. It learns a subspace as a means of randomly selecting a reference-set and uses it to represent images. A good reference-set should be both representative and discriminative. More specifically, the reference-set subspace should well span the data space while maintaining low redundancy. To automatically select reference images, we adapt affinity propagation algorithm based on data similarity to gather a reference-set that is both representative and discriminative. We apply the discriminative reference-based method to the task of scene categorization on some benchmark datasets. Extensive experiment results demonstrate that the proposed scene categorization method with selected reference set achieves better performance and higher efficiency compared to the state-of-the-art methods.

  • Combining LBP and SIFT in Sparse Coding for Categorizing Scene Images

    Shuang BAI  Jianjun HOU  Noboru OHNISHI  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E97-D No:9
      Page(s):
    2563-2566

    Local descriptors, Local Binary Pattern (LBP) and Scale Invariant Feature Transform (SIFT) are widely used in various computer applications. They emphasize different aspects of image contents. In this letter, we propose to combine them in sparse coding for categorizing scene images. First, we regularly extract LBP and SIFT features from training images. Then, corresponding to each feature, a visual word codebook is constructed. The obtained LBP and SIFT codebooks are used to create a two-dimensional table, in which each entry corresponds to an LBP visual word and a SIFT visual word. Given an input image, LBP and SIFT features extracted from the same positions of this image are encoded together based on sparse coding. After that, spatial max pooling is adopted to determine the image representation. Obtained image representations are converted into one-dimensional features and classified by utilizing SVM classifiers. Finally, we conduct extensive experiments on datasets of Scene Categories 8 and MIT 67 Indoor Scene to evaluate the proposed method. Obtained results demonstrate that combining features in the proposed manner is effective for scene categorization.

  • Scene Text Character Recognition Using Spatiality Embedded Dictionary

    Song GAO  Chunheng WANG  Baihua XIAO  Cunzhao SHI  Wen ZHOU  Zhong ZHANG  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E97-D No:7
      Page(s):
    1942-1946

    This paper tries to model spatial layout beyond the traditional spatial pyramid (SP) in the coding/pooling scheme for scene text character recognition. Specifically, we propose a novel method to build a dictionary called spatiality embedded dictionary (SED) in which each codeword represents a particular character stroke and is associated with a local response region. The promising results outperform other state-of-the-art algorithms.

  • Improvements of Local Descriptor in HOG/SIFT by BOF Approach

    Zhouxin YANG  Takio KURITA  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E97-D No:5
      Page(s):
    1293-1303

    Numerous studies have been focusing on the improvement of bag of features (BOF), histogram of oriented gradient (HOG) and scale invariant feature transform (SIFT). However, few works have attempted to learn the connection between them even though the latter two are widely used as local feature descriptor for the former one. Motivated by the resemblance between BOF and HOG/SIFT in the descriptor construction, we improve the performance of HOG/SIFT by a) interpreting HOG/SIFT as a variant of BOF in descriptor construction, and then b) introducing recently proposed approaches of BOF such as locality preservation, data-driven vocabulary, and spatial information preservation into the descriptor construction of HOG/SIFT, which yields the BOF-driven HOG/SIFT. Experimental results show that the BOF-driven HOG/SIFT outperform the original ones in pedestrian detection (for HOG), scene matching and image classification (for SIFT). Our proposed BOF-driven HOG/SIFT can be easily applied as replacements of the original HOG/SIFT in current systems since they are generalized versions of the original ones.

  • Scene Character Detection and Recognition with Cooperative Multiple-Hypothesis Framework

    Rong HUANG  Palaiahnakote SHIVAKUMARA  Yaokai FENG  Seiichi UCHIDA  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E96-D No:10
      Page(s):
    2235-2244

    To handle the variety of scene characters, we propose a cooperative multiple-hypothesis framework which consists of an image operator set module, an Optical Character Recognition (OCR) module and an integration module. Multiple image operators activated by multiple parameters probe suspected character regions. The OCR module is then applied to each suspected region and returns multiple candidates with weight values for future integration. Without the aid of the heuristic rules which impose constraints on segmentation area, aspect ratio, color consistency, text line orientations, etc., the integration module automatically prunes the redundant detection/recognition and pads the missing detection/recognition. The proposed framework bridges the gap between scene character detection and recognition, in the sense that a practical OCR engine is effectively leveraged for result refinement. In addition, the proposed method achieves the detection and recognition at the character level, which enables dealing with special scenarios such as single character, text along arbitrary orientations or text along curves. We perform experiments on the benchmark ICDAR 2011 Robust Reading Competition dataset which includes a text localization task and a word recognition task. The quantitative results demonstrate that multiple hypotheses outperform a single hypothesis, and be comparable with state-of-the-art methods in terms of recall, precision, F-measure, character recognition rate, total edit distance and word recognition rate. Moreover, two additional experiments are conducted to confirm the simplicity of parameter setting in this proposal.

  • Indoor Scene Classification Based on the Bag-of-Words Model of Local Feature Information Gain

    Rong WANG  Zhiliang WANG  Xirong MA  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E96-D No:4
      Page(s):
    984-987

    For the problem of Indoor Home Scene Classification, this paper proposes the BOW Model of Local Feature Information Gain. The experimental results show that not only the performance is improved but also the computation is reduced. Consequently this method out performs the state-of-the-art approach.

  • Robust Scene Categorization via Scale-Rotation Invariant Generative Model and Kernel Sparse Representation Classification

    Jinjun KUANG  Yi CHAI  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E96-D No:3
      Page(s):
    758-761

    This paper presents a novel scale-rotation invariant generative model (SRIGM) and a kernel sparse representation classification (KSRC) method for scene categorization. Recently the sparse representation classification (SRC) methods have been highly successful in a number of image processing tasks. Despite its popularity, the SRC framework lucks the abilities to handle multi-class data with high inter-class similarity or high intra-class variation. The kernel random coordinate descent (KRCD) algorithm is proposed for 1 minimization in the kernel space under the KSRC framework. It allows the proposed method to obtain satisfactory classification accuracy when inter-class similarity is high. The training samples are partitioned in multiple scales and rotated in different resolutions to create a generative model that is invariant to scale and rotation changes. This model enables the KSRC framework to overcome the high intra-class variation problem for scene categorization. The experimental results show the proposed method obtains more stable performances than other existing state-of-art scene categorization methods.

  • Efficiently Finding Individuals from Video Dataset

    Pengyi HAO  Sei-ichiro KAMATA  

     
    PAPER-Video Processing

      Vol:
    E95-D No:5
      Page(s):
    1280-1287

    We are interested in retrieving video shots or videos containing particular people from a video dataset. Owing to the large variations in pose, illumination conditions, occlusions, hairstyles and facial expressions, face tracks have recently been researched in the fields of face recognition, face retrieval and name labeling from videos. However, when the number of face tracks is very large, conventional methods, which match all or some pairs of faces in face tracks, will not be effective. Therefore, in this paper, an efficient method for finding a given person from a video dataset is presented. In our study, in according to performing research on face tracks in a single video, we also consider how to organize all the faces in videos in a dataset and how to improve the search quality in the query process. Different videos may include the same person; thus, the management of individuals in different videos will be useful for their retrieval. The proposed method includes the following three points. (i) Face tracks of the same person appearing for a period in each video are first connected on the basis of scene information with a time constriction, then all the people in one video are organized by a proposed hierarchical clustering method. (ii) After obtaining the organizational structure of all the people in one video, the people are organized into an upper layer by affinity propagation. (iii) Finally, in the process of querying, a remeasuring method based on the index structure of videos is performed to improve the retrieval accuracy. We also build a video dataset that contains six types of videos: films, TV shows, educational videos, interviews, press conferences and domestic activities. The formation of face tracks in the six types of videos is first researched, then experiments are performed on this video dataset containing more than 1 million faces and 218,786 face tracks. The results show that the proposed approach has high search quality and a short search time.

  • Image Categorization Using Scene-Context Scale Based on Random Forests

    Yousun KANG  Hiroshi NAGAHASHI  Akihiro SUGIMOTO  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E94-D No:9
      Page(s):
    1809-1816

    Scene-context plays an important role in scene analysis and object recognition. Among various sources of scene-context, we focus on scene-context scale, which means the effective scale of local context to classify an image pixel in a scene. This paper presents random forests based image categorization using the scene-context scale. The proposed method uses random forests, which are ensembles of randomized decision trees. Since the random forests are extremely fast in both training and testing, it is possible to perform classification, clustering and regression in real time. We train multi-scale texton forests which efficiently provide both a hierarchical clustering into semantic textons and local classification in various scale levels. The scene-context scale can be estimated by the entropy of the leaf node in the multi-scale texton forests. For image categorization, we combine the classified category distributions in each scale and the estimated scene-context scale. We evaluate on the MSRC21 segmentation dataset and find that the use of the scene-context scale improves image categorization performance. Our results have outperformed the state-of-the-art in image categorization accuracy.

  • Global Selection vs Local Ordering of Color SIFT Independent Components for Object/Scene Classification

    Dan-ni AI  Xian-hua HAN  Guifang DUAN  Xiang RUAN  Yen-wei CHEN  

     
    PAPER-Pattern Recognition

      Vol:
    E94-D No:9
      Page(s):
    1800-1808

    This paper addresses the problem of ordering the color SIFT descriptors in the independent component analysis for image classification. Component ordering is of great importance for image classification, since it is the foundation of feature selection. To select distinctive and compact independent components (IC) of the color SIFT descriptors, we propose two ordering approaches based on local variation, named as the localization-based IC ordering and the sparseness-based IC ordering. We evaluate the performance of proposed methods, the conventional IC selection method (global variation based components selection) and original color SIFT descriptors on object and scene databases, and obtain the following two main results. First, the proposed methods are able to obtain acceptable classification results in comparison with original color SIFT descriptors. Second, the highest classification rate can be obtained by using the global selection method in the scene database, while the local ordering methods give the best performance for the object database.

  • Scene Categorization with Classified Codebook Model

    Xu YANG  De XU  Songhe FENG  Yingjun TANG  Shuoyan LIU  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E94-D No:6
      Page(s):
    1349-1352

    This paper presents an efficient yet powerful codebook model, named classified codebook model, to categorize natural scene category. The current codebook model typically resorts to large codebook to obtain higher performance for scene categorization, which severely limits the practical applicability of the model. Our model formulates the codebook model with the theory of vector quantization, and thus uses the famous technique of classified vector quantization for scene-category modeling. The significant feature in our model is that it is beneficial for scene categorization, especially at small codebook size, while saving much computation complexity for quantization. We evaluate the proposed model on a well-known challenging scene dataset: 15 Natural Scenes. The experiments have demonstrated that our model can decrease the computation time for codebook generation. What is more, our model can get better performance for scene categorization, and the gain of performance becomes more pronounced at small codebook size.

  • Multi-Scale Multi-Level Generative Model in Scene Classification

    Wenjie XIE  De XU  Yingjun TANG  Geng CUI  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E94-D No:1
      Page(s):
    167-170

    Previous works show that the probabilistic Latent Semantic Analysis (pLSA) model is one of the best generative models for scene categorization and can obtain an acceptable classification accuracy. However, this method uses a certain number of topics to construct the final image representation. In such a way, it restricts the image description to one level of visual detail and cannot generate a higher accuracy rate. In order to solve this problem, we propose a novel generative model, which is referred to as multi-scale multi-level probabilistic Latent Semantic Analysis model (msml-pLSA). This method consists of two parts: multi-scale part, which extracts visual details from the image of diverse resolutions, and multi-level part, which concentrates multiple levels of topic representation to model scene. The msml-pLSA model allows for the description of fine and coarse local image detail in one framework. The proposed method is evaluated on the well-known scene classification dataset with 15 scene categories, and experimental results show that the proposed msml-pLSA model can improve the classification accuracy compared with the typical classification methods.

  • Position-Invariant Robust Features for Long-Term Recognition of Dynamic Outdoor Scenes

    Aram KAWEWONG  Sirinart TANGRUAMSUB  Osamu HASEGAWA  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E93-D No:9
      Page(s):
    2587-2601

    A novel Position-Invariant Robust Feature, designated as PIRF, is presented to address the problem of highly dynamic scene recognition. The PIRF is obtained by identifying existing local features (i.e. SIFT) that have a wide baseline visibility within a place (one place contains more than one sequential images). These wide-baseline visible features are then represented as a single PIRF, which is computed as an average of all descriptors associated with the PIRF. Particularly, PIRFs are robust against highly dynamical changes in scene: a single PIRF can be matched correctly against many features from many dynamical images. This paper also describes an approach to using these features for scene recognition. Recognition proceeds by matching an individual PIRF to a set of features from test images, with subsequent majority voting to identify a place with the highest matched PIRF. The PIRF system is trained and tested on 2000+ outdoor omnidirectional images and on COLD datasets. Despite its simplicity, PIRF offers a markedly better rate of recognition for dynamic outdoor scenes (ca. 90%) than the use of other features. Additionally, a robot navigation system based on PIRF (PIRF-Nav) can outperform other incremental topological mapping methods in terms of time (70% less) and memory. The number of PIRFs can be reduced further to reduce the time while retaining high accuracy, which makes it suitable for long-term recognition and localization.

  • Color Independent Components Based SIFT Descriptors for Object/Scene Classification

    Dan-ni AI  Xian-hua HAN  Xiang RUAN  Yen-wei CHEN  

     
    PAPER-Pattern Recognition

      Vol:
    E93-D No:9
      Page(s):
    2577-2586

    In this paper, we present a novel color independent components based SIFT descriptor (termed CIC-SIFT) for object/scene classification. We first learn an efficient color transformation matrix based on independent component analysis (ICA), which is adaptive to each category in a database. The ICA-based color transformation can enhance contrast between the objects and the background in an image. Then we compute CIC-SIFT descriptors over all three transformed color independent components. Since the ICA-based color transformation can boost the objects and suppress the background, the proposed CIC-SIFT can extract more effective and discriminative local features for object/scene classification. The comparison is performed among seven SIFT descriptors, and the experimental classification results show that our proposed CIC-SIFT is superior to other conventional SIFT descriptors.

  • Discriminating Semantic Visual Words for Scene Classification

    Shuoyan LIU  De XU  Songhe FENG  

     
    PAPER-Pattern Recognition

      Vol:
    E93-D No:6
      Page(s):
    1580-1588

    Bag-of-Visual-Words representation has recently become popular for scene classification. However, learning the visual words in an unsupervised manner suffers from the problem when faced these patches with similar appearances corresponding to distinct semantic concepts. This paper proposes a novel supervised learning framework, which aims at taking full advantage of label information to address the problem. Specifically, the Gaussian Mixture Modeling (GMM) is firstly applied to obtain "semantic interpretation" of patches using scene labels. Each scene induces a probability density on the low-level visual features space, and patches are represented as vectors of posterior scene semantic concepts probabilities. And then the Information Bottleneck (IB) algorithm is introduce to cluster the patches into "visual words" via a supervised manner, from the perspective of semantic interpretations. Such operation can maximize the semantic information of the visual words. Once obtained the visual words, the appearing frequency of the corresponding visual words in a given image forms a histogram, which can be subsequently used in the scene categorization task via the Support Vector Machine (SVM) classifier. Experiments on a challenging dataset show that the proposed visual words better perform scene classification task than most existing methods.

  • Natural Scene Classification Based on Integrated Topic Simplex

    Tang YINGJUN  Xu DE  Yang XU  Liu QIFANG  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E92-D No:9
      Page(s):
    1811-1814

    We present a novel model named Integrated Latent Topic Model (ILTM), to learn and recognize natural scene category. Unlike previous work, which considered the discrepancy and common property separately among all categories, Our approach combines universal topics from all categories with specific topics from each category. As a result, the model is implemented to produce a few but specific topics and more generic topics among categories, and each category is represented in a different topics simplex, which correlates well with human scene understanding. We investigate the classification performance with variable scene category tasks. The experiments have shown our model outperforms latent-space methods with less training data.

21-40hit(66hit)