The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] scene(66hit)

41-60hit(66hit)

  • An Accurate Scene Segmentation Method Based on Graph Analysis Using Object Matching and Audio Feature

    Makoto YAMAMOTO  Miki HASEYAMA  

     
    PAPER-Speech/Audio

      Vol:
    E92-A No:8
      Page(s):
    1883-1891

    A method for accurate scene segmentation using two kinds of directed graph obtained by object matching and audio features is proposed. Generally, in audiovisual materials, such as broadcast programs and movies, there are repeated appearances of similar shots that include frames of the same background, object or place, and such shots are included in a single scene. Many scene segmentation methods based on this idea have been proposed; however, since they use color information as visual features, they cannot provide accurate scene segmentation results if the color features change in different shots for which frames include the same object due to camera operations such as zooming and panning. In order to solve this problem, scene segmentation by the proposed method is realized by using two novel approaches. In the first approach, object matching is performed between two frames that are each included in different shots. By using these matching results, repeated appearances of shots for which frames include the same object can be successfully found and represented as a directed graph. The proposed method also generates another directed graph that represents the repeated appearances of shots with similar audio features in the second approach. By combined use of these two directed graphs, degradation of scene segmentation accuracy, which results from using only one kind of graph, can be avoided in the proposed method and thereby accurate scene segmentation can be realized. Experimental results performed by applying the proposed method to actual broadcast programs are shown to verify the effectiveness of the proposed method.

  • Category Constrained Learning Model for Scene Classification

    Yingjun TANG  De XU  Guanghua GU  Shuoyan LIU  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E92-D No:2
      Page(s):
    357-360

    We present a novel model, named Category Constraint-Latent Dirichlet Allocation (CC-LDA), to learn and recognize natural scene category. Previous work had to resort to additional classifier after obtaining image topic representation. Our model puts the category information in topic inference, so every category is represented in a different topics simplex and topic size, which is consistent with human cognitive habit. The significant feature in our model is that it can do discrimination without combined additional classifier, during the same time of getting topic representation. We investigate the classification performance with variable scene category tasks. The experiments have demonstrated that our learning model can get better performance with less training data.

  • HMM-Based Mask Estimation for a Speech Recognition Front-End Using Computational Auditory Scene Analysis

    Ji Hun PARK  Jae Sam YOON  Hong Kook KIM  

     
    LETTER-Speech and Hearing

      Vol:
    E91-D No:9
      Page(s):
    2360-2364

    In this paper, we propose a new mask estimation method for the computational auditory scene analysis (CASA) of speech using two microphones. The proposed method is based on a hidden Markov model (HMM) in order to incorporate an observation that the mask information should be correlated over contiguous analysis frames. In other words, HMM is used to estimate the mask information represented as the interaural time difference (ITD) and the interaural level difference (ILD) of two channel signals, and the estimated mask information is finally employed in the separation of desired speech from noisy speech. To show the effectiveness of the proposed mask estimation, we then compare the performance of the proposed method with that of a Gaussian kernel-based estimation method in terms of the performance of speech recognition. As a result, the proposed HMM-based mask estimation method provided an average word error rate reduction of 61.4% when compared with the Gaussian kernel-based mask estimation method.

  • Adaptively Combining Local with Global Information for Natural Scenes Categorization

    Shuoyan LIU  De XU  Xu YANG  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E91-D No:7
      Page(s):
    2087-2090

    This paper proposes the Extended Bag-of-Visterms (EBOV) to represent semantic scenes. In previous methods, most representations are bag-of-visterms (BOV), where visterms referred to the quantized local texture information. Our new representation is built by introducing global texture information to extend standard bag-of-visterms. In particular we apply the adaptive weight to fuse the local and global information together in order to provide a better visterm representation. Given these representations, scene classification can be performed by pLSA (probabilistic Latent Semantic Analysis) model. The experiment results show that the appropriate use of global information improves the performance of scene classification, as compared with BOV representation that only takes the local information into account.

  • Hierarchical Decomposition of Depth Map Sequences for Representation of Three-Dimensional Dynamic Scenes

    Sung-Yeol KIM  Yo-Sung HO  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E90-D No:11
      Page(s):
    1813-1820

    In this paper, we propose a new scheme to represent three-dimensional (3-D) dynamic scenes using a hierarchical decomposition of depth maps. In the hierarchical decomposition, we split a depth map into four types of images: regular mesh, boundary, feature point and number-of-layer (NOL) images. A regular mesh image is obtained by down-sampling a depth map. A boundary image is generated by gathering pixels of the depth map on the region of edges. For generating feature point images, we select pixels of the depth map on the region of no edges according to their influence on the shape of a 3-D surface, and convert the selected pixels into images. A NOL image includes structural information to manage the other three images. In order to render a frame of 3-D dynamic scenes, we first generate an initial surface utilizing the information of regular mesh, boundary and NOL images. Then, we enhance the initial surface by adding the depth information of feature point images. With the proposed scheme, we can represent consecutive 3-D scenes successfully within the framework of a multi-layer structure. Furthermore, we can compress the data of 3-D dynamic scenes represented by a mesh structure by a 2-D video coder.

  • A High-Performance Architecture of Motion Adaptive De-interlacing with Reliable Interfield Information

    Chung-chi LIN  Ming-hwa SHEU  Huann-keng CHIANG  Chih-Jen WEI  Chishyan LIAW  

     
    PAPER-Image

      Vol:
    E90-A No:11
      Page(s):
    2575-2583

    Scene changes occur frequently in film broadcasting, and tend to destabilize the performance with blurred, jagged, and artifacts effects when de-interlacing methods are utilized. This paper presents an efficient VLSI architecture of video de-interlacing with considering scene change to improve the quality of video results. This de-interlacing architecture contains three main parts. The first is scene change detection, which is designed based on examining the absolute pixel difference value of two adjacent even or odd fields. The second is background index mechanism for classifying motion and non-motion pixels of input field. The third component, spatial-temporal edge-based median filter, is used to deal with the interpolation for those motion pixels. Comparing with the existed de-interlacing approaches, our architecture design can significantly ameliorate the PSNRs of the video sequences with various scene changes; for other situations, it also maintains better performances. The proposed architecture has been implemented as a VLSI chip based on UMC 0.18-µm CMOS technology process. The total gate count is 30114 and its layout area is about 710 710-µm. The power consumption is 39.78 mW at working frequency 128.2 MHz, which is able to process de-interlacing for HDTV in real-time.

  • Fabrication of Microchannel with Thin Cover Layer for an Optical Waveguide MEMS Switch Based on Microfluidics

    Takuji IKEMOTO  Yasuo KOKUBUN  

     
    PAPER-Micro/Nano Photonic Devices

      Vol:
    E90-C No:1
      Page(s):
    78-86

    We propose and demonstrate a new fabrication process of a microchannel using the Damascene process. This process aims to integrate photonic circuits with microchannels fabricated in a glass film. The microchannel is fabricated by the removal of the sacrificial layer after a sacrificial layer is formed by the Damascene process and the cover is formed by sputter deposition. A thin cover layer can be formed by the sacrificial method, because the cover layer is supported by the sacrificial layer during film formation. The cover layer is hermetically sealed, since it is formed by radio frequency (RF) sputtering deposition. The thickness is 1 µm and the width ranges from 3.5 to 8 µm. Using the proposed microchannel fabrication method, we prepared a microelectromechanical system (MEMS) optical switch using microfluidics, and we confirmed its functional operation. This optical switch actuates a minute droplet of liquid injected into the microchannel using Maxwell's stresses. Light propagates straight through the waveguide so that the light passes through the microchannel when the droplet is in the microchannel, but the light rays are completely reflected into a crossed waveguide when the droplet is not in the microchannel. Since this fabrication method uses techniques common to those in the formation of copper wiring in an IC chip, it can be used in the microchannel process.

  • Chip-Level Performance Improvement Using Triple Damascene Wiring Design Concept for the 0.13 µm CMOS Generation and Beyond

    Noriaki ODA  Hiroyuki KUNISHIMA  Takashi KYOUNO  Kazuhiro TAKEDA  Tomoaki TANAKA  Toshiyuki TAKEWAKI  Masahiro IKEDA  

     
    PAPER

      Vol:
    E89-C No:11
      Page(s):
    1544-1550

    A novel wiring design concept called "Triple Damascene" is presented. We propose a new technology to mix wirings with different thickness in one layer by using dual damascene process without increasing mask steps. In this technology, three types of grooves are opened simultaneously. Deep trenches for thick wires, as well as vias and shallow trenches, are selectively opened. By the design concept using this technology, a 30% reduction in wiring delay is obtained for critical path. A 5% reduction in chip size is also obtained as the effect of decrease in repeater number for a typical high-performance multi-processing unit (MPU) in 0.13 µm generation. An example for performance enhancement in an actual product of graphic MPU chip is also demonstrated.

  • Robust Scene Extraction Using Multi-Stream HMMs for Baseball Broadcast

    Nguyen Huu BACH  Koichi SHINODA  Sadaoki FURUI  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E89-D No:9
      Page(s):
    2553-2561

    In this paper, we propose a robust statistical framework for extracting scenes from a baseball broadcast video. We apply multi-stream hidden Markov models (HMMs) to control the weights among different features. To achieve a large robustness against new scenes, we used a common simple structure for all the HMMs. In addition, scene segmentation and unsupervised adaptation were applied to achieve greater robustness against differences in environmental conditions among games. The F-measure of scene-extracting experiments for eight types of scene from 4.5 hours of digest data was 77.4% and was increased to 78.7% by applying scene segmentation. Furthermore, the unsupervised adaptation method improved precision by 2.7 points to 81.4%. These results confirm the effectiveness of our framework.

  • Registration of Partial 3D Point Clouds Acquired from a Multi-view Camera for Indoor Scene Reconstruction

    Sehwan KIM  Woontack WOO  

     
    PAPER

      Vol:
    E89-D No:1
      Page(s):
    62-72

    In this paper, a novel projection-based method is presented to register partial 3D point clouds, acquired from a multi-view camera, for 3D reconstruction of an indoor scene. In general, conventional registration methods for partial 3D point clouds require a high computational complexity and much time for registration. Moreover, these methods are not robust for 3D point cloud which has a low precision. To overcome these drawbacks, a projection-based registration method is proposed. Firstly, depth images are refined based on both temporal and spatial properties. The former involves excluding 3D points with large variation, and the latter fills up holes referring to four neighboring 3D points, respectively. Secondly, 3D point clouds acquired from two views are projected onto the same image plane, and two-step integer mapping is applied to search for correspondences through the modified KLT. Then, fine registration is carried out by minimizing distance errors based on adaptive search range. Finally, we calculate a final color referring to the colors of corresponding points and reconstruct an indoor scene by applying the above procedure to consecutive scenes. The proposed method not only reduces computational complexity by searching for correspondences on a 2D image plane, but also enables effective registration even for 3D points which have a low precision. Furthermore, only a few color and depth images are needed to reconstruct an indoor scene. The generated model can be adopted for interaction with as well as navigation in a virtual environment.

  • On the Polynomial Time Computability of Abstract Ray-Tracing Problems

    Shuji ISOBE  Tetsuo KURIYAMA  Masahiro MAMBO  Hiroki SHIZUYA  

     
    PAPER

      Vol:
    E88-A No:5
      Page(s):
    1209-1213

    The abstract ray-tracing problem asks, for a given scene consisting of a light source, a light receiver and finitely many obstacles in a space, and a given positive integer ε > 0, whether a ray going out from the light source could reach the light receiver with intensity at least ε. The problem is known to be PSPACE-hard, and it is very unlikely that there exists an efficient algorithm to solve the problem without adding any restriction. In this paper, we show that the problem can be solved in polynomial time under some weak practical restrictions.

  • Scene-Adaptive Frame-Layer Rate Control for Low Bit Rate Video

    Jae-Young PYUN  Yoon KIM  Sung-Jea KO  HwangJun SONG  

     
    LETTER-Source Coding/Image Processing

      Vol:
    E86-A No:10
      Page(s):
    2618-2622

    Rate control regulates the coded bit stream to satisfy certain given bit rate condition while maintaining the quality of coded video. However, most existing rate control algorithms for low bit rate video can not handle scene change properly, so visual quality is consequently worsened. The test model TMN8 of H.263+ can be forced to skip frames after an abrupt scene change. In this letter, we propose a new frame-layer rate control which allocates bits to frames and controls the frame skipping adaptively based on the pre-analysis of future frames. Experimental results show that the proposed control method provides an effective alternative to existing frame skipping methods causing the motion jerkiness and quality degradation.

  • Polyhedral Description of Panoramic Range Data by Stable Plane Extraction

    Caihua WANG  Hideki TANAHASHI  Hidekazu HIRAYU  Yoshinori NIWA  Kazuhiko YAMAMOTO  

     
    PAPER-Image Processing, Image Pattern Recognition

      Vol:
    E85-D No:9
      Page(s):
    1399-1408

    In this paper, we describe a novel technique to extract a polyhedral description from panoramic range data of a scene taken by a panoramic laser range finder. First, we introduce a reasonable noise model of the range data acquired with a laser radar range finder, and derive a simple and efficient approximate solution of the optimal fitting of a local plane in the range data under the assumed noise model. Then, we compute the local surface normals using the proposed method and extract stable planar regions from the range data by using both the distribution information of local surface normals and their spatial information in the range image. Finally, we describe a method which builds a polyhedral description of the scene using the extracted stable planar regions of the panoramic range data with 360 field of view in a polar coordinate system. Experimental results on complex real range data show the effectiveness of the proposed method.

  • A Probabilistic Approach to Plane Extraction and Polyhedral Approximation of Range Data

    Caihua WANG  Hideki TANAHASHI  Hidekazu HIRAYU  Yoshinori NIWA  Kazuhiko YAMAMOTO  

     
    PAPER-Image Processing, Image Pattern Recognition

      Vol:
    E85-D No:2
      Page(s):
    402-410

    In this paper, we propose a probabilistic approach to derive an approximate polyhedral description from range data. We first compare several least-squares-based methods for estimation of local normal vectors and select the most robust one based on a reasonable noise model of the range data. Second, we extract the stable planar regions from the range data by examining the distributions of the local normal vectors together with their spatial information in the 2D range image. Instead of segmenting the range data completely, we use only the geometries of the extracted stable planar regions to derive a polyhedral description of the range data. The curved surfaces in the range data are approximated by their extracted plane patches. With a probabilistic approach, the proposed method can be expected to be robust against the noise. Experimental results on real range data from different sources show the effectiveness of the proposed method.

  • A Multi-Resolution Image Understanding System Based on Multi-Agent Architecture for High-Resolution Images

    Keiji YANAI  Koichiro DEGUCHI  

     
    PAPER

      Vol:
    E84-D No:12
      Page(s):
    1642-1650

    Recently a high-resolution image that has more than one million pixels is available easily. However, such an image requires much processing time and memory for an image understanding system. In this paper, we propose an integrated image understanding system of multi-resolution analysis and multi-agent-based architecture for high-resolution images. The system we propose in this paper has capability to treat with a high-resolution image effectively without much extra cost. We implemented an experimental system for images of indoor scenes.

  • Coordinate Transformation by Nearest Neighbor Interpolation for ISAR Fixed Scene Imaging

    Koichi SASAKI  Masaru SHIMIZU  Yasuo WATANABE  

     
    PAPER

      Vol:
    E84-C No:12
      Page(s):
    1905-1909

    The reflection signal in the inverse synthetic aperture radar is measured in the polar coordinate defined by the object rotation angle and the frequency. The reconstruction of fixed scene images requires the coordinate transformation of the polar format data into the rectangular spatial frequency domain, which is then processed by the inverse Fourier transform. In this paper a fast and flexible method of coordinate transformation based on the nearest neighbor interpolation utilizing the Delauney triangulation is at first presented. Then, the induced errors in the transformed rectangular spatial frequency data and the resultant fixed scene images are investigated by simulation under the uniform plane wave transmit-receive mode over the swept frequency 120-160 GHz, and the results which demonstrate the validity of the current coordinate transformation are presented.

  • A Cumulative Distribution Function of Edge Direction for Road-Lane Detection

    Joon-Woong LEE  Un-Kun YI  Kwang-Ryul BAEK  

     
    PAPER-Pattern Recognition

      Vol:
    E84-D No:9
      Page(s):
    1206-1216

    This paper describes a cumulative distribution function (CDF) of edge direction for detecting road lanes. Based on the assumptions that there are no abrupt changes in the direction and location of road lanes and that the intensity of lane boundaries differs from that of the background, the CDF is formulated, which accumulates the edge magnitude for edge directions. The CDF has distinctive peak points at the vicinity of lane directions due to the directional and the positional continuities of a lane. To obtain lane-related information, we construct a scatter diagram by collecting edge pixels, of which the direction corresponds to the peak point of the CDF, then perform the principal axis-based line fitting for the scatter diagram. Because noises can cause many similar features appear or disappear in an image, to prevent false alarms or miss detection, a recursive estimator of the CDF was introduced, and also a scene understanding index (SUI) was formulated by the statistical parameters of the CDF. The proposed algorithm has been implemented in real time on video data obtained from a test vehicle driven on a typical highway.

  • Motion Estimation and Compensation Hardware Architecture for a Scene-Adaptive Algorithm on a Single-Chip MPEG-2 Video Encoder

    Koyo NITTA  Toshihiro MINAMI  Toshio KONDO  Takeshi OGURA  

     
    PAPER-VLSI Systems

      Vol:
    E84-D No:3
      Page(s):
    317-325

    This paper describes a unique motion estimation and compensation (ME/MC) hardware architecture for a scene-adaptive algorithm. By statistically analyzing the characteristics of the scene being encoded and controlling the encoding parameters according to the scene, the quality of the decoded image can be enhanced. The most significant feature of the architecture is that the two modules for ME/MC can work independently. Since a time interval can be inserted between the operations of the two modules, a scene-adaptive algorithm can be implemented in the architecture. The ME/MC architecture is loaded on a single-chip MPEG-2 video encoder.

  • Visualized Sound Retrieval and Categorization Using a Feature-Based Image Search Engine

    Katsunobu FUSHIKIDA  Yoshitsugu HIWATARI  Hideyo WAKI  

     
    PAPER-Multimedia Pattern Processing

      Vol:
    E83-D No:11
      Page(s):
    1978-1985

    In this paper, visualized sound retrieval and categorization methods using a feature-based image search engine were evaluated aiming at efficient video scene query. Color-coded patterns of the sound spectrogram are adopted as the visualized sound index. Sound categorization experiments were conducted using visualized sound databases including speech, bird song, musical sounds, insect chirping, and the sound-track of sports video. The results of the retrieval experiments show that the simple feature-based image search engine can be effectively used for visualized sound retrieval and categorization. The results of categorization experiments involving humans show that after brief training humans can at least do rough categorization. These results suggest that using visualized sound can be effective method for an efficient video scene query.

  • Modeling of Urban Scenes by Aerial Photographs and Simply Reconstructed Buildings

    Katsuyuki KAMEI  Wayne HOY  Takashi TAMADA  Kazuo SEO  

     
    PAPER

      Vol:
    E83-D No:7
      Page(s):
    1441-1449

    In many fields such as city administration and facilities management, there are an increasing number of requests for a Geographic Information System (GIS) that provides users with automated mapping functions. A mechanism which displays 3D views of an urban scene is particularly required because it would allow the construction of an intuitive and understandable environment for managing objects in the scene. In this paper, we present a new urban modeling system utilizing both image-based and geometry-based approaches. Our method is based on a new concept in which a wide urban area can be displayed with natural photo-realistic images, and each object drawn in the view can be identified by pointing to it. First, to generate natural urban views from any viewpoint, we employ an image-based rendering method, Image Walkthrough, and modify it to handle aerial images. This method can interpolate and generate natural views by assembling several source photographs. Next, to identify each object in the scene, we recover its shape using computer vision techniques (a geometry-based approach). The rough shape of each building is reconstructed from various aerial images, and then its drawn position on the generated view is also determined. This means that it becomes possible to identify each building from an urban view. We have combined both of these approaches yielding a new style of urban information management. The users of the system can enjoy an intuitive understanding of the area and easily identify their target, by generating natural views from any viewpoint and suitably reconstructing the shapes of objects. We have made a prototype system of this new concept of GIS, which have shown the validity of our method.

41-60hit(66hit)