IEICE global.ieice.org Site

Keyword Search Result

[Keyword] scene(66hit)

1-20hit(66hit)

Enhanced Data Transfer Cooperating with Artificial Triplets for Scene Graph Generation Open Access
KuanChao CHU Satoshi YAMAZAKI Hideki NAKAYAMA

PAPER-Image Recognition, Computer Vision

Pubricized:
2024/04/30
Vol:
E107-D No:9
Page(s):
1239-1252
This work focuses on training dataset enhancement of informative relational triplets for Scene Graph Generation (SGG). Due to the lack of effective supervision, the current SGG model predictions perform poorly for informative relational triplets with inadequate training samples. Therefore, we propose two novel training dataset enhancement modules: Feature Space Triplet Augmentation (FSTA) and Soft Transfer. FSTA leverages a feature generator trained to generate representations of an object in relational triplets. The biased prediction based sampling in FSTA efficiently augments artificial triplets focusing on the challenging ones. In addition, we introduce Soft Transfer, which assigns soft predicate labels to general relational triplets to make more supervisions for informative predicate classes effectively. Experimental results show that integrating FSTA and Soft Transfer achieve high levels of both Recall and mean Recall in Visual Genome dataset. The mean of Recall and mean Recall is the highest among all the existing model-agnostic methods.
Dual-Path Convolutional Neural Network Based on Band Interaction Block for Acoustic Scene Classification Open Access
Pengxu JIANG Yang YANG Yue XIE Cairong ZOU Qingyun WANG

LETTER-Engineering Acoustics

Pubricized:
2023/10/04
Vol:
E107-A No:7
Page(s):
1040-1044
Convolutional neural network (CNN) is widely used in acoustic scene classification (ASC) tasks. In most cases, local convolution is utilized to gather time-frequency information between spectrum nodes. It is challenging to adequately express the non-local link between frequency domains in a finite convolution region. In this paper, we propose a dual-path convolutional neural network based on band interaction block (DCNN-bi) for ASC, with mel-spectrogram as the model’s input. We build two parallel CNN paths to learn the high-frequency and low-frequency components of the input feature. Additionally, we have created three band interaction blocks (bi-blocks) to explore the pertinent nodes between various frequency bands, which are connected between two paths. Combining the time-frequency information from two paths, the bi-blocks with three distinct designs acquire non-local information and send it back to the respective paths. The experimental results indicate that the utilization of the bi-block has the potential to improve the initial performance of the CNN substantially. Specifically, when applied to the DCASE 2018 and DCASE 2020 datasets, the CNN exhibited performance improvements of 1.79% and 3.06%, respectively.
VTD-FCENet: A Real-Time HD Video Text Detection with Scale-Aware Fourier Contour Embedding Open Access
Wocheng XIAO Lingyu LIANG Jianyong CHEN Tao WANG

LETTER-Image Recognition, Computer Vision

Pubricized:
2023/12/07
Vol:
E107-D No:4
Page(s):
574-578
Video text detection (VTD) aims to localize text instances in videos, which has wide applications for downstream tasks. To deal with the variances of different scenes and text instances, multiple models and feature fusion strategies were typically integrated in existing VTD methods. A VTD method consisting of sophisticated components can efficiently improve detection accuracy, but may suffer from a limitation for real-time applications. This paper aims to achieve real-time VTD with an adaptive lightweight end-to-end framework. Different from previous methods that represent text in a spatial domain, we model text instances in the Fourier domain. Specifically, we propose a scale-aware Fourier Contour Embedding method, which not only models arbitrary shaped text contours of videos as compact signatures, but also adaptively select proper scales for features in a backbone in the training stage. Then, we construct VTD-FCENet to achieve real-time VTD, which encodes temporal correlations of adjacent frames with scale-aware FCE in a lightweight and adaptive manner. Quantitative evaluations were conducted on ICDAR2013 Video, Minetto and YVT benchmark datasets, and the results show that our VTD-FCENet not only obtains the state-of-the-arts or competitive detection accuracy, but also allows real-time text detection on HD videos simultaneously.
Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology
Wenkai LIU Lin ZHANG Menglong WU Xichang CAI Hongxia DONG

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2023/10/23
Vol:
E107-D No:1
Page(s):
83-92
The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.
An Integrated Convolutional Neural Network with a Fusion Attention Mechanism for Acoustic Scene Classification
Pengxu JIANG Yue XIE Cairong ZOU Li ZHAO Qingyun WANG

LETTER-Engineering Acoustics

Pubricized:
2023/02/06
Vol:
E106-A No:8
Page(s):
1057-1061
In human-computer interaction, acoustic scene classification (ASC) is one of the relevant research domains. In real life, the recorded audio may include a lot of noise and quiet clips, making it hard for earlier ASC-based research to isolate the crucial scene information in sound. Furthermore, scene information may be scattered across numerous audio frames; hence, selecting scene-related frames is crucial for ASC. In this context, an integrated convolutional neural network with a fusion attention mechanism (ICNN-FA) is proposed for ASC. Firstly, segmented mel-spectrograms as the input of ICNN can assist the model in learning the short-term time-frequency correlation information. Then, the designed ICNN model is employed to learn these segment-level features. In addition, the proposed global attention layer may gather global information by integrating these segment features. Finally, the developed fusion attention layer is utilized to fuse all segment-level features while the classifier classifies various situations. Experimental findings using ASC datasets from DCASE 2018 and 2019 indicate the efficacy of the suggested method.
Spatial-Temporal Aggregated Shuffle Attention for Video Instance Segmentation of Traffic Scene
Chongren ZHAO Yinhui ZHANG Zifen HE Yunnan DENG Ying HUANG Guangchen CHEN

PAPER-Image Processing and Video Processing

Pubricized:
2022/11/24
Vol:
E106-D No:2
Page(s):
240-251
Aiming at the problem of spatial focus regions distribution dispersion and dislocation in feature pyramid networks and insufficient feature dependency acquisition in both spatial and channel dimensions, this paper proposes a spatial-temporal aggregated shuffle attention for video instance segmentation (STASA-VIS). First, an mixed subsampling (MS) module to embed activating features from the low-level target area of feature pyramid into the high-level is designed, so as to aggregate spatial information on target area. Taking advantage of the coherent information in video frames, STASA-VIS uses the first ones of every 5 video frames as the key-frames and then propagates the keyframe feature maps of the pyramid layers forward in the time domain, and fuses with the non-keyframe mixed subsampled features to achieve time-domain consistent feature aggregation. Finally, STASA-VIS embeds shuffle attention in the backbone to capture the pixel-level pairwise relationship and dimensional dependencies among the channels and reduce the computation. Experimental results show that the segmentation accuracy of STASA-VIS reaches 41.2%, and the test speed reaches 34FPS, which is better than the state-of-the-art one stage video instance segmentation (VIS) methods in accuracy and achieves real-time segmentation.
Synthetic Scene Character Generator and Ensemble Scheme with the Random Image Feature Method for Japanese and Chinese Scene Character Recognition
Fuma HORIE Hideaki GOTO Takuo SUGANUMA

PAPER-Image Recognition, Computer Vision

Pubricized:
2021/08/24
Vol:
E104-D No:11
Page(s):
2002-2010
Scene character recognition has been intensively investigated for a couple of decades because it has a great potential in many applications including automatic translation, signboard recognition, and reading assistance for the visually-impaired. However, scene characters are difficult to recognize at sufficient accuracy owing to various noise and image distortions. In addition, Japanese scene character recognition is more challenging and requires a large amount of character data for training because thousands of character classes exist in the language. Some researchers proposed training data augmentation techniques using Synthetic Scene Character Data (SSCD) to compensate for the shortage of training data. In this paper, we propose a Random Filter which is a new method for SSCD generation, and introduce an ensemble scheme with the Random Image Feature (RI-Feature) method. Since there has not been a large Japanese scene character dataset for the evaluation of the recognition systems, we have developed an open dataset JPSC1400, which consists of a large number of real Japanese scene characters. It is shown that the accuracy has been improved from 70.9% to 83.1% by introducing the RI-Feature method to the ensemble scheme.
Joint Analysis of Sound Events and Acoustic Scenes Using Multitask Learning
Noriyuki TONAMI Keisuke IMOTO Ryosuke YAMANISHI Yoichi YAMASHITA

PAPER-Speech and Hearing

Pubricized:
2020/11/19
Vol:
E104-D No:2
Page(s):
294-301
Sound event detection (SED) and acoustic scene classification (ASC) are important research topics in environmental sound analysis. Many research groups have addressed SED and ASC using neural-network-based methods, such as the convolutional neural network (CNN), recurrent neural network (RNN), and convolutional recurrent neural network (CRNN). The conventional methods address SED and ASC separately even though sound events and acoustic scenes are closely related to each other. For example, in the acoustic scene “office,” the sound events “mouse clicking” and “keyboard typing” are likely to occur. Therefore, it is expected that information on sound events and acoustic scenes will be of mutual aid for SED and ASC. In this paper, we propose multitask learning for joint analysis of sound events and acoustic scenes, in which the parts of the networks holding information on sound events and acoustic scenes in common are shared. Experimental results obtained using the TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016 datasets indicate that the proposed method improves the performance of SED and ASC by 1.31 and 1.80 percentage points in terms of the F-score, respectively, compared with the conventional CRNN-based method.
Acceleration of Automatic Building Extraction via Color-Clustering Analysis Open Access
Masakazu IWAI Takuya FUTAGAMI Noboru HAYASAKA Takao ONOYE

LETTER-Computer Graphics

Vol:
E103-A No:12
Page(s):
1599-1602
In this paper, we improve upon the automatic building extraction method, which uses a variational inference Gaussian mixture model for performing color clustering, by accelerating its computational speed. The improved method decreases the computational time using an image with reduced resolution upon applying color clustering. According to our experiment, in which we used 106 scenery images, the improved method could extract buildings at a rate 86.54% faster than that of the conventional methods. Furthermore, the improved method significantly increased the extraction accuracy by 1.8% or more by preventing over-clustering using the reduced image, which also had a reduced number of the colors.
Graph Cepstrum: Spatial Feature Extracted from Partially Connected Microphones
Keisuke IMOTO

PAPER-Speech and Hearing

Pubricized:
2019/12/09
Vol:
E103-D No:3
Page(s):
631-638
In this paper, we propose an effective and robust method of spatial feature extraction for acoustic scene analysis utilizing partially synchronized and/or closely located distributed microphones. In the proposed method, a new cepstrum feature utilizing a graph-based basis transformation to extract spatial information from distributed microphones, while taking into account whether any pairs of microphones are synchronized and/or closely located, is introduced. Specifically, in the proposed graph-based cepstrum, the log-amplitude of a multichannel observation is converted to a feature vector utilizing the inverse graph Fourier transform, which is a method of basis transformation of a signal on a graph. Results of experiments using real environmental sounds show that the proposed graph-based cepstrum robustly extracts spatial information with consideration of the microphone connections. Moreover, the results indicate that the proposed method more robustly classifies acoustic scenes than conventional spatial features when the observed sounds have a large synchronization mismatch between partially synchronized microphone groups.
Vision Based Nighttime Vehicle Detection Using Adaptive Threshold and Multi-Class Classification
Yuta SAKAGAWA Kosuke NAKAJIMA Gosuke OHASHI

PAPER

Vol:
E102-A No:9
Page(s):
1235-1245
We propose a method that detects vehicles from in-vehicle monocular camera images captured during nighttime driving. Detecting vehicles from their shape is difficult at night; however, many vehicle detection methods focusing on light have been proposed. We detect bright spots by appropriate binarization based on the characteristics of vehicle lights such as brightness and color. Also, as the detected bright spots include lights other than vehicles, we need to distinguish the vehicle lights from other bright spots. Therefore, the bright spots were distinguished using Random Forest, a multiclass classification machine-learning algorithm. The features of bright spots not associated with vehicles were effectively utilized in the vehicle detection in our proposed method. More precisely vehicle detection is performed by giving weights to the results of the Random Forest based on the features of vehicle bright spots and the features of bright spots not related to the vehicle. Our proposed method was applied to nighttime images and confirmed effectiveness.
Recognition of Moving Object in High Dynamic Scene for Visual Prosthesis
Fei GUO Yuan YANG Yang XIAO Yong GAO Ningmei YU

PAPER-Human-computer Interaction

Pubricized:
2019/04/17
Vol:
E102-D No:7
Page(s):
1321-1331
Currently, visual perceptions generated by visual prosthesis are low resolution with unruly color and restricted grayscale. This severely restricts the ability of prosthetic implant to complete visual tasks in daily scenes. Some studies explore existing image processing techniques to improve the percepts of objects in prosthetic vision. However, most of them extract the moving objects and optimize the visual percepts in general dynamic scenes. The application of visual prosthesis in daily life scenes with high dynamic is greatly limited. Hence, in this study, a novel unsupervised moving object segmentation model is proposed to automatically extract the moving objects in high dynamic scene. In this model, foreground cues with spatiotemporal edge features and background cues with boundary-prior are exploited, the moving object proximity map are generated in dynamic scene according to the manifold ranking function. Moreover, the foreground and background cues are ranked simultaneously, and the moving objects are extracted by the two ranking maps integration. The evaluation experiment indicates that the proposed method can uniformly highlight the moving object and keep good boundaries in high dynamic scene with other methods. Based on this model, two optimization strategies are proposed to improve the perception of moving objects under simulated prosthetic vision. Experimental results demonstrate that the introduction of optimization strategies based on the moving object segmentation model can efficiently segment and enhance moving objects in high dynamic scene, and significantly improve the recognition performance of moving objects for the blind.
Bilateral Convolutional Activations Encoded with Fisher Vectors for Scene Character Recognition
Zhong ZHANG Hong WANG Shuang LIU Tariq S. DURRANI

LETTER-Image Recognition, Computer Vision

Pubricized:
2018/02/02
Vol:
E101-D No:5
Page(s):
1453-1456
A rich and robust representation for scene characters plays a significant role in automatically understanding the text in images. In this letter, we focus on the issue of feature representation, and propose a novel encoding method named bilateral convolutional activations encoded with Fisher vectors (BCA-FV) for scene character recognition. Concretely, we first extract convolutional activation descriptors from convolutional maps and then build a bilateral convolutional activation map (BCAM) to capture the relationship between the convolutional activation response and the spatial structure information. Finally, in order to obtain the global feature representation, the BCAM is injected into FV to encode convolutional activation descriptors. Hence, the BCA-FV can effectively integrate the prominent features and spatial structure information for character representation. We verify our method on two widely used databases (ICDAR2003 and Chars74K), and the experimental results demonstrate that our method achieves better results than the state-of-the-art methods. In addition, we further validate the proposed BCA-FV on the “Pan+ChiPhoto” database for Chinese scene character recognition, and the experimental results show the good generalization ability of the proposed BCA-FV.
Detecting TV Program Highlight Scenes Using Twitter Data Classified by Twitter User Behavior and Evaluating It to Soccer Game TV Programs
Tessai HAYAMA

PAPER-Datamining Technologies

Pubricized:
2018/01/19
Vol:
E101-D No:4
Page(s):
917-924
This paper presents a novel TV event detection method for automatically generating TV program digests by using Twitter data. Previous studies of TV program digest generation based on Twitter data have developed TV event detection methods that analyze the frequency time series of tweets that users made while watching a given TV program; however, in most of the previous studies, differences in how Twitter is used, e.g., sharing information versus conversing, have not been taken into consideration. Since these different types of Twitter data are lumped together into one category, it is difficult to detect highlight scenes of TV programs and correctly extract their content from the Twitter data. Therefore, this paper presents a highlight scene detection method to automatically generate TV program digests for TV programs based on Twitter data classified by Twitter user behavior. To confirm the effectiveness of the proposed method, experiments using 49 soccer game TV programs were conducted.
A Novel Discriminative Feature Extraction for Acoustic Scene Classification Using RNN Based Source Separation
Seongkyu MUN Suwon SHON Wooil KIM David K. HAN Hanseok KO

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2017/09/14
Vol:
E100-D No:12
Page(s):
3041-3044
Various types of classifiers and feature extraction methods for acoustic scene classification have been recently proposed in the IEEE Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Challenge Task 1. The results of the final evaluation, however, have shown that even top 10 ranked teams, showed extremely low accuracy performance in particular class pairs with similar sounds. Due to such sound classes being difficult to distinguish even by human ears, the conventional deep learning based feature extraction methods, as used by most DCASE participating teams, are considered facing performance limitations. To address the low performance problem in similar class pair cases, this letter proposes to employ a recurrent neural network (RNN) based source separation for each class prior to the classification step. Based on the fact that the system can effectively extract trained sound components using the RNN structure, the mid-layer of the RNN can be considered to capture discriminative information of the trained class. Therefore, this letter proposes to use this mid-layer information as novel discriminative features. The proposed feature shows an average classification rate improvement of 2.3% compared to the conventional method, which uses additional classifiers for the similar class pair issue.
Enhanced Depiction of High Dynamic Images Using Tone Mapping Operator and Chromatic Adaptation Transform
Ho-Hyoung CHOI Byoung-Ju YUN

BRIEF PAPER

Vol:
E100-C No:11
Page(s):
1031-1034
The problem of reproducing high dynamic range (HDR) images on devices with a restricted dynamic range has gained a lot of interest in the computer graphics community. Various approaches to this issue exist, spanning several research areas, including computer graphics, image processing, color vision, and physiology. However, most of the approaches to the issue have several serious well-known color distortion problems. Accordingly, this article presents a tone-mapping method. The proposed method comprises the tone-mapping operator and the chromatic adaptation transform. The tone-mapping method is combined with linear and non-linear mapping using visual gamma based on contrast sensitive function (CSF) and using key of scene value, where the visual gamma is adopted to automatically control the dynamic range, parameter free, as well as to avoid both the luminance shift and the hue shift in the displayed images. Furthermore, the key of scene value is used to represent whether the scene was subjectively light, norm, dark. The resulting image is then processed through a chromatic adaptation transform and emphasis lies in human visual perception (HVP). The experiment results show that the proposed method yields better performance of the color rendering over the conventional method in subjective and quantitative quality and color reproduction.
Scene Character Recognition Using Coupled Spatial Learning
Zhong ZHANG Hong WANG Shuang LIU Liang ZHENG

LETTER-Image Recognition, Computer Vision

Pubricized:
2017/04/17
Vol:
E100-D No:7
Page(s):
1546-1549
Feature representation, as a key component of scene character recognition, has been widely studied and a number of effective methods have been proposed. In this letter, we propose the novel method named coupled spatial learning (CSL) for scene character representation. Different from the existing methods, the proposed CSL method simultaneously discover the spatial context in both the dictionary learning and coding stages. Concretely, we propose to build the spatial dictionary by preserving the corresponding positions of the codewords. Correspondingly, we introduce the spatial coding strategy which utilizes the spatiality regularization to consider the relationship among features in the Euclidean space. Based on the spatial dictionary and spatial coding, the spatial context can be effectively integrated in the visual representations. We verify our method on two widely used databases (ICDAR2003 and Chars74k), and the experimental results demonstrate that our method achieves competitive results compared with the state-of-the-art methods. In addition, we further validate the proposed CSL method on the Caltech-101 database for image classification task, and the experimental results show the good generalization ability of the proposed CSL.
Acoustic Scene Analysis Based on Hierarchical Generative Model of Acoustic Event Sequence
Keisuke IMOTO Suehiro SHIMAUCHI

PAPER-Acoustic event detection

Pubricized:
2016/07/19
Vol:
E99-D No:10
Page(s):
2539-2549
We propose a novel method for estimating acoustic scenes such as user activities, e.g., “cooking,” “vacuuming,” “watching TV,” or situations, e.g., “being on the bus,” “being in a park,” “meeting,” utilizing the information of acoustic events. There are some methods for estimating acoustic scenes that associate a combination of acoustic events with an acoustic scene. However, the existing methods cannot adequately express acoustic scenes, e.g., “cooking,” that have more than one subordinate category, e.g., “frying ingredients” or “plating food,” because they directly associate acoustic events with acoustic scenes. In this paper, we propose an acoustic scene estimation method based on a hierarchical probabilistic generative model of an acoustic event sequence taking into account the relation among acoustic scenes, their subordinate categories, and acoustic event sequences. In the proposed model, each acoustic scene is represented as a probability distribution over their unsupervised subordinate categories, called “acoustic sub-topics,” and each acoustic sub-topic is represented as a probability distribution over acoustic events. Acoustic scene estimation experiments with real-life sounds showed that the proposed method could correctly extract subordinate categories of acoustic scenes.
LLC Revisit: Scene Classification with k-Farthest Neighbours
Katsuyuki TANAKA Tetsuya TAKIGUCHI Yasuo ARIKI

PAPER-Image Recognition, Computer Vision

Pubricized:
2016/02/08
Vol:
E99-D No:5
Page(s):
1375-1383
This paper introduces a simple but effective way to boost the performance of scene classification through a novel approach to the LLC coding process. In our proposed method, a local descriptor is encoded not only with k-nearest visual words but also with k-farthest visual words to produce more discriminative code. Since the proposed method is a simple modification of the image classification model, it can be easily integrated into various existing BoF models proposed in various areas, such as coding, pooling, to boost their scene classification performance. The results of experiments conducted with three scene datasets: 15-Scenes, MIT-Indoor67, and Sun367 show that adding k-farthest visual words better enhances scene classification performance than increasing the number of k-nearest visual words.
Nonlinear Regression of Saliency Guided Proposals for Unsupervised Segmentation of Dynamic Scenes
Yinhui ZHANG Mohamed ABDEL-MOTTALEB Zifen HE

PAPER-Image Processing and Video Processing

Pubricized:
2015/11/06
Vol:
E99-D No:2
Page(s):
467-474
This paper proposes an efficient video object segmentation approach that is tolerant to complex scene dynamics. Unlike existing approaches that rely on estimating object-like proposals on an intra-frame basis, the proposed approach employs temporally consistent foreground hypothesis using nonlinear regression of saliency guided proposals across a video sequence. For this purpose, we first generate salient foreground proposals at superpixel level by leveraging a saliency signature in the discrete cosine transform domain. We propose to use a random forest based nonlinear regression scheme to learn both appearance and shape features from salient foreground regions in all frames of a sequence. Availability of such features can help rank every foreground proposals of a sequence, and we show that the regions with high ranking scores are well correlated with semantic foreground objects in dynamic scenes. Subsequently, we utilize a Markov Random Field to integrate both appearance and motion coherence of the top-ranked object proposals. A temporal nonlinear regressor for generating salient object support regions significantly improves the segmentation performance compared to using only per-frame objectness cues. Extensive experiments on challenging real-world video sequences are performed to validate the feasibility and superiority of the proposed approach for addressing dynamic scene segmentation.

1-20hit(66hit)

Keyword Search Result

[Keyword] scene(66hit)

Enhanced Data Transfer Cooperating with Artificial Triplets for Scene Graph Generation Open Access

Dual-Path Convolutional Neural Network Based on Band Interaction Block for Acoustic Scene Classification Open Access

VTD-FCENet: A Real-Time HD Video Text Detection with Scale-Aware Fourier Contour Embedding Open Access

Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology

An Integrated Convolutional Neural Network with a Fusion Attention Mechanism for Acoustic Scene Classification

Spatial-Temporal Aggregated Shuffle Attention for Video Instance Segmentation of Traffic Scene

Synthetic Scene Character Generator and Ensemble Scheme with the Random Image Feature Method for Japanese and Chinese Scene Character Recognition

Joint Analysis of Sound Events and Acoustic Scenes Using Multitask Learning

Acceleration of Automatic Building Extraction via Color-Clustering Analysis Open Access

Graph Cepstrum: Spatial Feature Extracted from Partially Connected Microphones

Vision Based Nighttime Vehicle Detection Using Adaptive Threshold and Multi-Class Classification

Recognition of Moving Object in High Dynamic Scene for Visual Prosthesis

Bilateral Convolutional Activations Encoded with Fisher Vectors for Scene Character Recognition

Detecting TV Program Highlight Scenes Using Twitter Data Classified by Twitter User Behavior and Evaluating It to Soccer Game TV Programs

A Novel Discriminative Feature Extraction for Acoustic Scene Classification Using RNN Based Source Separation

Enhanced Depiction of High Dynamic Images Using Tone Mapping Operator and Chromatic Adaptation Transform

Scene Character Recognition Using Coupled Spatial Learning

Acoustic Scene Analysis Based on Hierarchical Generative Model of Acoustic Event Sequence

LLC Revisit: Scene Classification with k-Farthest Neighbours

Nonlinear Regression of Saliency Guided Proposals for Unsupervised Segmentation of Dynamic Scenes

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles