The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SCE(344hit)

1-20hit(344hit)

  • Enhanced Data Transfer Cooperating with Artificial Triplets for Scene Graph Generation Open Access

    KuanChao CHU  Satoshi YAMAZAKI  Hideki NAKAYAMA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2024/04/30
      Vol:
    E107-D No:9
      Page(s):
    1239-1252

    This work focuses on training dataset enhancement of informative relational triplets for Scene Graph Generation (SGG). Due to the lack of effective supervision, the current SGG model predictions perform poorly for informative relational triplets with inadequate training samples. Therefore, we propose two novel training dataset enhancement modules: Feature Space Triplet Augmentation (FSTA) and Soft Transfer. FSTA leverages a feature generator trained to generate representations of an object in relational triplets. The biased prediction based sampling in FSTA efficiently augments artificial triplets focusing on the challenging ones. In addition, we introduce Soft Transfer, which assigns soft predicate labels to general relational triplets to make more supervisions for informative predicate classes effectively. Experimental results show that integrating FSTA and Soft Transfer achieve high levels of both Recall and mean Recall in Visual Genome dataset. The mean of Recall and mean Recall is the highest among all the existing model-agnostic methods.

  • Dual-Path Convolutional Neural Network Based on Band Interaction Block for Acoustic Scene Classification Open Access

    Pengxu JIANG  Yang YANG  Yue XIE  Cairong ZOU  Qingyun WANG  

     
    LETTER-Engineering Acoustics

      Pubricized:
    2023/10/04
      Vol:
    E107-A No:7
      Page(s):
    1040-1044

    Convolutional neural network (CNN) is widely used in acoustic scene classification (ASC) tasks. In most cases, local convolution is utilized to gather time-frequency information between spectrum nodes. It is challenging to adequately express the non-local link between frequency domains in a finite convolution region. In this paper, we propose a dual-path convolutional neural network based on band interaction block (DCNN-bi) for ASC, with mel-spectrogram as the model’s input. We build two parallel CNN paths to learn the high-frequency and low-frequency components of the input feature. Additionally, we have created three band interaction blocks (bi-blocks) to explore the pertinent nodes between various frequency bands, which are connected between two paths. Combining the time-frequency information from two paths, the bi-blocks with three distinct designs acquire non-local information and send it back to the respective paths. The experimental results indicate that the utilization of the bi-block has the potential to improve the initial performance of the CNN substantially. Specifically, when applied to the DCASE 2018 and DCASE 2020 datasets, the CNN exhibited performance improvements of 1.79% and 3.06%, respectively.

  • VTD-FCENet: A Real-Time HD Video Text Detection with Scale-Aware Fourier Contour Embedding Open Access

    Wocheng XIAO  Lingyu LIANG  Jianyong CHEN  Tao WANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2023/12/07
      Vol:
    E107-D No:4
      Page(s):
    574-578

    Video text detection (VTD) aims to localize text instances in videos, which has wide applications for downstream tasks. To deal with the variances of different scenes and text instances, multiple models and feature fusion strategies were typically integrated in existing VTD methods. A VTD method consisting of sophisticated components can efficiently improve detection accuracy, but may suffer from a limitation for real-time applications. This paper aims to achieve real-time VTD with an adaptive lightweight end-to-end framework. Different from previous methods that represent text in a spatial domain, we model text instances in the Fourier domain. Specifically, we propose a scale-aware Fourier Contour Embedding method, which not only models arbitrary shaped text contours of videos as compact signatures, but also adaptively select proper scales for features in a backbone in the training stage. Then, we construct VTD-FCENet to achieve real-time VTD, which encodes temporal correlations of adjacent frames with scale-aware FCE in a lightweight and adaptive manner. Quantitative evaluations were conducted on ICDAR2013 Video, Minetto and YVT benchmark datasets, and the results show that our VTD-FCENet not only obtains the state-of-the-arts or competitive detection accuracy, but also allows real-time text detection on HD videos simultaneously.

  • Precoder Optimization Using Data Correlation for Wireless Data Aggregation

    Ayano NAKAI-KASAI  Naoyuki HAYASHI  Tadashi WADAYAMA  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E107-B No:3
      Page(s):
    330-338

    In this paper, we consider precoder design for wireless data aggregation in sensor networks. The precoder optimization problem can be formulated as minimization of mean squared error under transmit power and block diagonal constraints. We include statistical correlation of data into the optimization problem, which is appeared in typical applications but is ignored in conventional designing methods. We propose precoder optimization algorithms based on projected gradient descent with projection onto the constraint sets. The proposed method can achieve better performance than the conventional methods that do not incorporate data correlation, especially when data are highly correlated. We also extend the proposed approach to the context of over-the-air computation.

  • Development and Photoluminescence Properties of Dinuclear Eu(III)-β-Diketonates with a Branched Tetraphosphine Tetraoxide Ligand for Potential Use in LEDs as Red Phosphors Open Access

    Hiroki IWANAGA  Fumihiko AIGA  Shin-ichi SASAOKA  Takahiro WAZAKI  

     
    INVITED PAPER

      Pubricized:
    2023/08/03
      Vol:
    E107-C No:2
      Page(s):
    34-41

    In the field of micro-LED displays consisting of UV or Blue-LED arrays and phosphors, where the chips used are very small, particle size of phosphors must be small to suppress variation in hue for each pixel. Especially, there is a strong demand for a red phosphor with small particle sizes. However, quantum yields of inorganic phosphors decrease as particles size of phosphors get smaller. On the other hand, in the case of organic phosphors and complexes, quantum yields don't decrease when particle size gets smaller because each molecule has a function of absorbing and emitting light. We focus on Eu(III) complexes as candidates of red phosphors for micro-LED displays because their color purities of photoluminescence spectra are high, and have been tried to enhance photoluminescence intensity by coordinating non-ionic ligand, specifically, newly designed phosphine oxide ligands. Non-ionic ligands have generally less influential on properties of complexes compared with ionic ligands, but have a high degree of flexibility in molecular design. We found novel molecular design concept of phosphine oxide ligands to enhance photoluminescence properties of Eu(III) complexes. This time, novel dinuclear Eu(III)-β-diketonates with a branched tetraphosphine tetraoxide ligand, TDPBPO and TDPPPO, were developed. They are designed to have two different phosphine oxide portions; one has aromatic substituents and the other has no aromatic substituent. TDPBPO and TDPPPO ligands have functions of increasing absolute quantum yields of Eu(III)-β-diketonates. Eu(III)-β-diketonates with branched tetraphosphine tetraoxide ligands have sharp red emissions and excellent quantum yields, and are promising candidates for micro LED displays, security media, and sensing for their pure and strong photoluminescence intensity.

  • Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology

    Wenkai LIU  Lin ZHANG  Menglong WU  Xichang CAI  Hongxia DONG  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/10/23
      Vol:
    E107-D No:1
      Page(s):
    83-92

    The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.

  • Gradient Descent Direction Random Walk MIMO Detection Using Intermediate Search Point

    Naoki ITO  Yukitoshi SANADA  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2023/07/24
      Vol:
    E106-B No:11
      Page(s):
    1192-1199

    In this paper, multi-input multi-output (MIMO) signal detection with random walk along a gradient descent direction using an intermediate search point is presented. As a low complexity MIMO signal detection schemes, a gradient descent algorithm with Metropolis-Hastings (MH) methods has been proposed. Random walk along a gradient descent direction speeds up the MH based search using the gradient of a least-squares cost function. However, the gradient vector may be discarded through QAM constellation quantization in some cases. For further performance improvement, this paper proposes an improved search scheme in which the gradient vector is stored for the next search iteration to generate an intermediate search point. The performance of the proposed scheme improves with higher order modulation symbols as compared with that of a conventional scheme. Numerical results obtained through computer simulation show that a bit error rate (BER) performance improves by 5dB at a BER of 10-3 for 64QAM symbols in a 16×16 MIMO system.

  • On Gradient Descent Training Under Data Augmentation with On-Line Noisy Copies

    Katsuyuki HAGIWARA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/06/12
      Vol:
    E106-D No:9
      Page(s):
    1537-1545

    In machine learning, data augmentation (DA) is a technique for improving the generalization performance of models. In this paper, we mainly consider gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyze the situation where noisy copies are newly generated and injected into inputs at each epoch, i.e., the case of using on-line noisy copies. Therefore, this article can also be viewed as an analysis on a method using noise injection into a training process by DA. We considered the training process under three training situations which are the full-batch training under the sum of squared errors, and full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to the l2 regularization training for which variance of injected noise is important, whereas the number of copies is not. Moreover, we showed that DA with on-line copies apparently leads to an increase of learning rate in full-batch condition under the sum of squared errors and the mini-batch condition under the mean squared error. The apparent increase in learning rate and regularization effect can be attributed to the original input and additive noise in noisy copies, respectively. These results are confirmed in a numerical experiment in which we found that our result can be applied to usual off-line DA in an under-parameterization scenario and can not in an over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression can be qualitatively applied to neural networks.

  • An Integrated Convolutional Neural Network with a Fusion Attention Mechanism for Acoustic Scene Classification

    Pengxu JIANG  Yue XIE  Cairong ZOU  Li ZHAO  Qingyun WANG  

     
    LETTER-Engineering Acoustics

      Pubricized:
    2023/02/06
      Vol:
    E106-A No:8
      Page(s):
    1057-1061

    In human-computer interaction, acoustic scene classification (ASC) is one of the relevant research domains. In real life, the recorded audio may include a lot of noise and quiet clips, making it hard for earlier ASC-based research to isolate the crucial scene information in sound. Furthermore, scene information may be scattered across numerous audio frames; hence, selecting scene-related frames is crucial for ASC. In this context, an integrated convolutional neural network with a fusion attention mechanism (ICNN-FA) is proposed for ASC. Firstly, segmented mel-spectrograms as the input of ICNN can assist the model in learning the short-term time-frequency correlation information. Then, the designed ICNN model is employed to learn these segment-level features. In addition, the proposed global attention layer may gather global information by integrating these segment features. Finally, the developed fusion attention layer is utilized to fuse all segment-level features while the classifier classifies various situations. Experimental findings using ASC datasets from DCASE 2018 and 2019 indicate the efficacy of the suggested method.

  • A State-Space Approach and Its Estimation Bias Analysis for Adaptive Notch Digital Filters with Constrained Poles and Zeros

    Yoichi HINAMOTO  Shotaro NISHIMURA  

     
    PAPER-Digital Signal Processing

      Pubricized:
    2022/09/16
      Vol:
    E106-A No:3
      Page(s):
    582-589

    This paper deals with a state-space approach for adaptive second-order IIR notch digital filters with constrained poles and zeros. A simplified iterative algorithm is derived from the gradient-descent method to minimize the mean-squared output of an adaptive notch digital filter. Then, stability and parameter-estimation bias are analyzed for the simplified iterative algorithm. A numerical example is presented to demonstrate the validity and effectiveness of the proposed adaptive state-space notch digital filter and parameter-estimation bias analysis.

  • Spatial-Temporal Aggregated Shuffle Attention for Video Instance Segmentation of Traffic Scene

    Chongren ZHAO  Yinhui ZHANG  Zifen HE  Yunnan DENG  Ying HUANG  Guangchen CHEN  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2022/11/24
      Vol:
    E106-D No:2
      Page(s):
    240-251

    Aiming at the problem of spatial focus regions distribution dispersion and dislocation in feature pyramid networks and insufficient feature dependency acquisition in both spatial and channel dimensions, this paper proposes a spatial-temporal aggregated shuffle attention for video instance segmentation (STASA-VIS). First, an mixed subsampling (MS) module to embed activating features from the low-level target area of feature pyramid into the high-level is designed, so as to aggregate spatial information on target area. Taking advantage of the coherent information in video frames, STASA-VIS uses the first ones of every 5 video frames as the key-frames and then propagates the keyframe feature maps of the pyramid layers forward in the time domain, and fuses with the non-keyframe mixed subsampled features to achieve time-domain consistent feature aggregation. Finally, STASA-VIS embeds shuffle attention in the backbone to capture the pixel-level pairwise relationship and dimensional dependencies among the channels and reduce the computation. Experimental results show that the segmentation accuracy of STASA-VIS reaches 41.2%, and the test speed reaches 34FPS, which is better than the state-of-the-art one stage video instance segmentation (VIS) methods in accuracy and achieves real-time segmentation.

  • Convergence Acceleration via Chebyshev Step: Plausible Interpretation of Deep-Unfolded Gradient Descent

    Satoshi TAKABE  Tadashi WADAYAMA  

     
    PAPER-Numerical Analysis and Optimization

      Pubricized:
    2022/01/25
      Vol:
    E105-A No:8
      Page(s):
    1110-1120

    Deep unfolding is a promising deep-learning technique, whose network architecture is based on expanding the recursive structure of existing iterative algorithms. Although deep unfolding realizes convergence acceleration, its theoretical aspects have not been revealed yet. This study details the theoretical analysis of the convergence acceleration in deep-unfolded gradient descent (DUGD) whose trainable parameters are step sizes. We propose a plausible interpretation of the learned step-size parameters in DUGD by introducing the principle of Chebyshev steps derived from Chebyshev polynomials. The use of Chebyshev steps in gradient descent (GD) enables us to bound the spectral radius of a matrix governing the convergence speed of GD, leading to a tight upper bound on the convergence rate. Numerical results show that Chebyshev steps numerically explain the learned step-size parameters in DUGD well.

  • On the Asymptotic Evaluation of the Physical Optics Approximation for Plane Wave Scattering by Circular Conducting Cylinders

    Ngoc Quang TA  Hiroshi SHIRAI  

     
    PAPER

      Pubricized:
    2021/10/18
      Vol:
    E105-C No:4
      Page(s):
    128-136

    In this paper, the scattering far-field from a circular electric conducting cylinder has been analyzed by physical optics (PO) approximation for both H and E polarizations. The evaluation of radiation integrations due to the PO current is conducted numerically and analytically. While non-uniform and uniform asymptotic solutions have been derived by the saddle point method, a separate approximation has been made for forward scattering direction. Comparisons among our approximation, direct numerical integration and exact solution results yield a good agreement for electrically large cylinders.

  • A 6.5Gb/s Shared Bus Using Electromagnetic Connectors for Downsizing and Lightening Satellite Processor System

    Atsutake KOSUGE  Mototsugu HAMADA  Tadahiro KURODA  

     
    PAPER

      Pubricized:
    2021/09/03
      Vol:
    E105-A No:3
      Page(s):
    478-486

    A 6.5Gb/s shared bus that uses a 65nm CMOS pulse transceiver chip with a low frequency equalizer and electromagnetic connectors based on two types of transmission line couplers is presented. The amount of backplane wiring is reduced by a factor of 1/16 and total connector volume by a factor of 1/246. It reduces the size and weight of a satellite processor system by 60%, increases the data rate by a factor of 2.6, and satisfies the EMC standard for withstanding the strong shock of rocket launch.

  • Multi-Agent Distributed Route Selection under Consideration of Time Dependency among Agents' Road Usage for Vehicular Networks

    Takanori HARA  Masahiro SASABE  Shoji KASAHARA  

     
    PAPER

      Pubricized:
    2021/08/05
      Vol:
    E105-B No:2
      Page(s):
    140-150

    Traffic congestion in road networks has been studied as the congestion game in game theory. In the existing work, the road usage by each agent was assumed to be static during the whole time horizon of the agent's travel, as in the classical congestion game. This assumption, however, should be reconsidered because each agent sequentially uses roads composing the route. In this paper, we propose a multi-agent distributed route selection scheme based on a gradient descent method considering the time-dependency among agents' road usage for vehicular networks. The proposed scheme first estimates the time-dependent flow on each road by considering the agents' probabilistic occupation under the first-in-first-out (FIFO) policy. Then, it calculates the optimal route choice probability of each route candidate using the gradient descent method and the estimated time-dependent flow. Each agent finally selects one route according to the optimal route choice probabilities. We first prove that the proposed scheme can exponentially converge to the steady-state at the convergence rate inversely proportional to the product of the number of agents and that of individual route candidates. Through simulations under a grid-like network and a real road network, we show that the proposed scheme can improve the actual travel time by 5.1% and 2.5% compared with the conventional static-flow based approach, respectively. In addition, we demonstrate that the proposed scheme is robust against incomplete information sharing among agents, which would be caused by its low penetration ratio or limited transmission range of wireless communications.

  • Synthetic Scene Character Generator and Ensemble Scheme with the Random Image Feature Method for Japanese and Chinese Scene Character Recognition

    Fuma HORIE  Hideaki GOTO  Takuo SUGANUMA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/08/24
      Vol:
    E104-D No:11
      Page(s):
    2002-2010

    Scene character recognition has been intensively investigated for a couple of decades because it has a great potential in many applications including automatic translation, signboard recognition, and reading assistance for the visually-impaired. However, scene characters are difficult to recognize at sufficient accuracy owing to various noise and image distortions. In addition, Japanese scene character recognition is more challenging and requires a large amount of character data for training because thousands of character classes exist in the language. Some researchers proposed training data augmentation techniques using Synthetic Scene Character Data (SSCD) to compensate for the shortage of training data. In this paper, we propose a Random Filter which is a new method for SSCD generation, and introduce an ensemble scheme with the Random Image Feature (RI-Feature) method. Since there has not been a large Japanese scene character dataset for the evaluation of the recognition systems, we have developed an open dataset JPSC1400, which consists of a large number of real Japanese scene characters. It is shown that the accuracy has been improved from 70.9% to 83.1% by introducing the RI-Feature method to the ensemble scheme.

  • Adaptive Normal State-Space Notch Digital Filters: Algorithm and Frequency-Estimation Bias Analysis

    Yoichi HINAMOTO  Shotaro NISHIMURA  

     
    PAPER-Digital Signal Processing

      Pubricized:
    2021/05/17
      Vol:
    E104-A No:11
      Page(s):
    1585-1592

    This paper investigates an adaptive notch digital filter that employs normal state-space realization of a single-frequency second-order IIR notch digital filter. An adaptive algorithm is developed to minimize the mean-squared output error of the filter iteratively. This algorithm is based on a simplified form of the gradient-decent method. Stability and frequency estimation bias are analyzed for the adaptive iterative algorithm. Finally, a numerical example is presented to demonstrate the validity and effectiveness of the proposed adaptive notch digital filter and the frequency-estimation bias analyzed for the adaptive iterative algorithm.

  • A High-Speed PWM-Modulated Transceiver Network for Closed-Loop Channel Topology

    Kyongsu LEE  Jae-Yoon SIM  

     
    BRIEF PAPER

      Pubricized:
    2020/12/18
      Vol:
    E104-C No:7
      Page(s):
    350-354

    This paper proposes a pulse-width modulated (PWM) signaling[1] to send clock and data over a pair of channels for in-vehicle network where a closed chain of point-to-point (P2P) interconnection between electronic control units (ECU) has been established. To improve detection speed and margin of proposed receiver, we also proposed a novel clock and data recovery (CDR) scheme with 0.5 unit-interval (UI) tuning range and a PWM generator utilizing 10 equally-spaced phases. The feasibility of proposed system has been proved by successfully detecting 1.25 Gb/s data delivered via 3 ECUs and inter-channels in 180 nm CMOS technology. Compared to previous study, the proposed system achieved better efficiency in terms of power, cost, and reliability.

  • Distributed UAVs Placement Optimization for Cooperative Communication

    Zhaoyang HOU  Zheng XIANG  Peng REN  Qiang HE  Ling ZHENG  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2020/12/08
      Vol:
    E104-B No:6
      Page(s):
    675-685

    In this paper, the distributed cooperative communication of unmanned aerial vehicles (UAVs) is studied, where the condition number (CN) and the inner product (InP) are used to measure the quality of communication links. By optimizing the relative position of UAVs, large channel capacity and stable communication links can be obtained. Using the spherical wave model under the line of sight (LOS) channel, CN expression of the channel matrix is derived when there are Nt transmitters and two receivers in the system. In order to maximize channel capacity, we derive the UAVs position constraint equation (UAVs-PCE), and the constraint between BS elements distance and carrier wavelength is analyzed. The result shows there is an area where no matter how the UAVs' positions are adjusted, the CN is still very large. Then a special scenario is considered where UAVs form a rectangular lattice array, and the optimal constraint between communication distance and UAVs distance is derived. After that, we derive the InP of channel matrix and the gradient expression of InP with respect to UAVs' position. The particle swarm optimization (PSO) algorithm is used to minimize the CN and the gradient descent (GD) algorithm is used to minimize the InP by optimizing UAVs' position iteratively. Both of the two algorithms present great potentials for optimizing the CN and InP respectively. Furthermore, a hybrid algorithm named PSO-GD combining the advantage of the two algorithms is proposed to maximize the communication capacity with lower complexity. Simulations show that PSO-GD is more efficient than PSO and GD. PSO helps GD to break away from local extremum and provides better positions for GD, and GD can converge to an optimal solution quickly by using the gradient information based on the better positions. Simulations also reveal that a better channel can be obtained when those parameters satisfy the UAVs position constraint equation (UAVs-PCE), meanwhile, theory analysis also explains the abnormal phenomena in simulations.

  • Light-YOLOv3: License Plate Detection in Multi-Vehicle Scenario

    Yuchao SUN  Qiao PENG  Dengyin ZHANG  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2021/02/22
      Vol:
    E104-D No:5
      Page(s):
    723-728

    With the development of the Internet of Vehicles, License plate detection technology is widely used, e.g., smart city and edge senor monitor. However, traditional license plate detection methods are based on the license plate edge detection, only suitable for limited situation, such as, wealthy light and favorable camera's angle. Fortunately, deep learning networks represented by YOLOv3 can solve the problem, relying on strict condition. Although YOLOv3 make it better to detect large targets, its low performance in detecting small targets and lack of the real-time interactively. Motivated by this, we present a faster and lightweight YOLOv3 model for multi-vehicle or under-illuminated images scenario. Generally, our model can serves as a guideline for optimizing neural network in multi-vehicle scenario.

1-20hit(344hit)