The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] ATI(18690hit)

21-40hit(18690hit)

  • A CNN-Based Feature Pyramid Segmentation Strategy for Acoustic Scene Classification Open Access

    Ji XI  Yue XIE  Pengxu JIANG  Wei JIANG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2024/03/26
      Vol:
    E107-D No:8
      Page(s):
    1093-1096

    Currently, a significant portion of acoustic scene categorization (ASC) research is centered around utilizing Convolutional Neural Network (CNN) models. This preference is primarily due to CNN’s ability to effectively extract time-frequency information from audio recordings of scenes by employing spectrum data as input. The expression of many dimensions can be achieved by utilizing 2D spectrum characteristics. Nevertheless, the diverse interpretations of the same object’s existence in different positions on the spectrum map can be attributed to the discrepancies between spectrum properties and picture qualities. The lack of distinction between different aspects of input information in ASC-based CNN networks may result in a decline in system performance. Considering this, a feature pyramid segmentation (FPS) approach based on CNN is proposed. The proposed approach involves utilizing spectrum features as the input for the model. These features are split based on a preset scale, and each segment-level feature is then fed into the CNN network for learning. The SoftMax classifier will receive the output of all feature scales, and these high-level features will be fused and fed to it to categorize different scenarios. The experiment provides evidence to support the efficacy of the FPS strategy and its potential to enhance the performance of the ASC system.

  • Tracking WebVR User Activities through Hand Motions: An Attack Perspective Open Access

    Jiyeon LEE  

     
    LETTER-Human-computer Interaction

      Pubricized:
    2024/04/16
      Vol:
    E107-D No:8
      Page(s):
    1089-1092

    With the rapid advancement of graphics processing units (GPUs), Virtual Reality (VR) experiences have significantly improved, enhancing immersion and realism. However, these advancements also raise security concerns in VR. In this paper, I introduce a new attack leveraging known WebVR vulnerabilities to track the activities of VR users. The proposed attack leverages the user’s hand motion information exposed to web attackers, demonstrating the capability to identify consumed content, such as 3D images and videos, and pilfer private drawings created in a 3D drawing app. To achieve this, I employed a machine learning approach to process controller sensor data and devised techniques to extract sensitive activities during the use of target apps. The experimental results demonstrate that the viewed content in the targeted content viewer can be identified with 90% accuracy. Furthermore, I successfully obtained drawing outlines that precisely match the user’s original drawings without performance degradation, validating the effectiveness of the attack.

  • MDX-Mixer: Music Demixing by Leveraging Source Signals Separated by Existing Demixing Models Open Access

    Tomoyasu NAKANO  Masataka GOTO  

     
    PAPER-Music Information Processing

      Pubricized:
    2024/04/05
      Vol:
    E107-D No:8
      Page(s):
    1079-1088

    This paper presents MDX-Mixer, which improves music demixing (MDX) performance by leveraging source signals separated by multiple existing MDX models. Deep-learning-based MDX models have improved their separation performances year by year for four kinds of sound sources: “vocals,” “drums,” “bass,” and “other”. Our research question is whether mixing (i.e., weighted sum) the signals separated by state-of-the-art MDX models can obtain either the best of everything or higher separation performance. Previously, in singing voice separation and MDX, there have been studies in which separated signals of the same sound source are mixed with each other using time-invariant or time-varying positive mixing weights. In contrast to those, this study is novel in that it allows for negative weights as well and performs time-varying mixing using all of the separated source signals and the music acoustic signal before separation. The time-varying weights are estimated by modeling the music acoustic signals and their separated signals by dividing them into short segments. In this paper we propose two new systems: one that estimates time-invariant weights using 1×1 convolution, and one that estimates time-varying weights by applying the MLP-Mixer layer proposed in the computer vision field to each segment. The latter model is called MDX-Mixer. Their performances were evaluated based on the source-to-distortion ratio (SDR) using the well-known MUSDB18-HQ dataset. The results show that the MDX-Mixer achieved higher SDR than the separated signals given by three state-of-the-art MDX models.

  • FSAMT: Face Shape Adaptive Makeup Transfer Open Access

    Haoran LUO  Tengfei SHAO  Shenglei LI  Reiko HISHIYAMA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2024/04/02
      Vol:
    E107-D No:8
      Page(s):
    1059-1069

    Makeup transfer is the process of applying the makeup style from one picture (reference) to another (source), allowing for the modification of characters’ makeup styles. To meet the diverse makeup needs of individuals or samples, the makeup transfer framework should accurately handle various makeup degrees, ranging from subtle to bold, and exhibit intelligence in adapting to the source makeup. This paper introduces a “3-level” adaptive makeup transfer framework, addressing facial makeup through two sub-tasks: 1. Makeup adaptation, utilizing feature descriptors and eyelid curve algorithms to classify 135 organ-level face shapes; 2. Makeup transfer, achieved by learning the reference picture from three branches (color, highlight, pattern) and applying it to the source picture. The proposed framework, termed “Face Shape Adaptive Makeup Transfer” (FSAMT), demonstrates superior results in makeup transfer output quality, as confirmed by experimental results.

  • Agent Allocation-Action Learning with Dynamic Heterogeneous Graph in Multi-Task Games Open Access

    Xianglong LI  Yuan LI  Jieyuan ZHANG  Xinhai XU  Donghong LIU  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2024/04/03
      Vol:
    E107-D No:8
      Page(s):
    1040-1049

    In many real-world problems, a complex task is typically composed of a set of subtasks that follow a certain execution order. Traditional multi-agent reinforcement learning methods perform poorly in such multi-task cases, as they consider the whole problem as one task. For such multi-agent multi-task problems, heterogeneous relationships i.e., subtask-subtask, agent-agent, and subtask-agent, are important characters which should be explored to facilitate the learning performance. This paper proposes a dynamic heterogeneous graph based agent allocation-action learning framework. Specifically, a dynamic heterogeneous graph model is firstly designed to characterize the variation of heterogeneous relationships with the time going on. Then a multi-subgraph partition method is invented to extract features of heterogeneous graphs. Leveraging the extracted features, a hierarchical framework is designed to learn the dynamic allocation of agents among subtasks, as well as cooperative behaviors. Experimental results demonstrate that our framework outperforms recent representative methods on two challenging tasks, i.e., SAVETHECITY and Google Research Football full game.

  • Confidence-Driven Contrastive Learning for Document Classification without Annotated Data Open Access

    Zhewei XU  Mizuho IWAIHARA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2024/04/19
      Vol:
    E107-D No:8
      Page(s):
    1029-1039

    Data sparsity has always been a problem in document classification, for which semi-supervised learning and few-shot learning are studied. An even more extreme scenario is to classify documents without any annotated data, but using only category names. In this paper, we introduce a nearest neighbor search-based method Con2Class to tackle this tough task. We intend to produce embeddings for predefined categories and predict category embeddings for all the unlabeled documents in a unified embedding space, such that categories can be easily assigned by searching the nearest predefined category in the embedding space. To achieve this, we propose confidence-driven contrastive learning, in which prompt-based templates are designed and MLM-maintained contrastive loss is newly proposed to finetune a pretrained language model for embedding production. To deal with the issue that no annotated data is available to validate the classification model, we introduce confidence factor to estimate the classification ability by evaluating the prediction confidence. The language model having the highest confidence factor is used to produce embeddings for similarity evaluation. Pseudo labels are then assigned by searching the semantically closest category name, which are further used to train a separate classifier following a progressive self-training strategy for final prediction. Our experiments on five representative datasets demonstrate the superiority of our proposed method over the existing approaches.

  • Unveiling Python Version Compatibility Challenges in Code Snippets on Stack Overflow Open Access

    Shiyu YANG  Tetsuya KANDA  Daniel M. GERMAN  Yoshiki HIGO  

     
    PAPER-Software Engineering

      Pubricized:
    2024/04/16
      Vol:
    E107-D No:8
      Page(s):
    1007-1015

    Stack Overflow, a leading Q&A platform for developers, is a substantial reservoir of Python code snippets. Nevertheless, the incompatibility issues between Python versions, particularly Python 2 and Python 3, introduce substantial challenges that can potentially jeopardize the utility of these code snippets. This empirical study dives deep into the challenges of Python version inconsistencies on the interpretation and application of Python code snippets on Stack Overflow. Our empirical study exposes the prevalence of Python version compatibility issues on Stack Overflow. It further emphasizes an apparent deficiency in version-specific identification, a critical element that facilitates the identification and utilization of Python code snippets. These challenges, primarily arising from the lack of backward compatibility between Python’s major versions, pose significant hurdles for developers relying on Stack Overflow for code references and learning. This study, therefore, signifies the importance of proactively addressing these compatibility issues in Python code snippets. It advocates for enhanced tools and strategies to assist developers in efficiently navigating through the Python version complexities on platforms like Stack Overflow. By highlighting these concerns and providing a potential remedy, we aim to contribute to a more efficient and effective programming experience on Stack Overflow and similar platforms.

  • Evaluating PAM-4 Data Transmission Quality Using Multi-Dimensional Mapping of Received Symbols Open Access

    Yasushi YUMINAKA  Kazuharu NAKAJIMA  Yosuke IIJIMA  

     
    PAPER

      Pubricized:
    2024/04/25
      Vol:
    E107-D No:8
      Page(s):
    985-991

    This study investigates a two/three-dimensional (2D/3D) symbol-mapping technique that evaluates data transmission quality based on a four-level pulse-amplitude modulation (PAM-4) symbol transition. Multi-dimensional symbol transition mapping facilitates the visualization of the degree of interference (ISI). The simulation and experimental results demonstrated that the 2D symbol mapping can evaluate the PAM-4 data transmission quality degraded by ISI and visualize the equalization effect. Furthermore, potential applications of 2D mapping and its extension to 3D mapping were explored.

  • Extending Binary Neural Networks to Bayesian Neural Networks with Probabilistic Interpretation of Binary Weights Open Access

    Taisei SAITO  Kota ANDO  Tetsuya ASAI  

     
    PAPER

      Pubricized:
    2024/04/17
      Vol:
    E107-D No:8
      Page(s):
    949-957

    Neural networks (NNs) fail to perform well or make excessive predictions when predicting out-of-distribution or unseen datasets. In contrast, Bayesian neural networks (BNNs) can quantify the uncertainty of their inference to solve this problem. Nevertheless, BNNs have not been widely adopted owing to their increased memory and computational cost. In this study, we propose a novel approach to extend binary neural networks by introducing a probabilistic interpretation of binary weights, effectively converting them into BNNs. The proposed approach can reduce the number of weights by half compared to the conventional method. A comprehensive comparative analysis with established methods like Monte Carlo dropout and Bayes by backprop was performed to assess the performance and capabilities of our proposed technique in terms of accuracy and capturing uncertainty. Through this analysis, we aim to provide insights into the advantages of this Bayesian extension.

  • New Bounds for Quick Computation of the Lower Bound on the Gate Count of Toffoli-Based Reversible Logic Circuits Open Access

    Takashi HIRAYAMA  Rin SUZUKI  Katsuhisa YAMANAKA  Yasuaki NISHITANI  

     
    PAPER

      Pubricized:
    2024/05/10
      Vol:
    E107-D No:8
      Page(s):
    940-948

    We present a time-efficient lower bound κ on the number of gates in Toffoli-based reversible circuits that represent a given reversible logic function. For the characteristic vector s of a reversible logic function, κ(s) closely approximates σ-lb(s), which is known as a relatively efficient lower bound in respect of evaluation time and tightness. The primary contribution of this paper is that κ enables fast computation while maintaining a tightness of the lower bound, approximately equal to σ-lb. We prove that the discrepancy between κ(s) and σ-lb(s) is at most one only, by providing upper and lower bounds on σ-lb in terms of κ. Subsequently, we show that κ can be calculated more efficiently than σ-lb. An algorithm for κ(s) with a complexity of 𝓞(n) is presented, where n is the dimension of s. Experimental results comparing κ and σ-lb are also given. The results demonstrate that the two lower bounds are equal for most reversible functions, and that the calculation of κ is significantly faster than σ-lb by several orders of magnitude.

  • On Easily Reconstructable Logic Functions Open Access

    Tsutomu SASAO  

     
    PAPER

      Pubricized:
    2024/04/16
      Vol:
    E107-D No:8
      Page(s):
    913-921

    This paper shows that sum-of-product expression (SOP) minimization produces the generalization ability. We show this in three steps. First, various classes of SOPs are generated. Second, minterms of SOP are randomly selected to generate partially defined functions. And, third, from the partially defined functions, original functions are reconstructed by SOP minimization. We consider Achilles heel functions, majority functions, monotone increasing cascade functions, functions generated from random SOPs, monotone increasing random SOPs, circle functions, and globe functions. As for the generalization ability, the presented method is compared with Naive Bayes, multi-level perceptron, support vector machine, JRIP, J48, and random forest. For these functions, in many cases, only 10% of the input combinations are sufficient to reconstruct more than 90% of the truth tables of the original functions.

  • 10-Gbit/s Data Transmission Using 120-GHz-Band Contactless Communication with SRR Integrated Glass Substrate Open Access

    Tomohiro KUMAKI  Akihiko HIRATA  Tubasa SAIJO  Yuma KAWAMOTO  Tadao NAGATSUMA  Osamu KAGAYA  

     
    PAPER-Microwaves, Millimeter-Waves

      Pubricized:
    2024/02/08
      Vol:
    E107-C No:8
      Page(s):
    223-230

    We achieved 10-Gbit/s data transmission using a cutting-edge 120-GHz-band high-speed contactless communication technology, which allows seamless connection to a local area network (LAN) by simply placing devices on a desk. We propose a glass substrate-integrated rectangular waveguide that can control the permeability of the top surface to 120-GHz signals by contacting a dielectric substrate with the substrate. The top surface of the rectangular waveguide was replaced with a glass substrate on which split-ring resonators (SRRs) were integrated. The transmission loss of the waveguide with a glass substrate was 2.5 dB at 125 GHz. When a dielectric sheet with a line pattern formed on the contact surface was in contact with a glass substrate, the transmission loss from the waveguide to the dielectric sheet was 19.2 dB at 125 GHz. We achieved 10-Gbit/s data transmission by contacting a dielectric sheet to the SRR-integrated glass substrate.

  • Method for Estimating Scatterer Information from the Response Waveform of a Backward Transient Scattering Field Using TD-SPT Open Access

    Keiji GOTO  Toru KAWANO  Munetoshi IWAKIRI  Tsubasa KAWAKAMI  Kazuki NAKAZAWA  

     
    PAPER-Electromagnetic Theory

      Pubricized:
    2024/01/23
      Vol:
    E107-C No:8
      Page(s):
    210-222

    This paper proposes a scatterer information estimation method using numerical data for the response waveform of a backward transient scattering field for both E- and H-polarizations when a two-dimensional (2-D) coated metal cylinder is selected as a scatterer. It is assumed that a line source and an observation point are placed at different locations. The four types of scatterer information covered in this paper are the relative permittivity of a surrounding medium, the relative permittivity of a coating medium layer and its thickness, and the radius of a coated metal cylinder. Specifically, a time-domain saddle-point technique (TD-SPT) is used to derive scatterer information estimation formulae from the amplitude intensity ratios (AIRs) of adjacent backward transient scattering field components. The estimates are obtained by substituting the numerical data of the response waveforms of the backward transient scattering field components into the estimation formulae and performing iterative calculations. Furthermore, a minimum thickness of a coating medium layer for which the estimation method is valid is derived, and two kinds of applicable conditions for the estimation method are proposed. The effectiveness of the scatterer information estimation method is verified by comparing the estimates with the set values. The noise tolerance and convergence characteristics of the estimation method and the method of controlling the estimation accuracy are also discussed.

  • Sum Rate Maximization for Multiuser Full-Duplex Wireless Powered Communication Networks Open Access

    Keigo HIRASHIMA  Teruyuki MIYAJIMA  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E107-B No:8
      Page(s):
    564-572

    In this paper, we consider an orthogonal frequency division multiple access (OFDMA)-based multiuser full-duplex wireless powered communication network (FD WPCN) system with beamforming (BF) at an energy transmitter (ET). The ET performs BF to efficiently transmit energy to multiple users while suppressing interference to an information receiver (IR). Multiple users operating in full-duplex mode harvest energy from the signals sent by the ET while simultaneously transmitting information to the IR using the harvested energy. We analytically demonstrate that the FD WPCN is superior to its half-duplex (HD) WPCN counterpart in the high-SNR regime. We propose a transmitter design method that maximizes the sum rate by determining the BF at the ET, power allocation at both the ET and users, and sub-band allocation. Simulation results show the effectiveness of the proposed method.

  • Differential Active Self-Interference Cancellation for Asynchronous In-Band Full-Duplex GFSK Open Access

    Shinsuke IBI  Takumi TAKAHASHI  Hisato IWAI  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E107-B No:8
      Page(s):
    552-563

    This paper proposes a novel differential active self-interference canceller (DASIC) algorithm for asynchronous in-band full-duplex (IBFD) Gaussian filtered frequency shift keying (GFSK), which is designed for wireless Internet of Things (IoT). In IBFD communications, where two terminals simultaneously transmit and receive signals in the same frequency band, there is an extremely strong self-interference (SI). The SI can be mitigated by an active SI canceller (ASIC), which subtracts an interference replica based on channel state information (CSI) from the received signal. The challenging problem is the realization of asynchronous IBFD for wireless IoT in indoor environments. In the asynchronous mode, pilot contamination is induced by the non-orthogonality between asynchronous pilot sequences. In addition, the transceiver suffers from analog front-end (AFE) impairments, such as phase noise. Due to these impairments, the SI cannot be canceled entirely at the receiver, resulting in residual interference. To address the above issue, the DASIC incorporates the principle of the differential codec, which enables to suppress SI without the CSI estimation of SI owing to the differential structure. Also, on the premise of using an error correction technique, iterative detection and decoding (IDD) is applied to improve the detection capability while exchanging the extrinsic log-likelihood ratio (LLR) between the maximum a-posteriori probability (MAP) detector and the channel decoder. Finally, the validity of using the DASIC algorithm is evaluated by computer simulations in terms of the packet error rate (PER). The results clearly demonstrate the possibility of realizing asynchronous IBFD.

  • Polling Schedule Algorithms for Data Aggregation with Sensor Phase Control in In-Vehicle UWB Networks Open Access

    Hajime MIGITA  Yuki NAKAGOSHI  Patrick FINNERTY  Chikara OHTA  Makoto OKUHARA  

     
    PAPER-Network

      Vol:
    E107-B No:8
      Page(s):
    529-540

    To enhance fuel efficiency and lower manufacturing and maintenance costs, in-vehicle wireless networks can facilitate the weight reduction of vehicle wire harnesses. In this paper, we utilize the Impulse Radio-Ultra Wideband (IR-UWB) of IEEE 802.15.4a/z for in-vehicle wireless networks because of its excellent signal penetration and robustness in multipath environments. Since clear channel assessment is optional in this standard, we employ polling control as a multiple access control to prevent interference within the system. Therein, the preamble overhead is large in IR-UWB of IEEE 802.15.4a/z. Hence, aggregating as much sensor data as possible within each frame is more efficient. In this paper, we assume that reading out data from sensors and sending data to actuators is periodical and that their respective phases can be adjusted. Therefore, this paper proposes an integer linear programming-based scheduling algorithm that minimizes the number of transmitted frames by adjusting the read and write phases. Furthermore, we provide a heuristic algorithm that computes a sub-optimal but acceptable solution in a shorter time. Experimental validation shows that the data aggregation of the proposed algorithms is robust against interference.

  • A Dual-Branch Algorithm for Semantic-Focused Face Super-Resolution Reconstruction Open Access

    Qi QI  Liuyi MENG  Ming XU  Bing BAI  

     
    LETTER-Image

      Pubricized:
    2024/03/18
      Vol:
    E107-A No:8
      Page(s):
    1435-1439

    In face super-resolution reconstruction, the interference caused by the texture and color of the hair region on the details and contours of the face region can negatively affect the reconstruction results. This paper proposes a semantic-based, dual-branch face super-resolution algorithm to address the issue of varying reconstruction complexities and mutual interference among different pixel semantics in face images. The algorithm clusters pixel semantic data to create a hierarchical representation, distinguishing between facial pixel regions and hair pixel regions. Subsequently, independent image enhancement is applied to these distinct pixel regions to mitigate their interference, resulting in a vivid, super-resolution face image.

  • Video Reflection Removal by Modified EDVR and 3D Convolution Open Access

    Sota MORIYAMA  Koichi ICHIGE  Yuichi HORI  Masayuki TACHI  

     
    LETTER-Image

      Pubricized:
    2023/12/11
      Vol:
    E107-A No:8
      Page(s):
    1430-1434

    In this paper, we propose a method for video reflection removal using a video restoration framework with enhanced deformable networks (EDVR). We examine the effect of each module in EDVR on video reflection removal and modify the models using 3D convolutions. The performance of each modified model is evaluated in terms of the RMSE between the structural similarity (SSIM) and the smoothed SSIM representing temporal consistency.

  • Peak-to-Average Power Ratio Reduction Scheme in DCO-OFDM with a Combined Index Modulation and Convex Optimization Open Access

    Menglong WU  Jianwen ZHANG  Yongfa XIE  Yongchao SHI  Tianao YAO  

     
    LETTER-Communication Theory and Signals

      Pubricized:
    2024/03/22
      Vol:
    E107-A No:8
      Page(s):
    1425-1429

    Direct-current biased optical orthogonal frequency division multiplexing (DCO-OFDM) exhibits a high peak-to-average power ratio (PAPR), which leads to nonlinear distortion in the system. In response to the above, the study proposes a scheme that combines direct-current biased optical orthogonal frequency division multiplexing with index modulation (DCO-OFDM-IM) and convex optimization algorithms. The proposed scheme utilizes partially activated subcarriers of the system to transmit constellation modulated symbol information, and transmits additional symbol information of the system through the combination of activated carrier index. Additionally, a dither signal is added to the system’s idle subcarriers, and the convex optimization algorithm is applied to solve for the optimal values of this dither signal. Therefore, by ensuring the system’s peak power remains unchanged, the scheme enhances the system’s average transmission power and thus achieves a reduction in the PAPR. Experimental results indicate that at a system’s complementary cumulative distribution function (CCDF) of 10-4, the proposed scheme reduces the PAPR by approximately 3.5 dB compared to the conventional DCO-OFDM system. Moreover, at a bit error rate (BER) of 10-3, the proposed scheme can lower the signal-to-noise ratio (SNR) by about 1 dB relative to the traditional DCO-OFDM system. Therefore, the proposed scheme enables a more substantial reduction in PAPR and improvement in BER performance compared to the conventional DCO-OFDM approach.

  • An Efficiency-Enhancing Wideband OFDM Dual-Function MIMO Radar-Communication System Design Open Access

    Yumeng ZHANG  

     
    LETTER-Communication Theory and Signals

      Pubricized:
    2024/03/04
      Vol:
    E107-A No:8
      Page(s):
    1421-1424

    Integrated Sensing and Communication at terahertz band (ISAC-THz) has been considered as one of the promising technologies for the future 6G. However, in the phase-shifters (PSs) based massive multiple-input-multiple-output (MIMO) hybrid precoding system, due to the ultra-large bandwidth of the terahertz frequency band, the subcarrier channels with different frequencies have different equivalent spatial directions. Therefore, the hybrid beamforming at the transmitter will cause serious beam split problems. In this letter, we propose a dual-function radar communication (DFRC) precoding method by considering recently proposed delay-phase precoding structure for THz massive MIMO. By adding delay phase components between the radio frequency chain and the frequency-independent PSs, the beam is aligned with the target physical direction over the entire bandwidth to reduce the loss caused by beam splitting effect. Furthermore, we employ a hardware structure by using true-time-delayers (TTDs) to realize the concept of frequency-dependent phase shifts. Theoretical analysis and simulation results have shown that it can increase communication performance and make up for the performance loss caused by the dual-function trade-off of communication radar to a certain extent.

21-40hit(18690hit)