The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] form(3161hit)

61-80hit(3161hit)

  • An In-Vehicle Auditory Signal Evaluation Platform based on a Driving Simulator

    Fuma SAWA  Yoshinori KAMIZONO  Wataru KOBAYASHI  Ittetsu TANIGUCHI  Hiroki NISHIKAWA  Takao ONOYE  

     
    PAPER-Acoustics

      Pubricized:
    2023/05/22
      Vol:
    E106-A No:11
      Page(s):
    1368-1375

    Advanced driver-assistance systems (ADAS) generally play an important role to support safe drive by detecting potential risk factors beforehand and informing the driver of them. However, if too many services in ADAS rely on visual-based technologies, the driver becomes increasingly burdened and exhausted especially on their eyes. The drivers should be back out of monitoring tasks other than significantly important ones in order to alleviate the burden of the driver as long as possible. In-vehicle auditory signals to assist the safe drive have been appealing as another approach to altering visual suggestions in recent years. In this paper, we developed an in-vehicle auditory signals evaluation platform in an existing driving simulator. In addition, using in-vehicle auditory signals, we have demonstrated that our developed platform has highlighted the possibility to partially switch from only visual-based tasks to mixing with auditory-based ones for alleviating the burden on drivers.

  • Evaluating Energy Consumption of Internet Services Open Access

    Leif Katsuo OXENLØWE  Quentin SAUDAN  Jasper RIEBESEHL  Mujtaba ZAHIDY  Smaranika SWAIN  

     
    INVITED PAPER

      Pubricized:
    2023/06/15
      Vol:
    E106-B No:11
      Page(s):
    1036-1043

    This paper summarizes recent reports on the internet's energy consumption and the internet's benefits on climate actions. It discusses energy-efficiency and the need for a common standard for evaluating the climate impact of future communication technologies and suggests a model that can be adapted to different internet applications such as streaming, online reading and downloading. The two main approaches today are based on how much data is transmitted or how much time the data is under way. The paper concludes that there is a need for a standardized method to estimate energy consumption and CO2 emission related to internet services. This standard should include a method for energy-optimizing future networks, where every Wh will be scrutinized.

  • 40-GHz Band Photodiode-Integrated Phased Array Antenna Module for Analog-Radio over Fiber toward Beyond 5G Open Access

    Shinji NIMURA  Shota ISHIMURA  Kazuki TANAKA  Kosuke NISHIMURA  Ryo INOHARA  

     
    INVITED PAPER

      Pubricized:
    2023/05/17
      Vol:
    E106-B No:11
      Page(s):
    1050-1058

    In 5th generation (5G) and Beyond 5G mobile communication systems, it is expected that numerous antennas will be densely deployed to realize ultra-broadband communication and uniform coverage. However, as the number of antennas increases, total power consumption of all antennas will also increase, which leads to a negative impact on the environment and operating costs of telecommunication operators. Thus, it is necessary to simplify an antenna structure to suppress the power consumption of each antenna. On the other hand, as a way to realize ultra-broadband communication, millimeter waves will be utilized because they can transmit signals with a broader bandwidth than lower frequencies. However, since millimeter waves have a large propagation loss, a propagation distance is shorter than that of low frequencies. Therefore, in order to extend the propagation distance, it is necessary to increase an equivalent isotropic radiated power by beamforming with phased array antenna. In this paper, a phased antenna array module in combined with analog radio over fiber (A-RoF) technology for 40-GHz millimeter wave is developed and evaluated for the first time. An 8×8 phased array antenna for 40-GHz millimeter wave with integrated photodiodes and RF chains has been developed, and end-to-end transmission experiment including 20km A-RoF transmission and 3-m over-the-air transmission from the developed phased array antenna has been conducted. The results showed that the 40-GHz RF signal after the end-to-end transmission satisfied the criteria of 3GPP signal quality requirements within ±50 degrees of main beam direction.

  • Real-Time Detection of Fiber Bending and/or Optical Filter Shift by Machine-Learning of Tapped Raw Digital Coherent Optical Signals

    Yuichiro NISHIKAWA  Shota NISHIJIMA  Akira HIRANO  

     
    PAPER

      Pubricized:
    2023/05/19
      Vol:
    E106-B No:11
      Page(s):
    1065-1073

    We have proposed autonomous network diagnosis platform for operation of future large capacity and virtualized network, including 5G and beyond 5G services. As for the one candidate of information collection and analyzing function blocks in the platform, we proposed novel optical sensing techniques that utilized tapped raw signal data acquired from digital coherent optical receivers. The raw signal data is captured before various digital signal processing for demodulation. Therefore, it contains various waveform deformation and/or noise as it experiences through transmission fibers. In this paper, we examined to detect two possible failures in transmission lines including fiber bending and optical filter shift by analyzing the above-mentioned raw signal data with the help of machine learning. For the purpose, we have implemented Docker container applications in WhiteBox Cassini to acquire real-time raw signal data. We generated CNN model for the detections in off-line processing and used them for real-time detections. We have confirmed successful detection of optical fiber bend and/or optical filter shift in real-time with high accuracy. Also, we evaluated their tolerance against ASE noise and invented novel approach to improve detection accuracy. In addition to that, we succeeded to detect them even in the situation of simultaneous occurrence of those failures.

  • Spherical Style Deformation on Single Component Models

    Xuemei FENG  Qing FANG  Kouichi KONNO  Zhiyi ZHANG  Katsutsugu MATSUYAMA  

     
    PAPER-Computer Graphics

      Pubricized:
    2023/08/22
      Vol:
    E106-D No:11
      Page(s):
    1891-1905

    In this study, we present a spherical style deformation algorithm to be applied on single component models that can deform the models with spherical style, while preserving the local details of the original models. Because 3D models have complex skeleton structures that consist of many components, the deformation around connections between each single component is complicated, especially preventing mesh self-intersections. To the best of our knowledge, there does not exist not only methods to achieve a spherical style in a 3D model consisting of multiple components but also methods suited to a single component. In this study, we focus on spherical style deformation of single component models. Accordingly, we propose a deformation method that transforms the input model with the spherical style, while preserving the local details of the input model. Specifically, we define an energy function that combines the as-rigid-as-possible (ARAP) method and spherical features. The spherical term is defined as l2-regularization on a linear feature; accordingly, the corresponding optimization can be solved efficiently. We also observed that the results of our deformation are dependent on the quality of the input mesh. For instance, when the input mesh consists of many obtuse triangles, the spherical style deformation method fails. To address this problem, we propose an optional deformation method based on convex hull proxy model as the complementary deformation method. Our proxy method constructs a proxy model of the input model and applies our deformation method to the proxy model to deform the input model by projection and interpolation. We have applied our proposed method to simple and complex shapes, compared our experimental results with the 3D geometric stylization method of normal-driven spherical shape analogies, and confirmed that our method successfully deforms models that are smooth, round, and curved. We also discuss the limitations and problems of our algorithm based on the experimental results.

  • Line Segment Detection Based on False Peak Suppression and Local Hough Transform and Application to Nuclear Emulsion

    Ye TIAN  Mei HAN  Jinyi ZHANG  

    This article has been retracted at the request of the authors.
     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2023/08/09
      Vol:
    E106-D No:11
      Page(s):
    1854-1867

    This paper mainly proposes a line segment detection method based on pseudo peak suppression and local Hough transform, which has good noise resistance and can solve the problems of short line segment missing detection, false detection, and oversegmentation. In addition, in response to the phenomenon of uneven development in nuclear emulsion tomographic images, this paper proposes an image preprocessing process that uses the “Difference of Gaussian” method to reduce noise and then uses the standard deviation of the gray value of each pixel to bundle and unify the gray value of each pixel, which can robustly obtain the linear features in these images. The tests on the actual dataset of nuclear emulsion tomographic images and the public YorkUrban dataset show that the proposed method can effectively improve the accuracy of convolutional neural network or vision in transformer-based event classification for alpha-decay events in nuclear emulsion. In particular, the line segment detection method in the proposed method achieves optimal results in both accuracy and processing speed, which also has strong generalization ability in high quality natural images.

  • Gaussian Mixture Bandpass Filter Design for Narrow Passband Width by Using a FIR Recursive Filter

    Yukihiko YAMASHITA  

     
    PAPER-Digital Signal Processing

      Pubricized:
    2023/04/11
      Vol:
    E106-A No:10
      Page(s):
    1277-1285

    Bandpass filters (BPFs) are very important to extract target signals and eliminate noise from the received signals. A BPF of which frequency characteristics is a sum of Gaussian functions is called the Gaussian mixture BPF (GMBPF). In this research, we propose to implement the GMBPF approximately by the sum of several frequency components of the sliding Fourier transform (SFT) or the attenuated SFT (ASFT). Because a component of the SFT/ASFT can be approximately realized using the finite impulse response (FIR) recursive filters, its calculation complexity does not depend on the length of the impulse response. The property makes GMBPF ideal for narrow bandpass filtering applications. We conducted experiments to demonstrate the advantages of the proposed GMBPF over FIR filters designed by a MATLAB function with regard to the computational complexity.

  • Further Results on Autocorrelation of Vectorial Boolean Functions

    Zeyao LI  Niu JIANG  Zepeng ZHUO  

     
    PAPER-Cryptography and Information Security

      Pubricized:
    2023/03/27
      Vol:
    E106-A No:10
      Page(s):
    1305-1310

    In this paper, we study the properties of the sum-of-squares indicator of vectorial Boolean functions. Firstly, we give the upper bound of $sum_{uin mathbb{F}_2^n,vin mathbb{F}_2^m}mathcal{W}_F^3(u,v)$. Secondly, based on the Walsh-Hadamard transform, we give a secondary construction of vectorial bent functions. Further, three kinds of sum-of-squares indicators of vectorial Boolean functions are defined by autocorrelation function and the lower and upper bounds of the sum-of-squares indicators are derived. Finally, we study the sum-of-squares indicators with respect to several equivalence relations, and get the sum-of-squares indicator which have the best cryptographic properties.

  • General Closed-Form Transfer Function Expressions for Fast Filter Bank

    Jinguang HAO  Gang WANG  Honggang WANG  Lili WANG  Xuefeng LIU  

     
    LETTER-Digital Signal Processing

      Pubricized:
    2023/04/14
      Vol:
    E106-A No:10
      Page(s):
    1354-1357

    The existing literature focuses on the applications of fast filter bank due to its excellent frequency responses with low complexity. However, the topic is not addressed related to the general transfer function expressions of the corresponding subfilters for a specific channel. To do this, in this paper, general closed-form transfer function expressions for fast filter bank are derived. Firstly, the cascaded structure of fast filter bank is modelled by a binary tree, with which the index of the subfilter at each stage within the channel can be determined. Then the transfer functions for the two outputs of a subfilter are expressed in a unified form. Finally, the general closed-form transfer functions for the channel and its corresponding subfilters are obtained by variables replacement if the prototype lowpass filters for the stages are given. Analytical results and simulations verify the general expressions. With such closed-form expressions lend themselves easily to analysis and direct computation of the transfer functions and the frequency responses without the structure graph.

  • Theoretical Analysis of Fully Wireless-Power-Transfer Node Networks Open Access

    Hiroshi SAITO  

     
    PAPER-Fundamental Theories for Communications

      Pubricized:
    2023/05/10
      Vol:
    E106-B No:10
      Page(s):
    864-872

    The performance of a fully wireless-power-transfer (WPT) node network, in which each node transfers (and receives) energy through a wireless channel when it has sufficient (and insufficient) energy in its battery, was theoretically analyzed. The lost job ratio (LJR), namely, is the ratio of (i) the amount of jobs that cannot be done due to battery of a node running out to (ii) the amount of jobs that should be done, is used as a performance metric. It describes the effect of the battery of each node running out and how much additional energy is needed. Although it is known that WPT can reduce the probability of the battery running out among a few nodes within a small area, the performance of a fully WPT network has not been clarified. By using stochastic geometry and first-passage-time analysis for a diffusion process, the expected LJR was theoretically derived. Numerical examples demonstrate that the key parameters determining the performance of the network are node density, threshold switching of statuses between “transferring energy” and “receiving energy,” and the parameters of power conversion. They also demonstrate the followings: (1) The mean energy stored in the node battery decreases in the networks because of the loss caused by WPT, and a fully WPT network cannot decrease the probability of the battery running out under the current WPT efficiency. (2) When the saturation value of power conversion increases, a fully WPT network can decrease the probability of the battery running out although the mean energy stored in the node battery still decreases in the networks. This result is explained by the fact that the variance of stored energy in each node battery becomes smaller due to transfer of energy from nodes of sufficient energy to nodes of insufficient energy.

  • 1-D and 2-D Beam Steering Arrays Antennas Fed by a Compact Beamforming Network for Millimeter-Wave Communication

    Jean TEMGA  Koki EDAMATSU  Tomoyuki FURUICHI  Mizuki MOTOYOSHI  Takashi SHIBA  Noriharu SUEMATSU  

     
    PAPER-Antennas and Propagation

      Pubricized:
    2023/04/11
      Vol:
    E106-B No:10
      Page(s):
    915-927

    In this article, a new Beamforming Network (BFN) realized in Broadside Coupled Stripline (BCS) is proposed to feed 1×4 and 2×2 arrays antenna at 28 GHZ-Band. The new BFN is composed only of couplers and phase shifters. It doesn't require any crossover compared to the conventional Butler Matrix (BM) which requires two crossovers. The tight coupling and low loss characteristics of the BCS allow a design of a compact and wideband BFN. The new BFN produces the phase differences of (±90°) and (±45°, ±135°) respectively in x- and y-directions. Its integration with a 1×4 linear array antenna reduces the array area by 70% with an improvement of the gain performance compared with the conventional array. The integration with a 2×2 array allows the realization of a full 2-D beam scanning. The proposed concept has been verified experimentally by measuring the fabricated prototypes of the BFN, the 1-D and 2-D patch arrays antennas. The measured 11.5 dBi and 11.3 dBi maximum gains are realized in θ0 = 14° and (θ0, φ0) = (45°,345°) directions respectively for the 1-D and 2-D patch arrays. The physical area of the fabricated BFN is only (0.37λ0×0.3λ0×0.08λ0), while the 1-D array and 2-D array antennas areas without feeding transmission lines are respectively (0.5λ0×2.15λ0×0.08λ0) and (0.9λ0×0.8λ0×0.08λ0).

  • Local-to-Global Structure-Aware Transformer for Question Answering over Structured Knowledge

    Yingyao WANG  Han WANG  Chaoqun DUAN  Tiejun ZHAO  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/06/27
      Vol:
    E106-D No:10
      Page(s):
    1705-1714

    Question-answering tasks over structured knowledge (i.e., tables and graphs) require the ability to encode structural information. Traditional pre-trained language models trained on linear-chain natural language cannot be directly applied to encode tables and graphs. The existing methods adopt the pre-trained models in such tasks by flattening structured knowledge into sequences. However, the serialization operation will lead to the loss of the structural information of knowledge. To better employ pre-trained transformers for structured knowledge representation, we propose a novel structure-aware transformer (SATrans) that injects the local-to-global structural information of the knowledge into the mask of the different self-attention layers. Specifically, in the lower self-attention layers, SATrans focus on the local structural information of each knowledge token to learn a more robust representation of it. In the upper self-attention layers, SATrans further injects the global information of the structured knowledge to integrate the information among knowledge tokens. In this way, the SATrans can effectively learn the semantic representation and structural information from the knowledge sequence and the attention mask, respectively. We evaluate SATrans on the table fact verification task and the knowledge base question-answering task. Furthermore, we explore two methods to combine symbolic and linguistic reasoning for these tasks to solve the problem that the pre-trained models lack symbolic reasoning ability. The experiment results reveal that the methods consistently outperform strong baselines on the two benchmarks.

  • Prior Information Based Decomposition and Reconstruction Learning for Micro-Expression Recognition

    Jinsheng WEI  Haoyu CHEN  Guanming LU  Jingjie YAN  Yue XIE  Guoying ZHAO  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2023/07/13
      Vol:
    E106-D No:10
      Page(s):
    1752-1756

    Micro-expression recognition (MER) draws intensive research interest as micro-expressions (MEs) can infer genuine emotions. Prior information can guide the model to learn discriminative ME features effectively. However, most works focus on researching the general models with a stronger representation ability to adaptively aggregate ME movement information in a holistic way, which may ignore the prior information and properties of MEs. To solve this issue, driven by the prior information that the category of ME can be inferred by the relationship between the actions of facial different components, this work designs a novel model that can conform to this prior information and learn ME movement features in an interpretable way. Specifically, this paper proposes a Decomposition and Reconstruction-based Graph Representation Learning (DeRe-GRL) model to efectively learn high-level ME features. DeRe-GRL includes two modules: Action Decomposition Module (ADM) and Relation Reconstruction Module (RRM), where ADM learns action features of facial key components and RRM explores the relationship between these action features. Based on facial key components, ADM divides the geometric movement features extracted by the graph model-based backbone into several sub-features, and learns the map matrix to map these sub-features into multiple action features; then, RRM learns weights to weight all action features to build the relationship between action features. The experimental results demonstrate the effectiveness of the proposed modules, and the proposed method achieves competitive performance.

  • A Luminance Expansion Method for Displaying HDR Video in SDR Display

    Takashi YAMAZOE  Jinyu TANG  Gin INOUE  Kenji SUGIYAMA  

     
    LETTER-Vision

      Pubricized:
    2023/06/27
      Vol:
    E106-A No:9
      Page(s):
    1220-1223

    HDR video is possible to display the maximum 1200% luminance, however, it is limited in SDR display. In this study, we expand high luminance area considering with perceptual performance to improve a presentation performance of HDR video in the SDR display. As results of objective experiments, it is recognized that the proposed method can improve the presentation performance maximally 0.8dB in WPSNR.

  • Receive Beamforming Designed for Massive Multi-User MIMO Detection via Gaussian Belief Propagation Open Access

    Takanobu DOI  Jun SHIKIDA  Daichi SHIRASE  Kazushi MURAOKA  Naoto ISHII  Takumi TAKAHASHI  Shinsuke IBI  

     
    PAPER

      Pubricized:
    2023/03/08
      Vol:
    E106-B No:9
      Page(s):
    758-767

    This paper proposes two full-digital receive beamforming (BF) methods for low-complexity and high-accuracy uplink signal detection via Gaussian belief propagation (GaBP) at base stations (BSs) adopting massive multi-input multi-output (MIMO) for open radio access network (O-RAN). In addition, beyond fifth generation mobile communication (beyond 5G) systems will increase uplink capacity. In the scenarios such as O-RAN and beyond 5G, it is vital to reduce the cost of the BSs by limiting the bandwidth of fronthaul (FH) links, and the dimensionality reduction of the received signal based on the receive BF at a radio unit is a well-known strategy to reduce the amount of data transported via the FH links. In this paper, we clarify appropriate criteria for designing a BF weight considering the subsequent GaBP signal detection with the proposed methods: singular-value-decomposition-based BF and QR-decomposition-based BF with the aid of discrete-Fourier-transformation-based spreading. Both methods achieve the dimensionality reduction without compromising the desired signal power by taking advantage of a null space of channels. The proposed receive BF methods reduce correlations between the received signals in the BF domain, which improves the robustness of GaBP against spatial correlation among fading coefficients. Simulation results assuming realistic BS and user equipment arrangement show that the proposed methods improve detection capability while significantly reducing the computational cost.

  • A 2-D Beam Scanning Array Antenna Fed by a Compact 16-Way 2-D Beamforming Network in Broadside Coupled Stripline

    Jean TEMGA  Tomoyuki FURUICHI  Takashi SHIBA  Noriharu SUEMATSU  

     
    PAPER

      Pubricized:
    2023/03/28
      Vol:
    E106-B No:9
      Page(s):
    768-777

    A 2-D beam scanning array antenna fed by a compact 16-way 2-D beamforming network (BFN) designed in Broadside Coupled Stripline (BCS) is addressed. The proposed 16-way 2-D BFN is formed by interconnecting two groups of 4x4 Butler Matrix (BM). Each group is composed of four compact 4x4 BMs. The critical point of the design is to propose a simple and compact 4x4 BM without crossover in BCS to achieve a better transmission coefficient of the 16-way 2-D BFN with reduced size of merely 0.8λ0×0.8λ0×0.04λ0. Moreover, the complexity of the interface connection between the 2-D BFN and the 4x4 patch array antenna is reduced by using probe feeding. The 16-way 2-D BFN is able to produce the phase shift of ±45°, and ±135° in x- and y- directions. The 2-D BFN is easily integrated under the 4x4 patch array to form a 2-D phased array capable of switching 16 beams in both elevation and azimuth directions. The area of the proposed 2-D beam scanning array antenna module has been significantly reduced to 2λ0×2λ0×0.04λ0. A prototype operating in the frequency range of 4-6GHz is fabricated and measured to validate the concept. The measurement results agree well with the simulations.

  • Performance of Broadcast Channel Using Hierarchical Modulation in OFDM Downlink

    Daiki MITAMURA  Mamoru SAWAHASHI  Yoshihisa KISHIYAMA  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2023/03/22
      Vol:
    E106-B No:9
      Page(s):
    844-854

    This paper proposes a multiple code block transmission scheme using hierarchical modulation (HM) for a broadcast channel in the orthogonal frequency division multiplexing (OFDM) downlink. We investigate the average bit error rate (BER) performance of two-layer HM using 16 quadrature amplitude modulation (QAM) and three-layer HM using 64QAM in multipath Rayleigh fading channels. In multiple code block transmission using HM, the basic information bits are demodulated and decoded to all users within a cell that satisfy the bit error rate (BER) requirement. Hence, we investigate non-uniform QAM constellations to find one that suppresses the loss in the average BER of the basic information bits for HM to a low level compared to that using the original constellation in which only the basic information bits are transmitted while simultaneously minimizing the loss in the average BER of the secondary and tertiary information bits from the original constellations in which the information bits of the respective layers are transmitted alone. Based on the path loss equations in the Urban Macro and Rural Macro scenarios, we also investigate the maximum distance from a base station (BS) for the information bits of each layer to attain the required average received signal-to-noise power ratio (SNR) that achieves the average BER of 10-3.

  • File Tracking and Visualization Methods Using a Network Graph to Prevent Information Leakage

    Tomohiko YANO  Hiroki KUZUNO  Kenichi MAGATA  

     
    PAPER

      Pubricized:
    2023/06/20
      Vol:
    E106-D No:9
      Page(s):
    1339-1353

    Information leakage is a significant threat to organizations, and effective measures are required to protect information assets. As confidential files can be leaked through various paths, a countermeasure is necessary to prevent information leakage from various paths, from simple drag-and-drop movements to complex transformations such as encryption and encoding. However, existing methods are difficult to take countermeasures depending on the information leakage paths. Furthermore, it is also necessary to create a visualization format that can find information leakage easily and a method that can remove unnecessary parts while leaving the necessary parts of information leakage to improve visibility. This paper proposes a new information leakage countermeasure method that incorporates file tracking and visualization. The file tracking component recursively extracts all events related to confidential files. Therefore, tracking is possible even when data have transformed significantly from the original file. The visualization component represents the results of file tracking as a network graph. This allows security administrators to find information leakage even if a file is transformed through multiple events. Furthermore, by pruning the network graph using the frequency of past events, the indicators of information leakage can be more easily found by security administrators. In experiments conducted, network graphs were generated for two information leakage scenarios in which files were moved and copied. The visualization results were obtained according to the scenarios, and the network graph was pruned to reduce vertices by 17.6% and edges by 10.9%.

  • Framework of Measuring Engagement with Access Logs Under Tracking Prevention for Affiliate Services

    Motoi IWASHITA  Hirotaka SUGITA  

     
    PAPER

      Pubricized:
    2023/05/24
      Vol:
    E106-D No:9
      Page(s):
    1452-1460

    In recent years, the market size for internet advertising has been increasing with the expansion of the Internet. Among the internet advertising technologies, affiliate services, which are a performance-based service, use cookies to track and measure the performance of affiliates. However, for the purpose of safeguarding personal information, cookies tend to be regulated, which leads to concerns over whether normal tracking by cookies works as intended. Therefore, in this study, the recent problems from the perspectives of affiliates, affiliate service providers, and advertisers are extracted, and a framework of cookie-independent measuring engagement method using access logs is proposed and open issues are discussed for future affiliate services.

  • Siamese Transformer for Saliency Prediction Based on Multi-Prior Enhancement and Cross-Modal Attention Collaboration

    Fazhan YANG  Xingge GUO  Song LIANG  Peipei ZHAO  Shanhua LI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2023/06/20
      Vol:
    E106-D No:9
      Page(s):
    1572-1583

    Visual saliency prediction has improved dramatically since the advent of convolutional neural networks (CNN). Although CNN achieves excellent performance, it still cannot learn global and long-range contextual information well and lacks interpretability due to the locality of convolution operations. We proposed a saliency prediction model based on multi-prior enhancement and cross-modal attention collaboration (ME-CAS). Concretely, we designed a transformer-based Siamese network architecture as the backbone for feature extraction. One of the transformer branches captures the context information of the image under the self-attention mechanism to obtain a global saliency map. At the same time, we build a prior learning module to learn the human visual center bias prior, contrast prior, and frequency prior. The multi-prior input to another Siamese branch to learn the detailed features of the underlying visual features and obtain the saliency map of local information. Finally, we use an attention calibration module to guide the cross-modal collaborative learning of global and local information and generate the final saliency map. Extensive experimental results demonstrate that our proposed ME-CAS achieves superior results on public benchmarks and competitors of saliency prediction models. Moreover, the multi-prior learning modules enhance images express salient details, and model interpretability.

61-80hit(3161hit)