The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] TE(21534hit)

521-540hit(21534hit)

  • Compression of Vehicle and Pedestrian Detection Network Based on YOLOv3 Model

    Lie GUO  Yibing ZHAO  Jiandong GAO  

     
    PAPER-Intelligent Transportation Systems

      Pubricized:
    2022/06/22
      Vol:
    E106-D No:5
      Page(s):
    735-745

    The commonly used object detection algorithm based on convolutional neural network is difficult to meet the real-time requirement on embedded platform due to its large size of model, large amount of calculation, and long inference time. It is necessary to use model compression to reduce the amount of network calculation and increase the speed of network inference. This paper conducts compression of vehicle and pedestrian detection network by pruning and removing redundant parameters. The vehicle and pedestrian detection network is trained based on YOLOv3 model by using K-means++ to cluster the anchor boxes. The detection accuracy is improved by changing the proportion of categorical losses and regression losses for each category in the loss function because of the unbalanced number of targets in the dataset. A layer and channel pruning algorithm is proposed by combining global channel pruning thresholds and L1 norm, which can reduce the time cost of the network layer transfer process and the amount of computation. Network layer fusion based on TensorRT is performed and inference is performed using half-precision floating-point to improve the speed of inference. Results show that the vehicle and pedestrian detection compression network pruned 84% channels and 15 Shortcut modules can reduce the size by 32% and the amount of calculation by 17%. While the network inference time can be decreased to 21 ms, which is 1.48 times faster than the network pruned 84% channels.

  • An Improved Real-Time Object Tracking Algorithm Based on Deep Learning Features

    Xianyu WANG  Cong LI  Heyi LI  Rui ZHANG  Zhifeng LIANG  Hai WANG  

     
    PAPER-Object Recognition and Tracking

      Pubricized:
    2022/01/07
      Vol:
    E106-D No:5
      Page(s):
    786-793

    Visual object tracking is always a challenging task in computer vision. During the tracking, the shape and appearance of the target may change greatly, and because of the lack of sufficient training samples, most of the online learning tracking algorithms will have performance bottlenecks. In this paper, an improved real-time algorithm based on deep learning features is proposed, which combines multi-feature fusion, multi-scale estimation, adaptive updating of target model and re-detection after target loss. The effectiveness and advantages of the proposed algorithm are proved by a large number of comparative experiments with other excellent algorithms on large benchmark datasets.

  • Learning Pixel Perception for Identity and Illumination Consistency Face Frontalization in the Wild

    Yongtang BAO  Pengfei ZHOU  Yue QI  Zhihui WANG  Qing FAN  

     
    PAPER-Person Image Generation

      Pubricized:
    2022/06/21
      Vol:
    E106-D No:5
      Page(s):
    794-803

    A frontal and realistic face image was synthesized from a single profile face image. It has a wide range of applications in face recognition. Although the frontal face method based on deep learning has made substantial progress in recent years, there is still no guarantee that the generated face has identity consistency and illumination consistency in a significant posture. This paper proposes a novel pixel-based feature regression generative adversarial network (PFR-GAN), which can learn to recover local high-frequency details and preserve identity and illumination frontal face images in an uncontrolled environment. We first propose a Reslu block to obtain richer feature representation and improve the convergence speed of training. We then introduce a feature conversion module to reduce the artifacts caused by face rotation discrepancy, enhance image generation quality, and preserve more high-frequency details of the profile image. We also construct a 30,000 face pose dataset to learn about various uncontrolled field environments. Our dataset includes ages of different races and wild backgrounds, allowing us to handle other datasets and obtain better results. Finally, we introduce a discriminator used for recovering the facial structure of the frontal face images. Quantitative and qualitative experimental results show our PFR-GAN can generate high-quality and high-fidelity frontal face images, and our results are better than the state-of-art results.

  • Enhanced Full Attention Generative Adversarial Networks

    KaiXu CHEN  Satoshi YAMANE  

     
    LETTER-Core Methods

      Pubricized:
    2023/01/12
      Vol:
    E106-D No:5
      Page(s):
    813-817

    In this paper, we propose improved Generative Adversarial Networks with attention module in Generator, which can enhance the effectiveness of Generator. Furthermore, recent work has shown that Generator conditioning affects GAN performance. Leveraging this insight, we explored the effect of different normalization (spectral normalization, instance normalization) on Generator and Discriminator. Moreover, an enhanced loss function called Wasserstein Divergence distance, can alleviate the problem of difficult to train module in practice.

  • Prediction of Driver's Visual Attention in Critical Moment Using Optical Flow

    Rebeka SULTANA  Gosuke OHASHI  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/01/26
      Vol:
    E106-D No:5
      Page(s):
    1018-1026

    In recent years, driver's visual attention has been actively studied for driving automation technology. However, the number of models is few to perceive an insight understanding of driver's attention in various moments. All attention models process multi-level image representations by a two-stream/multi-stream network, increasing the computational cost due to an increment of model parameters. However, multi-level image representation such as optical flow plays a vital role in tasks involving videos. Therefore, to reduce the computational cost of a two-stream network and use multi-level image representation, this work proposes a single stream driver's visual attention model for a critical situation. The experiment was conducted using a publicly available critical driving dataset named BDD-A. Qualitative results confirm the effectiveness of the proposed model. Moreover, quantitative results highlight that the proposed model outperforms state-of-the-art visual attention models according to CC and SIM. Extensive ablation studies verify the presence of optical flow in the model, the position of optical flow in the spatial network, the convolution layers to process optical flow, and the computational cost compared to a two-stream model.

  • Clustering-Based Neural Network for Carbon Dioxide Estimation

    Conghui LI  Quanlin ZHONG  Baoyin LI  

     
    LETTER-Intelligent Transportation Systems

      Pubricized:
    2022/08/01
      Vol:
    E106-D No:5
      Page(s):
    829-832

    In recent years, the applications of deep learning have facilitated the development of green intelligent transportation system (ITS), and carbon dioxide estimation has been one of important issues in green ITS. Furthermore, the carbon dioxide estimation could be modelled as the fuel consumption estimation. Therefore, a clustering-based neural network is proposed to analyze clusters in accordance with fuel consumption behaviors and obtains the estimated fuel consumption and the estimated carbon dioxide. In experiments, the mean absolute percentage error (MAPE) of the proposed method is only 5.61%, and the performance of the proposed method is higher than other methods.

  • Maximizing External Action with Information Provision Over Multiple Rounds in Online Social Networks

    Masaaki MIYASHITA  Norihiko SHINOMIYA  Daisuke KASAMATSU  Genya ISHIGAKI  

     
    PAPER

      Pubricized:
    2023/02/03
      Vol:
    E106-D No:5
      Page(s):
    847-855

    Online social networks have increased their impact on the real world, which motivates information senders to control the propagation process of information to promote particular actions of online users. However, the existing works on information provisioning seem to oversimplify the users' decision-making process that involves information reception, internal actions of social networks, and external actions of social networks. In particular, characterizing the best practices of information provisioning that promotes the users' external actions is a complex task due to the complexity of the propagation process in OSNs, even when the variation of information is limited. Therefore, we propose a new information diffusion model that distinguishes user behaviors inside and outside of OSNs, and formulate an optimization problem to maximize the number of users who take the external actions by providing information over multiple rounds. Also, we define a robust provisioning policy for the problem, which selects a message sequence to maximize the expected number of desired users under the probabilistic uncertainty of OSN settings. Our experiment results infer that there could exist an information provisioning policy that achieves nearly-optimal solutions in different types of OSNs. Furthermore, we empirically demonstrate that the proposed robust policy can be such a universally optimal solution.

  • MicroState: An Anomaly Localization Method in Heterogeneous Microservice Systems

    Jingjing YANG  Yuchun GUO  Yishuai CHEN  

     
    PAPER

      Pubricized:
    2023/01/13
      Vol:
    E106-D No:5
      Page(s):
    904-912

    Microservice architecture has been widely adopted for large-scale applications because of its benefits of scalability, flexibility, and reliability. However, microservice architecture also proposes new challenges in diagnosing root causes of performance degradation. Existing methods rely on labeled data and suffer a high computation burden. This paper proposes MicroState, an unsupervised and lightweight method to pinpoint the root cause with detailed descriptions. We decompose root cause diagnosis into element location and detailed reason identification. To mitigate the impact of element heterogeneity and dynamic invocations, MicroState generates elements' invoked states, quantifies elements' abnormality by warping-based state comparison, and infers the anomalous group. MicroState locates the root cause element with the consideration of anomaly frequency and persistency. To locate the anomalous metric from diverse metrics, MicroState extracts metrics' trend features and evaluates metrics' abnormality based on their trend feature variation, which reduces the reliance on anomaly detectors. Our experimental evaluation based on public data of the Artificial intelligence for IT Operations Challenge (AIOps Challenge 2020) shows that MicroState locates root cause elements with 87% precision and diagnoses anomaly reasons accurately.

  • Wide-Area and Long-Term Agricultural Sensing System Utilizing UAV and Wireless Technologies

    Hiroshi YAMAMOTO  Shota NISHIURA  Yoshihiro HIGASHIURA  

     
    INVITED PAPER

      Pubricized:
    2023/02/08
      Vol:
    E106-D No:5
      Page(s):
    914-926

    In order to improve crop production and efficiency of farming operations, an IoT (Internet of Things) system for remote monitoring has been attracting a lot of attention. The existing studies have proposed agricultural sensing systems such that environmental information is collected from many sensor nodes installed in farmland through wireless communications (e.g., Wi-Fi, ZigBee). Especially, Low-Power Wide-Area (LPWA) is a focus as a candidate for wireless communication that enables the support of vast farmland for a long time. However, it is difficult to achieve long distance communication even when using the LPWA because a clear line of sight is difficult to keep due to many obstacles such as crops and agricultural machinery in the farmland. In addition, a sensor node cannot run permanently on batteries because the battery capacity is not infinite. On the other hand, an Unmanned Aerial Vehicle (UAV) that can move freely and stably in the sky has been leveraged for agricultural sensor network systems. By utilizing a UAV as the gateway of the sensor network, the gateway can move to the appropriate location to ensure a clear line of sight from the sensor nodes. In addition, the coverage area of the sensor network can be expanded as the UAV travels over a wide area even when short-range and ultra-low-power wireless communication (e.g., Bluetooth Low Energy (BLE)) is adopted. Furthermore, various wireless technologies (e.g., wireless power transfer, wireless positioning) that have the possibility to improve the coverage area and the lifetime of the sensor network have become available. Therefore, in this study, we propose and develop two kinds of new agricultural sensing systems utilizing a UAV and various wireless technologies. The objective of the proposed system is to provide the solution for achieving the wide-area and long-term sensing for the vast farmland. Depending on which problem is in a priority, the proposed system chooses one of two designs. The first design of the system attempts to achieve the wide-area sensing, and so it is based on the LPWA for wireless communication. In the system, to efficiently collect the environmental information, the UAV autonomously travels to search for the locations to maintain the good communication properties of the LPWA to the sensor nodes dispersed over a wide area of farmland. In addition, the second design attempts to achieve the long-term sensing, so it is based on BLE, a typical short-range and ultra-low-power wireless communication technology. In this design, the UAV autonomously flies to the location of sensor nodes and supplies power to them using a wireless power transfer technology for achieving a battery-less sensor node. Through experimental evaluations using a prototype system, it is confirmed that the combination of the UAV and various wireless technologies has the possibility to achieve a wide-area and long-term sensing system for monitoring vast farmland.

  • Parallelization on a Minimal Substring Search Algorithm for Regular Expressions

    Yosuke OBE  Hiroaki YAMAMOTO  Hiroshi FUJIWARA  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2023/02/08
      Vol:
    E106-D No:5
      Page(s):
    952-958

    Let us consider a regular expression r of length m and a text string T of length n over an alphabet Σ. Then, the RE minimal substring search problem is to find all minimal substrings of T matching r. Yamamoto proposed O(mn) time and O(m) space algorithm using a Thompson automaton. In this paper, we improve Yamamoto's algorithm by introducing parallelism. The proposed algorithm runs in O(mn) time in the worst case and in O(mn/p) time in the best case, where p denotes the number of processors. Besides, we show a parameter related to the parallel time of the proposed algorithm. We evaluate the algorithm experimentally.

  • High-Precision Mobile Robot Localization Using the Integration of RAR and AKF

    Chen WANG  Hong TAN  

     
    PAPER-Information Network

      Pubricized:
    2023/01/24
      Vol:
    E106-D No:5
      Page(s):
    1001-1009

    The high-precision indoor positioning technology has gradually become one of the research hotspots in indoor mobile robots. Relax and Recover (RAR) is an indoor positioning algorithm using distance observations. The algorithm restores the robot's trajectory through curve fitting and does not require time synchronization of observations. The positioning can be successful with few observations. However, the algorithm has the disadvantages of poor resistance to gross errors and cannot be used for real-time positioning. In this paper, while retaining the advantages of the original algorithm, the RAR algorithm is improved with the adaptive Kalman filter (AKF) based on the innovation sequence to improve the anti-gross error performance of the original algorithm. The improved algorithm can be used for real-time navigation and positioning. The experimental validation found that the improved algorithm has a significant improvement in accuracy when compared to the original RAR. When comparing to the extended Kalman filter (EKF), the accuracy is also increased by 12.5%, which can be used for high-precision positioning of indoor mobile robots.

  • Chinese Named Entity Recognition Method Based on Dictionary Semantic Knowledge Enhancement

    Tianbin WANG  Ruiyang HUANG  Nan HU  Huansha WANG  Guanghan CHU  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/02/15
      Vol:
    E106-D No:5
      Page(s):
    1010-1017

    Chinese Named Entity Recognition is the fundamental technology in the field of the Chinese Natural Language Process. It is extensively adopted into information extraction, intelligent question answering, and knowledge graph. Nevertheless, due to the diversity and complexity of Chinese, most Chinese NER methods fail to sufficiently capture the character granularity semantics, which affects the performance of the Chinese NER. In this work, we propose DSKE-Chinese NER: Chinese Named Entity Recognition based on Dictionary Semantic Knowledge Enhancement. We novelly integrate the semantic information of character granularity into the vector space of characters and acquire the vector representation containing semantic information by the attention mechanism. In addition, we verify the appropriate number of semantic layers through the comparative experiment. Experiments on public Chinese datasets such as Weibo, Resume and MSRA show that the model outperforms character-based LSTM baselines.

  • Subjective Difficulty Estimation of Educational Comics Using Gaze Features

    Kenya SAKAMOTO  Shizuka SHIRAI  Noriko TAKEMURA  Jason ORLOSKY  Hiroyuki NAGATAKI  Mayumi UEDA  Yuki URANISHI  Haruo TAKEMURA  

     
    PAPER-Educational Technology

      Pubricized:
    2023/02/03
      Vol:
    E106-D No:5
      Page(s):
    1038-1048

    This study explores significant eye-gaze features that can be used to estimate subjective difficulty while reading educational comics. Educational comics have grown rapidly as a promising way to teach difficult topics using illustrations and texts. However, comics include a variety of information on one page, so automatically detecting learners' states such as subjective difficulty is difficult with approaches such as system log-based detection, which is common in the Learning Analytics field. In order to solve this problem, this study focused on 28 eye-gaze features, including the proposal of three new features called “Variance in Gaze Convergence,” “Movement between Panels,” and “Movement between Tiles” to estimate two degrees of subjective difficulty. We then ran an experiment in a simulated environment using Virtual Reality (VR) to accurately collect gaze information. We extracted features in two unit levels, page- and panel-units, and evaluated the accuracy with each pattern in user-dependent and user-independent settings, respectively. Our proposed features achieved an average F1 classification-score of 0.721 and 0.742 in user-dependent and user-independent models at panel unit levels, respectively, trained by a Support Vector Machine (SVM).

  • New Training Method for Non-Dominant Hand Pitching Motion Based on Reversal Trajectory of Dominant Hand Pitching Motion Using AR and Vibration

    Masato SOGA  Taiki MORI  

     
    PAPER-Educational Technology

      Pubricized:
    2023/02/08
      Vol:
    E106-D No:5
      Page(s):
    1049-1058

    In this paper, we propose a new method for non-dominant limb training. The method is that a learner aims at a motion which is generated by reversing his/her own motion of dominant limb, when he/she tries to train himself/herself for non-dominant limb training. In addition, we designed and developed interface for the new method which can select feedback types. One is an interface using AR and sound, and the other is an interface using AR and vibration. We found that vibration feedback was effective for non-dominant hand training of pitching motion, while sound feedback was not so effective as vibration.

  • Selective Learning of Human Pose Estimation Based on Multi-Scale Convergence Network

    Wenkai LIU  Cuizhu QIN  Menglong WU  Wenle BAI  Hongxia DONG  

     
    LETTER-Human-computer Interaction

      Pubricized:
    2023/02/15
      Vol:
    E106-D No:5
      Page(s):
    1081-1084

    Pose estimation is a research hot spot in computer vision tasks and the key to computer perception of human activities. The core concept of human pose estimation involves describing the motion of the human body through major joint points. Large receptive fields and rich spatial information facilitate the keypoint localization task, and how to capture features on a larger scale and reintegrate them into the feature space is a challenge for pose estimation. To address this problem, we propose a multi-scale convergence network (MSCNet) with a large receptive field and rich spatial information. The structure of the MSCNet is based on an hourglass network that captures information at different scales to present a consistent understanding of the whole body. The multi-scale receptive field (MSRF) units provide a large receptive field to obtain rich contextual information, which is then selectively enhanced or suppressed by the Squeeze-Excitation (SE) attention mechanism to flexibly perform the pose estimation task. Experimental results show that MSCNet scores 73.1% AP on the COCO dataset, an 8.8% improvement compared to the mainstream CMUPose method. Compared to the advanced CPN, the MSCNet has 68.2% of the computational complexity and only 55.4% of the number of parameters.

  • Speech Emotion Recognition Using Multihead Attention in Both Time and Feature Dimensions

    Yue XIE  Ruiyu LIANG  Zhenlin LIANG  Xiaoyan ZHAO  Wenhao ZENG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2023/02/21
      Vol:
    E106-D No:5
      Page(s):
    1098-1101

    To enhance the emotion feature and improve the performance of speech emotion recognition, an attention mechanism is employed to recognize the important information in both time and feature dimensions. In the time dimension, multi-heads attention is modified with the last state of the long short-term memory (LSTM)'s output to match the time accumulation characteristic of LSTM. In the feature dimension, scaled dot-product attention is replaced with additive attention that refers to the method of the state update of LSTM to construct multi-heads attention. This means that a nonlinear change replaces the linear mapping in classical multi-heads attention. Experiments on IEMOCAP datasets demonstrate that the attention mechanism could enhance emotional information and improve the performance of speech emotion recognition.

  • Convolution Block Feature Addition Module (CBFAM) for Lightweight and Fast Object Detection on Non-GPU Devices

    Min Ho KWAK  Youngwoo KIM  Kangin LEE  Jae Young CHOI  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2023/01/24
      Vol:
    E106-D No:5
      Page(s):
    1106-1110

    This letter proposes a novel lightweight deep learning object detector named LW-YOLOv4-tiny, which incorporates the convolution block feature addition module (CBFAM). The novelty of LW-YOLOv4-tiny is the use of channel-wise convolution and element-wise addition in the CBFAM instead of utilizing the concatenation of different feature maps. The model size and computation requirement are reduced by up to 16.9 Mbytes, 5.4 billion FLOPs (BFLOPS), and 11.3 FPS, which is 31.9%, 22.8%, and 30% smaller and faster than the most recent version of YOLOv4-tiny. From the MSCOCO2017 and PASCAL VOC2012 benchmarks, LW-YOLOv4-tiny achieved 40.2% and 69.3% mAP, respectively.

  • Fish Detecting Using YOLOv4 and CVAE in Aquaculture Ponds with a Non-Uniform Strong Reflection Background

    Meng ZHAO  Junfeng WU  Hong YU  Haiqing LI  Jingwen XU  Siqi CHENG  Lishuai GU  Juan MENG  

     
    PAPER-Smart Agriculture

      Pubricized:
    2022/11/07
      Vol:
    E106-D No:5
      Page(s):
    715-725

    Accurate fish detection is of great significance in aquaculture. However, the non-uniform strong reflection in aquaculture ponds will affect the precision of fish detection. This paper combines YOLOv4 and CVAE to accurately detect fishes in the image with non-uniform strong reflection, in which the reflection in the image is removed at first and then the reflection-removed image is provided for fish detecting. Firstly, the improved YOLOv4 is applied to detect and mask the strong reflective region, to locate and label the reflective region for the subsequent reflection removal. Then, CVAE is combined with the improved YOLOv4 for inferring the priori distribution of the Reflection region and restoring the Reflection region by the distribution so that the reflection can be removed. For further improving the quality of the reflection-removed images, the adversarial learning is appended to CVAE. Finally, YOLOV4 is used to detect fishes in the high quality image. In addition, a new image dataset of pond cultured takifugu rubripes is constructed,, which includes 1000 images with fishes annotated manually, also a synthetic dataset including 2000 images with strong reflection is created and merged with the generated dataset for training and verifying the robustness of the proposed method. Comprehensive experiments are performed to compare the proposed method with the state-of-the-art fish detecting methods without reflection removal on the generated dataset. The results show that the fish detecting precision and recall of the proposed method are improved by 2.7% and 2.4% respectively.

  • Speech Enhancement for Laser Doppler Vibrometer Dealing with Unknown Irradiated Objects

    Chengkai CAI  Kenta IWAI  Takanobu NISHIURA  

     
    PAPER-Digital Signal Processing

      Pubricized:
    2022/09/30
      Vol:
    E106-A No:4
      Page(s):
    647-656

    The acquisition of distant sound has always been a hot research topic. Since sound is caused by vibration, one of the best methods for measuring distant sound is to use a laser Doppler vibrometer (LDV). This laser has high directivity, that enables it to acquire sound from far away, which is of great practical use for disaster relief and other situations. However, due to the vibration characteristics of the irradiated object itself and the reflectivity of its surface (or other reasons), the acquired sound is often lacking frequency components in certain frequency bands and is mixed with obvious noise. Therefore, when using LDV to acquire distant speech, if we want to recognize the actual content of the speech, it is necessary to enhance the acquired speech signal in some way. Conventional speech enhancement methods are not generally applicable due to the various types of degradation in observed speech. Moreover, while several speech enhancement methods for LDV have been proposed, they are only effective when the irradiated object is known. In this paper, we present a speech enhancement method for LDV that can deal with unknown irradiated objects. The proposed method is composed of noise reduction, pitch detection, power spectrum envelope estimation, power spectrum reconstruction, and phase estimation. Experimental results demonstrate the effectiveness of our method for enhancing the acquired speech with unknown irradiated objects.

  • Joint Selection of Transceiver Nodes in Distributed MIMO Radar Network with Non-Orthogonal Waveforms

    Yanxi LU  Shuangli LIU  

     
    LETTER-Communication Theory and Signals

      Pubricized:
    2022/10/18
      Vol:
    E106-A No:4
      Page(s):
    692-695

    In this letter, we consider the problem of joint selection of transmitters and receivers in a distributed multi-input multi-output radar network for localization. Different from previous works, we consider a more mathematically challenging but generalized situation that the transmitting signals are not perfectly orthogonal. Taking Cramér Rao lower bound as performance metric, we propose a scheme of joint selection of transmitters and receivers (JSTR) aiming at optimizing the localization performance under limited number of nodes. We propose a bi-convex relaxation to replace the resultant NP hard non-convex problem. Using the bi-convexity, the surrogate problem can be efficiently resolved by nonlinear alternating direction method of multipliers. Simulation results reveal that the proposed algorithm has very close performance compared with the computationally intensive but global optimal exhaustive search method.

521-540hit(21534hit)