The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] CA(12529hit)

601-620hit(12529hit)

  • DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching

    Satoshi MIZOGUCHI  Yuki SAITO  Shinnosuke TAKAMICHI  Hiroshi SARUWATARI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2021/07/30
      Vol:
    E104-D No:11
      Page(s):
    1971-1980

    We propose deep neural network (DNN)-based speech enhancement that reduces musical noise and achieves better auditory impressions. The musical noise is an artifact generated by nonlinear signal processing and negatively affects the auditory impressions. We aim to develop musical-noise-free speech enhancement methods that suppress the musical noise generation and produce perceptually-comfortable enhanced speech. DNN-based speech enhancement using a soft mask achieves high noise reduction but generates musical noise in non-speech regions. Therefore, first, we define kurtosis matching for DNN-based low-musical-noise speech enhancement. Kurtosis is the fourth-order moment and is known to correlate with the amount of musical noise. The kurtosis matching is a penalty term of the DNN training and works to reduce the amount of musical noise. We further extend this scheme to standardized-moment matching. The extended scheme involves using moments whose orders are higher than kurtosis and generalizes the conventional musical-noise-free method based on kurtosis matching. We formulate standardized-moment matching and explore how effectively the higher-order moments reduce the amount of musical noise. Experimental evaluation results 1) demonstrate that kurtosis matching can reduce musical noise without negatively affecting noise suppression and 2) newly reveal that the sixth-moment matching also achieves low-musical-noise speech enhancement as well as kurtosis matching.

  • Detecting Depression from Speech through an Attentive LSTM Network

    Yan ZHAO  Yue XIE  Ruiyu LIANG  Li ZHANG  Li ZHAO  Chengyu LIU  

     
    LETTER-Speech and Hearing

      Pubricized:
    2021/08/24
      Vol:
    E104-D No:11
      Page(s):
    2019-2023

    Depression endangers people's health conditions and affects the social order as a mental disorder. As an efficient diagnosis of depression, automatic depression detection has attracted lots of researcher's interest. This study presents an attention-based Long Short-Term Memory (LSTM) model for depression detection to make full use of the difference between depression and non-depression between timeframes. The proposed model uses frame-level features, which capture the temporal information of depressive speech, to replace traditional statistical features as an input of the LSTM layers. To achieve more multi-dimensional deep feature representations, the LSTM output is then passed on attention layers on both time and feature dimensions. Then, we concat the output of the attention layers and put the fused feature representation into the fully connected layer. At last, the fully connected layer's output is passed on to softmax layer. Experiments conducted on the DAIC-WOZ database demonstrate that the proposed attentive LSTM model achieves an average accuracy rate of 90.2% and outperforms the traditional LSTM network and LSTM with local attention by 0.7% and 2.3%, respectively, which indicates its feasibility.

  • A Hybrid Retinex-Based Algorithm for UAV-Taken Image Enhancement

    Xinran LIU  Zhongju WANG  Long WANG  Chao HUANG  Xiong LUO  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2021/08/05
      Vol:
    E104-D No:11
      Page(s):
    2024-2027

    A hybrid Retinex-based image enhancement algorithm is proposed to improve the quality of images captured by unmanned aerial vehicles (UAVs) in this paper. Hyperparameters of the employed multi-scale Retinex with chromaticity preservation (MSRCP) model are automatically tuned via a two-phase evolutionary computing algorithm. In the two-phase optimization algorithm, the Rao-2 algorithm is applied to performing the global search and a solution is obtained by maximizing the objective function. Next, the Nelder-Mead simplex method is used to improve the solution via local search. Real UAV-taken images of bad quality are collected to verify the performance of the proposed algorithm. Meanwhile, four famous image enhancement algorithms, Multi-Scale Retinex, Multi-Scale Retinex with Color Restoration, Automated Multi-Scale Retinex, and MSRCP are utilized as benchmarking methods. Meanwhile, two commonly used evolutionary computing algorithms, particle swarm optimization and flower pollination algorithm, are considered to verify the efficiency of the proposed method in tuning parameters of the MSRCP model. Experimental results demonstrate that the proposed method achieves the best performance compared with benchmarks and thus the proposed method is applicable for real UAV-based applications.

  • Neural Network Calculations at the Speed of Light Using Optical Vector-Matrix Multiplication and Optoelectronic Activation

    Naoki HATTORI  Jun SHIOMI  Yutaka MASUDA  Tohru ISHIHARA  Akihiko SHINYA  Masaya NOTOMI  

     
    PAPER

      Pubricized:
    2021/05/17
      Vol:
    E104-A No:11
      Page(s):
    1477-1487

    With the rapid progress of the integrated nanophotonics technology, the optical neural network architecture has been widely investigated. Since the optical neural network can complete the inference processing just by propagating the optical signal in the network, it is expected more than one order of magnitude faster than the electronics-only implementation of artificial neural networks (ANN). In this paper, we first propose an optical vector-matrix multiplication (VMM) circuit using wavelength division multiplexing, which enables inference processing at the speed of light with ultra-wideband. This paper next proposes optoelectronic circuit implementation for batch normalization and activation function, which significantly improves the accuracy of the inference processing without sacrificing the speed performance. Finally, using a virtual environment for machine learning and an optoelectronic circuit simulator, we demonstrate the ultra-fast and accurate operation of the optical-electronic ANN circuit.

  • An Analysis of Local BTI Variation with Ring-Oscillator in Advanced Processes and Its Impact on Logic Circuit and SRAM

    Mitsuhiko IGARASHI  Yuuki UCHIDA  Yoshio TAKAZAWA  Makoto YABUUCHI  Yasumasa TSUKAMOTO  Koji SHIBUTANI  Kazutoshi KOBAYASHI  

     
    PAPER

      Pubricized:
    2021/05/25
      Vol:
    E104-A No:11
      Page(s):
    1536-1545

    In this paper, we present an analysis of local variability of bias temperature instability (BTI) by measuring Ring-Oscillators (RO) on various processes and its impact on logic circuit and SRAM. The evaluation results based on measuring ROs of a test elementary group (TEG) fabricated in 7nm Fin Field Effect Transistor (FinFET) process, 16/14nm generation FinFET processes and a 28nm planer process show that the standard deviations of Negative BTI (NBTI) Vth degradation (σ(ΔVthp)) are proportional to the square root of the mean value (µ(ΔVthp)) at any stress time, Vth flavors and various recovery conditions. While the amount of local BTI variation depends on the gate length, width and number of fins, the amount of local BTI variation at the 7nm FinFET process is slightly larger than other processes. Based on these measurement results, we present an analysis result of its impact on logic circuit considering measured Vth dependency on global NBTI in the 7nm FinFET process. We also analyse its impact on SRAM minimum operation voltage (Vmin) of static noise margin (SNM) based on sensitivity analysis and shows non-negligible Vmin degradation caused by local NBTI.

  • A Synthesis Method Based on Multi-Stage Optimization for Power-Efficient Integrated Optical Logic Circuits

    Ryosuke MATSUO  Jun SHIOMI  Tohru ISHIHARA  Hidetoshi ONODERA  Akihiko SHINYA  Masaya NOTOMI  

     
    PAPER

      Pubricized:
    2021/05/18
      Vol:
    E104-A No:11
      Page(s):
    1546-1554

    Optical logic circuits based on integrated nanophotonics attract significant interest due to their ultra-high-speed operation. However, the power dissipation of conventional optical logic circuits is exponential to the number of inputs of target logic functions. This paper proposes a synthesis method reducing power dissipation to a polynomial order of the number of inputs while exploiting the high-speed nature. Our method divides the target logic function into multiple sub-functions with Optical-to-Electrical (OE) converters. Each sub-function has a smaller number of inputs than that of the original function, which enables to exponentially reduce the power dissipated by an optical logic circuit representing the sub-function. The proposed synthesis method can mitigate the OE converter delay overhead by parallelizing sub-functions. We apply the proposed synthesis method to the ISCAS'85 benchmark circuits. The power consumption of the conventional circuits based on the Binary Decision Diagram (BDD) is at least three orders of magnitude larger than that of the optical logic circuits synthesized by the proposed method. The proposed method reduces the power consumption to about 100mW. The delay of almost all the circuits synthesized by the proposed method is kept less than four times the delay of the conventional BDD-based circuit.

  • An Anomalous Behavior Detection Method Utilizing Extracted Application-Specific Power Behaviors

    Kazunari TAKASAKI  Ryoichi KIDA  Nozomu TOGAWA  

     
    PAPER

      Pubricized:
    2021/07/08
      Vol:
    E104-A No:11
      Page(s):
    1555-1565

    With the widespread use of Internet of Things (IoT) devices in recent years, we utilize a variety of hardware devices in our daily life. On the other hand, hardware security issues are emerging. Power analysis is one of the methods to detect anomalous behaviors, but it is hard to apply it to IoT devices where an operating system and various software programs are running. In this paper, we propose an anomalous behavior detection method for an IoT device by extracting application-specific power behaviors. First, we measure power consumption of an IoT device, and obtain the power waveform. Next, we extract an application-specific power waveform by eliminating a steady factor from the obtained power waveform. Finally, we extract feature values from the application-specific power waveform and detect an anomalous behavior by utilizing the local outlier factor (LOF) method. We conduct two experiments to show how our proposed method works: one runs three application programs and an anomalous application program randomly and the other runs three application programs in series and an anomalous application program very rarely. Application programs on both experiments are implemented on a single board computer. The experimental results demonstrate that the proposed method successfully detects anomalous behaviors by extracting application-specific power behaviors, while the existing approaches cannot.

  • Supply and Threshold Voltage Scaling for Minimum Energy Operation over a Wide Operating Performance Region

    Shoya SONODA  Jun SHIOMI  Hidetoshi ONODERA  

     
    PAPER

      Pubricized:
    2021/05/14
      Vol:
    E104-A No:11
      Page(s):
    1566-1576

    A method for runtime energy optimization based on the supply voltage (Vdd) and the threshold voltage (Vth) scaling is proposed. This paper refers to the optimal voltage pair, which minimizes the energy consumption of LSI circuits under a target delay constraint, as a Minimum Energy Point (MEP). The MEP dynamically fluctuates depending on the operating conditions determined by a target delay constraint, an activity factor and a chip temperature. In order to track the MEP, this paper proposes a closed-form continuous function that determines the MEP over a wide operating performance region ranging from the above-threshold region down to the sub-threshold region. Based on the MEP determination formula, an MEP tracking algorithm is also proposed. The MEP tracking algorithm estimates the MEP even though the operating conditions widely change. Measurement results based on a 32-bit RISC processor fabricated in a 65-nm Silicon On Thin Buried oxide (SOTB) process technology show that the proposed method estimates the MEP within a 5% energy loss in comparison with the actual MEP operation.

  • A Modulus Factorization Algorithm for Self-Orthogonal and Self-Dual Quasi-Cyclic Codes via Polynomial Matrices Open Access

    Hajime MATSUI  

     
    LETTER-Coding Theory

      Pubricized:
    2021/05/21
      Vol:
    E104-A No:11
      Page(s):
    1649-1653

    A construction method of self-orthogonal and self-dual quasi-cyclic codes is shown which relies on factorization of modulus polynomials for cyclicity in this study. The smaller-size generator polynomial matrices are used instead of the generator matrices as linear codes. An algorithm based on Chinese remainder theorem finds the generator polynomial matrix on the original modulus from the ones constructed on each factor. This method enables us to efficiently construct and search these codes when factoring modulus polynomials into reciprocal polynomials.

  • Leakage-Resilient and Proactive Authenticated Key Exchange (LRP-AKE), Reconsidered

    SeongHan SHIN  

     
    PAPER

      Pubricized:
    2021/08/05
      Vol:
    E104-D No:11
      Page(s):
    1880-1893

    In [31], Shin et al. proposed a Leakage-Resilient and Proactive Authenticated Key Exchange (LRP-AKE) protocol for credential services which provides not only a higher level of security against leakage of stored secrets but also secrecy of private key with respect to the involving server. In this paper, we discuss a problem in the security proof of the LRP-AKE protocol, and then propose a modified LRP-AKE protocol that has a simple and effective measure to the problem. Also, we formally prove its AKE security and mutual authentication for the entire modified LRP-AKE protocol. In addition, we describe several extensions of the (modified) LRP-AKE protocol including 1) synchronization issue between the client and server's stored secrets; 2) randomized ID for the provision of client's privacy; and 3) a solution to preventing server compromise-impersonation attacks. Finally, we evaluate the performance overhead of the LRP-AKE protocol and show its test vectors. From the performance evaluation, we can confirm that the LRP-AKE protocol has almost the same efficiency as the (plain) Diffie-Hellman protocol that does not provide authentication at all.

  • Metric-Combining Multiuser Detection Using Replica Cancellation with RTS and Enhanced CTS for High-Reliable and Low-Latency Wireless Communications

    Hideya SO  Kazuhiko FUKAWA  Hayato SOYA  Yuyuan CHANG  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2021/06/01
      Vol:
    E104-B No:11
      Page(s):
    1441-1453

    In unlicensed spectrum, wireless communications employing carrier sense multiple access with collision avoidance (CSMA/CA) suffer from longer transmission delay time as the number of user terminals (UTs) increases, because packet collisions are more likely to occur. To cope with this problem, this paper proposes a new multiuser detection (MUD) scheme that uses both request-to-send (RTS) and enhanced clear-to-send (eCTS) for high-reliable and low-latency wireless communications. As in conventional MUD scheme, the metric-combining MUD (MC-MUD) calculates log likelihood functions called metrics and accumulates the metrics for the maximum likelihood detection (MLD). To avoid increasing the number of states for MLD, MC-MUD forces the relevant UTs to retransmit their packets until all the collided packets are correctly detected, which requires a kind of central control and reduces the system throughput. To overcome these drawbacks, the proposed scheme, which is referred to as cancelling MC-MUD (CMC-MUD), deletes replicas of some of the collided packets from the received signals, once the packets are correctly detected during the retransmission. This cancellation enables new UTs to transmit their packets and then performs MLD without increasing the number of states, which improves the system throughput without increasing the complexity. In addition, the proposed scheme adopts RTS and eCTS. One UT that suffers from packet collision transmits RTS before the retransmission. Then, the corresponding access point (AP) transmits eCTS including addresses of the other UTs, which have experienced the same packet collision. To reproduce the same packet collision, these other UTs transmit their packets once they receive the eCTS. Computer simulations under one AP conditions evaluate an average carrier-to-interference ratio (CIR) range in which the proposed scheme is effective, and clarify that the transmission delay time of the proposed scheme is shorter than that of the conventional schemes. In two APs environments that can cause the hidden terminal problem, it is demonstrated that the proposed scheme achieves shorter transmission delay times than the conventional scheme with RTS and conventional CTS.

  • Improving the Recognition Accuracy of a Sound Communication System Designed with a Neural Network

    Kosei OZEKI  Naofumi AOKI  Saki ANAZAWA  Yoshinori DOBASHI  Kenichi IKEDA  Hiroshi YASUDA  

     
    PAPER-Engineering Acoustics

      Pubricized:
    2021/05/06
      Vol:
    E104-A No:11
      Page(s):
    1577-1584

    This study has developed a system that performs data communications using high frequency bands of sound signals. Unlike radio communication systems using advanced wireless devices, it only requires the legacy devices such as microphones and speakers employed in ordinary telephony communication systems. In this study, we have investigated the possibility of a machine learning approach to improve the recognition accuracy identifying binary symbols exchanged through sound media. This paper describes some experimental results evaluating the performance of our proposed technique employing a neural network as its classifier of binary symbols. The experimental results indicate that the proposed technique may have a certain appropriateness for designing an optimal classifier for the symbol identification task.

  • Specificity Analysis for Nonlinear Distorted Radiation Using 4.65GHz Band Massive Element Active Antenna System for 5G and Influence on Spatial Multiplexing Performance Open Access

    Takuji MOCHIZUKI  

     
    INVITED PAPER

      Pubricized:
    2021/04/08
      Vol:
    E104-C No:10
      Page(s):
    543-551

    This paper reports the evaluation and simulated results of the nonlinear characteristics of the 4.65GHz Active Antenna System (AAS) for 5G mobile communication systems. The antenna element is composed of ±45° dual polarization shared patch antenna, and is equipped with total 64 elements with horizontal 8 × vertical 4 × 2 polarization configuration. A 32-element transceiver circuit was mounted on the back side of the antenna printed circuit board. With the above circuit configuration, a full digital beamforming method has been adopted that can realize high frequency utilization efficiency by using the Sub6GHz-band massive element AAS, and excellent spatial multiplexing performance by Massive MIMO has been pursued. However, it was found that the Downlink (DL) SINR (Signal to Interference and Noise Ratio) to each terminal deteriorated because of the nonlinear distorted radiation as the transmission output power was increased in the maximum rated direction. Therefore, it has been confirmed that the spatial multiplexing performance in the high output power region is significantly improved by installing DPD. In order to clarify the affection of nonlinear distorted radiation on spatial multiplexing performance, the radiation patterns were measured using OFDM signal (subcarrier spacing 60kHz × 1500 subcarriers in 90MHz bandwidth) in an anechoic chamber. And by the simulated analysis for the affection of nonlinear distortion on null characteristic, the accuracy of nulls generated in each user terminal direction does not depend on the degree of nonlinearity, but is affected by the residual amplitude and phase variation among all transmitters and receivers after calibration (CAL). Therefore, it was clarified that the double compensation configuration of DPD and high-precision CAL is effective for achieving excellent Massive MIMO performance. This paper is based on the IEICE Japanese Transactions on Communications (Vol.J102-B, No.11, pp.816-824, Nov. 2019).

  • Robust and Efficient Homography Estimation Using Directional Feature Matching of Court Points for Soccer Field Registration

    Kazuki KASAI  Kaoru KAWAKITA  Akira KUBOTA  Hiroki TSURUSAKI  Ryosuke WATANABE  Masaru SUGANO  

     
    PAPER

      Pubricized:
    2021/07/08
      Vol:
    E104-D No:10
      Page(s):
    1563-1571

    In this paper, we present an efficient and robust method for estimating Homography matrix for soccer field registration between a captured camera image and a soccer field model. The presented method first detects reliable field lines from the camera image through clustering. Constructing a novel directional feature of the intersection points of the lines in both the camera image and the model, the presented method then finds matching pairs of these points between the image and the model. Finally, Homography matrix estimations and validations are performed using the obtained matching pairs, which can reduce the required number of Homography matrix calculations. Our presented method uses possible intersection points outside image for the point matching. This effectively improves robustness and accuracy of Homography estimation as demonstrated in experimental results.

  • Overloaded Wireless MIMO Switching for Information Exchanging through Untrusted Relay in Secure Wireless Communication

    Arata TAKAHASHI  Osamu TAKYU  Hiroshi FUJIWARA  Takeo FUJII  Tomoaki OHTSUKI  

     
    PAPER

      Pubricized:
    2021/03/30
      Vol:
    E104-B No:10
      Page(s):
    1249-1259

    Information exchange through a relay node is attracting attention for applying machine-to-machine communications. If the node demodulates the received signal in relay processing confidentially, the information leakage through the relay station is a problem. In wireless MIMO switching, the frequency spectrum usage efficiency can be improved owing to the completion of information exchange within a short time. This study proposes a novel wireless MIMO switching method for secure information exchange. An overloaded situation, in which the access nodes are one larger than the number of antennas in the relay node, makes the demodulation of the relay node difficult. The access schedule of nodes is required for maintaining the overload situation and the high information exchange efficiency. This study derives the equation model of the access schedule and constructs an access schedule with fewer time periods in the integer programming problem. From the computer simulation, we confirm that the secure capacity of the proposed MIMO switching is larger than that of the original one, and the constructed access schedule is as large as the ideal and minimum time period for information exchange completion.

  • An Optimistic Synchronization Based Optimal Server Selection Scheme for Delay Sensitive Communication Services Open Access

    Akio KAWABATA  Bijoy Chand CHATTERJEE  Eiji OKI  

     
    PAPER-Network System

      Pubricized:
    2021/04/09
      Vol:
    E104-B No:10
      Page(s):
    1277-1287

    In distributed processing for communication services, a proper server selection scheme is required to reduce delay by ensuring the event occurrence order. Although a conservative synchronization algorithm (CSA) has been used to achieve this goal, an optimistic synchronization algorithm (OSA) can be feasible for synchronizing distributed systems. In comparison with CSA, which reproduces events in occurrence order before processing applications, OSA can be feasible to realize low delay communication as the processing events arrive sequentially. This paper proposes an optimal server selection scheme that uses OSA for distributed processing systems to minimize end-to-end delay under the condition that maximum status holding time is limited. In other words, the end-to-end delay is minimized based on the allowed rollback time, which is given according to the application designing aspects and availability of computing resources. Numerical results indicate that the proposed scheme reduces the delay compared to the conventional scheme.

  • Triplet Attention Network for Video-Based Person Re-Identification

    Rui SUN  Qili LIANG  Zi YANG  Zhenghui ZHAO  Xudong ZHANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2021/07/21
      Vol:
    E104-D No:10
      Page(s):
    1775-1779

    Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.

  • Code-Switching ASR and TTS Using Semisupervised Learning with Machine Speech Chain

    Sahoko NAKAYAMA  Andros TJANDRA  Sakriani SAKTI  Satoshi NAKAMURA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2021/07/08
      Vol:
    E104-D No:10
      Page(s):
    1661-1677

    The phenomenon where a speaker mixes two or more languages within the same conversation is called code-switching (CS). Handling CS is challenging for automatic speech recognition (ASR) and text-to-speech (TTS) because it requires coping with multilingual input. Although CS text or speech may be found in social media, the datasets of CS speech and corresponding CS transcriptions are hard to obtain even though they are required for supervised training. This work adopts a deep learning-based machine speech chain to train CS ASR and CS TTS with each other with semisupervised learning. After supervised learning with monolingual data, the machine speech chain is then carried out with unsupervised learning of either the CS text or speech. The results show that the machine speech chain trains ASR and TTS together and improves performance without requiring the pair of CS speech and corresponding CS text. We also integrate language embedding and language identification into the CS machine speech chain in order to handle CS better by giving language information. We demonstrate that our proposed approach can improve the performance on both a single CS language pair and multiple CS language pairs, including the unknown CS excluded from training data.

  • Health Indicator Estimation by Video-Based Gait Analysis

    Ruochen LIAO  Kousuke MORIWAKI  Yasushi MAKIHARA  Daigo MURAMATSU  Noriko TAKEMURA  Yasushi YAGI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/07/09
      Vol:
    E104-D No:10
      Page(s):
    1678-1690

    In this study, we propose a method to estimate body composition-related health indicators (e.g., ratio of body fat, body water, and muscle, etc.) using video-based gait analysis. This method is more efficient than individual measurement using a conventional body composition meter. Specifically, we designed a deep-learning framework with a convolutional neural network (CNN), where the input is a gait energy image (GEI) and the output consists of the health indicators. Although a vast amount of training data is typically required to train network parameters, it is unfeasible to collect sufficient ground-truth data, i.e., pairs consisting of the gait video and the health indicators measured using a body composition meter for each subject. We therefore use a two-step approach to exploit an auxiliary gait dataset that contains a large number of subjects but lacks the ground-truth health indicators. At the first step, we pre-train a backbone network using the auxiliary dataset to output gait primitives such as arm swing, stride, the degree of stoop, and the body width — considered to be relevant to the health indicators. At the second step, we add some layers to the backbone network and fine-tune the entire network to output the health indicators even with a limited number of ground-truth data points of the health indicators. Experimental results show that the proposed method outperforms the other methods when training from scratch as well as when using an auto-encoder-based pre-training and fine-tuning approach; it achieves relatively high estimation accuracy for the body composition-related health indicators except for body fat-relevant ones.

  • Sketch Face Recognition via Cascaded Transformation Generation Network

    Lin CAO  Xibao HUO  Yanan GUO  Kangning DU  

     
    PAPER-Image

      Pubricized:
    2021/04/01
      Vol:
    E104-A No:10
      Page(s):
    1403-1415

    Sketch face recognition refers to matching photos with sketches, which has effectively been used in various applications ranging from law enforcement agencies to digital entertainment. However, due to the large modality gap between photos and sketches, sketch face recognition remains a challenging task at present. To reduce the domain gap between the sketches and photos, this paper proposes a cascaded transformation generation network for cross-modality image generation and sketch face recognition simultaneously. The proposed cascaded transformation generation network is composed of a generation module, a cascaded feature transformation module, and a classifier module. The generation module aims to generate a high quality cross-modality image, the cascaded feature transformation module extracts high-level semantic features for generation and recognition simultaneously, the classifier module is used to complete sketch face recognition. The proposed transformation generation network is trained in an end-to-end manner, it strengthens the recognition accuracy by the generated images. The recognition performance is verified on the UoM-SGFSv2, e-PRIP, and CUFSF datasets; experimental results show that the proposed method is better than other state-of-the-art methods.

601-620hit(12529hit)