The search functionality is under construction.

IEICE TRANSACTIONS on Information

  • Impact Factor

    0.72

  • Eigenfactor

    0.002

  • article influence

    0.1

  • Cite Score

    1.4

Advance publication (published online immediately after acceptance)

Volume E104-D No.10  (Publication Date:2021/10/01)

    Special Section on Formal Approaches
  • FOREWORD Open Access

    Fuyuki ISHIKAWA  

     
    FOREWORD

      Page(s):
    1514-1514
  • Formal Modeling and Verification of Concurrent FSMs: Case Study on Event-Based Cooperative Transport Robots

    Yoshinao ISOBE  Nobuhiko MIYAMOTO  Noriaki ANDO  Yutaka OIWA  

     
    PAPER

      Pubricized:
    2021/07/08
      Page(s):
    1515-1532

    In this paper, we demonstrate that a formal approach is effective for improving reliability of cooperative robot designs, where the control logics are expressed in concurrent FSMs (Finite State Machines), especially in accordance with the standard FSM4RTC (FSM for Robotic Technology Components), by a case study of cooperative transport robots. In the case study, FSMs are modeled in the formal specification language CSP (Communicating Sequential Processes) and checked by the model-checking tool FDR, where we show techniques for modeling and verification of cooperative robots implemented with the help of the RTM (Robotic Technology Middleware).

  • Verification of Group Key Management of IEEE 802.21 Using ProVerif

    Ryoga NOGUCHI  Yoshikazu HANATANI  Kazuki YONEYAMA  

     
    PAPER

      Pubricized:
    2021/07/14
      Page(s):
    1533-1543

    Home Energy Management Systems (HEMS) contain devices of multiple manufacturers. Also, a large number of groups of devices must be managed according to several clustering situations. Hence, since it is necessary to establish a common secret group key among group members, the group key management scheme of IEEE 802.21 is used. However, no security verification result by formal methods is known. In this paper, we give the first formal verification result of secrecy and authenticity of the group key management scheme of IEEE 802.21 against insider and outsider attacks using ProVerif, which is an automatic verification tool for cryptographic protocols. As a result, we clarify that a spoofing attack by an insider and a replay attack by an outsider are found for the basic scheme, but these attacks can be prevented by using the scheme with the digital signature option.

  • Special Section on Picture Coding and Image Media Processing
  • FOREWORD Open Access

    Toshiaki FUJII  

     
    FOREWORD

      Page(s):
    1544-1544
  • Image Based Coding of Spatial Probability Distribution on Human Dynamics Data

    Hideaki KIMATA  Xiaojun WU  Ryuichi TANIDA  

     
    PAPER

      Pubricized:
    2021/06/24
      Page(s):
    1545-1554

    The need for real-time use of human dynamics data is increasing. The technical requirements for this include improved databases for handling a large amount of data as well as highly accurate sensing of people's movements. A bitmap index format has been proposed for high-speed processing of data that spreads in a two-dimensional space. Using the same format is expected to provide a service that searches queries, reads out desired data, visualizes it, and analyzes it. In this study, we propose a coding format that enables human dynamics data to compress it in the target data size, in order to save data storage for successive increase of real-time human dynamics data. In the proposed method, the spatial population distribution, which is expressed by a probability distribution, is approximated and compressed using the one-pixel one-byte data format normally used for image coding. We utilize two kinds of approximation, which are accuracy of probability and precision of spatial location, in order to control the data size and the amount of information. For accuracy of probability, we propose a non-linear mapping method for the spatial distribution, and for precision of spatial location, we propose spatial scalable layered coding to refine the mesh level of the spatial distribution. Also, in order to enable additional detailed analysis, we propose another scalable layered coding that improves the accuracy of the distribution. We demonstrate through experiments that the proposed data approximation and coding format achieve sufficient approximation of spatial population distribution in the given condition of target data size.

  • Per-Pixel Water Detection on Surfaces with Unknown Reflectance

    Chao WANG  Michihiko OKUYAMA  Ryo MATSUOKA  Takahiro OKABE  

     
    PAPER

      Pubricized:
    2021/07/06
      Page(s):
    1555-1562

    Water detection is important for machine vision applications such as visual inspection and robot motion planning. In this paper, we propose an approach to per-pixel water detection on unknown surfaces with a hyperspectral image. Our proposed method is based on the water spectral characteristics: water is transparent for visible light but translucent/opaque for near-infrared light and therefore the apparent near-infrared spectral reflectance of a surface is smaller than the original one when water is present on it. Specifically, we use a linear combination of a small number of basis vector to approximate the spectral reflectance and estimate the original near-infrared reflectance from the visible reflectance (which does not depend on the presence or absence of water) to detect water. We conducted a number of experiments using real images and show that our method, which estimates near-infrared spectral reflectance based on the visible spectral reflectance, has better performance than existing techniques.

  • Robust and Efficient Homography Estimation Using Directional Feature Matching of Court Points for Soccer Field Registration

    Kazuki KASAI  Kaoru KAWAKITA  Akira KUBOTA  Hiroki TSURUSAKI  Ryosuke WATANABE  Masaru SUGANO  

     
    PAPER

      Pubricized:
    2021/07/08
      Page(s):
    1563-1571

    In this paper, we present an efficient and robust method for estimating Homography matrix for soccer field registration between a captured camera image and a soccer field model. The presented method first detects reliable field lines from the camera image through clustering. Constructing a novel directional feature of the intersection points of the lines in both the camera image and the model, the presented method then finds matching pairs of these points between the image and the model. Finally, Homography matrix estimations and validations are performed using the obtained matching pairs, which can reduce the required number of Homography matrix calculations. Our presented method uses possible intersection points outside image for the point matching. This effectively improves robustness and accuracy of Homography estimation as demonstrated in experimental results.

  • Lossless Coding of HDR Color Images in a Floating Point Format Using Block-Adaptive Inter-Color Prediction

    Yuya KAMATAKI  Yusuke KAMEDA  Yasuyo KITA  Ichiro MATSUDA  Susumu ITOH  

     
    LETTER

      Pubricized:
    2021/05/17
      Page(s):
    1572-1575

    This paper proposes a lossless coding method for HDR color images stored in a floating point format called Radiance RGBE. In this method, three mantissa and a common exponent parts, each of which is represented in 8-bit depth, are encoded using the block-adaptive prediction technique with some modifications considering the data structure.

  • Rolling Guidance Filter as a Clustering Algorithm

    Takayuki HATTORI  Kohei INOUE  Kenji HARA  

     
    LETTER

      Pubricized:
    2021/05/31
      Page(s):
    1576-1579

    We propose a generalization of the rolling guidance filter (RGF) to a similarity-based clustering (SBC) algorithm which can handle general vector data. The proposed RGF-based SBC algorithm makes the similarities between data clearer than the original similarity values computed from the original data. On the basis of the similarity values, we assign cluster labels to data by an SBC algorithm. Experimental results show that the proposed algorithm achieves better clustering result than the result by the naive application of the SBC algorithm to the original similarity values. Additionally, we study the convergence of a unimodal vector dataset to its mean vector.

  • Regular Section
  • Global Optimization Algorithm for Cloud Service Composition

    Hongwei YANG  Fucheng XUE  Dan LIU  Li LI  Jiahui FENG  

     
    PAPER-Computer System

      Pubricized:
    2021/06/30
      Page(s):
    1580-1591

    Service composition optimization is a classic NP-hard problem. How to quickly select high-quality services that meet user needs from a large number of candidate services is a hot topic in cloud service composition research. An efficient second-order beetle swarm optimization is proposed with a global search ability to solve the problem of cloud service composition optimization in this study. First, the beetle antennae search algorithm is introduced into the modified particle swarm optimization algorithm, initialize the population bying using a chaotic sequence, and the modified nonlinear dynamic trigonometric learning factors are adopted to control the expanding capacity of particles and global convergence capability. Second, modified secondary oscillation factors are incorporated, increasing the search precision of the algorithm and global searching ability. An adaptive step adjustment is utilized to improve the stability of the algorithm. Experimental results founded on a real data set indicated that the proposed global optimization algorithm can solve web service composition optimization problems in a cloud environment. It exhibits excellent global searching ability, has comparatively fast convergence speed, favorable stability, and requires less time cost.

  • An Ising Machine-Based Solver for Visiting-Route Recommendation Problems in Amusement Parks

    Yosuke MUKASA  Tomoya WAKAIZUMI  Shu TANAKA  Nozomu TOGAWA  

     
    PAPER-Computer System

      Pubricized:
    2021/07/08
      Page(s):
    1592-1600

    In an amusement park, an attraction-visiting route considering the waiting time and traveling time improves visitors' satisfaction and experience. We focus on Ising machines to solve the problem, which are recently expected to solve combinatorial optimization problems at high speed by mapping the problems to Ising models or quadratic unconstrained binary optimization (QUBO) models. We propose a mapping of the visiting-route recommendation problem in amusement parks to a QUBO model for solving it using Ising machines. By using an actual Ising machine, we could obtain feasible solutions one order of magnitude faster with almost the same accuracy as the simulated annealing method for the visiting-route recommendation problem.

  • Supporting Proactive Refactoring: An Exploratory Study on Decaying Modules and Their Prediction

    Natthawute SAE-LIM  Shinpei HAYASHI  Motoshi SAEKI  

     
    PAPER-Software Engineering

      Pubricized:
    2021/06/28
      Page(s):
    1601-1615

    Code smells can be detected using tools such as a static analyzer that detects code smells based on source code metrics. Developers perform refactoring activities based on the result of such detection tools to improve source code quality. However, such an approach can be considered as reactive refactoring, i.e., developers react to code smells after they occur. This means that developers first suffer the effects of low-quality source code before they start solving code smells. In this study, we focus on proactive refactoring, i.e., refactoring source code before it becomes smelly. This approach would allow developers to maintain source code quality without having to suffer the impact of code smells. To support the proactive refactoring process, we propose a technique to detect decaying modules, which are non-smelly modules that are about to become smelly. We present empirical studies on open source projects with the aim of studying the characteristics of decaying modules. Additionally, to facilitate developers in the refactoring planning process, we perform a study on using a machine learning technique to predict decaying modules and report a factor that contributes most to the performance of the model under consideration.

  • Similarity Search in InterPlanetary File System with the Aid of Locality Sensitive Hash

    Satoshi FUJITA  

     
    PAPER-Information Network

      Pubricized:
    2021/07/08
      Page(s):
    1616-1623

    To realize an information-centric networking, IPFS (InterPlanetary File System) generates a unique ContentID for each content by applying a cryptographic hash to the content itself. Although it could improve the security against attacks such as falsification, it makes difficult to realize a similarity search in the framework of IPFS, since the similarity of contents is not reflected in the proximity of ContentIDs. To overcome this issue, we propose a method to apply a locality sensitive hash (LSH) to feature vectors extracted from contents as the key of indexes stored in IPFS. By conducting experiments with 10,000 random points corresponding to stored contents, we found that more than half of randomly given queries return a non-empty result for the similarity search, and yield an accurate result which is outside the σ confidence interval of an ordinary flooding-based method. Note that such a collection of random points corresponds to the worst case scenario for the proposed scheme since the performance of similarity search could improve when points and queries follow an uneven distribution.

  • Semi-Structured BitTorrent Protocol with Application to Efficient P2P Video Streaming

    Satoshi FUJITA  

     
    PAPER-Information Network

      Pubricized:
    2021/07/08
      Page(s):
    1624-1631

    In this paper, we propose a method to enhance the download efficiency of BitTorrent protocol with the notion of structures in the set of pieces generated from a shared file and the swarm of peers downloading the same shared file. More specifically, as for the set of pieces, we introduce the notion of super-pieces called clusters, which is aimed to enlarge the granularity of the management of request-and-reply of pieces, and as for the swarm of peers, we organize a clique consisting of several peers with similar upload capacity, to improve the smoothness of the flow of pieces associated with a cluster. As is shown in the simulation results, the proposed extensions significantly reduce the download time of the first 75% of the downloaders, and thereby improve the performance of P2P-assisted video streaming such as Akamai NetSession and BitTorrent DNA.

  • Asymmetric Tobit Analysis for Correlation Estimation from Censored Data

    HongYuan CAO  Tsuyoshi KATO  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2021/07/19
      Page(s):
    1632-1639

    Contamination of water resources with pathogenic microorganisms excreted in human feces is a worldwide public health concern. Surveillance of fecal contamination is commonly performed by routine monitoring for a single type or a few types of microorganism(s). To design a feasible routine for periodic monitoring and to control risks of exposure to pathogens, reliable statistical algorithms for inferring correlations between concentrations of microorganisms in water need to be established. Moreover, because pathogens are often present in low concentrations, some contaminations are likely to be under a detection limit. This yields a pairwise left-censored dataset and complicates computation of correlation coefficients. Errors of correlation estimation can be smaller if undetected values are imputed better. To obtain better imputations, we utilize side information and develop a new technique, the asymmetric Tobit model which is an extension of the Tobit model so that domain knowledge can be exploited effectively when fitting the model to a censored dataset. The empirical results demonstrate that imputation with domain knowledge is effective for this task.

  • Discovering Multiple Clusters of Sightseeing Spots to Improve Tourist Satisfaction Using Network Motifs

    Tengfei SHAO  Yuya IEIRI  Reiko HISHIYAMA  

     
    PAPER-Office Information Systems, e-Business Modeling

      Pubricized:
    2021/07/09
      Page(s):
    1640-1650

    Tourist satisfaction plays a very important role in the development of local community tourism. For the development of tourist destinations in local communities, it is important to measure, maintain, and improve tourist destination royalties over the medium to long term. It has been proven that improving tourist satisfaction is a major factor in improving tourist destination royalties. Therefore, to improve tourist satisfaction in local communities, we identified multiple clusters of sightseeing spots and determined that the satisfaction of tourists can be increased based on these clusters of sightseeing spots. Our discovery flow can be summarized as follows. First, we extracted tourism keywords from guidebooks on sightseeing spots. We then constructed a complex network of tourists and sightseeing spots based on the data collected from experiments conducted in Kyoto. Next, we added the corresponding tourism keywords to each sightseeing spot. Finally, by analyzing network motifs, we successfully discovered multiple clusters of sightseeing spots that could be used to improve tourist satisfaction.

  • Mining Emergency Event Logs to Support Resource Allocation

    Huiling LI  Cong LIU  Qingtian ZENG  Hua HE  Chongguang REN  Lei WANG  Feng CHENG  

     
    PAPER-Office Information Systems, e-Business Modeling

      Pubricized:
    2021/06/28
      Page(s):
    1651-1660

    Effective emergency resource allocation is essential to guarantee a successful emergency disposal, and it has become a research focus in the area of emergency management. Emergency event logs are accumulated in modern emergency management systems and can be analyzed to support effective resource allocation. This paper proposes a novel approach for efficient emergency resource allocation by mining emergency event logs. More specifically, an emergency event log with various attributes, e.g., emergency task name, emergency resource type (reusable and consumable ones), required resource amount, and timestamps, is first formalized. Then, a novel algorithm is presented to discover emergency response process models, represented as an extension of Petri net with resource and time elements, from emergency event logs. Next, based on the discovered emergency response process models, the minimum resource requirements for both reusable and consumable resources are obtained, and two resource allocation strategies, i.e., the Shortest Execution Time (SET) strategy and the Least Resource Consumption (LRC) strategy, are proposed to support efficient emergency resource allocation decision-making. Finally, a chlorine tank explosion emergency case study is used to demonstrate the applicability and effectiveness of the proposed resource allocation approach.

  • Code-Switching ASR and TTS Using Semisupervised Learning with Machine Speech Chain

    Sahoko NAKAYAMA  Andros TJANDRA  Sakriani SAKTI  Satoshi NAKAMURA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2021/07/08
      Page(s):
    1661-1677

    The phenomenon where a speaker mixes two or more languages within the same conversation is called code-switching (CS). Handling CS is challenging for automatic speech recognition (ASR) and text-to-speech (TTS) because it requires coping with multilingual input. Although CS text or speech may be found in social media, the datasets of CS speech and corresponding CS transcriptions are hard to obtain even though they are required for supervised training. This work adopts a deep learning-based machine speech chain to train CS ASR and CS TTS with each other with semisupervised learning. After supervised learning with monolingual data, the machine speech chain is then carried out with unsupervised learning of either the CS text or speech. The results show that the machine speech chain trains ASR and TTS together and improves performance without requiring the pair of CS speech and corresponding CS text. We also integrate language embedding and language identification into the CS machine speech chain in order to handle CS better by giving language information. We demonstrate that our proposed approach can improve the performance on both a single CS language pair and multiple CS language pairs, including the unknown CS excluded from training data.

  • Health Indicator Estimation by Video-Based Gait Analysis

    Ruochen LIAO  Kousuke MORIWAKI  Yasushi MAKIHARA  Daigo MURAMATSU  Noriko TAKEMURA  Yasushi YAGI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/07/09
      Page(s):
    1678-1690

    In this study, we propose a method to estimate body composition-related health indicators (e.g., ratio of body fat, body water, and muscle, etc.) using video-based gait analysis. This method is more efficient than individual measurement using a conventional body composition meter. Specifically, we designed a deep-learning framework with a convolutional neural network (CNN), where the input is a gait energy image (GEI) and the output consists of the health indicators. Although a vast amount of training data is typically required to train network parameters, it is unfeasible to collect sufficient ground-truth data, i.e., pairs consisting of the gait video and the health indicators measured using a body composition meter for each subject. We therefore use a two-step approach to exploit an auxiliary gait dataset that contains a large number of subjects but lacks the ground-truth health indicators. At the first step, we pre-train a backbone network using the auxiliary dataset to output gait primitives such as arm swing, stride, the degree of stoop, and the body width — considered to be relevant to the health indicators. At the second step, we add some layers to the backbone network and fine-tune the entire network to output the health indicators even with a limited number of ground-truth data points of the health indicators. Experimental results show that the proposed method outperforms the other methods when training from scratch as well as when using an auto-encoder-based pre-training and fine-tuning approach; it achieves relatively high estimation accuracy for the body composition-related health indicators except for body fat-relevant ones.

  • Image Emotion Recognition Using Visual and Semantic Features Reflecting Emotional and Similar Objects

    Takahisa YAMAMOTO  Shiki TAKEUCHI  Atsushi NAKAZAWA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/06/24
      Page(s):
    1691-1701

    Visual sentiment analysis has a lot of applications, including image captioning, opinion mining, and advertisement; however, it is still a difficult problem and existing algorithms cannot produce satisfactory results. One of the difficulties in classifying images into emotions is that visual sentiments are evoked by different types of information - visual and semantic information where visual information includes colors or textures, and semantic information includes types of objects evoking emotions and/or their combinations. In contrast to the existing methods that use only visual information, this paper shows a novel algorithm for image emotion recognition that uses both information simultaneously. For semantic features, we introduce an object vector and a word vector. The object vector is created by an object detection method and reflects existing objects in an image. The word vector is created by transforming the names of detected objects through a word embedding model. This vector will be similar among objects that are semantically similar. These semantic features and a visual feature made by a fine-tuned convolutional neural network (CNN) are concatenated. We perform the classification by the concatenated feature vector. Extensive evaluation experiments using emotional image datasets show that our method achieves the best accuracy except for one dataset against other existing methods. The improvement in accuracy of our method from existing methods is 4.54% at the highest.

  • Siamese Visual Tracking with Dual-Pipeline Correlated Fusion Network

    Ying KANG  Cong LIU  Ning WANG  Dianxi SHI  Ning ZHOU  Mengmeng LI  Yunlong WU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/07/09
      Page(s):
    1702-1711

    Siamese visual tracking, viewed as a problem of max-similarity matching to the target template, has absorbed increasing attention in computer vision. However, it is a challenge for current Siamese trackers that the demands of balance between accuracy in real-time tracking and robustness in long-time tracking are hard to meet. This work proposes a new Siamese based tracker with a dual-pipeline correlated fusion network (named as ADF-SiamRPN), which consists of one initial template for robust correlation, and the other transient template with the ability of adaptive feature optimal selection for accurate correlation. By the promotion from the learnable correlation-response fusion network afterwards, we are in pursuit of the synthetical improvement of tracking performance. To compare the performance of ADF-SiamRPN with state-of-the-art trackers, we conduct lots of experiments on benchmarks like OTB100, UAV123, VOT2016, VOT2018, GOT-10k, LaSOT and TrackingNet. The experimental results of tracking demonstrate that ADF-SiamRPN outperforms all the compared trackers and achieves the best balance between accuracy and robustness.

  • Document-Level Neural Machine Translation with Associated Memory Network

    Shu JIANG  Rui WANG  Zuchao LI  Masao UTIYAMA  Kehai CHEN  Eiichiro SUMITA  Hai ZHAO  Bao-liang LU  

     
    PAPER-Natural Language Processing

      Pubricized:
    2021/06/24
      Page(s):
    1712-1723

    Standard neural machine translation (NMT) is on the assumption that the document-level context is independent. Most existing document-level NMT approaches are satisfied with a smattering sense of global document-level information, while this work focuses on exploiting detailed document-level context in terms of a memory network. The capacity of the memory network that detecting the most relevant part of the current sentence from memory renders a natural solution to model the rich document-level context. In this work, the proposed document-aware memory network is implemented to enhance the Transformer NMT baseline. Experiments on several tasks show that the proposed method significantly improves the NMT performance over strong Transformer baselines and other related studies.

  • HBDCA: A Toolchain for High-Accuracy BRAM-Defined CNN Accelerator on FPGA with Flexible Structure

    Zhengjie LI  Jiabao GAO  Jinmei LAI  

     
    PAPER-Biocybernetics, Neurocomputing

      Pubricized:
    2021/07/26
      Page(s):
    1724-1733

    In recent years FPGA has become popular in CNN acceleration, and many CNN-to-FPGA toolchains are proposed to fast deploy CNN on FPGA. However, for these toolchains, updating CNN network means regeneration of RTL code and re-implementation which is time-consuming and may suffer timing-closure problems. So, we propose HBDCA: a toolchain and corresponding accelerator. The CNN on HBDCA is defined by the content of BRAM. The toolchain integrates UpdateMEM utility of Xilinx, which updates content of BRAM without re-synthesis and re-implementation process. The toolchain also integrates TensorFlow Lite which provides high-accuracy quantization. HBDCA supports 8-bits per-channel quantization of weights and 8-bits per-layer quantization of activations. Upgrading CNN on accelerator means the kernel size of CNN may change. Flexible structure of HBDCA supports kernel-level parallelism with three different sizes (3×3, 5×5, 7×7). HBDCA implements four types of parallelism in convolution layer and two types of parallelism in fully-connected layer. In order to reduce access number to memory, both spatial and temporal data-reuse techniques were applied on convolution layer and fully-connect layer. Especially, temporal reuse is adopted at both row and column level of an Input Feature Map of convolution layer. Data can be just read once from BRAM and reused for the following clock. Experiments show by updating BRAM content with single UpdateMEM command, three CNNs with different kernel size (3×3, 5×5, 7×7) are implemented on HBDCA. Compared with traditional design flow, UpdateMEM reduces development time by 7.6X-9.1X for different synthesis or implementation strategy. For similar CNN which is created by toolchain, HBDCA has smaller latency (9.97µs-50.73µs), and eliminates re-implementation when update CNN. For similar CNN which is created by dedicated design, HBDCA also has the smallest latency 9.97µs, the highest accuracy 99.14% and the lowest power 1.391W. For different CNN which is created by similar toolchain which eliminate re-implementation process, HBDCA achieves higher speedup 120.28X.

  • Diversity-Robust Acoustic Feature Signatures Based on Multiscale Fractal Dimension for Similarity Search of Environmental Sounds

    Motohiro SUNOUCHI  Masaharu YOSHIOKA  

     
    PAPER-Music Information Processing

      Pubricized:
    2021/07/02
      Page(s):
    1734-1748

    This paper proposes new acoustic feature signatures based on the multiscale fractal dimension (MFD), which are robust against the diversity of environmental sounds, for the content-based similarity search. The diversity of sound sources and acoustic compositions is a typical feature of environmental sounds. Several acoustic features have been proposed for environmental sounds. Among them is the widely-used Mel-Frequency Cepstral Coefficients (MFCCs), which describes frequency-domain features. However, in addition to these features in the frequency domain, environmental sounds have other important features in the time domain with various time scales. In our previous paper, we proposed enhanced multiscale fractal dimension signature (EMFD) for environmental sounds. This paper extends EMFD by using the kernel density estimation method, which results in better performance of the similarity search tasks. Furthermore, it newly proposes another acoustic feature signature based on MFD, namely very-long-range multiscale fractal dimension signature (MFD-VL). The MFD-VL signature describes several features of the time-varying envelope for long periods of time. The MFD-VL signature has stability and robustness against background noise and small fluctuations in the parameters of sound sources, which are produced in field recordings. We discuss the effectiveness of these signatures in the similarity sound search by comparing with acoustic features proposed in the DCASE 2018 challenges. Due to the unique descriptiveness of our proposed signatures, we confirmed the signatures are effective when they are used with other acoustic features.

  • Simple Oblivious Routing Method to Balance Load in Network-on-Chip

    Jiao GUAN  Jueping CAI  Ruilian XIE  Yequn WANG  Jinzhi LAI  

     
    LETTER-Computer System

      Pubricized:
    2021/06/30
      Page(s):
    1749-1752

    This letter presents an oblivious and load-balanced routing (OLBR) method without virtual channels for 2D mesh Network-on-chip (NoC). To balance the traffic load of network and avoid deadlock, OLBR divides network nodes into two regions, one region contains the nodes of east and west sides of NoC, in which packets are routed by odd-even turn rule with Y direction preference (OE-YX), and the remaining nodes are divided to the other region, in which packets are routed by odd-even turn rule with alterable priority arbitration (OE-APA). Simulation results show that OLBR's saturation throughput can be improved than related works by 11.73% and OLBR balances the traffic load over entire network.

  • Research on a Prediction Method for Carbon Dioxide Concentration Based on an Optimized LSTM Network of Spatio-Temporal Data Fusion

    Jun MENG  Gangyi DING  Laiyang LIU  

     
    LETTER-Data Engineering, Web Information Systems

      Pubricized:
    2021/07/08
      Page(s):
    1753-1757

    In view of the different spatial and temporal resolutions of observed multi-source heterogeneous carbon dioxide data and the uncertain quality of observations, a data fusion prediction model for observed multi-scale carbon dioxide concentration data is studied. First, a wireless carbon sensor network is created, the gross error data in the original dataset are eliminated, and remaining valid data are combined with kriging method to generate a series of continuous surfaces for expressing specific features and providing unified spatio-temporally normalized data for subsequent prediction models. Then, the long short-term memory network is used to process these continuous time- and space-normalized data to obtain the carbon dioxide concentration prediction model at any scales. Finally, the experimental results illustrate that the proposed method with spatio-temporal features is more accurate than the single sensor monitoring method without spatio-temporal features.

  • Single Image Dehazing Algorithm Based on Modified Dark Channel Prior

    Hao ZHOU  Zhuangzhuang ZHANG  Yun LIU  Meiyan XUAN  Weiwei JIANG  Hailing XIONG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2021/07/14
      Page(s):
    1758-1761

    Single image dehazing algorithm based on Dark Channel Prior (DCP) is widely known. More and more image dehazing algorithms based on DCP have been proposed. However, we found that it is more effective to use DCP in the RAW images before the ISP pipeline. In addition, for the problem of DCP failure in the sky area, we propose an algorithm to segment the sky region and compensate the transmission. Extensive experimental results on both subjective and objective evaluation demonstrate that the performance of the modified DCP (MDCP) has been greatly improved, and it is competitive with the state-of-the-art methods.

  • Multi-Task Learning for Improved Recognition of Multiple Types of Acoustic Information

    Jae-Won KIM  Hochong PARK  

     
    LETTER-Speech and Hearing

      Pubricized:
    2021/07/14
      Page(s):
    1762-1765

    We propose a new method for improving the recognition performance of phonemes, speech emotions, and music genres using multi-task learning. When tasks are closely related, multi-task learning can improve the performance of each task by learning common feature representation for all the tasks. However, the recognition tasks considered in this study demand different input signals of speech and music at different time scales, resulting in input features with different characteristics. In addition, a training dataset with multiple labels for all information sources is not available. Considering these issues, we conduct multi-task learning in a sequential training process using input features with a single label for one information source. A comparative evaluation confirms that the proposed method for multi-task learning provides higher performance for all recognition tasks than individual learning for each task as in conventional methods.

  • A CNN-Based Optimal CTU λ Decision for HEVC Intra Rate Control

    Lili WEI  Zhenglong YANG  Zhenming WANG  Guozhong WANG  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2021/07/19
      Page(s):
    1766-1769

    Since HEVC intra rate control has no prior information to rely on for coding, it is a difficult work to obtain the optimal λ for every coding tree unit (CTU). In this paper, a convolutional neural network (CNN) based intra rate control is proposed. Firstly, a CNN with two last output channels is used to predict the key parameters of the CTU R-λ curve. For well training the CNN, a combining loss function is built and the balance factor γ is explored to achieve the minimum loss result. Secondly, the initial CTU λ can be calculated by the predicted results of the CNN and the allocated bit per pixel (bpp). According to the rate distortion optimization (RDO) of a frame, a spatial equation is derived between the CTU λ and the frame λ. Lastly, The CTU clipping function is used to obtain the optimal CTU λ for the intra rate control. The experimental results show that the proposed algorithm improves the intra rate control performance significantly with a good rate control accuracy.

  • Unsupervised Building Damage Identification Using Post-Event Optical Imagery and Variational Autoencoder

    Daming LIN  Jie WANG  Yundong LI  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2021/07/20
      Page(s):
    1770-1774

    Rapid building damage identification plays a vital role in rescue operations when disasters strike, especially when rescue resources are limited. In the past years, supervised machine learning has made considerable progress in building damage identification. However, the usage of supervised machine learning remains challenging due to the following facts: 1) the massive samples from the current damage imagery are difficult to be labeled and thus cannot satisfy the training requirement of deep learning, and 2) the similarity between partially damaged and undamaged buildings is high, hindering accurate classification. Leveraging the abundant samples of auxiliary domains, domain adaptation aims to transfer a classifier trained by historical damage imagery to the current task. However, traditional domain adaptation approaches do not fully consider the category-specific information during feature adaptation, which might cause negative transfer. To address this issue, we propose a novel domain adaptation framework that individually aligns each category of the target domain to that of the source domain. Our method combines the variational autoencoder (VAE) and the Gaussian mixture model (GMM). First, the GMM is established to characterize the distribution of the source domain. Then, the VAE is constructed to extract the feature of the target domain. Finally, the Kullback-Leibler (KL) divergence is minimized to force the feature of the target domain to observe the GMM of the source domain. Two damage detection tasks using post-earthquake and post-hurricane imageries are utilized to verify the effectiveness of our method. Experiments show that the proposed method obtains improvements of 4.4% and 9.5%, respectively, compared with the conventional method.

  • Triplet Attention Network for Video-Based Person Re-Identification

    Rui SUN  Qili LIANG  Zi YANG  Zhenghui ZHAO  Xudong ZHANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2021/07/21
      Page(s):
    1775-1779

    Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.

  • PSTNet: Crowd Flow Prediction by Pyramidal Spatio-Temporal Network

    Enze YANG  Shuoyan LIU  Yuxin LIU  Kai FANG  

     
    LETTER-Biocybernetics, Neurocomputing

      Pubricized:
    2021/04/12
      Page(s):
    1780-1783

    Crowd flow prediction in high density urban scenes is involved in a wide range of intelligent transportation and smart city applications, and it has become a significant topic in urban computing. In this letter, a CNN-based framework called Pyramidal Spatio-Temporal Network (PSTNet) for crowd flow prediction is proposed. Spatial encoding is employed for spatial representation of external factors, while prior pyramid enhances feature dependence of spatial scale distances and temporal spans, after that, post pyramid is proposed to fuse the heterogeneous spatio-temporal features of multiple scales. Experimental results based on TaxiBJ and MobileBJ demonstrate that proposed PSTNet outperforms the state-of-the-art methods.

  • Gradient Corrected Approximation for Binary Neural Networks

    Song CHENG  Zixuan LI  Yongsen WANG  Wanbing ZOU  Yumei ZHOU  Delong SHANG  Shushan QIAO  

     
    LETTER-Biocybernetics, Neurocomputing

      Pubricized:
    2021/07/05
      Page(s):
    1784-1788

    Binary neural networks (BNNs), where both activations and weights are radically quantized to be {-1, +1}, can massively accelerate the run-time performance of convolution neural networks (CNNs) for edge devices, by computation complexity reduction and memory footprint saving. However, the non-differentiable binarizing function used in BNNs, makes the binarized models hard to be optimized, and introduces significant performance degradation than the full-precision models. Many previous works managed to correct the backward gradient of binarizing function with various improved versions of straight-through estimation (STE), or in a gradual approximate approach, but the gradient suppression problem was not analyzed and handled. Thus, we propose a novel gradient corrected approximation (GCA) method to match the discrepancy between binarizing function and backward gradient in a gradual and stable way. Our work has two primary contributions: The first is to approximate the backward gradient of binarizing function using a simple leaky-steep function with variable window size. The second is to correct the gradient approximation by standardizing the backward gradient propagated through binarizing function. Experiment results show that the proposed method outperforms the baseline by 1.5% Top-1 accuracy on ImageNet dataset without introducing extra computation cost.

  • Fitness-Distance Balance with Functional Weights: A New Selection Method for Evolutionary Algorithms

    Kaiyu WANG  Sichen TAO  Rong-Long WANG  Yuki TODO  Shangce GAO  

     
    LETTER-Biocybernetics, Neurocomputing

      Pubricized:
    2021/07/21
      Page(s):
    1789-1792

    In 2019, a new selection method, named fitness-distance balance (FDB), was proposed. FDB has been proved to have a significant effect on improving the search capability for evolutionary algorithms. But it still suffers from poor flexibility when encountering various optimization problems. To address this issue, we propose a functional weights-enhanced FDB (FW). These functional weights change the original weights in FDB from fixed values to randomly generated ones by a distribution function, thereby enabling the algorithm to select more suitable individuals during the search. As a case study, FW is incorporated into the spherical search algorithm. Experimental results based on various IEEE CEC2017 benchmark functions demonstrate the effectiveness of FW.