Ryoma SENDA Yoshiaki TAKATA Hiroyuki SEKI
A pushdown system (PDS) is known as an abstract model of recursive programs. For PDS, model checking methods have been studied and applied to various software verification such as interprocedural data flow analysis and malware detection. However, PDS cannot manipulate data values from an infinite domain. A register PDS (RPDS) is an extension of PDS by adding registers to deal with data values in a restricted way. This paper proposes algorithms for LTL model checking problems for RPDS with simple and regular valuations, which are labelings of atomic propositions to configurations with reasonable restriction. First, we introduce RPDS and related models, and then define the LTL model checking problems for RPDS. Second, we give algorithms for solving these problems and also show that the problems are EXPTIME-complete. As practical examples, we show solutions of a malware detection and an XML schema checking in the proposed framework.
Xiang WANG Xin LU Meiming FU Jiayi LIU Hongyan YANG
Leveraging on Network Function Virtualization (NFV) and Software Defined Networking (SDN), network slicing (NS) is recognized as a key technology that enables the 5G Infrastructure Provider (InP) to support diversified vertical services over a shared common physical infrastructure. 5G end-to-end (E2E) NS is a logical virtual network that spans across the 5G network. Existing works on improving the reliability of the 5G mainly focus on reliable wireless communications, on the other hand, the reliability of an NS also refers to the ability of the NS system to provide continued service. Hence, in this work, we focus on enhancing the reliability of the NS to cope with physical network node failures, and we investigate the NS deployment problem to improve the reliability of the system represented by the NS. The reliability of an NS is enhanced by two means: firstly, by considering the topology information of an NS, critical virtual nodes are backed up to allow failure recovery; secondly, the embedding of the augmented NS virtual network is optimized for failure avoidance. We formulate the embedding of the augmented virtual network (AVN) to maximize the survivability of the NS system as the survivable AVN embedding (S-AVNE) problem through an Integer Linear Program (ILP) formulation. Due to the complexity of the problem, a heuristic algorithm is introduced. Finally, we conduct intensive simulations to evaluate the performance of our algorithm with regard to improving the reliability of the NS system.
Uuganbayar GANBOLD Junya SATO Takuya AKASHI
Horizon detection is useful in maritime image processing for various purposes, such as estimation of camera orientation, registration of consecutive frames, and restriction of the object search region. Existing horizon detection methods are based on edge extraction. For accuracy, they use multiple images, which are filtered with different filter sizes. However, this increases the processing time. In addition, these methods are not robust to blurting. Therefore, we developed a horizon detection method without extracting the candidates from the edge information by formulating the horizon detection problem as a global optimization problem. A horizon line in an image plane was represented by two parameters, which were optimized by an evolutionary algorithm (genetic algorithm). Thus, the local and global features of a horizon were concurrently utilized in the optimization process, which was accelerated by applying a coarse-to-fine strategy. As a result, we could detect the horizon line on high-resolution maritime images in about 50ms. The performance of the proposed method was tested on 49 videos of the Singapore marine dataset and the Buoy dataset, which contain over 16000 frames under different scenarios. Experimental results show that the proposed method can achieve higher accuracy than state-of-the-art methods.
This article proposes to apply the auto-correlation function (ACF), bispectrum analysis, and convolutional neural networks (CNN) to implement radar emitter identification (REI) based on intrapulse features. In this work, we combine ACF with bispectrum for signal feature extraction. We first calculate the ACF of each emitter signal, and then the bispectrum of the ACF and obtain the spectrograms. The spectrum images are taken as the feature maps of the radar emitters and fed into the CNN classifier to realize automatic identification. We simulate signal samples of different modulation types in experiments. We also consider the feature extraction method directly using bispectrum analysis for comparison. The simulation results demonstrate that by combining ACF with bispectrum analysis, the proposed scheme can attain stronger robustness to noise, the spectrograms of our approach have more pronounced features, and our approach can achieve better identification performance at low signal-to-noise ratios.
Yu YAO Yuena MA Jingjie LV Hao SONG Qiang FU
In this paper, a special class of two-generator quasi-twisted (QT) codes with index 2 will be presented. We explore the algebraic structure of the class of QT codes and the form of their Hermitian dual codes. A sufficient condition for self-orthogonality with Hermitian inner product is derived. Using the class of Hermitian self-orthogonal QT codes, we construct two new binary quantum codes [[70, 42, 7]]2, [[78, 30, 10]]2. According to Theorem 6 of Ref.[2], we further can get 9 new binary quantum codes. So a total of 11 new binary quantum codes are obtained and there are 10 quantum codes that can break the quantum Gilbert-Varshamov (GV) bound.
Tomoya ITSUBO Michihiro KOIBUCHI Hideharu AMANO Hiroki MATSUTANI
Since deep learning workloads perform a large number of matrix operations on training data, GPUs (Graphics Processing Units) are efficient especially for the training phase. A cluster of computers each of which equips multiple GPUs can significantly accelerate the deep learning workloads. More specifically, a back-propagation algorithm following a gradient descent approach is used for the training. Although the gradient computation is still a major bottleneck of the training, gradient aggregation and optimization impose both communication and computation overheads, which should also be reduced for further shortening the training time. To address this issue, in this paper, multiple GPUs are interconnected with a PCI Express (PCIe) over 10Gbit Ethernet (10GbE) technology. Since these remote GPUs are interconnected with network switches, gradient aggregation and optimizers (e.g., SGD, AdaGrad, Adam, and SMORMS3) are offloaded to FPGA-based 10GbE switches between remote GPUs; thus, the gradient aggregation and parameter optimization are completed in the network. The proposed FPGA-based 10GbE switches with the four optimizers are implemented on NetFPGA-SUME board. Their resource utilizations are increased by PEs for the optimizers, and they consume up to 56% of the resources. Evaluation results using four remote GPUs connected via the proposed FPGA-based switch demonstrate that these optimizers are accelerated by up to 3.0x and 1.25x compared to CPU and GPU implementations, respectively. Also, the gradient aggregation throughput by the FPGA-based switch achieves up to 98.3% of the 10GbE line rate.
Ryosuke KURAMOCHI Hiroki NAKAHARA
Convolutional neural networks (CNNs) are widely used for image processing tasks in both embedded systems and data centers. In data centers, high accuracy and low latency are desired for various tasks such as image processing of streaming videos. We propose an FPGA-based low-latency CNN inference for randomly wired convolutional neural networks (RWCNNs), whose layer structures are based on random graph models. Because RWCNNs have several convolution layers that have no direct dependencies between them, our architecture can process them efficiently using a pipeline method. At each layer, we need to use the calculation results of multiple layers as the input. We use an FPGA with HBM2 to enable parallel access to the input data with multiple HBM2 channels. We schedule the order of execution of the layers to improve the pipeline efficiency. We build a conflict graph using the scheduling results. Then, we allocate the calculation results of each layer to the HBM2 channels by coloring the graph. Because the pipeline execution needs to be properly controlled, we developed an automatic generation tool for hardware functions. We implemented the proposed architecture on the Alveo U50 FPGA. We investigated a trade-off between latency and recognition accuracy for the ImageNet classification task by comparing the inference performances for different input image sizes. We compared our accelerator with a conventional accelerator for ResNet-50. The results show that our accelerator reduces the latency by 2.21 times. We also obtained 12.6 and 4.93 times better efficiency than CPU and GPU, respectively. Thus, our accelerator for RWCNNs is suitable for low-latency inference.
Computing the Lempel-Ziv Factorization (LZ77) of a string is one of the most important problems in computer science. Nowadays, it has been widely used in many applications such as data compression, text indexing and pattern discovery, and already become the heart of many file compressors like gzip and 7zip. In this paper, we show a linear time algorithm called Xone for computing the LZ77, which has the same space requirement with the previous best space requirement for linear time LZ77 factorization called BGone. Xone greatly improves the efficiency of BGone. Experiments show that the two versions of Xone: XoneT and XoneSA are about 27% and 31% faster than BGoneT and BGoneSA, respectively.
Sashi NOVITASARI Sakriani SAKTI Satoshi NAKAMURA
Real-time machine speech translation systems mimic human interpreters and translate incoming speech from a source language to the target language in real-time. Such systems can be achieved by performing low-latency processing in ASR (automatic speech recognition) module before passing the output to MT (machine translation) and TTS (text-to-speech synthesis) modules. Although several studies recently proposed sequence mechanisms for neural incremental ASR (ISR), these frameworks have a more complicated training mechanism than the standard attention-based ASR because they have to decide the incremental step and learn the alignment between speech and text. In this paper, we propose attention-transfer ISR (AT-ISR) that learns the knowledge from attention-based non-incremental ASR for a low delay end-to-end speech recognition. ISR comes with a trade-off between delay and performance, so we investigate how to reduce AT-ISR delay without a significant performance drop. Our experiment shows that AT-ISR achieves a comparable performance to the non-incremental ASR when the incremental recognition begins after the speech utterance reaches 25% of the complete utterance length. Additional experiments to investigate the effect of ISR on translation tasks are also performed. The focus is to find the optimum granularity of the output unit. The results reveal that our end-to-end subword-level ISR resulted in the best translation quality with the lowest WER and the lowest uncovered-word rate.
Bitcoin is one of popular cryptocurrencies widely used over the world, and its blockchain technology has attracted considerable attention. In Bitcoin system, it has been reported that transactions are prioritized according to transaction fees, and that transactions with high priorities are likely to be confirmed faster than those with low priorities. In this paper, we consider performance modeling of Bitcoin-blockchain system in order to characterize the transaction-confirmation time. We first introduce the Bitcoin system, focusing on proof-of-work, the consensus mechanism of Bitcoin blockchain. Then, we show some queueing models and its analytical results, discussing the implications and insights obtained from the queueing models.
Kimiko MOTONAKA Tomoya KOSEKI Yoshinobu KAJIKAWA Seiji MIYOSHI
The Volterra filter is one of the digital filters that can describe nonlinearity. In this paper, we analyze the dynamic behaviors of an adaptive signal-processing system including the Volterra filter by a statistical-mechanical method. On the basis of the self-averaging property that holds when the tapped delay line is assumed to be infinitely long, we derive simultaneous differential equations in a deterministic and closed form, which describe the behaviors of macroscopic variables. We obtain the exact solution by solving the equations analytically. In addition, the validity of the theory derived is confirmed by comparison with numerical simulations.
Zhiyuan JIANG Yijie HUANG Shunqing ZHANG Shugong XU
In a heterogeneous unreliable multiaccess network, wherein terminals share a common wireless channel with distinct error probabilities, existing works have shown that a persistent round-robin (RR-P) scheduling policy can be arbitrarily worse than the optimum in terms of Age of Information (AoI) under standard Automatic Repeat reQuest (ARQ). In this paper, practical Hybrid ARQ (HARQ) schemes which are widely-used in today's wireless networks are considered. We show that RR-P is very close to optimum with asymptotically many terminals in this case, by explicitly deriving tight, closed-form AoI gaps between optimum and achievable AoI by RR-P. In particular, it is rigorously proved that for RR-P, under HARQ models concerning fading channels (resp. finite-blocklength regime), the relative AoI gap compared with the optimum is within a constant of 6.4% (resp. 6.2% with error exponential decay rate of 0.5). In addition, RR-P enjoys the distinctive advantage of implementation simplicity with channel-unaware and easy-to-decentralize operations, making it favorable in practice. A further investigation considering constraint imposed on the number of retransmissions is presented. The performance gap is indicated through numerical simulations.
Ruicong ZHI Caixia ZHOU Junwei YU Tingting LI Ghada ZAMZMI
Pain is an essential physiological phenomenon of human beings. Accurate assessment of pain is important to develop proper treatment. Although self-report method is the gold standard in pain assessment, it is not applicable to individuals with communicative impairment. Non-verbal pain indicators such as pain related facial expressions and changes in physiological parameters could provide valuable insights for pain assessment. In this paper, we propose a multimodal-based Stream Integrated Neural Network with Different Frame Rates (SINN) that combines facial expression and biomedical signals for automatic pain assessment. The main contributions of this research are threefold. (1) There are four-stream inputs of the SINN for facial expression feature extraction. The variant facial features are integrated with biomedical features, and the joint features are utilized for pain assessment. (2) The dynamic facial features are learned in both implicit and explicit manners to better represent the facial changes that occur during pain experience. (3) Multiple modalities are utilized to identify various pain states, including facial expression and biomedical signals. The experiments are conducted on publicly available pain datasets, and the performance is compared with several deep learning models. The experimental results illustrate the superiority of the proposed model, and it achieves the highest accuracy of 68.2%, which is up to 5% higher than the basic deep learning models on pain assessment with binary classification.
Hiroshi HASEGAWA Takuma YASUDA Yojiro MORI Ken-ichi SATO
We propose an efficient network upgrade and expansion method that can make the most of the next generation channel resources to accommodate further increases in traffic. Semi-flexible grid configuration and two cost metrics are introduced to establish a regularity in frequency assignment and minimize disturbance in the upgrade process; both reduce the fragmentation in frequency assignment and the number of fibers necessary. Various investigations of different configurations elucidate that the number of fibers necessary is reduced about 10-15% for any combination of upgrade scenario, channel frequency bandwidth, and topology adopted.
Shuhei NISHIYAMA Chonho LEE Tomohiro MASHITA
In this work, an optimization method for the 3D container loading problem with multiple constraints is proposed. The method consists of a genetic algorithm to generate an arrangement of cargo and a fitness evaluation using a physics simulation. The fitness function considers not only the maximization of the container density and fitness value but also several different constraints such as weight, stack-ability, fragility, and orientation of cargo pieces. We employed a container shaking simulation for the fitness evaluation to include constraint effects during loading and transportation. We verified that the proposed method successfully provides the optimal cargo arrangement for small-scale problems with about 10 pieces of cargo.
Lijun GAO Zhenyi BIAN Maode MA
DoS (Denial of Service) attacks are becoming one of the most serious security threats to global networks. We analyze the existing DoS detection methods and defense mechanisms in depth. In recent years, K-Means and improved variants have been widely examined for security intrusion detection, but the detection accuracy to data is not satisfactory. In this paper we propose a multi-dimensional space feature vector expansion K-Means model to detect threats in the network environment. The model uses a genetic algorithm to optimize the weight of K-Means multi-dimensional space feature vector, which greatly improves the detection rate against 6 typical Dos attacks. Furthermore, in order to verify the correctness of the model, this paper conducts a simulation on the NSL-KDD data set. The results show that the algorithm of multi-dimensional space feature vectors expansion K-Means improves the recognition accuracy to 96.88%. Furthermore, 41 kinds of feature vectors in NSL-KDD are analyzed in detail according to a large number of experimental training. The feature vector of the probability positive return of security attack detection is accurately extracted, and a comparison chart is formed to support subsequent research. A theoretical analysis and experimental results show that the multi-dimensional space feature vector expansion K-Means algorithm has a good application in the detection of DDos attacks.
You GAO Yun-Fei YAO Lin-Zhi SHEN
Permutation polynomials over finite fields have been widely studied due to their important applications in mathematics and cryptography. In recent years, 2-to-1 mappings over finite fields were proposed to build almost perfect nonlinear functions, bent functions, and the semi-bent functions. In this paper, we generalize the 2-to-1 mappings to m-to-1 mappings, including their construction methods. Some applications of m-to-1 mappings are also discussed.
Michiharu NAKAMURA Eisuke FUKUDA Yoshimasa DAIDO Keiichi MIZUTANI Takeshi MATSUMURA Hiroshi HARADA
Non-linear behavioral models play a key role in designing digital pre-distorters (DPDs) for non-linear power amplifiers (NLPAs). In general, more complex behavioral models have better capability, but they should be converted into simpler versions to assist implementation. In this paper, a conversion from a complex fifth order inverse of a parallel Wiener (PRW) model to a simpler memory polynomial (MP) model is developed by using frequency domain expressions. In the developed conversion, parameters of the converted MP model are calculated from those of original fifth order inverse and frequency domain statistics of the transmit signal. Since the frequency domain statistics of the transmit signal can be precalculated, the developed conversion is deterministic, unlike the conventional conversion that identifies a converted model from lengthy input and output data. Computer simulations are conducted to confirm that conversion error is sufficiently small and the converted MP model offers equivalent pre-distortion to the original fifth order inverse.
Ryosuke SUGA Kazuto OSHIMA Tomoki UWANO
In this paper, a planar balun having simple and compact features with slit ground was proposed. The operating frequency can be designed by the length and position of the defected ground slits. The 20 dB bandwidth of the common mode rejection ratio of the measuring balun was over 90%.
Hongjie XU Jun SHIOMI Hidetoshi ONODERA
Hardware accelerators are designed to support a specialized processing dataflow for everchanging deep neural networks (DNNs) under various processing environments. This paper introduces two hardware properties to describe the cost of data movement in each memory hierarchy. Based on the hardware properties, this paper proposes a set of evaluation metrics that are able to evaluate the number of memory accesses and the required memory capacity according to the specialized processing dataflow. Proposed metrics are able to analytically predict energy, throughput, and area of a hardware design without detailed implementation. Once a processing dataflow and constraints of hardware resources are determined, the proposed evaluation metrics quickly quantify the expected hardware benefits, thereby reducing design time.