Stanislav SEDUKHIN Yoichi TOMIOKA Kohei YAMAMOTO
In this paper, starting from the algorithm, a performance- and energy-efficient 3D structure or shape of the Tensor Processing Engine (TPE) for CNN acceleration is systematically searched and evaluated. An optimal accelerator's shape maximizes the number of concurrent MAC operations per clock cycle while minimizes the number of redundant operations. The proposed 3D vector-parallel TPE architecture with an optimal shape can be very efficiently used for considerable CNN acceleration. Due to implemented support of inter-block image data independency, it is possible to use multiple of such TPEs for the additional CNN acceleration. Moreover, it is shown that the proposed TPE can also be uniformly used for acceleration of the different CNN models such as VGG, ResNet, YOLO, and SSD. We also demonstrate that our theoretical efficiency analysis is matched with the result of a real implementation for an SSD model to which a state-of-the-art channel pruning technique is applied.
Shuhei TAMATE Yutaka TABUCHI Yasunobu NAKAMURA
In this paper, we review the basic components of superconducting quantum computers. We mainly focus on the packaging and wiring technologies required to realize large-scalable superconducting quantum computers.
The objective of critical nodes problem is to minimize pair-wise connectivity as a result of removing a specific number of nodes in the residual graph. From a mathematical modeling perspective, it comes the truth that the more the number of fragmented components and the evenly distributed of disconnected sub-graphs, the better the quality of the solution. Basing on this conclusion, we proposed a new Cluster Expansion Method for Critical Node Problem (CEMCNP), which on the one hand exploits a contraction mechanism to greedy simplify the complexity of sparse graph model, and on the other hand adopts an incremental cluster expansion approach in order to maintain the size of formed component within reasonable limitation. The proposed algorithm also relies heavily on the idea of multi-start iterative local search algorithm, whereas brings in a diversified late acceptance local search strategy to keep the balance between interleaving diversification and intensification in the process of neighborhood search. Extensive evaluations show that CEMCNP running on 35 of total 42 benchmark instances are superior to the outcome of KBV, while holding 3 previous best results out of the challenging instances. In addition, CEMCNP also demonstrates equivalent performance in comparison with the existing MANCNP and VPMS algorithms over 22 of total 42 graph models with fewer number of node exchange operations.
In this study, we aim to improve the performance of audio source separation for monaural mixture signals. For monaural audio source separation, semisupervised nonnegative matrix factorization (SNMF) can achieve higher separation performance by employing small supervised signals. In particular, penalized SNMF (PSNMF) with orthogonality penalty is an effective method. PSNMF forces two basis matrices for target and nontarget sources to be orthogonal to each other and improves the separation accuracy. However, the conventional orthogonality penalty is based on an inner product and does not affect the estimation of the basis matrix properly because of the scale indeterminacy between the basis and activation matrices in NMF. To cope with this problem, a new PSNMF with cosine similarity between the basis matrices is proposed. The experimental comparison shows the efficacy of the proposed cosine similarity penalty in supervised audio source separation.
Yu DAI Zijian ZHOU Fangguo ZHANG Chang-An ZHAO
Pairing computations on elliptic curves with odd prime degrees are rarely studied as low efficiency. Recently, Clarisse, Duquesne and Sanders proposed two new curves with odd prime embedding degrees: BW13-P310 and BW19-P286, which are suitable for some special cryptographic schemes. In this paper, we propose efficient methods to compute the optimal ate pairing on this types of curves, instantiated by the BW13-P310 curve. We first extend the technique of lazy reduction into the finite field arithmetic. Then, we present a new method to execute Miller's algorithm. Compared with the standard Miller iteration formulas, the new ones provide a more efficient software implementation of pairing computations. At last, we also give a fast formula to perform the final exponentiation. Our implementation results indicate that it can be computed efficiently, while it is slower than that over the (BLS12-P446) curve at the same security level.
Ryota YOSHIMURA Ichiro MARUTA Kenji FUJIMOTO Ken SATO Yusuke KOBAYASHI
Particle filters have been widely used for state estimation problems in nonlinear and non-Gaussian systems. Their performance depends on the given system and measurement models, which need to be designed by the user for each target system. This paper proposes a novel method to design these models for a particle filter. This is a numerical optimization method, where the particle filter design process is interpreted into the framework of reinforcement learning by assigning the randomnesses included in both models of the particle filter to the policy of reinforcement learning. In this method, estimation by the particle filter is repeatedly performed and the parameters that determine both models are gradually updated according to the estimation results. The advantage is that it can optimize various objective functions, such as the estimation accuracy of the particle filter, the variance of the particles, the likelihood of the parameters, and the regularization term of the parameters. We derive the conditions to guarantee that the optimization calculation converges with probability 1. Furthermore, in order to show that the proposed method can be applied to practical-scale problems, we design the particle filter for mobile robot localization, which is an essential technology for autonomous navigation. By numerical simulations, it is demonstrated that the proposed method further improves the localization accuracy compared to the conventional method.
Da LI Yuanyuan WANG Rikuya YAMAMOTO Yukiko KAWAI Kazutoshi SUMIYA
Recently, machine learning approaches and user movement history analysis on mobile devices have attracted much attention. Generally, we need to apply text data into the word embedding tool for acquiring word vectors as the preprocessing of machine learning approaches. However, it is difficult for mobile devices to afford the huge cost of high-dimensional vector calculation. Thus, a low-cost user behavior and user movement history analysis approach should be considered. To address this issue, firstly, we convert the zip code and street house number into vectors instead of textual address information to reduce the cost of spatial vector calculation. Secondly, we propose a low-cost high-performance semantic and physical distance (real distance) calculation method that applied zip-code-based vectors. Finally, to verify the validity of our proposed method, we utilize the US zip code data to calculate both semantic and physical distances and compare their results with the previous method. The experimental results showed that our proposed method could significantly improve the performance of distance calculation and effectively control the cost to a low level.
Wen SHAO Rei KAWAKAMI Takeshi NAEMURA
Previous studies on anomaly detection in videos have trained detectors in which reconstruction and prediction tasks are performed on normal data so that frames on which their task performance is low will be detected as anomalies during testing. This paper proposes a new approach that involves sorting video clips, by using a generative network structure. Our approach learns spatial contexts from appearances and temporal contexts from the order relationship of the frames. Experiments were conducted on four datasets, and we categorized the anomalous sequences by appearance and motion. Evaluations were conducted not only on each total dataset but also on each of the categories. Our method improved detection performance on both anomalies with different appearance and different motion from normality. Moreover, combining our approach with a prediction method produced improvements in precision at a high recall.
Convolutional Neural Network (CNN) has made extraordinary progress in image classification tasks. However, it is less effective to use CNN directly to detect image manipulation. To address this problem, we propose an image filtering layer and a multi-scale feature fusion module which can guide the model more accurately and effectively to perform image manipulation detection. Through a series of experiments, it is shown that our model achieves improvements on image manipulation detection compared with the previous researches.
Hao FANG Chi-Hua CHEN Dewang CHEN Feng-Jang HWANG
Aiming for accurate data-driven predictions for the passenger walking time, this study proposes a novel neuron-network-based mixture probability (NNBMP) model with repetition learning (RL) to estimate the probability density distribution of passenger walking time (PWT) in the metro station. Our conducted experiments for Fuzhou metro stations demonstrate that the proposed NNBMP-RL model achieved the mean absolute error, mean square error, and mean absolute percentage error of 0.0078, 1.33 × 10-4, and 19.41%, respectively, and it outperformed all the seven compared models. The developed NNBMP model fitting accurately the PWT distribution in the metro station is readily applicable to the microscopic analyses of passenger flow.
This letter proposes a post-processing method to improve the smoothness and safety of the path for an autonomous vehicle navigating in an urban environment. The proposed method transforms the initial path given by local path planning algorithms using a stochastic approach to improve its smoothness and safety. Using the proposed method, the initial path is efficiently transformed by iteratively updating the position of each waypoint within it. The proposed method also guarantees the feasibility of the transformed path. Experimental results verify that the proposed method can improve the smoothness and safety of the initial path and ensure the feasibility of the transformed path.
Yanyan ZHANG Meiling SHEN Wensheng YANG
We propose a target detection network (RMF-Net) based on the multi-scale strategy to solve the problems of large differences in the detection scale and mutual occlusion, which result in inaccurate locations. A multi-layer feature fusion module and multi-expansion dilated convolution pyramid module were designed based on the ResNet-101 residual network. The ability of the network to express the multi-scale features of the target could be improved by combining the shallow and deep features of the target and expanding the receptive field of the network. Moreover, RoI Align pooling was introduced to reduce the low accuracy of the anchor frame caused by multiple quantizations for improved positioning accuracy. Finally, an AD-IoU loss function was designed, which can adaptively optimise the distance between the prediction box and real box by comprehensively considering the overlap rate, centre distance, and aspect ratio between the boxes and can improve the detection accuracy of the occlusion target. Ablation experiments on the RMF-Net model verified the effectiveness of each factor in improving the network detection accuracy. Comparative experiments were conducted on the Pascal VOC2007 and Pascal VOC2012 datasets with various target detection algorithms based on convolutional neural networks. The results demonstrated that RMF-Net exhibited strong scale adaptability at different occlusion rates. The detection accuracy reached 80.4% and 78.5% respectively.
Xiuping PENG Hongxiao LI Hongbin LIN
In this letter, the almost binary sequence (sequence with a single zero element) is considered as a special class of binary sequence. Four new bounds on the cross-correlation of balanced (almost) binary sequences with period Q ≡ 1(mod 4) under the precondition of out-of-phase autocorrelation values {-1} or {1, -3} are firstly presented. Then, seven new pairs of balanced (almost) binary sequences of period Q with ideal or optimal autocorrelation values and meeting the lower cross-correlation bounds are proposed by using cyclotomic classes of order 4. These new bounds of (almost) binary sequences with period Q achieve smaller maximum out-of-phase autocorrelation values and cross-correlation values.
Tongzhou QU Zibin DAI Yanjiang LIU Lin CHEN Xianzhao XIA
The existing research on Amdahl's law is limited to multi/many-core processors, and cannot be applied to the important parallel processing architecture of coarse-grained reconfigurable arrays. This paper studies the relation between the multi-level parallelism of block cipher algorithms and the architectural characteristics of coarse-grain reconfigurable arrays. We introduce the key variables that affect the performance of reconfigurable arrays, such as communication overhead and configuration overhead, into Amdahl's law. On this basis, we propose a performance model for coarse-grain reconfigurable block cipher array (CGRBA) based on the extended Amdahl's law. In addition, this paper establishes the optimal integer nonlinear programming model, which can provide a parameter reference for the architecture design of CGRBA. The experimental results show that: (1) reducing the communication workload ratio and increasing the number of configuration pages reasonably can significantly improve the algorithm performance on CGRBA; (2) the communication workload ratio has a linear effect on the execution time.
Fei ZHANG Peining ZHEN Dishan JING Xiaotang TANG Hai-Bao CHEN Jie YAN
Intrusion is one of major security issues of internet with the rapid growth in smart and Internet of Thing (IoT) devices, and it becomes important to detect attacks and set out alarm in IoT systems. In this paper, the support vector machine (SVM) and principal component analysis (PCA) based method is used to detect attacks in smart IoT systems. SVM with nonlinear scheme is used for intrusion classification and PCA is adopted for feature selection on the training and testing datasets. Experiments on the NSL-KDD dataset show that the test accuracy of the proposed method can reach 82.2% with 16 features selected from PCA for binary-classification which is almost the same as the result obtained with all the 41 features; and the test accuracy can achieve 78.3% with 29 features selected from PCA for multi-classification while 79.6% without feature selection. The Denial of Service (DoS) attack detection accuracy of the proposed method can achieve 8.8% improvement compared with existing artificial neural network based method.
This paper evaluates the bluetooth low energy (BLE) positioning systems using the sparse-training data through the comparison experiments. The sparse-training data is extracted from the database including enough data for realizing the highly accurate and precise positioning. First, we define the sparse-training data, i.e., the data collection time and the number of smartphones, directions, beacons, and reference points, on BLE positioning systems. Next, the positioning performance evaluation experiments are conducted in two indoor environments, that is, an indoor corridor as a one-dimensionally spread environment and a hall as a twodimensionally spread environment. The algorithms for comparison are the conventional fingerprint algorithm and the hybrid algorithm (the authors already proposed, and combined the proximity algorithm and the fingerprint algorithm). Based on the results, we confirm that the hybrid algorithm performs well in many cases even when using sparse-training data. Consequently, the robustness of the hybrid algorithm, that the authors already proposed for the sparse-training data, is shown.
Jinkyu KANG Seongah JEONG Hoojin LEE
In this letter, we analyze the error rate performance of M-ary coherent free-space optical (FSO) communications under strong atmospheric turbulence. Specifically, we derive the exact error rates for M-ary phase shift keying (MPSK) and M-ary quadrature amplitude modulation (MQAM) based on moment-generating function (MGF) with negative exponential distributed turbulence, where maximum ratio combining (MRC) receiver is adopted to mitigate the turbulence effects. Additionally, by evaluating the asymptotic error rate in high signal-to-noise ratio (SNR) regime, it is possible to effectively investigate and predict the error rate performance for various system configurations. The accuracy and the effectiveness of our theoretical analyses are verified via numerical results.
Zhimin GUO Jianfei CHEN Sheng ZHANG
Millimeter wave synthetic aperture interferometric radiometers (SAIR) are very powerful instruments, which can effectively realize high-precision imaging detection. However due to the existence of interference factor and complex near-field error, the imaging effect of near-field SAIR is usually not ideal. To achieve better imaging results, a new fully connected imaging network (FCIN) is proposed for near-field SAIR. In FCIN, the fully connected network is first used to reconstruct the image domain directly from the visibility function, and then the residual dense network is used for image denoising and enhancement. The simulation results show that the proposed FCIN method has high imaging accuracy and shorten imaging time.
Wen SHI Jianling LIU Jingyu ZHANG Yuran MEN Hongwei CHEN Deke WANG Yang CAO
Syndrome is a crucial principle of Traditional Chinese Medicine. Formula classification is an effective approach to discover herb combinations for the clinical treatment of syndromes. In this study, a local search based firefly algorithm (LSFA) for parameter optimization and feature selection of support vector machines (SVMs) for formula classification is proposed. Parameters C and γ of SVMs are optimized by LSFA. Meanwhile, the effectiveness of herbs in formula classification is adopted as a feature. LSFA searches for well-performing subsets of features to maximize classification accuracy. In LSFA, a local search of fireflies is developed to improve FA. Simulations demonstrate that the proposed LSFA-SVM algorithm outperforms other classification algorithms on different datasets. Parameters C and γ and the features are optimized by LSFA to obtain better classification performance. The performance of FA is enhanced by the proposed local search mechanism.
Software-defined networking (SDN) decouples the control and forwarding of network devices, providing benefits such as simplified control. However, due to cost constraints and other factors, SDN is difficult to fully deploy. It has been proposed that SDN devices can be incrementally deployed in a traditional IP network, i.e., hybrid SDN, to provide partial SDN benefits. Studies have shown that better traffic engineering performance can be achieved by modifying the coverage and placement of SDN devices in hybrid SDN, because they can influence the behavior of legacy switches through certain strategies. However, it is difficult to develop and execute a traffic engineering strategy in hybrid SDN. This article proposes a routing algorithm to achieve approximate load balancing, which minimizes the maximum link utilization by using the optimal solution of linear programming and merging the minimum split traffic flows. A multipath forwarding mechanism under the same problem is designed to optimize transmission time. Experiments show that our algorithm has certain advantages in link utilization and transmission time compared to traditional distributed routing algorithms like OSPF and some hybrid SDN routing mechanisms. Furthermore, our algorithm can approximate the control effect of full SDN when the deployment rate of SDN devices is 40%.