Masayuki SHIMODA Youki SADA Ryosuke KURAMOCHI Shimpei SATO Hiroki NAKAHARA
In the realization of convolutional neural networks (CNNs) in resource-constrained embedded hardware, the memory footprint of weights is one of the primary problems. Pruning techniques are often used to reduce the number of weights. However, the distribution of nonzero weights is highly skewed, which makes it more difficult to utilize the underlying parallelism. To address this problem, we present SENTEI*, filter-wise pruning with distillation, to realize hardware-aware network architecture with comparable accuracy. The filter-wise pruning eliminates weights such that each filter has the same number of nonzero weights, and retraining with distillation retains the accuracy. Further, we develop a zero-weight skipping inter-layer pipelined accelerator on an FPGA. The equalization enables inter-filter parallelism, where a processing block for a layer executes filters concurrently with straightforward architecture. Our evaluation of semantic-segmentation tasks indicates that the resulting mIoU only decreased by 0.4 points. Additionally, the speedup and power efficiency of our FPGA implementation were 33.2× and 87.9× higher than those of the mobile GPU. Therefore, our technique realizes hardware-aware network with comparable accuracy.
Hiroyuki OKUDA Nobuto SUGIE Tatsuya SUZUKI Kentaro HARAGUCHI Zibo KANG
Path planning and motion control are fundamental components to realize safe and reliable autonomous driving. The discrimination of the role of these two components, however, is somewhat obscure because of strong mathematical interaction between these two components. This often results in a redundant computation in the implementation. One of attracting idea to overcome this redundancy is a simultaneous path planning and motion control (SPPMC) based on a model predictive control framework. SPPMC finds the optimal control input considering not only the vehicle dynamics but also the various constraints which reflect the physical limitations, safety constraints and so on to achieve the goal of a given behavior. In driving in the real traffic environment, decision making has also strong interaction with planning and control. This is much more emphasized in the case that several tasks are switched in some context to realize higher-level tasks. This paper presents a basic idea to integrate decision making, path planning and motion control which is able to be executed in realtime. In particular, lane-changing behavior together with the decision of its initiation is selected as the target task. The proposed idea is based on the nonlinear model predictive control and appropriate switching of the cost function and constraints in it. As the result, the decision of the initiation, planning, and control of the lane-changing behavior are achieved by solving a single optimization problem under several constraints such as safety. The validity of the proposed method is tested by using a vehicle simulator.
Chenxu WANG Yutong LU Zhiguang CHEN Junnan LI
Training deep learning (DL) is a computationally intensive process; as a result, training time can become so long that it impedes the development of DL. High performance computing clusters, especially supercomputers, are equipped with a large amount of computing resources, storage resources, and efficient interconnection ability, which can train DL networks better and faster. In this paper, we propose a method to train DL networks distributed with high efficiency. First, we propose a hierarchical synchronous Stochastic Gradient Descent (SGD) strategy, which can make full use of hardware resources and greatly increase computational efficiency. Second, we present a two-level parameter synchronization scheme which can reduce communication overhead by transmitting parameters of the first layer models in shared memory. Third, we optimize the parallel I/O by making each reader read data as continuously as possible to avoid the high overhead of discontinuous data reading. At last, we integrate the LARS algorithm into our system. The experimental results demonstrate that our approach has tremendous performance advantages relative to unoptimized methods. Compared with the native distributed strategy, our hierarchical synchronous SGD strategy (HSGD) can increase computing efficiency by about 20 times.
Haitao XIE Qingtao FAN Qian XIAO
Nowadays recommender systems (RS) keep drawing attention from academia, and collaborative filtering (CF) is the most successful technique for building RS. To overcome the inherent limitation, which is referred to as data sparsity in CF, various solutions are proposed to incorporate additional social information into recommendation processes, such as trust networks. However, existing methods suffer from multi-source data integration (i.e., fusion of social information and ratings), which is the basis for similarity calculation of user preferences. To this end, we propose a social collaborative filtering method based on novel trust metrics. Firstly, we use Graph Convolutional Networks (GCNs) to learn the associations between social information and user ratings while considering the underlying social network structures. Secondly, we measure the direct-trust values between neighbors by representing multi-source data as user ratings on popular items, and then calculate the indirect-trust values based on trust propagations. Thirdly, we employ all trust values to create a social regularization in user-item rating matrix factorization in order to avoid overfittings. The experiments on real datasets show that our approach outperforms the other state-of-the-art methods on usage of multi-source data to alleviate data sparsity.
Tomohiro TAKAHASHI Katsumi KONISHI Kazunori URUMA Toshihiro FURUKAWA
This paper proposes an image inpainting algorithm based on multiple linear models and matrix rank minimization. Several inpainting algorithms have been previously proposed based on the assumption that an image can be modeled using autoregressive (AR) models. However, these algorithms perform poorly when applied to natural photographs because they assume that an image is modeled by a position-invariant linear model with a fixed model order. In order to improve inpainting quality, this work introduces a multiple AR model and proposes an image inpainting algorithm based on multiple matrix rank minimization with sparse regularization. In doing so, a practical algorithm is provided based on the iterative partial matrix shrinkage algorithm, with numerical examples showing the effectiveness of the proposed algorithm.
How to restore virtual network against substrate network failure (e.g. link cut) is one of the key challenges of network virtualization. The traditional virtual network recovery (VNR) methods are mostly based on the idea of centralized control. However, if multiple virtual networks fail at the same time, their recovery processes are usually queued according to a specific priority, which may increase the average waiting time of users. In this letter, we study distributed virtual network recovery (DVNR) method to improve the virtual network recovery efficiency. We establish exclusive virtual machine (VM) for each virtual network and process recovery requests of multiple virtual networks in parallel. Simulation results show that the proposed DVNR method can obtain recovery success rate closely to centralized VNR method while yield ~70% less average recovery time.
Takahiro OGAWA Keisuke MAEDA Miki HASEYAMA
An inpainting method via sparse representation based on a new phaseless quality metric is presented in this paper. Since power spectra, phaseless features, of local regions within images enable more successful representation of their texture characteristics compared to their pixel values, a new quality metric based on these phaseless features is newly derived for image representation. Specifically, the proposed method enables spare representation of target signals, i.e., target patches, including missing intensities by monitoring errors converged by phase retrieval as the novel phaseless quality metric. This is the main contribution of our study. In this approach, the phase retrieval algorithm used in our method has the following two important roles: (1) derivation of the new quality metric that can be derived even for images including missing intensities and (2) conversion of phaseless features, i.e., power spectra, to pixel values, i.e., intensities. Therefore, the above novel approach solves the existing problem of not being able to use better features or better quality metrics for inpainting. Results of experiments showed that the proposed method using sparse representation based on the new phaseless quality metric outperforms previously reported methods that directly use pixel values for inpainting.
Lucas Saad Nogueira NUNES Jacir Luiz BORDIM Yasuaki ITO Koji NAKANO
The volume of digital information is growing at an extremely fast pace which, in turn, exacerbates the need of efficient mechanisms to find the presence of a pattern in an input text or a set of input strings. Combining the processing power of Graphics Processing Unit (GPU) with matching algorithms seems a natural alternative to speedup the string-matching process. This work proposes a Parallel Rabin-Karp implementation (PRK) that encompasses a fast-parallel prefix-sums algorithm to maximize parallelization and accelerate the matching verification. Given an input text T of length n and p patterns of length m, the proposed implementation finds all occurrences of p in T in O(m+q+n/τ+nm/q) time, where q is a sufficiently large prime number and τ is the available number of threads. Sequential and parallel versions of the PRK have been implemented. Experiments have been executed on p≥1 patterns of length m comprising of m=10, 20, 30 characters which are compared against a text string of length n=227. The results show that the parallel implementation of the PRK algorithm on NVIDIA V100 GPU provides speedup surpassing 372 times when compared to the sequential implementation and speedup of 12.59 times against an OpenMP implementation running on a multi-core server with 128 threads. Compared to another prominent GPU implementation, the PRK implementation attained speedup surpassing 37 times.
Nozomi HAGA Jerdvisanop CHAKAROTHAI Keisuke KONNO
The impedance expansion method (IEM) is a circuit-modeling technique for electrically small devices based on the method of moments. In a previous study, a circuit model of a wireless power transfer (WPT) system was developed by utilizing the IEM and eigenmode analysis. However, this technique assumes that all the coupling elements (e.g., feeding loops and resonant coils) are in the absence of neighboring scatters (e.g., bodies of vehicles). This study extends the theory of the IEM to obtain the circuit model of a WPT system in the vicinity of a perfectly conducting scatterer (PCS). The numerical results show that the proposed method can be applied to the frequencies at which the dimension of the PCS is less than approximately a quarter wavelength. In addition, the yielded circuit model is found to be valid at the operating frequency band.
Takumi FUJITSUKA Keigo TAKEUCHI
Pilot contamination is addressed in massive multiple-input multiple-output (MIMO) uplink. The main ideas of pilot decontamination are twofold: One is to design transmission timing of pilot sequences such that the pilot transmission periods in different cells do not fully overlap with each other, as considered in previous works. The other is joint channel and data estimation via approximate message-passing (AMP) for bilinear inference. The convergence property of conventional AMP is bad in bilinear inference problems, so that adaptive damping was required to help conventional AMP converge. The main contribution of this paper is a modification of the update rules in conventional AMP to improve the convergence property of AMP. Numerical simulations show that the proposed AMP outperforms conventional AMP in terms of estimation performance when adaptive damping is not used. Furthermore, it achieves better performance than state-of-the-art methods based on subspace estimation when the power difference between cells is small.
Xiaoxuan GUO Renxi GONG Haibo BAO Zhenkun LU
It is well known that the large-scale access of wind power to the power system will affect the economic and environmental objectives of power generation scheduling, and also bring new challenges to the traditional deterministic power generation scheduling because of the intermittency and randomness of wind power. In order to deal with these problems, a multiobjective optimization dispatch method of wind-thermal power system is proposed. The method can be described as follows: A multiobjective interval power generation scheduling model of wind-thermal power system is firstly established by describing the wind speed on wind farm as an interval variable, and the minimization of fuel cost and pollution gas emission cost of thermal power unit is chosen as the objective functions. And then, the optimistic and pessimistic Pareto frontiers of the multi-objective interval power generation scheduling are obtained by utilizing an improved normal boundary intersection method with a normal boundary intersection (NBI) combining with a bilevel optimization method to solve the model. Finally, the optimistic and pessimistic compromise solutions is determined by a distance evaluation method. The calculation results of the 16-unit 174-bus system show that by the proposed method, a uniform optimistic and pessimistic Pareto frontier can be obtained, the analysis of the impact of wind speed interval uncertainty on the economic and environmental indicators can be quantified. In addition, it has been verified that the Pareto front in the actual scenario is distributed between the optimistic and pessimistic Pareto front, and the influence of different wind power access levels on the optimistic and pessimistic Pareto fronts is analyzed.
Tsutomu INAMOTO Yoshinobu HIGAMI
In this paper, we aim to develop technologies for the circuit fault diagnosis and propose a formulation of a measure of a test pattern for the circuit fault diagnosis. Given a faulty circuit, the fault diagnosis is to deduce locations of faults that had occurred in the circuit. The fault diagnosis is executed in software before the failure analysis by which engineers inspect physical defects, and helps to improve the manufacturing process which yielded faulty circuits. The heart of the fault diagnosis is to distinguish between candidate faults by using test patterns, which are applied to the circuit-under-diagnosis (CUD), and thus test patterns that can distinguish as many faults as possible need to be generated. This fact motivates us to consider the test pattern measure based on the number of fault-pairs that become distinguished by a test pattern. To the best of the authors' knowledge, that measure requires the computational time of complexity order O(NF2), where NF denotes the number of candidate faults. Since NF is generally large for real industrial circuits, the computational time of the measure is long even when a high-performance computer is used. The formulation proposed in this paper makes it possible to calculate the measure in the computational complexity of O(NF log NF), and thus that measure is useful for the test pattern selection in the fault diagnosis. In computational experiments, the effectiveness of the formulation is demonstrated as samples of computational times of the measure calculated by the traditional and the proposed formulae and thorough comparisons between several greedy heuristics which are based on the measure.
Yuta NAKAHARA Toshiyasu MATSUSHIMA
A spatially “Mt. Fuji” coupled (SFC) low-density parity-check (LDPC) ensemble is a modified version of the spatially coupled (SC) LDPC ensemble. Its decoding error probability in the waterfall region has been studied only in an experimental manner. In this paper, we theoretically analyze it over the binary erasure channel by modifying the expected graph evolution (EGE) and covariance evolution (CE) that have been used to analyze the original SC-LDPC ensemble. In particular, we derive the initial condition modified for the SFC-LDPC ensemble. Then, unlike the SC-LDPC ensemble, the SFC-LDPC ensemble has a local minimum on the solution of the EGE and CE. Considering the property of it, we theoretically expect the waterfall curve of the SFC-LDPC ensemble is steeper than that of the SC-LDPC ensemble. In addition, we also confirm it by numerical experiments.
Jinu GONG Hoojin LEE Rumin YANG Joonhyuk KANG
Two-ray (TR) fading model is one of the fading models to represent a worst-case fading scenario. We derive the exact closed-form expressions of the generalized moment generating function (G-MGF) for the TR fading model, which enables us to analyze the numerous types of wireless communication applications. Among them, we carry out several analytical results for the TR fading model, including the exact ergodic capacity along with asymptotic expressions and energy detection performance. Finally, we provide numerical results to validate our evaluations.
Tsugumichi SHIBATA Yoshito KATO
Capacitive coupling of line coded and DC-balanced digital signals is often used to eliminate steady bias current flow between the systems or components in various communication systems. A multi-layer ceramic chip capacitor is promising for the capacitor of very broadband signal coupling because of its high frequency characteristics expected from the downsizing of the chip recent years. The lower limit of the coupling bandwidth is determined by the capacitance while the higher limit is affected by the parasitic inductance associated with the chip structure. In this paper, we investigate the coupling characteristics up to millimeter wave frequencies by the measurement and simulations. A phenomenon has been found in which the change in the current distribution in the chip structure occur at high frequencies and the coupling characteristics are improved compared to the prediction based on the conventional equivalent circuit model. A new equivalent circuit model of chip capacitor that can express the effect of the improvement has been proposed.
Yuta MATSUMOTO Ken MISHINA Daisuke HISANO Akihiro MARUTA
In inter-data center networks where high transmission capacity and spectral efficiency are required, a 16QAM format is deployed. On the other hand, in intra-data center networks, a PAM4 format is deployed to meet the demand for a simple and low-cost transceiver configuration. For a seamless and effective connection of such heterogeneous networks without using optical-electrical-optical conversion, an all-optical modulation format conversion technique is required. In this paper, we propose an all-optical PAM4 to 16QAM modulation format conversion using nonlinear optical loop mirror. The successful conversion operation from 2 × 26.6-Gbaud PAM4 signals to a 100-Gbps class 16QAM signal is verified by numerical simulation. Compared with an ideal 16QAM signal, the power penalty of the converted 16QAM signal can be kept within 0.51dB.
Ryo SHIBATA Gou HOSOYA Hiroyuki YASHIMA
We propose a coding/decoding strategy that surpasses the symmetric information rate of a binary insertion/deletion (ID) channel and approaches the Markov capacity of the channel. The proposed codes comprise inner trellis codes and outer irregular low-density parity-check (LDPC) codes. The trellis codes are designed to mimic the transition probabilities of a Markov input process that achieves a high information rate, whereas the LDPC codes are designed to maximize an iterative decoding threshold in the superchannel (concatenation of the ID channels and trellis codes).
For low-density parity-check (LDPC) codes, the penalized decoding method based on the alternating direction method of multipliers (ADMM) can improve the decoding performance at low signal-to-noise ratios and also has low decoding complexity. There are three effective methods that could increase the ADMM penalized decoding speed, which are reducing the number of Euclidean projections in ADMM penalized decoding, designing an effective penalty function and selecting an appropriate layered scheduling strategy for message transmission. In order to further increase the ADMM penalized decoding speed, through reducing the number of Euclidean projections and using the vertical layered scheduling strategy, this paper designs a fast converging ADMM penalized decoding method based on the improved penalty function. Simulation results show that the proposed method not only improves the decoding performance but also reduces the average number of iterations and the average decoding time.
Liping ZHANG Zongqing LU Qingmin LIAO
This paper proposes a new and effective convolutional neural network model termed OFR-Net for optical flow refinement. The OFR-Net exploits the spatial correlation between images and optical flow fields. It adopts a pyramidal codec structure with residual connections, dense connections and skip connections within and between the encoder and decoder, to comprehensively fuse features of different scales, locally and globally. We also introduce a warp loss to restrict large displacement refinement errors. A series of experiments on the FlyingChairs and MPI Sintel datasets show that the OFR-Net can effectively refine the optical flow predicted by various methods.
Daichi FURUBAYASHI Yuta KASHIWAGI Takanori SATO Tadashi KAWAI Akira ENOKIHARA Naokatsu YAMAMOTO Tetsuya KAWANISHI
A new structure of the electro-optic modulator to compensate the third-order intermodulation distortion (IMD3) is introduced. The modulator includes two Mach-Zehnder modulators (MZMs) operating with frequency chirp and the two modulated outputs are combined with an adequate phase difference. We revealed by theoretical analysis and numerical calculations that the IMD3 components in the receiver output could be selectively suppressed when the two MZMs operate with chirp parameters of opposite signs to each other. Spectral power of the IMD3 components in the proposed modulator was more than 15dB lower than that in a normal Mach-Zehnder modulator at modulation index between 0.15π and 0.25π rad. The IMD3 compensation properties of the proposed modulator was experimentally confirmed by using a dual parallel Mach-Zehnder modulator (DPMZM) structure. We designed and fabricated the modulator with the single-chip structure and the single-input operation by integrating with 180° hybrid coupler on the modulator substrate. Modulation signals were applied to each modulation electrode by the 180° hybrid coupler to set the chirp parameters of two MZMs of the DPMZM. The properties of the fabricated modulator were measured by using 10GHz two-tone signals. The performance of the IMD3 compensation agreed with that in the calculation. It was confirmed that the IMD3 compensation could be realized even by the fabricated modulator structure.