Tomoya ITSUBO Michihiro KOIBUCHI Hideharu AMANO Hiroki MATSUTANI
Since deep learning workloads perform a large number of matrix operations on training data, GPUs (Graphics Processing Units) are efficient especially for the training phase. A cluster of computers each of which equips multiple GPUs can significantly accelerate the deep learning workloads. More specifically, a back-propagation algorithm following a gradient descent approach is used for the training. Although the gradient computation is still a major bottleneck of the training, gradient aggregation and optimization impose both communication and computation overheads, which should also be reduced for further shortening the training time. To address this issue, in this paper, multiple GPUs are interconnected with a PCI Express (PCIe) over 10Gbit Ethernet (10GbE) technology. Since these remote GPUs are interconnected with network switches, gradient aggregation and optimizers (e.g., SGD, AdaGrad, Adam, and SMORMS3) are offloaded to FPGA-based 10GbE switches between remote GPUs; thus, the gradient aggregation and parameter optimization are completed in the network. The proposed FPGA-based 10GbE switches with the four optimizers are implemented on NetFPGA-SUME board. Their resource utilizations are increased by PEs for the optimizers, and they consume up to 56% of the resources. Evaluation results using four remote GPUs connected via the proposed FPGA-based switch demonstrate that these optimizers are accelerated by up to 3.0x and 1.25x compared to CPU and GPU implementations, respectively. Also, the gradient aggregation throughput by the FPGA-based switch achieves up to 98.3% of the 10GbE line rate.
Miho YAMAKURA Ryousei TAKANO Akram BEN AHMED Midori SUGAYA Hideharu AMANO
FPGA (Field Programmable Gate Array) based accelerators are attracting significant interest in cloud computing systems. Combining multi-FPGA systems with cloud computing brings a new perspective to the reconfigurable computing research. However, the multi-tenancy of a multi-FPGA system has not been fully discussed in the previous researches. In this paper, we propose a multi-tenant resource management system, named FiC-RM, for a multi-FPGA cloud system. FiC-RM provides users with a set of FPGA resources according to their requirements and allows them to exclusively access FPGA boards and the interconnection network. To achieve this, we propose a placement algorithm which is a key to efficiently share the limited resources. We demonstrate FiC-RM controls a practical scale multi-FPGA system. Moreover, Our simulation study shows that our placement algorithm achieved 3 to 4% improvement in the average resource usage and a 20-second reduction in the response time, compared to other existing naive algorithms.
Hybrid storage techniques are useful methods to improve the cost performance for input-output (IO) intensive workloads. These techniques choose areas of concentrated IO accesses and migrate them to an upper tier to extract as much performance as possible through greater use of upper tier areas. Automated tiered storage with fast memory and slow flash storage (ATSMF) is a hybrid storage system situated between non-volatile memories (NVMs) and solid-state drives (SSDs). ATSMF aims to reduce the average response time for IO accesses by migrating areas of concentrated IO access from an SSD to an NVM. When a concentrated IO access finishes, the system migrates these areas from the NVM back to the SSD. Unfortunately, the published ATSMF implementation temporarily consumes much NVM capacity upon migrating concentrated IO access areas to NVM, because its algorithm executes NVM migration with high priority. As a result, it often delays evicting areas in which IO concentrations have ended to the SSD. Therefore, to reduce the consumption of NVM while maintaining the average response time, we developed new techniques for making ATSMF more practical. The first is a queue handling technique based on the number of IO accesses for NVM migration and eviction. The second is an eviction method that selects only write-accessed partial regions in finished areas. The third is a technique for variable eviction timing to balance the NVM consumption and average response time. Experimental results indicate that the average response times of the proposed ATSMF are almost the same as those of the published ATSMF, while the NVM consumption is three times lower in best case.
Yuki KAJIWARA Junjun ZHENG Koichi MOURI
The number of malware, including variants and new types, is dramatically increasing over the years, posing one of the greatest cybersecurity threats nowadays. To counteract such security threats, it is crucial to detect malware accurately and early enough. The recent advances in machine learning technology have brought increasing interest in malware detection. A number of research studies have been conducted in the field. It is well known that malware detection accuracy largely depends on the training dataset used. Creating a suitable training dataset for efficient malware detection is thus crucial. Different works usually use their own dataset; therefore, a dataset is only effective for one detection method, and strictly comparing several methods using a common training dataset is difficult. In this paper, we focus on how to create a training dataset for efficiently detecting malware. To achieve our goal, the first step is to clarify the information that can accurately characterize malware. This paper concentrates on threads, by treating them as important information for characterizing malware. Specifically, on the basis of the dynamic analysis log from the Alkanet, a system call tracer, we obtain the thread information and classify the thread information processing into four patterns. Then the malware detection is performed using the number of transitions of system calls appearing in the thread as a feature. Our comparative experimental results showed that the primary thread information is important and useful for detecting malware with high accuracy.
Sashi NOVITASARI Sakriani SAKTI Satoshi NAKAMURA
Real-time machine speech translation systems mimic human interpreters and translate incoming speech from a source language to the target language in real-time. Such systems can be achieved by performing low-latency processing in ASR (automatic speech recognition) module before passing the output to MT (machine translation) and TTS (text-to-speech synthesis) modules. Although several studies recently proposed sequence mechanisms for neural incremental ASR (ISR), these frameworks have a more complicated training mechanism than the standard attention-based ASR because they have to decide the incremental step and learn the alignment between speech and text. In this paper, we propose attention-transfer ISR (AT-ISR) that learns the knowledge from attention-based non-incremental ASR for a low delay end-to-end speech recognition. ISR comes with a trade-off between delay and performance, so we investigate how to reduce AT-ISR delay without a significant performance drop. Our experiment shows that AT-ISR achieves a comparable performance to the non-incremental ASR when the incremental recognition begins after the speech utterance reaches 25% of the complete utterance length. Additional experiments to investigate the effect of ISR on translation tasks are also performed. The focus is to find the optimum granularity of the output unit. The results reveal that our end-to-end subword-level ISR resulted in the best translation quality with the lowest WER and the lowest uncovered-word rate.
Kimiko MOTONAKA Tomoya KOSEKI Yoshinobu KAJIKAWA Seiji MIYOSHI
The Volterra filter is one of the digital filters that can describe nonlinearity. In this paper, we analyze the dynamic behaviors of an adaptive signal-processing system including the Volterra filter by a statistical-mechanical method. On the basis of the self-averaging property that holds when the tapped delay line is assumed to be infinitely long, we derive simultaneous differential equations in a deterministic and closed form, which describe the behaviors of macroscopic variables. We obtain the exact solution by solving the equations analytically. In addition, the validity of the theory derived is confirmed by comparison with numerical simulations.
Zhiyuan JIANG Yijie HUANG Shunqing ZHANG Shugong XU
In a heterogeneous unreliable multiaccess network, wherein terminals share a common wireless channel with distinct error probabilities, existing works have shown that a persistent round-robin (RR-P) scheduling policy can be arbitrarily worse than the optimum in terms of Age of Information (AoI) under standard Automatic Repeat reQuest (ARQ). In this paper, practical Hybrid ARQ (HARQ) schemes which are widely-used in today's wireless networks are considered. We show that RR-P is very close to optimum with asymptotically many terminals in this case, by explicitly deriving tight, closed-form AoI gaps between optimum and achievable AoI by RR-P. In particular, it is rigorously proved that for RR-P, under HARQ models concerning fading channels (resp. finite-blocklength regime), the relative AoI gap compared with the optimum is within a constant of 6.4% (resp. 6.2% with error exponential decay rate of 0.5). In addition, RR-P enjoys the distinctive advantage of implementation simplicity with channel-unaware and easy-to-decentralize operations, making it favorable in practice. A further investigation considering constraint imposed on the number of retransmissions is presented. The performance gap is indicated through numerical simulations.
Tetsuya MANABE Koichi AIHARA Naoki KOJIMA Yusuke HIRAYAMA Taichi SUZUKI
This paper indicates a design methodology of Wi-Fi round-trip time (RTT) ranging for lateration through the performance evaluation experiments. The Wi-Fi RTT-based lateration needs to operate plural access points (APs) at the same time. However, the relationship between the number of APs in operation and ranging performance has not been clarified in the conventional researches. Then, we evaluate the ranging performance of Wi-Fi RTT for lateration focusing on the number of APs and channel-usage conditions. As the results, we confirm that the ranging result acquisition rates decreases caused by increasing the number of APs simultaneously operated and/or increasing the channel-usage rates. In addition, based on positioning performance comparison between the Wi-Fi RTT-based lateration and the Wi-Fi fingerprint method, we clarify the points of notice that positioning by Wi-Fi RTT-based lateration differs from the conventional radio-intensity-based positioning. Consequently, we show a design methodology of Wi-Fi RTT ranging for lateration as the following three points: the important indicators for evaluation, the severeness of the channel selection, and the number of APs for using. The design methodology will help to realize the high-quality location-based services.
Ruicong ZHI Caixia ZHOU Junwei YU Tingting LI Ghada ZAMZMI
Pain is an essential physiological phenomenon of human beings. Accurate assessment of pain is important to develop proper treatment. Although self-report method is the gold standard in pain assessment, it is not applicable to individuals with communicative impairment. Non-verbal pain indicators such as pain related facial expressions and changes in physiological parameters could provide valuable insights for pain assessment. In this paper, we propose a multimodal-based Stream Integrated Neural Network with Different Frame Rates (SINN) that combines facial expression and biomedical signals for automatic pain assessment. The main contributions of this research are threefold. (1) There are four-stream inputs of the SINN for facial expression feature extraction. The variant facial features are integrated with biomedical features, and the joint features are utilized for pain assessment. (2) The dynamic facial features are learned in both implicit and explicit manners to better represent the facial changes that occur during pain experience. (3) Multiple modalities are utilized to identify various pain states, including facial expression and biomedical signals. The experiments are conducted on publicly available pain datasets, and the performance is compared with several deep learning models. The experimental results illustrate the superiority of the proposed model, and it achieves the highest accuracy of 68.2%, which is up to 5% higher than the basic deep learning models on pain assessment with binary classification.
Akira JINGUJI Shimpei SATO Hiroki NAKAHARA
Convolutional neural network (CNN) has a high recognition rate in image recognition and are used in embedded systems such as smartphones, robots and self-driving cars. Low-end FPGAs are candidates for embedded image recognition platforms because they achieve real-time performance at a low cost. However, CNN has significant parameters called weights and internal data called feature maps, which pose a challenge for FPGAs for performance and memory capacity. To solve these problems, we exploit a split-CNN and weight sparseness. The split-CNN reduces the memory footprint by splitting the feature map into smaller patches and allows the feature map to be stored in the FPGA's high-throughput on-chip memory. Weight sparseness reduces computational costs and achieves even higher performance. We designed a dedicated architecture of a sparse CNN and a memory buffering scheduling for a split-CNN and implemented this on the PYNQ-Z1 FPGA board with a low-end FPGA. An experiment on classification using VGG16 shows that our implementation is 3.1 times faster than the GPU, and 5.4 times faster than an existing FPGA implementation.
Hongcui WANG Pierre ROUSSEL Bruce DENBY
A Silent Speech Interface (SSI) is a sensor-based, Artificial Intelligence (AI) enabled system in which articulation is performed without the use of the vocal chords, resulting in a voice interface that conserves the ambient audio environment, protects private data, and also functions in noisy environments. Though portable SSIs based on ultrasound imaging of the tongue have obtained Word Error Rates rivaling that of acoustic speech recognition, SSIs remain relegated to the laboratory due to stability issues. Indeed, reliable extraction of acoustic features from ultrasound tongue images in real-life situations has proven elusive. Recently, Representation Learning has shown considerable success in learning underlying structure in noisy, high-dimensional raw data. In its unsupervised form, Representation Learning is able to reveal structure in unlabeled data, thus greatly simplifying the data preparation task. In the present article, a 3D Convolutional Neural Network architecture is applied to unlabeled ultrasound images, and is shown to reliably predict future tongue configurations. By comparing the 3DCNN to a simple previous-frame predictor, it is possible to recognize tongue trajectories comprising transitions between regions of stability that correlate with formant trajectories in a spectrogram of the signal. Prospects for using the underlying structural representation to provide features for subsequent speech processing tasks are presented.
Lijun GAO Zhenyi BIAN Maode MA
DoS (Denial of Service) attacks are becoming one of the most serious security threats to global networks. We analyze the existing DoS detection methods and defense mechanisms in depth. In recent years, K-Means and improved variants have been widely examined for security intrusion detection, but the detection accuracy to data is not satisfactory. In this paper we propose a multi-dimensional space feature vector expansion K-Means model to detect threats in the network environment. The model uses a genetic algorithm to optimize the weight of K-Means multi-dimensional space feature vector, which greatly improves the detection rate against 6 typical Dos attacks. Furthermore, in order to verify the correctness of the model, this paper conducts a simulation on the NSL-KDD data set. The results show that the algorithm of multi-dimensional space feature vectors expansion K-Means improves the recognition accuracy to 96.88%. Furthermore, 41 kinds of feature vectors in NSL-KDD are analyzed in detail according to a large number of experimental training. The feature vector of the probability positive return of security attack detection is accurately extracted, and a comparison chart is formed to support subsequent research. A theoretical analysis and experimental results show that the multi-dimensional space feature vector expansion K-Means algorithm has a good application in the detection of DDos attacks.
Tomoko K. MATSUSHIMA Shoichiro YAMASAKI Kyohei ONO
This paper proposes a new class of signature codes for synchronous optical code-division multiple access (CDMA) and describes a general method for construction of the codes. The proposed codes can be obtained from generalized modified prime sequence codes (GMPSCs) based on extension fields GF(q), where q=pm, p is a prime number, and m is a positive integer. It has been reported that optical CDMA systems using GMPSCs remove not only multi-user interference but also optical interference (e.g., background light) with a constant intensity during a slot of length q2. Recently, the authors have reported that optical CDMA systems using GMPSCs also remove optical interference with intensity varying by blocks with a length of q. The proposed codes, referred to as p-chip codes in general and chip-pair codes in particular for the case of p=2, have the property of removing interference light with an intensity varying by shorter blocks with a length of p without requiring additional equipment. The present paper also investigates the algebraic properties and applications of the proposed codes.
You GAO Yun-Fei YAO Lin-Zhi SHEN
Permutation polynomials over finite fields have been widely studied due to their important applications in mathematics and cryptography. In recent years, 2-to-1 mappings over finite fields were proposed to build almost perfect nonlinear functions, bent functions, and the semi-bent functions. In this paper, we generalize the 2-to-1 mappings to m-to-1 mappings, including their construction methods. Some applications of m-to-1 mappings are also discussed.
Michiharu NAKAMURA Eisuke FUKUDA Yoshimasa DAIDO Keiichi MIZUTANI Takeshi MATSUMURA Hiroshi HARADA
Non-linear behavioral models play a key role in designing digital pre-distorters (DPDs) for non-linear power amplifiers (NLPAs). In general, more complex behavioral models have better capability, but they should be converted into simpler versions to assist implementation. In this paper, a conversion from a complex fifth order inverse of a parallel Wiener (PRW) model to a simpler memory polynomial (MP) model is developed by using frequency domain expressions. In the developed conversion, parameters of the converted MP model are calculated from those of original fifth order inverse and frequency domain statistics of the transmit signal. Since the frequency domain statistics of the transmit signal can be precalculated, the developed conversion is deterministic, unlike the conventional conversion that identifies a converted model from lengthy input and output data. Computer simulations are conducted to confirm that conversion error is sufficiently small and the converted MP model offers equivalent pre-distortion to the original fifth order inverse.
Kazuya MATSUBAYASHI Naobumi MICHISHITA Hisashi MORISHITA
The composite right/left-handed (CRLH) coaxial line (CL) with wideband electromagnetic band gap (EBG) is applied to the wideband choke structure for a monocone antenna with short elements, and the resulting characteristics are considered. In the proposed antenna, impedance matching and leakage current suppression can be achieved across a wideband off. The lowest frequency (|S11| ≤ -10dB) of the proposed antenna is about the same as that of the monocone antenna on an infinite ground plane. In addition, the radiation patterns of the proposed antenna are close to the figure of eight in wideband. The proposed antenna is prototyped, and the validity of the simulation is verified through measurement.
Kenya TAJIMA Yoshihiro HIROHASHI Esmeraldo ZARA Tsuyoshi KATO
The multi-category support vector machine (MC-SVM) is one of the most popular machine learning algorithms. There are numerous MC-SVM variants, although different optimization algorithms were developed for diverse learning machines. In this study, we developed a new optimization algorithm that can be applied to several MC-SVM variants. The algorithm is based on the Frank-Wolfe framework that requires two subproblems, direction-finding and line search, in each iteration. The contribution of this study is the discovery that both subproblems have a closed form solution if the Frank-Wolfe framework is applied to the dual problem. Additionally, the closed form solutions on both the direction-finding and line search exist even for the Moreau envelopes of the loss functions. We used several large datasets to demonstrate that the proposed optimization algorithm rapidly converges and thereby improves the pattern recognition performance.
Sooyong JEONG Sungdeok CHA Woo Jin LEE
Embedded software often interacts with multiple inputs from various sensors whose dependency is often complex or partially known to developers. With incomplete information on dependency, testing is likely to be insufficient in detecting errors. We propose a method to enhance testing coverage of embedded software by identifying subtle and often neglected dependencies using information contained in usage log. Usage log, traditionally used primarily for investigative purpose following accidents, can also make useful contribution during testing of embedded software. Our approach relies on first individually developing behavioral model for each environmental input, performing compositional analysis while identifying feasible but untested dependencies from usage log, and generating additional test cases that correspond to untested or insufficiently tested dependencies. Experimental evaluation was performed on an Android application named Gravity Screen as well as an Arduino-based wearable glove app. Whereas conventional CTM-based testing technique achieved average branch coverage of 26% and 68% on these applications, respectively, proposed technique achieved 100% coverage in both.
Maodudul HASAN Eisuke NISHIYAMA Ichihiko TOYODA
Herein, a novel self-oscillating active integrated array antenna (AIAA) is proposed for beam switching X-band applications. The proposed AIAA comprises four linearly polarized microstrip antenna elements, a Gunn oscillator, two planar magic-Ts, and two single-pole single-throw (SPST) switches. The in/anti-phase signal combination approach employing planar magic-Ts is adopted to attain bidirectional radiation patterns in the φ =90° plane with a simple structure. The proposed antenna can switch its beam using the SPST switches. The antenna is analyzed through simulations, and a prototype of the antenna is fabricated and tested to validate the concept. The proposed concept is found to be feasible; the prototype has an effective isotropic radiated power of +15.98dBm, radiated power level of +4.28dBm, and cross-polarization suppression of better than 15dB. The measured radiation patterns are in good agreement with the simulation results.
Makoto YASUKAWA Yasushi MAKIHARA Toshinori HOSOI Masahiro KUBO Yasushi YAGI
Human gait analysis has been widely used in medical and health fields. It is essential to extract spatio-temporal gait features (e.g., single support duration, step length, and toe angle) by partitioning the gait phase and estimating the footprint position/orientation in such fields. Therefore, we propose a method to partition the gait phase given a foot position sequence using mutually constrained piecewise linear approximation with dynamic programming, which not only represents normal gait well but also pathological gait without training data. We also propose a method to detect footprints by accumulating toe edges on the floor plane during stance phases, which enables us to detect footprints more clearly than a conventional method. Finally, we extract four spatial/temporal gait parameters for accuracy evaluation: single support duration, double support duration, toe angle, and step length. We conducted experiments to validate the proposed method using two types of gait patterns, that is, healthy and mimicked hemiplegic gait, from 10 subjects. We confirmed that the proposed method could estimate the spatial/temporal gait parameters more accurately than a conventional skeleton-based method regardless of the gait pattern.