Peiqi ZHANG Shinya TAKAMAEDA-YAMAZAKI
Binary Neural Networks (BNN) have binarized neuron and connection values so that their accelerators can be realized by extremely efficient hardware. However, there is a significant accuracy gap between BNNs and networks with wider bit-width. Conventional BNNs binarize feature maps by static globally-unified thresholds, which makes the produced bipolar image lose local details. This paper proposes a multi-input activation function to enable adaptive thresholding for binarizing feature maps: (a) At the algorithm level, instead of operating each input pixel independently, adaptive thresholding dynamically changes the threshold according to surrounding pixels of the target pixel. When optimizing weights, adaptive thresholding is equivalent to an accompanied depth-wise convolution between normal convolution and binarization. Accompanied weights in the depth-wise filters are ternarized and optimized end-to-end. (b) At the hardware level, adaptive thresholding is realized through a multi-input activation function, which is compatible with common accelerator architectures. Compact activation hardware with only one extra accumulator is devised. By equipping the proposed method on FPGA, 4.1% accuracy improvement is achieved on the original BNN with only 1.1% extra LUT resource. Compared with State-of-the-art methods, the proposed idea further increases network accuracy by 0.8% on the Cifar-10 dataset and 0.4% on the ImageNet dataset.
Siyi HU Makiko ITO Takahide YOSHIKAWA Yuan HE Hiroshi NAKAMURA Masaaki KONDO
Widely adopted by machine learning and graph processing applications nowadays, sparse matrix-Vector multiplication (SpMV) is a very popular algorithm in linear algebra. This is especially the case for fully-connected MLP layers, which dominate many SpMV computations and play a substantial role in diverse services. As a consequence, a large fraction of data center cycles is spent on SpMV kernels. Meanwhile, despite having efficient storage options against sparsity (such as CSR or CSC), SpMV kernels still suffer from the problem of limited memory bandwidth during data transferring because of the memory hierarchy of modern computing systems. In more detail, we find that both integer and floating-point data used in SpMV kernels are handled plainly without any necessary pre-processing. Therefore, we believe bandwidth conservation techniques, such as data compression, may dramatically help SpMV kernels when data is transferred between the main memory and the Last Level Cache (LLC). Furthermore, we also observe that convergence conditions in some typical scientific computation benchmarks (based on SpMV kernels) will not be degraded when adopting lower precision floating-point data. Based on these findings, in this work, we propose a simple yet effective data compression scheme that can be extended to general purpose computing architectures or HPC systems preferably. When it is adopted, a best-case speedup of 1.92x is made. Besides, evaluations with both the CG kernel and the PageRank algorithm indicate that our proposal introduces negligible overhead on both the convergence speed and the accuracy of final results.
Ha DAO Quoc-Huy VO Tien-Huy PHAM Kensuke FUKUDA
Universities collect and process a massive amount of Personal Identifiable Information (PII) at registration and throughout interactions with individuals. However, student PII can be exposed to the public by uploading documents along with university notice without consent and awareness, which could put individuals at risk of a variety of different scams, such as identity theft, fraud, or phishing. In this paper, we perform an in-depth analysis of student PII leakage at Vietnamese universities. To the best of our knowledge, we are the first to conduct a comprehensive study on student PII leakage in higher educational institutions. We find that 52.8% of Vietnamese universities leak student PII, including one or more types of personal data, in documents on their websites. It is important to note that the compromised PII includes sensitive types of data, student medical record and religion. Also, student PII leakage is not a new phenomenon and it has happened year after year since 2005. Finally, we present a study with 23 Vietnamese university employees who have worked on student PII to get a deeper understanding of this situation and envisage concrete solutions. The results are entirely surprising: the employees are highly aware of the concept of student PII. However, student PII leakage still happens due to their working habits or the lack of a management system and regulation. Therefore, the Vietnamese university should take a more active stand to protect student data in this situation.
The aim of a computer-aided drawing therapy system in this work is to associate drawings which a client makes with the client's mental state in quantitative terms. A case study is conducted on experimental data which contain both pastel drawings and mental state scores obtained from the same client in a psychotherapy program. To perform such association through colors, we translate a drawing to a color feature by measuring its representative colors as primary color rates. A primary color rate of a color is defined from a psychological primary color in a way such that it shows a rate of emotional properties of the psychological primary color which is supposed to affect the color. To obtain several informative colors as representative ones of a drawing, we define two kinds of color: approximate colors extracted by color reduction, and area-averaged colors calculated from the approximate colors. A color analysis method for extracting representative colors from each drawing in a drawing sequence under the same conditions is presented. To estimate how closely a color feature is associated with a concurrent mental state, we propose a method of utilizing machine-learning classification. A practical way of building a classification model through training and validation on a very small dataset is presented. The classification accuracy reached by the model is considered as the degree of association of the color feature with the mental state scores given in the dataset. Experiments were carried out on given clinical data. Several kinds of color feature were compared in terms of the association with the same mental state. As a result, we found out a good color feature with the highest degree of association. Also, primary color rates proved more effective in representing colors in psychological terms than RGB components. The experimentals provide evidence that colors can be associated quantitatively with states of human mind.
Takefumi KAWAKAMI Takanori IDE Kunihito HOKI Masakazu MURAMATSU
In this paper, we apply two methods in machine learning, dropout and semi-supervised learning, to a recently proposed method called CSQ-SDL which uses deep neural networks for evaluating shift quality from time-series measurement data. When developing a new Automatic Transmission (AT), calibration takes place where many parameters of the AT are adjusted to realize pleasant driving experience in all situations that occur on all roads around the world. Calibration requires an expert to visually assess the shift quality from the time-series measurement data of the experiments each time the parameters are changed, which is iterative and time-consuming. The CSQ-SDL was developed to shorten time consumed by the visual assessment, and its effectiveness depends on acquiring a sufficient number of data points. In practice, however, data amounts are often insufficient. The methods proposed here can handle such cases. For the cases wherein only a small number of labeled data points is available, we propose a method that uses dropout. For those cases wherein the number of labeled data points is small but the number of unlabeled data is sufficient, we propose a method that uses semi-supervised learning. Experiments show that while the former gives moderate improvement, the latter offers a significant performance improvement.
Jianbo WANG Haozhi HUANG Li SHEN Xuan WANG Toshihiko YAMASAKI
The image-to-image translation aims to learn a mapping between the source and target domains. For improving visual quality, the majority of previous works adopt multi-stage techniques to refine coarse results in a progressive manner. In this work, we present a novel approach for generating plausible details by only introducing a group of intermediate supervisions without cascading multiple stages. Specifically, we propose a Laplacian Pyramid Transformation Generative Adversarial Network (LapTransGAN) to simultaneously transform components in different frequencies from the source domain to the target domain within only one stage. Hierarchical perceptual and gradient penalization are utilized for learning consistent semantic structures and details at each pyramid level. The proposed model is evaluated based on various metrics, including the similarity in feature maps, reconstruction quality, segmentation accuracy, similarity in details, and qualitative appearances. Our experiments show that LapTransGAN can achieve a much better quantitative performance than both the supervised pix2pix model and the unsupervised CycleGAN model. Comprehensive ablation experiments are conducted to study the contribution of each component.
Chee Siang LEOW Hideaki YAJIMA Tomoki KITAGAWA Hiromitsu NISHIZAKI
Text detection is a crucial pre-processing step in optical character recognition (OCR) for the accurate recognition of text, including both fonts and handwritten characters, in documents. While current deep learning-based text detection tools can detect text regions with high accuracy, they often treat multiple lines of text as a single region. To perform line-based character recognition, it is necessary to divide the text into individual lines, which requires a line detection technique. This paper focuses on the development of a new approach to single-line detection in OCR that is based on the existing Character Region Awareness For Text detection (CRAFT) model and incorporates a deep neural network specialized in line segmentation. However, this new method may still detect multiple lines as a single text region when multi-line text with narrow spacing is present. To address this, we also introduce a post-processing algorithm to detect single text regions using the output of the single-line segmentation. Our proposed method successfully detects single lines, even in multi-line text with narrow line spacing, and hence improves the accuracy of OCR.
Recent studies have shown that concurrent transmission with precise time synchronization enables reliable and efficient flooding for wireless networks. However, most of them require all nodes in the network to forward packets a fixed number of times to reach the destination, which leads to unnecessary energy consumption in both one-to-one and many-to-one communication scenarios. In this letter, we propose G1M address this issue by reducing redundant packet forwarding in concurrent transmissions. The evaluation of G1M shows that compared with LWB, the average energy consumption of one-to-one and many-to-one transmission is reduced by 37.89% and 25%, respectively.
Atikur RAHMAN Nozomu KINJO Isao NAKANISHI
Person authentication using biometric information has recently become popular among researchers. User management based on biometrics is more reliable than that using conventional methods. To secure private information, it is necessary to build continuous authentication-based user management systems. Brain waves are suitable biometric modalities for continuous authentication. This study is based on biometric authentication using brain waves evoked by invisible visual stimuli. Invisible visual stimulation is considered over visual stimulation to overcome the obstacles faced by a user when using a system. Invisible stimuli are confirmed by changing the intensity of the image and presenting high-speed stimulation. To ensure invisibility, stimuli of different intensities were tested, and the stimuli with an intensity of 5% was confirmed to be invisible. To improve the verification performance, a continuous wavelet transform was introduced over the Fourier transform because it extracts both time and frequency information from the brain wave. The scalogram obtained by the wavelet transform was used as an individual feature and for synchronizing the template and test data. Furthermore, to improve the synchronization performance, the waveband was split based on the power distribution of the scalogram. A performance evaluation using 20 subjects showed an equal error rate of 3.8%.
Sei-ichiro KAMATA Tsunenori MINE
In 2014, the above paper entitled ‘Quasi-Linear Support Vector Machine for Nonlinear Classification’ was published by Zhou, et al. [1]. They proposed a quasi-linear kernel function for support vector machine (SVM). However, in this letter, we point out that this proposed kernel function is a part of multiple kernel functions generated by well-known multiple kernel learning which is proposed by Bach, et al. [2] in 2004. Since then, there have been a lot of related papers on multiple kernel learning with several applications [3]. This letter verifies that the main kernel function proposed by Zhou, et al. [1] can be derived using multiple kernel learning algorithms [3]. In the kernel construction, Zhou, et al. [1] used Gaussian kernels, but the multiple kernel learning had already discussed the locality of additive Gaussian kernels or other kernels in the framework [4], [5]. Especially additive Gaussian or other kernels were discussed in tutorial at major international conference ECCV2012 [6]. The authors did not discuss these matters.
Fuma SAWA Yoshinori KAMIZONO Wataru KOBAYASHI Ittetsu TANIGUCHI Hiroki NISHIKAWA Takao ONOYE
Advanced driver-assistance systems (ADAS) generally play an important role to support safe drive by detecting potential risk factors beforehand and informing the driver of them. However, if too many services in ADAS rely on visual-based technologies, the driver becomes increasingly burdened and exhausted especially on their eyes. The drivers should be back out of monitoring tasks other than significantly important ones in order to alleviate the burden of the driver as long as possible. In-vehicle auditory signals to assist the safe drive have been appealing as another approach to altering visual suggestions in recent years. In this paper, we developed an in-vehicle auditory signals evaluation platform in an existing driving simulator. In addition, using in-vehicle auditory signals, we have demonstrated that our developed platform has highlighted the possibility to partially switch from only visual-based tasks to mixing with auditory-based ones for alleviating the burden on drivers.
Shinji FUKUMA Yoshiro IWAI Shin-ichiro MORI
We propose a fine structure imaging for the surface and its inside of solid material such as coated drill bits with TiN (Titanium Nitride). We call this method i-MSE (innovative MSE) since the fine structure is visualized with a local mechanical strength (the local erosion rate) which is obtained from a set of erosion depth profiles measured with Micro Slurry-jet Erosion test (MSE). The local erosion rate at any sampling point is estimated from the depth profile using a sliding window regression and for the rest of the 2-dimensional points it is interpolated with the mean value coordinate technique. The interpolated rate is converted to a 2D image (i-MSE image) with a color map. The i-MSE image can distinguish layers if the testing material surface is composed of coats which have different resistance to erosion (erosive wear), while microscopic image such as SEM (Scanning Electron Microscope) and a calotest just provides appearance information, not physical characteristics. Experiments for some layered specimens show that i-MSE can be an effective tool to visualize the structure and to evaluate the mechanical characteristics for the surface and the inside of solid material.
Gouki OKADA Makoto NAKASHIZUKA
This paper presents a deep network based on unrolling the diffusion process with the morphological Laplacian. The diffusion process is an iterative algorithm that can solve the diffusion equation and represents time evolution with Laplacian. The diffusion process is applied to smoothing of images and has been extended with non-linear operators for various image processing tasks. In this study, we introduce the morphological Laplacian to the basic diffusion process and unwrap to deep networks. The morphological filters are non-linear operators with parameters that are referred to as structuring elements. The discrete Laplacian can be approximated with the morphological filters without multiplications. Owing to the non-linearity of the morphological filter with trainable structuring elements, the training uses error back propagation and the network of the morphology can be adapted to specific image processing applications. We introduce two extensions of the morphological Laplacian for deep networks. Since the morphological filters are realized with addition, max, and min, the error caused by the limited bit-length is not amplified. Consequently, the morphological parts of the network are implemented in unsigned 8-bit integer with single instruction multiple data set (SIMD) to achieve fast computation on small devices. We applied the proposed network to image completion and Gaussian denoising. The results and computational time are compared with other denoising algorithm and deep networks.
Hojun SHIMOYAMA Soh YOSHIDA Takao FUJITA Mitsuji MUNEYASU
Recent character detectors have been modeled using deep neural networks and have achieved high performance in various tasks, such as text detection in natural scenes and character detection in historical documents. However, existing methods cannot achieve high detection accuracy for wooden slips because of their multi-scale character sizes and aspect ratios, high character density, and close character-to-character distance. In this study, we propose a new U-Net-based character detection and localization framework that learns character regions and boundaries between characters. The proposed method enhances the learning performance of character regions by simultaneously learning the vertical and horizontal boundaries between characters. Furthermore, by adding simple and low-cost post-processing using the learned regions of character boundaries, it is possible to more accurately detect the location of a group of characters in a close neighborhood. In this study, we construct a wooden slip dataset. Experiments demonstrated that the proposed method outperformed existing character detection methods, including state-of-the-art character detection methods for historical documents.
For massive multiple-input multiple-output (MIMO) communication systems, simple linear detectors such as zero forcing (ZF) and minimum mean square error (MMSE) can achieve near-optimal detection performance with reduced computational complexity. However, such linear detectors always involve complicated matrix inversion, which will suffer from high computational overhead in the practical implementation. Due to the massive parallel-processing and efficient hardware-implementation nature, the neural network has become a promising approach to signal processing for the future wireless communications. In this paper, we first propose an efficient neural network to calculate the pseudo-inverses for any type of matrices based on the improved Newton's method, termed as the PINN. Through detailed analysis and derivation, the linear massive MIMO detectors are mapped on PINNs, which can take full advantage of the research achievements of neural networks in both algorithms and hardwares. Furthermore, an improved limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) quasi-Newton method is studied as the learning algorithm of PINNs to achieve a better performance/complexity trade-off. Simulation results finally validate the efficiency of the proposed scheme.
Canonical decomposition for bipartite graphs, which was introduced by Fouquet, Giakoumakis, and Vanherpe (1999), is a decomposition scheme for bipartite graphs associated with modular decomposition. Weak-bisplit graphs are bipartite graphs totally decomposable (i.e., reducible to single vertices) by canonical decomposition. Canonical decomposition comprises series, parallel, and K+S decomposition. This paper studies a decomposition scheme comprising only parallel and K+S decomposition. We show that bipartite graphs totally decomposable by this decomposition are precisely P6-free chordal bipartite graphs. This characterization indicates that P6-free chordal bipartite graphs can be recognized in linear time using the recognition algorithm for weak-bisplit graphs presented by Giakoumakis and Vanherpe (2003).
We thank Kamata et al. (2023) [1] for their interest in our work [2], and for providing an explanation of the quasi-linear kernel from a viewpoint of multiple kernel learning. In this letter, we first give a summary of the quasi-linear SVM. Then we provide a discussion on the novelty of quasi-linear kernels against multiple kernel learning. Finally, we explain the contributions of our work [2].
Various optical fiber connectors have been developed during the 40 years since optical fiber communications systems were first put into practical use. This paper describes the key technologies for optical connectors and recent technical issues.
Masato YOSHIDA Kosuke KIMURA Toshihiko HIROOKA Keisuke KASAI Masataka NAKAZAWA
We compare the demodulation performance of an analog OTDM demultiplexing scheme and digitized OTDM demultiplexing with an ultrahigh-speed digital signal processor in a single-channel OTDM coherent Nyquist pulse transmission. We evaluated the demodulation performance for 40, 80, and 160Gbaud OTDM signals with a baseline rate of 10Gbaud. As a result, we clarified that the analog scheme performs significantly better since the bandwidth for handling the demultiplexed signal is as narrow as 10GHz regardless of the symbol rate. This enables us to use a low-speed A/D converter (ADC) with a large effective number of bits (ENOB). On the other hand, in the digital scheme, the higher the symbol rate becomes, the more bandwidth the receiver requires. Therefore, it is necessary to use an ultrahigh-speed ADC with a low ENOB for a 160Gbaud signal. We measured the ENOB of the ultrahigh-speed ADC used in the digital scheme and showed that the measured ENOB was approximately 1.5 bits lower than that of the low-speed ADC used in the analog scheme. This 1.5-bit decrease causes a large degradation in the demodulation performance obtained with the digital demultiplexing scheme.
Yuichiro NISHIKAWA Shota NISHIJIMA Akira HIRANO
We have proposed autonomous network diagnosis platform for operation of future large capacity and virtualized network, including 5G and beyond 5G services. As for the one candidate of information collection and analyzing function blocks in the platform, we proposed novel optical sensing techniques that utilized tapped raw signal data acquired from digital coherent optical receivers. The raw signal data is captured before various digital signal processing for demodulation. Therefore, it contains various waveform deformation and/or noise as it experiences through transmission fibers. In this paper, we examined to detect two possible failures in transmission lines including fiber bending and optical filter shift by analyzing the above-mentioned raw signal data with the help of machine learning. For the purpose, we have implemented Docker container applications in WhiteBox Cassini to acquire real-time raw signal data. We generated CNN model for the detections in off-line processing and used them for real-time detections. We have confirmed successful detection of optical fiber bend and/or optical filter shift in real-time with high accuracy. Also, we evaluated their tolerance against ASE noise and invented novel approach to improve detection accuracy. In addition to that, we succeeded to detect them even in the situation of simultaneous occurrence of those failures.