The natural gradient descent is an optimization method for real-valued neural networks that was proposed from the viewpoint of information geometry. Here, we present an extension of the natural gradient descent to complex-valued neural networks. Our idea is to use the Hermitian extension of the Fisher information matrix. Moreover, we generalize the projected natural gradient (PRONG), which is a fast natural gradient descent algorithm, to complex-valued neural networks. We also consider the advantage of complex-valued neural networks over real-valued neural networks. A useful property of complex numbers in the complex plane is that the rotation is simply expressed by the multiplication. By focusing on this property, we construct the output function of complex-valued neural networks, which is invariant even if the input is changed to its rotated value. Then, our complex-valued neural network can learn rotated data without data augmentation. Finally, through simulation of online character recognition, we demonstrate the effectiveness of the proposed approach.
Saya OHIRA Naoki TSUCHIYA Tetsuya MATSUMURA
We propose a three-dimensional (3D) sound processor architecture that includes super-directional modulation intellectual property (IP) and 3D sound processing IP and for consumer applications. In addition, we also propose an automatic design environment for 3D sound processing IP. This processor can generate realistic small sound fields in arbitrary spaces using ultrasound. In particular, in the 3D sound processing IP, in order to reproduce 3D audio, it is necessary to reproduce the personal frequency characteristics of complex head related transfer functions. For this reason, we have constructed an automatic design environment with high reconfigurability. This automatic design environment is based on high-level synthesis, and it is possible to automatically generate a C-based algorithm simulator and automatically synthesize the IP hardware by inputting a parameter description file for filter design. This automatic design environment can reduce the design period to approximately 1/5 as compared with conventional manual design. Applying the automatic design environment, a 3D sound processing IP was designed experimentally. The designed IP can be sufficiently applied to consumer applications from the viewpoints of hardware amount and power consumption.
Xiao-Yi ZHAO Chao-Yi DONG Peng ZHOU Mei-Jia ZHU Jing-Wen REN Xiao-Yan CHEN
The paper employed an Alexnet, which is a deep learning framework, to automatically diagnose the damages of wind power generator blade surfaces. The original images of wind power generator blade surfaces were captured by machine visions of a 4-rotor UAV (unmanned aerial vehicle). Firstly, an 8-layer Alexnet, totally including 21 functional sub-layers, is constructed and parameterized. Secondly, the Alexnet was trained with 10000 images and then was tested by 6-turn 350 images. Finally, the statistic of network tests shows that the average accuracy of damage diagnosis by Alexnet is about 99.001%. We also trained and tested a traditional BP (Back Propagation) neural network, which have 20-neuron input layer, 5-neuron hidden layer, and 1-neuron output layer, with the same image data. The average accuracy of damage diagnosis of BP neural network is 19.424% lower than that of Alexnet. The point shows that it is feasible to apply the UAV image acquisition and the deep learning classifier to diagnose the damages of wind turbine blades in service automatically.
Guangna ZHANG Yuanyuan GAO Huadong LUO Nan SHA Mingxi GUO Kui XU
In this paper, we investigate a novel joint multi-relay and jammer selection (JMRJS) scheme in order to improve the physical layer security of wireless networks. In the JMRJS scheme, all the relays succeeding in source decoding are selected to assist in the source signal transmission and meanwhile, all the remaining relay nodes are employed to act as friendly jammers to disturb the eavesdroppers by broadcasting artificial noise. Based on the more general Nakagami-m fading channel, we analyze the security performance of the JMRJS scheme for protecting the source signal against eavesdropping. The exact closed-form expressions of outage probability (OP) and intercept probability (IP) for the JMRJS scheme over Nakagami-m fading channel are derived. Moreover, we analyze the security-reliability tradeoff (SRT) of this scheme. Simulation results show that as the number of decode-and-forward (DF)relay nodes increases, the SRT of the JMRJS scheme improves notably. And when the transmit power is below a certain value, the SRT of the JMRJS scheme consistently outperforms the joint single-relay and jammer selection (JSRJS) scheme and joint equal-relay and jammer selection (JERJS) scheme respectively. In addition, the SRT of this scheme is always better than that of the multi-relay selection (MRS) scheme.
Takahiro NISHIMURA Jacir Luiz BORDIM Yasuaki ITO Koji NAKANO
The bulk execution of a sequential algorithm is to execute it for many different inputs in turn or at the same time. It is known that the bulk execution of an oblivious sequential algorithm can be implemented to run efficiently on a GPU. The bulk execution supports fine grained bitwise parallelism, allowing it to achieve high acceleration over a straightforward sequential computation. The main contribution of this work is to present a Bitwise Parallel Bulk Computation (BPBC) to accelerate the Smith-Waterman Algorithm (SWA) using the affine gap penalty. Thus, our idea is to convert this computation into a circuit simulation using the BPBC technique to compute multiple instances simultaneously. The proposed BPBC technique for the SWA has been implemented on the GPU and CPU. Experimental results show that the proposed BPBC for the SWA accelerates the computation by over 646 times as compared to a single CPU implementation and by 6.9 times as compared to a multi-core CPU implementation with 160 threads.
Satoshi KINOSHITA Yoshinobu KAJIKAWA
Adaptive Volterra filters (AVFs) are usually used to identify nonlinear systems, such as loudspeaker systems, and ordinary adaptive algorithms can be used to update the filter coefficients of AVFs. However, AVFs require huge computational complexity even if the order of the AVF is constrained to the second order. Improving calculation efficiency is therefore an important issue for the real-time implementation of AVFs. In this paper, we propose a novel sub-band AVF with high calculation efficiency for second-order AVFs. The proposed sub-band AVF consists of four parts: input signal transformation for a single sub-band AVF, tap length determination to improve calculation efficiency, switching the number of sub-bands while maintaining the estimation accuracy, and an automatic search for an appropriate number of sub-bands. The proposed sub-band AVF can improve calculation efficiency for which the dominant nonlinear components are concentrated in any frequency band, such as loudspeakers. A simulation result demonstrates that the proposed sub-band AVF can realize higher estimation accuracy than conventional efficient AVFs.
Hand-dorsa vein recognition is solved based on the convolutional activations of the pre-trained deep convolutional neural network (DCNN). In specific, a novel task-specific cross-convolutional-layer pooling is proposed to obtain the more representative and discriminative feature representation. Rigorous experiments on the self-established database achieves the state-of-the-art recognition result, which demonstrates the effectiveness of the proposed model.
Junjie SUN Chenyi ZHUANG Qiang MA
A travel route recommendation service that recommends a sequence of points of interest for tourists traveling in an unfamiliar city is a very useful tool in the field of location-based social networks. Although there are many web services and mobile applications that can help tourists to plan their trips by providing information about sightseeing attractions, travel route recommendation services are still not widely applied. One reason could be that most of the previous studies that addressed this task were based on the orienteering problem model, which mainly focuses on the estimation of a user-location relation (for example, a user preference). This assumes that a user receives a reward by visiting a point of interest and the travel route is recommended by maximizing the total rewards from visiting those locations. However, a location-location relation, which we introduce as a transition pattern in this paper, implies useful information such as visiting order and can help to improve the quality of travel route recommendations. To this end, we propose a travel route recommendation method by combining location and transition knowledge, which assigns rewards for both locations and transitions.
Yuma KINOSHITA Kouki SEO Artit VISAVAKITCHAROEN Hitoshi KIYA
We propose a novel hue-preserving tone mapping scheme. Various tone mapping operations have been studied so far, but there are very few works on color distortion caused in image tone mapping. First, LDR images produced from HDR ones by using conventional tone mapping operators (TMOs) are pointed out to have some distortion in hue values due to clipping and rounding quantization processing. Next,we propose a novel method which allows LDR images to have the same maximally saturated color values as those of HDR ones. Generated LDR images by the proposed method have smaller hue degradation than LDR ones generated by conventional TMOs. Moreover, the proposed method is applicable to any TMOs. In an experiment, the proposed method is demonstrated not only to produce images with small hue degradation but also to maintain well-mapped luminance, in terms of three objective metrics: TMQI, hue value in CIEDE2000, and the maximally saturated color on the constant-hue plane in the RGB color space.
Hiroto TANAKA Shinsuke MATSUMOTO Shinji KUSUMOTO
Over the past recent decades, numerous programming languages have expanded to embrace multi-paradigms such as the fusion of object-oriented and functional programming. For example, Java, one of the most famous object-oriented programming languages, introduced a number of functional idioms in 2014. This evolution enables developers to achieve various benefits from both paradigms. However, we do not know how Java developers use functional idioms actually. Additionally, the extent to which, while there are several criticisms against the idioms, the developers actually accept and/or use the idioms currently remains unclear. In this paper, we investigate the actual use status of three functional idioms (Lambda Expression, Stream, and Optional) in Java projects by mining 100 projects containing approximately 130,000 revisions. From the mining results, we determined that Lambda Expression is utilized in 16% of all the examined projects, whereas Stream and Optional are only utilized in 2% to 3% of those projects. It appears that most Java developers avoid using functional idioms just because of keeping compatibility Java versions, while a number of developers accept these idioms for reasons of readability and runtime performance improvements. Besides, when they adopt the idioms, Lambda Expression frequently consists of a single statement, and Stream is used to operate the elements of a collection. On the other hand, some developers implement Optional using deprecated methods. We can say that good usage of the idioms should be widely known among developers.
Haitong YANG Guangyou ZHOU Tingting HE Maoxi LI
In this paper, we study domain adaptation of semantic role classification. Most systems utilize the supervised method for semantic role classification. But, these methods often suffer severe performance drops on out-of-domain test data. The reason for the performance drops is that there are giant feature differences between source and target domain. This paper proposes a framework called Adversarial Domain Adaption Network (ADAN) to relieve domain adaption of semantic role classification. The idea behind our method is that the proposed framework can derive domain-invariant features via adversarial learning and narrow down the gap between source and target feature space. To evaluate our method, we conduct experiments on English portion in the CoNLL 2009 shared task. Experimental results show that our method can largely reduce the performance drop on out-of-domain test data.
Lianqiang LI Jie ZHU Ming-Ting SUN
Convolutional Neural Networks (CNNs) usually have millions or even billions of parameters, which make them hard to be deployed into mobile devices. In this work, we present a novel filter-level pruning method to alleviate this issue. More concretely, we first construct an undirected fully connected graph to represent a pre-trained CNN model. Then, we employ the spectral clustering algorithm to divide the graph into some subgraphs, which is equivalent to clustering the similar filters of the CNN into the same groups. After gaining the grouping relationships among the filters, we finally keep one filter for one group and retrain the pruned model. Compared with previous pruning methods that identify the redundant filters by heuristic ways, the proposed method can select the pruning candidates more reasonably and precisely. Experimental results also show that our proposed pruning method has significant improvements over the state-of-the-arts.
Neunghoe KIM Jongwook JEONG Mansoo HWANG
Free/libre open source software (FLOSS) are being rapidly employed in several companies and organizations, because it can be modified and used for free. Hence, the use of FLOSS could contribute to its originally intended benefits and to the competence of its users. In this study, we analyzed the effect of using FLOSS on related competences. We investigated the change in the competences through an empirical study before and after the use of FLOSS among project participants. Consequently, it was confirmed that the competences of the participants improved after utilizing FLOSS.
Dokeun LEE Seongjin LEE Youjip WON
Indexing is one of the fields where the non-volatile memory (NVM) has the advantages of byte-addressable characteristics and fast read/write speed. The existing index structures for NVM have been developed based on the fact that the size of cache line and the atomicity guarantee unit of NVM are different and they tried to overcome the weakness of consistency from the difference. To overcome the weakness, an expensive flush operation is required which results in a lower performance than a basic B+tree index. Recent studies have shown that the I/O units of the NVM can be matched with the atomicity guarantee units under limited circumstances. In this paper, we propose a Cache line sized Atomic Write B+tree (CAWBT), which is a minimal B+tree structure that shows higher performance than a basic b+ tree and designed for NVM. CAWBT has almost same performance compared to basic B+tree without consistency guarantee and shows remarkable performance improvement compared to other B+tree indexes for NVM.
Shuichiro HARUTA Hiromu ASAHINA Fumitaka YAMAZAKI Iwao SASASE
Detecting phishing websites is imperative. Among several detection schemes, the promising ones are the visual similarity-based approaches. In those, targeted legitimate website's visual features referred to as signatures are stored in SDB (Signature Database) by the system administrator. They can only detect phishing websites whose signatures are highly similar to SDB's one. Thus, the system administrator has to register multiple signatures to detect various phishing websites and that cost is very high. This incurs the vulnerability of zero-day phishing attack. In order to address this issue, an auto signature update mechanism is needed. The naive way of auto updating SDB is expanding the scope of detection by adding detected phishing website's signature to SDB. However, the previous approaches are not suitable for auto updating since their similarity can be highly different among targeted legitimate website and subspecies of phishing website targeting that legitimate website. Furthermore, the previous signatures can be easily manipulated by attackers. In order to overcome the problems mentioned above, in this paper, we propose a hue signature auto update system for visual similarity-based phishing detection with tolerance to zero-day attack. The phishing websites targeting certain legitimate website tend to use the targeted website's theme color to deceive users. In other words, the users can easily distinguish phishing website if it has highly different hue information from targeted legitimate one (e.g. red colored Facebook is suspicious). Thus, the hue signature has a common feature among the targeted legitimate website and subspecies of phishing websites, and it is difficult for attackers to change it. Based on this notion, we argue that the hue signature fulfills the requirements about auto updating SDB and robustness for attackers' manipulating. This commonness can effectively expand the scope of detection when auto updating is applied to the hue signature. By the computer simulation with a real dataset, we demonstrate that our system achieves high detection performance compared with the previous scheme.
Ryo MASUMURA Taichi ASAMI Takanobu OBA Sumitaka SAKAUCHI Akinori ITO
This paper demonstrates latent word recurrent neural network language models (LW-RNN-LMs) for enhancing automatic speech recognition (ASR). LW-RNN-LMs are constructed so as to pick up advantages in both recurrent neural network language models (RNN-LMs) and latent word language models (LW-LMs). The RNN-LMs can capture long-range context information and offer strong performance, and the LW-LMs are robust for out-of-domain tasks based on the latent word space modeling. However, the RNN-LMs cannot explicitly capture hidden relationships behind observed words since a concept of a latent variable space is not present. In addition, the LW-LMs cannot take into account long-range relationships between latent words. Our idea is to combine RNN-LM and LW-LM so as to compensate individual disadvantages. The LW-RNN-LMs can support both a latent variable space modeling as well as LW-LMs and a long-range relationship modeling as well as RNN-LMs at the same time. From the viewpoint of RNN-LMs, LW-RNN-LM can be considered as a soft class RNN-LM with a vast latent variable space. In contrast, from the viewpoint of LW-LMs, LW-RNN-LM can be considered as an LW-LM that uses the RNN structure for latent variable modeling instead of an n-gram structure. This paper also details a parameter inference method and two kinds of implementation methods, an n-gram approximation and a Viterbi approximation, for introducing the LW-LM to ASR. Our experiments show effectiveness of LW-RNN-LMs on a perplexity evaluation for the Penn Treebank corpus and an ASR evaluation for Japanese spontaneous speech tasks.
Jinna LV Bin WU Yunlei ZHANG Yunpeng XIAO
Recently, social relation analysis receives an increasing amount of attention from text to image data. However, social relation analysis from video is an important problem, which is lacking in the current literature. There are still some challenges: 1) it is hard to learn a satisfactory mapping function from low-level pixels to high-level social relation space; 2) how to efficiently select the most relevant information from noisy and unsegmented video. In this paper, we present an Attentive Sequences Recurrent Network model, called ASRN, to deal with the above challenges. First, in order to explore multiple clues, we design a Multiple Feature Attention (MFA) mechanism to fuse multiple visual features (i.e. image, motion, body, and face). Through this manner, we can generate an appropriate mapping function from low-level video pixels to high-level social relation space. Second, we design a sequence recurrent network based on Global and Local Attention (GLA) mechanism. Specially, an attention mechanism is used in GLA to integrate global feature with local sequence feature to select more relevant sequences for the recognition task. Therefore, the GLA module can better deal with noisy and unsegmented video. At last, extensive experiments on the SRIV dataset demonstrate the performance of our ASRN model.
Wei BAI Yuli ZHANG Meng WANG Jin CHEN Han JIANG Zhan GAO Donglin JIAO
This paper investigates the spectrum allocation problem. Under the current spectrum management mode, large amount of spectrum resource is wasted due to uncertainty of user's demand. To reduce the impact of uncertainty, a presale mechanism is designed based on spectrum pool. In this mechanism, the spectrum manager provides spectrum resource at a favorable price for presale aiming at sharing with user the risk caused by uncertainty of demand. Because of the hierarchical characteristic, we build a spectrum market Stackelberg game, in which the manager acts as leader and user as follower. Then proof of the uniqueness and optimality of Stackelberg Equilibrium is given. Simulation results show the presale mechanism can promote profits for both sides and reduce temporary scheduling.
A non-photorealistic rendering method has been proposed for generating oil-film-like images from photographic images by bilateral infra-envelope filter. The conventional method has a disadvantage that it takes much time to process. We propose a method for generating oil-film-like images that can be processed faster than the conventional method. The proposed method uses an iterative process with upper and lower smoothing filters. To verify the effectiveness of the proposed method, we conduct experiments using Lenna image. As a result of the experiments, we show that the proposed method can process faster than the conventional method.
Kyungdeuk KO Jaihyun PARK David K. HAN Hanseok KO
In-class species classification based on animal sounds is a highly challenging task even with the latest deep learning technique applied. The difficulty of distinguishing the species is further compounded when the number of species is large within the same class. This paper presents a novel approach for fine categorization of animal species based on their sounds by using pre-trained CNNs and a new self-attention module well-suited for acoustic signals The proposed method is shown effective as it achieves average species accuracy of 98.37% and the minimum species accuracy of 94.38%, the highest among the competing baselines, which include CNN's without self-attention and CNN's with CBAM, FAM, and CFAM but without pre-training.