This Letter proposes a autoencoder model supervised by semantic similarity for zero-shot learning. With the help of semantic similarity vectors of seen and unseen classes and the classification branch, our experimental results on two datasets are 7.3% and 4% better than the state-of-the-art on conventional zero-shot learning in terms of the averaged top-1 accuracy.
Daisuke SAITO Nobuaki MINEMATSU Keikichi HIROSE
This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of multiple Gaussian mixture models (GMM). In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice GMM (EV-GMM) was proposed. In the EVC, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this paper, we revisit construction of the speaker space by introducing the tensor factor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the dimension of the mean vector and the Gaussian component. The speaker space is derived by the tensor factor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. In addition, in this paper, effects of speaker adaptive training before factorization are also investigated. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.
In this paper, an enhanced well-changed GGNMOS (EW-GGNMOS) is proposed and demonstrated. The new device has the same topology as the conventional 3.3V GGNMOS, except that its well has been changed to the 1.2V p-well. Attributed to higher doping concentration, resulting in a much lower trigger voltage and desirable turn-on uniformity compared to conventional 3.3V GGNMOS. Therefore, we can use EW-GGNMOS as a 3.3V ESD protection device without any additional process.
Outlier detection in a data set is very important in performing proper data mining. In this paper, we propose a method for efficiently detecting outliers by performing cluster analysis using the DS algorithm improved from the k-means algorithm. This method is simpler to detect outliers than traditional methods, and these detected outliers can quantitatively indicate “the degree of outlier”. Using this method, we detect abnormal trading days from OHLCs for S&P500 and FTSA, which are typical and world-wide stock indexes, from the beginning of 2005 to the end of 2015. They are defined as non-steady trading days, and the conditions for becoming the non-steady markets are mined as new knowledge. Applying the mined knowledge to OHLCs from the beginning of 2016 to the end of 2018, we can predict the non-steady trading days during that period. By verifying the predicted content, we show the fact that the appropriate knowledge has been successfully mined and show the effectiveness of the outlier detection method proposed in this paper. Furthermore, we mutually reference and comparatively analyze the results of applying this method to multiple stock indexes. This analyzes possible to visualize when and where social and economic impacts occur and how they propagate through the earth. This is one of the new applications using this method.
This letter studies the physical layer security of an unmanned aerial vehicle (UAV)-enabled multicasting system, where a UAV serves as a mobile transmitter to send a common confidential message to a group of legitimate users under the existence of multiple eavesdroppers. The worst situation in which each eavesdropper can wiretap all legitimate users is considered. We seek to maximize the average secrecy rate by jointly optimizing the UAV's transmit power and trajectory over a given flight period. The resulting optimization problem is nonconvex and intractable to solve. To circumvent the nonconvexity, we propose an iterative algorithm to approximate the solution based on the alternating optimization and successive convex approximation methods. Simulation results validate the convergence and effectiveness of our proposed algorithm.
Yukasa MURAKAMI Masateru TSUNODA Koji TODA
To enhance the prediction accuracy of the number of faults, many studies proposed various prediction models. The model is built using a dataset collected in past projects, and the number of faults is predicted using the model and the data of the current project. Datasets sometimes have many data points where the dependent variable, i.e., the number of faults is zero. When a multiple linear regression model is made using the dataset, the model may not be built properly. To avoid the problem, the Tobit model is considered to be effective when predicting software faults. The model assumes that the range of a dependent variable is limited and the model is built based on the assumption. Similar to the Tobit model, the Poisson regression model assumes there are many data points whose value is zero on the dependent variable. Also, log-transformation is sometimes applied to enhance the accuracy of the model. Additionally, ensemble methods are effective to enhance prediction accuracy of the models. We evaluated the prediction accuracy of the methods separately, when the number of faults is zero and not zero. In the experiment, our proposed ensemble method showed the highest accuracy, and Pred25 was 21% when the number of faults was not zero, and it was 45% when the number was zero.
Kenta NISHIYUKI Jia-Yau SHIAU Shigenori NAGAE Tomohiro YABUUCHI Koichi KINOSHITA Yuki HASEGAWA Takayoshi YAMASHITA Hironobu FUJIYOSHI
Driver drowsiness estimation is one of the important tasks for preventing car accidents. Most of the approaches are binary classification that classify a driver is significantly drowsy or not. Multi-level drowsiness estimation, that detects not only significant drowsiness but also moderate drowsiness, is helpful to a safer and more comfortable car system. Existing approaches are mostly based on conventional temporal measures which extract temporal information related to eye states, and these measures mainly focus on detecting significant drowsiness for binary classification. For multi-level drowsiness estimation, we propose two temporal measures, average eye closed time (AECT) and soft percentage of eyelid closure (Soft PERCLOS). Existing approaches are also based on a time domain convolutional neural network (CNN) as deep neural network models, of which layers are linked sequentially. The network model extracts features mainly focusing on mono-temporal resolution. We found that features focusing on multi-temporal resolution are effective to multi-level drowsiness estimation, and we propose a parallel linked time-domain CNN to extract the multi-temporal features. We collected an own dataset in a real environment and evaluated the proposed methods with the dataset. Compared with existing temporal measures and network models, Our system outperforms the existing approaches on the dataset.
This paper proposes a method for heatmapping people who are involved in a group activity. Such people grouping is useful for understanding group activities. In prior work, people grouping is performed based on simple inflexible rules and schemes (e.g., based on proximity among people and with models representing only a constant number of people). In addition, several previous grouping methods require the results of action recognition for individual people, which may include erroneous results. On the other hand, our proposed heatmapping method can group any number of people who dynamically change their deployment. Our method can work independently of individual action recognition. A deep network for our proposed method consists of two input streams (i.e., RGB and human bounding-box images). This network outputs a heatmap representing pixelwise confidence values of the people grouping. Extensive exploration of appropriate parameters was conducted in order to optimize the input bounding-box images. As a result, we demonstrate the effectiveness of the proposed method for heatmapping people involved in group activities.
Pei LI Haiyang ZHANG Fan CHU Wei WU Juan ZHAO Baoyun WANG
This paper proposes a sampling strategy for bandlimited graph signals over perturbed graph, in which we assume the edge between any pair of the nodes may be deleted randomly. Considering the mismatch between the true graph and the presumed graph, we derive the mean square error (MSE) of the reconstructed bandlimited graph signals. To minimize the MSE, we propose a greedy-based algorithm to obtain the optimal sampling set. Furthermore, we use Neumann series to avoid the pseudo-inverse computing. An efficient algorithm with low-complexity is thus proposed. Finally, numerical results show the superiority of our proposed algorithms over the other existing algorithms.
This paper proposes a method to create various training images for instance segmentation in a semi-supervised manner. In our proposed learning scheme, a few 3D CG models of target objects and a large number of images retrieved by keywords from the Internet are employed for initial model training and model update, respectively. Instance segmentation requires pixel-level annotations as well as object class labels in all training images. A possible solution to reduce a huge annotation cost is to use synthesized images as training images. While image synthesis using a 3D CG simulator can generate the annotations automatically, it is difficult to prepare a variety of 3D object models for the simulator. One more possible solution is semi-supervised learning. Semi-supervised learning such as self-training uses a small set of supervised data and a huge number of unsupervised data. The supervised images are given by the 3D CG simulator in our method. From the unsupervised images, we have to select only correctly-detected annotations. For selecting the correctly-detected annotations, we propose to quantify the reliability of each detected annotation based on its silhouette as well as its textures. Experimental results demonstrate that the proposed method can generate more various images for improving instance segmentation.
Yasuyuki MIYAMOTO Takahiro GOTOW
In this study, simulations are performed to design an optimal device for thinning the GaN channel layer on the semi-insulating layer in HEMT. When the gate length is 50nm, the thickness of the undoped channel must be thinner than 300nm to observe the off state. When the GaN channel layer is an Fe-doped, an on/off ratio of ~300 can be achieved even with a gate length of 25nm, although the transconductance is slightly reduced.
Takuma HAMAGAMI Shinsuke HARA Hiroyuki YOMO Ryusuke MIYAMOTO Yasutaka KAWAMOTO Takunori SHIMAZAKI Hiroyuki OKUHATA
When we collect vital data from exercisers by putting wireless sensor nodes to them, the reliability of the wireless data collection is dependent on the position of node on the body of exerciser, therefore, in order to determine the suitable body position, it is essential to evaluate the data collection performances by changing the body positions of nodes in experiments involving human subjects. However, their fair comparison is problematic, because the experiments have no repeatability, that is, we cannot evaluate the performances for multiple body positions in an experiment at the same time. In this paper, we predict the performances by a software network simulator. Using two main functions such as a channel state function and a mobility function, the network simulator can repeatedly generate the same channel and mobility conditions for nodes. Numerical result obtained by the network simulator shows that when collecting vital data from twenty two footballers in a game, among three body position such as waist, forearm and calf, the forearm position gives the highest data collection rate and the predicted data collection rates agree well with the ones obtained by an experiment involving real subjects.
Masahiro MITTA Minseok KIM Yuki ICHIKAWA
This paper presents a real-time body motion classification system using the radio channel characteristics of a wearable body area network (BAN). We developed a custom wearable BAN radio channel measurement system by modifying an off-the-shelf ZigBee-based sensor network system, where the link quality indicator (LQI) values of the wireless links between the coordinator and four sensor nodes can be measured. After interpolating and standardizing the raw data samples in a pre-processing stage, the time-domain features are calculated, and the body motion is classified by a decision-tree based random forest machine learning algorithm which is most suitable for real-time processing. The features were carefully chosen to exclude those that exhibit the same tendency based on the mean and variance of the features to avoid overfitting. The measurements demonstrated successful real-time body motion classification and revealed the potential for practical use in various daily-life applications.
Jumpei YAMAMOTO Toshihiko NISHIMURA Takeo OHGANE Yasutaka OGAWA Daiki TAKEDA Yoshihisa KISHIYAMA
Massive MIMO is known as a promising technology for multiuser multiplexing in the fifth generation mobile communication system to accommodate the rapidly-increasing traffic. It has a large number of antenna elements and thus provides very sharp beams. As seen in hybrid beamforming, there have already been many papers on the concatenation of two precoders (beamformers). The inner precoder, i.e., a multi-beam former, performs a linear transformation between the element space and the beam space. The outer precoder forms nulls in the limited beam space spanned by selected beams to suppress the inter-user interference. In this two-step precoder, the beam shape is expected to determine the system performance. In this paper, we evaluate the achievable throughput performance for different beam-shaping schemes: a discrete Fourier transform (DFT) beam, Chebyshev weighted beams, and Taylor weighted beam. Simulations show that the DFT beam provides the best performance except the case of imperfect precoding and cell edge SNR of 30dB.
Hitoshi NISHIMURA Naoya MAKIBUCHI Kazuyuki TASAKA Yasutomo KAWANISHI Hiroshi MURASE
Multiple human tracking is widely used in various fields such as marketing and surveillance. The typical approach associates human detection results between consecutive frames using the features and bounding boxes (position+size) of detected humans. Some methods use an omnidirectional camera to cover a wider area, but ID switch often occurs in association with detections due to following two factors: i) The feature is adversely affected because the bounding box includes many background regions when a human is captured from an oblique angle. ii) The position and size change dramatically between consecutive frames because the distance metric is non-uniform in an omnidirectional image. In this paper, we propose a novel method that accurately tracks humans with an association metric for omnidirectional images. The proposed method has two key points: i) For feature extraction, we introduce local rectification, which reduces the effect of background regions in the bounding box. ii) For distance calculation, we describe the positions in a world coordinate system where the distance metric is uniform. In the experiments, we confirmed that the Multiple Object Tracking Accuracy (MOTA) improved 3.3 in the LargeRoom dataset and improved 2.3 in the SmallRoom dataset.
Rongcun WANG Shujuan JIANG Kun ZHANG Qiao YU
Software fault localization, as one of the essential activities in program debugging, aids to software developers to identify the locations of faults in a program, thus reducing the cost of program debugging. Spectrum-based fault localization (SBFL), as one of the representative localization techniques, has been intensively studied. The localization technique calculates the probability of each program entity that is faulty by a certain suspiciousness formula. The accuracy of SBFL is not always as satisfactory as expected because it neglects the contextual information of statement executions. Therefore, we proposed 5 rules, i.e., random, the maximum coverage, the minimum coverage, the maximum distance, and the minimum distance, to improve the accuracy of SBFL for further. The 5 rules can effectively use the contextual information of statement executions. Moreover, they can be implemented on the traditional SBFL techniques using suspiciousness formulas with little effort. We empirically evaluated the impacts of the rules on 17 suspiciousness formulas. The results show that all 5 rules can significantly improve the ranking of faulty statements. Particularly, for the faults difficult to locate, the improvement is more remarkable. Generally, the rules can effectively reduce the number of statements examined by an average of more than 19%. Compared with other rules, the minimum coverage rule generates better results. This indicates that the application of the test case having the minimum coverage capability for fault localization is more effective.
Survivable virtual network embedding (SVNE) is one of major challenges of network virtualization. In order to improve the utilization rate of the substrate network (SN) resources with virtual network (VN) topology connectivity guarantee under link failure in SN, we first establishes an Integer Linear Programming (ILP) model for that under SN supports path splitting. Then we designs a novel survivable VN topology protection method based on particle swarm optimization (VNE-PSO), which redefines the parameters and related operations of particles with the embedding overhead as the fitness function. Simulation results show that the solution significantly improves the long-term average revenue of the SN, the acceptance rate of VN requests, and reduces the embedding time compared with the existing research results.
A pre-trained deep convolutional neural network (DCNN) is adopted as a feature extractor to extract the feature representation of vein images for hand-dorsa vein recognition. In specific, a novel selective deep convolutional feature is proposed to obtain more representative and discriminative feature representation. Extensive experiments on the lab-made database obtain the state-of-the-art recognition result, which demonstrates the effectiveness of the proposed model.
Danyang LIU Ji XU Pengyuan ZHANG
End-to-end (E2E) multilingual automatic speech recognition (ASR) systems aim to recognize multilingual speeches in a unified framework. In the current E2E multilingual ASR framework, the output prediction for a specific language lacks constraints on the output scope of modeling units. In this paper, a language supervision training strategy is proposed with language masks to constrain the neural network output distribution. To simulate the multilingual ASR scenario with unknown language identity information, a language identification (LID) classifier is applied to estimate the language masks. On four Babel corpora, the proposed E2E multilingual ASR system achieved an average absolute word error rate (WER) reduction of 2.6% compared with the multilingual baseline system.