Loan default prediction has been a significant problem in the financial domain because overdue loans may incur significant losses. Machine learning methods have been introduced to solve this problem, but there are still many challenges including feature multicollinearity, imbalanced labels, and small data sample problems. To replicate the success of deep learning in many areas, an effective regularization technique named muddling label regularization is introduced in this letter, and an ensemble of feed-forward neural networks is proposed, which outperforms machine learning and deep learning baselines in a real-world dataset.
Recently, the performances of discriminative correlation filter (CF) trackers are getting better and better in visual tracking. In this paper, we propose spatial-temporal regularization with precise state estimation based on discriminative correlation filter (STPSE) in order to achieve more significant tracking performance. First, we consider the continuous change of the object state, using the information from the previous two filters for training the correlation filter model. Here, we train the correlation filter model with the hand-crafted features. Second, we introduce update control in which average peak-to-correlation energy (APCE) and the distance between the object locations obtained by HOG features and hand-crafted features are utilized to detect abnormality of the state around the object. APCE and the distance indicate the reliability of the filter response, thus if abnormality is detected, the proposed method does not update the scale and the object location estimated by the filter response. In the experiment, our tracker (STPSE) achieves significant and real-time performance with only CPU for the challenging benchmark sequence (OTB2013, OTB2015, and TC128).
Genki OSADA Budrul AHSAN Revoti PRASAD BORA Takashi NISHIDE
Virtual Adversarial Training (VAT) has shown impressive results among recently developed regularization methods called consistency regularization. VAT utilizes adversarial samples, generated by injecting perturbation in the input space, for training and thereby enhances the generalization ability of a classifier. However, such adversarial samples can be generated only within a very small area around the input data point, which limits the adversarial effectiveness of such samples. To address this problem we propose LVAT (Latent space VAT), which injects perturbation in the latent space instead of the input space. LVAT can generate adversarial samples flexibly, resulting in more adverse effect and thus more effective regularization. The latent space is built by a generative model, and in this paper we examine two different type of models: variational auto-encoder and normalizing flow, specifically Glow. We evaluated the performance of our method in both supervised and semi-supervised learning scenarios for an image classification task using SVHN and CIFAR-10 datasets. In our evaluation, we found that our method outperforms VAT and other state-of-the-art methods.
In the recent years, deep learning has achieved significant results in various areas of machine learning. Deep learning requires a huge amount of data to train a model, and data collection techniques such as web crawling have been developed. However, there is a risk that these data collection techniques may generate incorrect labels. If a deep learning model for image classification is trained on a dataset with noisy labels, the generalization performance significantly decreases. This problem is called Learning with Noisy Labels (LNL). One of the recent researches on LNL, called DivideMix [1], has successfully divided the dataset into samples with clean labels and ones with noisy labels by modeling loss distribution of all training samples with a two-component Mixture Gaussian model (GMM). Then it treats the divided dataset as labeled and unlabeled samples and trains the classification model in a semi-supervised manner. Since the selected samples have lower loss values and are easy to classify, training models are in a risk of overfitting to the simple pattern during training. To train the classification model without overfitting to the simple patterns, we propose to introduce consistency regularization on the selected samples by GMM. The consistency regularization perturbs input images and encourages model to outputs the same value to the perturbed images and the original images. The classification model simultaneously receives the samples selected as clean and their perturbed ones, and it achieves higher generalization performance with less overfitting to the selected samples. We evaluated our method with synthetically generated noisy labels on CIFAR-10 and CIFAR-100 and obtained results that are comparable or better than the state-of-the-art method.
Jonathan MOJOO Yu ZHAO Muthu Subash KAVITHA Junichi MIYAO Takio KURITA
The task of image annotation is becoming enormously important for efficient image retrieval from the web and other large databases. However, huge semantic information and complex dependency of labels on an image make the task challenging. Hence determining the semantic similarity between multiple labels on an image is useful to understand any incomplete label assignment for image retrieval. This work proposes a novel method to solve the problem of multi-label image annotation by unifying two different types of Laplacian regularization terms in deep convolutional neural network (CNN) for robust annotation performance. The unified Laplacian regularization model is implemented to address the missing labels efficiently by generating the contextual similarity between labels both internally and externally through their semantic similarities, which is the main contribution of this study. Specifically, we generate similarity matrices between labels internally by using Hayashi's quantification method-type III and externally by using the word2vec method. The generated similarity matrices from the two different methods are then combined as a Laplacian regularization term, which is used as the new objective function of the deep CNN. The Regularization term implemented in this study is able to address the multi-label annotation problem, enabling a more effectively trained neural network. Experimental results on public benchmark datasets reveal that the proposed unified regularization model with deep CNN produces significantly better results than the baseline CNN without regularization and other state-of-the-art methods for predicting missing labels.
A limited number of types of sound event occur in an acoustic scene and some sound events tend to co-occur in the scene; for example, the sound events “dishes” and “glass jingling” are likely to co-occur in the acoustic scene “cooking.” In this paper, we propose a method of sound event detection using graph Laplacian regularization with sound event co-occurrence taken into account. In the proposed method, the occurrences of sound events are expressed as a graph whose nodes indicate the frequencies of event occurrence and whose edges indicate the sound event co-occurrences. This graph representation is then utilized for the model training of sound event detection, which is optimized under an objective function with a regularization term considering the graph structure of sound event occurrence and co-occurrence. Evaluation experiments using the TUT Sound Events 2016 and 2017 detasets, and the TUT Acoustic Scenes 2016 dataset show that the proposed method improves the performance of sound event detection by 7.9 percentage points compared with the conventional CNN-BiGRU-based detection method in terms of the segment-based F1 score. In particular, the experimental results indicate that the proposed method enables the detection of co-occurring sound events more accurately than the conventional method.
Linear Prediction (LP) analysis is commonly used in speech processing. LP is based on Auto-Regressive (AR) model and it estimates the AR model parameter from signals with l2-norm optimization. Recently, sparse estimation is paid attention since it can extract significant features from big data. The sparse estimation is realized by l1 or l0-norm optimization or regularization. Sparse LP analysis methods based on l1-norm optimization have been proposed. Since excitation of speech is not white Gaussian, a sparse LP estimation can estimate more accurate parameter than the conventional l2-norm based LP. These are time-invariant and real-valued analysis. We have been studied Time-Varying Complex AR (TV-CAR) analysis for an analytic signal and have evaluated the performance on speech processing. The TV-CAR methods are l2-norm methods. In this paper, we propose the sparse TV-CAR analysis based on adaptive LASSO (Least absolute shrinkage and selection operator) that is l1-norm regularization and evaluate the performance on F0 estimation of speech using IRAPT (Instantaneous RAPT). The experimental results show that the sparse TV-CAR methods perform better for a high level of additive Pink noise.
Affine projection sign algorithm (APSA) is an important adaptive filtering method to combat the impulsive noisy environment. However, the performance of APSA is poor, if its regularization parameter is not well chosen. We propose a variable regularization APSA (VR-APSA) approach, which adopts a gradient-based method to recursively reduce the norm of the a priori error vector. The resulting VR-APSA leverages the time correlation of both the input signal matrix and error vector to adjust the value of the regularization parameter. Simulation results confirm that our algorithm exhibits both fast convergence and small misadjustment properties.
Shogo SEKI Tomoki TODA Kazuya TAKEDA
This paper proposes a semi-supervised source separation method for stereophonic music signals containing multiple recorded or processed signals, where synthesized music is focused on the stereophonic music. As the synthesized music signals are often generated as linear combinations of many individual source signals and their respective mixing gains, phase or phase difference information between inter-channel signals, which represent spatial characteristics of recording environments, cannot be utilized as acoustic clues for source separation. Non-negative Tensor Factorization (NTF) is an effective technique which can be used to resolve this problem by decomposing amplitude spectrograms of stereo channel music signals into basis vectors and activations of individual music source signals, along with their corresponding mixing gains. However, it is difficult to achieve sufficient separation performance using this method alone, as the acoustic clues available for separation are limited. To address this issue, this paper proposes a Cepstral Distance Regularization (CDR) method for NTF-based stereo channel separation, which involves making the cepstrum of the separated source signals follow Gaussian Mixture Models (GMMs) of the corresponding the music source signal. These GMMs are trained in advance using available samples. Experimental evaluations separating three and four sound sources are conducted to investigate the effectiveness of the proposed method in both supervised and semi-supervised separation frameworks, and performance is also compared with that of a conventional NTF method. Experimental results demonstrate that the proposed method yields significant improvements within both separation frameworks, and that cepstral distance regularization provides better separation parameters.
Jun WANG Yuanyun WANG Chengzhi DENG Shengqian WANG Yong QIN
Developing a robust appearance model is a challenging task due to appearance variations of objects such as partial occlusion, illumination variation, rotation and background clutter. Existing tracking algorithms employ linear combinations of target templates to represent target appearances, which are not accurate enough to deal with appearance variations. The underlying relationship between target candidates and the target templates is highly nonlinear because of complicated appearance variations. To address this, this paper presents a regularized kernel representation for visual tracking. Namely, the feature vectors of target appearances are mapped into higher dimensional features, in which a target candidate is approximately represented by a nonlinear combination of target templates in a dimensional space. The kernel based appearance model takes advantage of considering the non-linear relationship and capturing the nonlinear similarity between target candidates and target templates. l2-regularization on coding coefficients makes the approximate solution of target representations more stable. Comprehensive experiments demonstrate the superior performances in comparison with state-of-the-art trackers.
A Tikhonov regularized RLS algorithm with an exponential weighting factor, i.e., a leaky RLS (LRLS) algorithm was proposed by the author. A quadratic version of the LRLS algorithm also exists in the literature of adaptive filters. In this letter, a cubic version of the LRLS filter which is computationally efficient is proposed when the length of the adaptive filter is short. The proposed LRLS filter includes only a divide per iteration although its multiplications and additions increase in number. Simulation results show that the proposed LRLS filter is faster for its short length than the existing quadratic version of the LRLS filter.
Recently, a high dimensional classification framework has been proposed to introduce spatial structure information in classical single kernel support vector machine optimization scheme for brain image analysis. However, during the construction of spatial kernel in this framework, a huge adjacency matrix is adopted to determine the adjacency relation between each pair of voxels and thus it leads to very high computational complexity in the spatial kernel calculation. The method is improved in this manuscript by a new construction of tensorial kernel wherein a 3-order tensor is adopted to preserve the adjacency relation so that calculation of the above huge matrix is avoided, and hence the computational complexity is significantly reduced. The improvement is verified by experimental results on classification of Alzheimer patients and cognitively normal controls.
Guohao LYU Hui YIN Xinyan YU Siwei LUO
In this letter, a local characteristic image restoration based on convolutional neural network is proposed. In this method, image restoration is considered as a classification problem and images are divided into several sub-blocks. The convolutional neural network is used to extract and classify the local characteristics of image sub-blocks, and the different forms of the regularization constraints are adopted for the different local characteristics. Experiments show that the image restoration results by the regularization method based on local characteristics are superior to those by the traditional regularization methods and this method also has lower computing cost.
Recently, a high dimensional classification framework has been proposed to introduce spatial and anatomical priors in classical single kernel support vector machine optimization scheme, wherein the sequential minimal optimization (SMO) training algorithm is adopted, for brain image analysis. However, to satisfy the optimization conditions required in the single kernel case, it is unreasonably assumed that the spatial regularization parameter is equal to the anatomical one. In this letter, this approach is improved by combining SMO algorithm with multiple kernel learning to avoid that assumption and optimally estimate two parameters. The improvement is comparably demonstrated by experimental results on classification of Alzheimer patients and elderly controls.
Zhong ZHANG Shuang LIU Zhiwei ZHANG
Sparsity-based methods have been recently applied to abnormal event detection and have achieved impressive results. However, most such methods suffer from the problem of dimensionality curse; furthermore, they also take no consideration of the relationship among coefficient vectors. In this paper, we propose a novel method called consistent sparse representation (CSR) to overcome the drawbacks. We first reconstruct each feature in the space spanned by the clustering centers of training features so as to reduce the dimensionality of features and preserve the neighboring structure. Then, the consistent regularization is added to the sparse representation model, which explicitly considers the relationship of coefficient vectors. Our method is verified on two challenging databases (UCSD Ped1 database and Subway batabase), and the experimental results demonstrate that our method obtains better results than previous methods in abnormal event detection.
Yilong ZHANG Yuehua LI Guanhua HE Sheng ZHANG
Aperture synthesis technology represents an effective approach to millimeter-wave radiometers for high-resolution observations. However, the application of synthetic aperture imaging radiometer (SAIR) is limited by its large number of antennas, receivers and correlators, which may increase noise and cause the image distortion. To solve those problems, this letter proposes a compressive regularization imaging algorithm, called CRIA, to reconstruct images accurately via combining the sparsity and the energy functional of target space. With randomly selected visibility samples, CRIA employs l1 norm to reconstruct the target brightness temperature and l2 norm to estimate the energy functional of it simultaneously. Comparisons with other algorithms show that CRIA provides higher quality target brightness temperature images at a lower data level.
This letter studies the problem of cooperative spectrum sensing in wideband cognitive radio networks. Based on the basis expansion model (BEM), the problem of estimation of power spectral density (PSD) is transformed to estimation of BEM coefficients. The sparsity both in frequency domain and space domain is used to construct a sparse estimation structure. The theory of L1/2 regularization is used to solve the compressed sensing problem. Simulation results demonstrate the effectiveness of the proposed method.
In this paper, an image prior based on soft-morphological filters and its application to image recovery are presented. In morphological image processing, a gray-scale image is represented as a subset in a three-dimensional space, which is spanned by spatial and intensity axes. Morphological opening and closing, which are basic operations in morphological image processing, respectively approximate the image subset and its complementary images as the unions of structuring elements that are translated in the three-dimensional space. In this study, the opening and closing filters are applied to an image prior to resolve the regularization problem of image recovery. When the proposed image prior is applied, the image is recovered as an image that has no noise component, which is eliminated by the opening and closing. However, the closing and opening filters are less able to eliminate Gaussian noise. In order to improve the robustness against Gaussian noise, the closing and opening filters are respectively approximated as soft-closing and soft-opening with relaxed max and min functions. In image recovery experiments, image denoising and deblurring using the proposed prior are demonstrated. Comparisons of the proposed prior with the existing priors that impose a penalty on the gradient of the intensity are also shown.
Wittawat JITKRITTUM Hirotaka HACHIYA Masashi SUGIYAMA
Feature selection is a technique to screen out less important features. Many existing supervised feature selection algorithms use redundancy and relevancy as the main criteria to select features. However, feature interaction, potentially a key characteristic in real-world problems, has not received much attention. As an attempt to take feature interaction into account, we propose
Makoto NAKASHIZUKA Yu ASHIHARA Youji IIGUNI
This paper proposes an adaptation method for structuring elements of morphological filters. A structuring element of a morphological filter specifies a shape of local structures that is eliminated or preserved in the output. The adaptation of the structuring element is hence a crucial problem for image denoising using morphological filters. Existing adaptation methods for structuring elements require preliminary training using example images. We propose an adaptation method for structuring elements of morphological opening filters that does not require such training. In our approach, the opening filter is interpreted as an approximation method with the union of the structuring elements. In order to eliminate noise components, a penalty defined from an assumption of image smoothness is imposed on the structuring element. Image denoising is achieved through decreasing the objective function, which is the sum of an approximation error term and the penalty function. In experiments, we use the proposed method to demonstrate positive impulsive noise reduction from images.