Many learning machines such as normal mixtures and layered neural networks are not regular but singular statistical models, because the map from a parameter to a probability distribution is not one-to-one. The conventional statistical asymptotic theory can not be applied to such learning machines because the likelihood function can not be approximated by any normal distribution. Recently, new statistical theory has been established based on algebraic geometry and it was clarified that the generalization and training errors are determined by two birational invariants, the real log canonical threshold and the singular fluctuation. However, their concrete values are left unknown. In the present paper, we propose a new concept, a quasi-regular case in statistical learning theory. A quasi-regular case is not a regular case but a singular case, however, it has the same property as a regular case. In fact, we prove that, in a quasi-regular case, two birational invariants are equal to each other, resulting that the symmetry of the generalization and training errors holds. Moreover, the concrete values of two birational invariants are explicitly obtained, hence the quasi-regular case is useful to study statistical learning theory.
Shunsuke ONO Takamichi MIYATA Isao YAMADA Katsunori YAMAOKA
Solving image recovery problems requires the use of some efficient regularizations based on a priori information with respect to the unknown original image. Naturally, we can assume that an image is modeled as the sum of smooth, edge, and texture components. To obtain a high quality recovered image, appropriate regularizations for each individual component are required. In this paper, we propose a novel image recovery technique which performs decomposition and recovery simultaneously. We formulate image recovery as a nonsmooth convex optimization problem and design an iterative scheme based on the alternating direction method of multipliers (ADMM) for approximating its global minimizer efficiently. Experimental results reveal that the proposed image recovery technique outperforms a state-of-the-art method.
Jing JIANG Dianhua WU Pingzhi FAN
Optical orthogonal codes (OOCs) were introduced by Salehi, as signature sequences to facilitate multiple access in optical fibre networks. The existence of optimal (υ,3,1)-OOCs had been solved completely. Although there are some partial results, the existence of optimal (υ, 4, 1)-OOCs is far from settled. In this letter, three general constructions for (υ, 4, 1)-OOCs via perfect difference families are presented, new infinite classes of (υ, 4, 1)-OOCs are then obtained.
Yusaku KANETA Shingo YOSHIZAWA Shin-ichi MINATO Hiroki ARIMURA Yoshikazu MIYANAGA
In this paper, we propose a novel architecture for large-scale regular expression matching, called dynamically reconfigurable bit-parallel NFA architecture (Dynamic BP-NFA), which allows dynamic loading of regular expressions on-the-fly as well as efficient pattern matching for fast data streams. This is the first dynamically reconfigurable hardware with guaranteed performance for the class of extended patterns, which is a subclass of regular expressions consisting of union of characters and its repeat. This class allows operators such as character classes, gaps, optional characters, and bounded and unbounded repeats of character classes. The key to our architecture is the use of bit-parallel pattern matching approach, in which the information of an input non-deterministic finite automaton (NFA) is first compactly encoded in bit-masks stored in a collection of registers and block RAMs. Then, the NFA is efficiently simulated by a fixed circuitry using bitwise Boolean and arithmetic operations consuming one input character per clock regardless of the actual contents of an input text. Experimental results showed that our hardwares for both string and extended patterns were comparable to previous dynamically reconfigurable hardwares in their performances.
Adel ZAHEDI Mohammad-Hossein KAHAEI
A flexible and computationally efficient method for spectral analysis of sinusoidal signals using the Basis Pursuit De-Noising (BPDN) is proposed. This method estimates a slotted Auto-Correlation Function (ACF) and computes the spectrum as the sparse representation of the ACF in a dictionary of cosine functions. Simulation results illustrate flexibility and effectiveness of the proposed method.
Hiroki NAKAHARA Tsutomu SASAO Munehiro MATSUURA
This paper shows a design method for a regular expression matching circuit based on a decomposed automaton. To implement a regular expression matching circuit, first, we convert a regular expression into a non-deterministic finite automaton (NFA). Then, to reduce the number of states, we convert the NFA into a merged-states non-deterministic finite automaton with unbounded string transition (MNFAU) using a greedy algorithm. Next, to realize it by a feasible amount of hardware, we decompose the MNFAU into a deterministic finite automaton (DFA) and an NFA. The DFA part is implemented by an off-chip memory and a simple sequencer, while the NFA part is implemented by a cascade of logic cells. Also, in this paper, we show that the MNFAU based implementation has lower area complexity than the DFA and the NFA based ones. Experiments using regular expressions form SNORT shows that, as for the embedded memory size per a character, the MNFAU is 17.17-148.70 times smaller than DFA methods. Also, as for the number of LCs (Logic Cells) per a character, the MNFAU is 1.56-5.12 times smaller than NFA methods. This paper describes detail of the MEMOCODE2010 HW/SW co-design contest for which we won the first place award.
Yanwei WANG Xiaoqing DING Changsong LIU
This letter has retrained an MQDF classifier on the retraining set, which is constructed by samples locating near classification boundary. The method is evaluated on HCL2000 and HCD Chinese handwriting sets. The results show that the retrained MQDF outperforms MQDF and cascade MQDF on all test sets.
Jacob BENESTY Constantin PALEOLOGU Silviu CIOCHIN
Regularization plays a fundamental role in adaptive filtering. There are, very likely, many different ways to regularize an adaptive filter. In this letter, we propose one possible way to do it based on a condition that makes intuitively sense. From this condition, we show how to regularize the recursive least-squares (RLS) algorithm.
Kazuya UEKI Masashi SUGIYAMA Yasuyuki IHARA
Over the recent years, a great deal of effort has been made to estimate age from face images. It has been reported that age can be accurately estimated under controlled environment such as frontal faces, no expression, and static lighting conditions. However, it is not straightforward to achieve the same accuracy level in a real-world environment due to considerable variations in camera settings, facial poses, and illumination conditions. In this paper, we apply a recently proposed machine learning technique called covariate shift adaptation to alleviating lighting condition change between laboratory and practical environment. Through real-world age estimation experiments, we demonstrate the usefulness of our proposed method.
The Laplacian support vector machine (LSVM) is a semi-supervised framework that uses manifold regularization for learning from labeled and unlabeled data. However, the optimal kernel parameters of LSVM are difficult to obtain. In this paper, we propose a multi-kernel LSVM (MK-LSVM) method using multi-kernel learning formulations in combination with the LSVM. Our learning formulations assume that a set of base kernels are grouped, and employ l2 norm regularization for automatically seeking the optimal linear combination of base kernels. Experimental testing reveals that our method achieves better performance than the LSVM alone using synthetic data, the UCI Machine Learning Repository, and the Caltech database of Generic Object Classification.
Owing to the high expressiveness of regular expression, it is frequently used in searching and manipulation of text based data. Regular expression is highly applicable in processing Latin alphabet based text, but the same cannot be said for Hangeul*, the writing system for Korean language. Although Hangeul possesses alphabetic features within the script, expressiveness of regular expression pattern using Hangeul is hindered by the absence of syllable decomposition. Without decomposition support in regular expression, searching through Hangeul text is limited to string literal matching. Literal matching has made enumeration of syllable candidates in regular expression pattern definition indispensable, albeit impractical, especially for a large set of syllable candidates. Although the existing implementation of canonical decomposition in Unicode standard does reduce a pre-composed Hangeul syllable into smaller unit of consonant-vowel or consonant-vowel-consonant letters, it still leaves quite a number of the individual letters in compounded form. We have observed that there is a necessity to further reduce the compounded letters into unit of basic letters to properly represent the Korean script in regular expression. We look at how the new canonical decomposition technique proposed by Kim can help in handling Hangeul in regular expression. In this paper, we examine several of the performance indicators of full decomposition of Hangeul syllable to better understand the overhead that might incur, if a full decomposition were to be implemented in a regular expression engine. For efficiency considerations, we propose a semi decomposition technique alongside with a notation for defining Hangeul syllables. The semi decomposition functions as an enhancement to the existing regular expression syntax by taking in some of the special constructs and features of the Korean language. This proposed technique intends to allow an end user to have a greater freedom to define regular expression syntax for Hangeul.
Shigetoshi NAKATAKE Masahiro KAWAKITA Takao ITO Masahiro KOJIMA Michiko KOJIMA Kenji IZUMI Tadayuki HABASAKI
This paper presents a novel regularity evaluation of placement structure and techniques for handling conditional design rules along with dynamic diffusion sharing and well island generation, which are developed based on Sequence-Pair. The regular structures such as topological rows, arrays and repetitive structures are characterized by the way of forming sub-sequences of a sequence-pair. A placement objective is formulated balancing the regularity and the area efficiency. Furthermore, diffusion sharing and well island can be also identified looking into forming of a sequence-pair. In experiments, we applied our regularity-oriented placement mixed with the constraint-driven technique to real analog designs, and attained the results comparable to manual designs even when imposing symmetry constraints. Besides, the results also revealed the regularity serves to increase row-structures applicable to the diffusion sharing for area saving and wire-length reduction.
Yi TANG Junchen JIANG Xiaofei WANG Chengchen HU Bin LIU Zhijia CHEN
Multi-pattern matching is a key technique for implementing network security applications such as Network Intrusion Detection/Protection Systems (NIDS/NIPSes) where every packet is inspected against tens of thousands of predefined attack signatures written in regular expressions (regexes). To this end, Deterministic Finite Automaton (DFA) is widely used for multi-regex matching, but existing DFA-based researches have claimed high throughput at an expense of extremely high memory cost, so fail to be employed in devices such as high-speed routers and embedded systems where the available memory is quite limited. In this paper, we propose a parallel architecture of DFA called Parallel DFA (PDFA) taking advantage of the large amount of concurrent flows to increase the throughput with nearly no extra memory cost. The basic idea is to selectively store the underlying DFA in memory modules that can be accessed in parallel. To explore its potential parallelism we intensively study DFA-split schemes from both state and transition points in this paper. The performance of our approach in both the average cases and the worst cases is analyzed, optimized and evaluated by numerical results. The evaluation shows that we obtain an average speedup of 100 times compared with traditional DFA-based matching approach.
Kazuya UEKI Masashi SUGIYAMA Yasuyuki IHARA
We address the problem of perceived age estimation from face images, and propose a new semi-supervised approach involving two novel aspects. The first novelty is an efficient active learning strategy for reducing the cost of labeling face samples. Given a large number of unlabeled face samples, we reveal the cluster structure of the data and propose to label cluster-representative samples for covering as many clusters as possible. This simple sampling strategy allows us to boost the performance of a manifold-based semi-supervised learning method only with a relatively small number of labeled samples. The second contribution is to take the heterogeneous characteristics of human age perception into account. It is rare to misjudge the age of a 5-year-old child as 15 years old, but the age of a 35-year-old person is often misjudged as 45 years old. Thus, magnitude of the error is different depending on subjects' age. We carried out a large-scale questionnaire survey for quantifying human age perception characteristics, and propose to utilize the quantified characteristics in the framework of weighted regression. Consequently, our proposed method is expressed in the form of weighted least-squares with a manifold regularizer, which is scalable to massive datasets. Through real-world age estimation experiments, we demonstrate the usefulness of the proposed method.
Let V(φ) be a shift invariant subspace of L2(R) with a Riesz or frame generator φ(t). We take φ(t) suitably so that the regular sampling expansion : f(t) = f(n)S(t-n) holds on V(φ). We then find conditions on the generator φ(t) and various bounds of the perturbation {δ n }n∈Z under which an irregular sampling expansion: f(t) = f(n+ δn)Sn(t) holds on V(φ). Some illustrating examples are also provided.
Many learning machines that have hierarchical structure or hidden variables are now being used in information science, artificial intelligence, and bioinformatics. However, several learning machines used in such fields are not regular but singular statistical models, hence their generalization performance is still left unknown. To overcome these problems, in the previous papers, we proved new equations in statistical learning, by which we can estimate the Bayes generalization loss from the Bayes training loss and the functional variance, on the condition that the true distribution is a singularity contained in a learning machine. In this paper, we prove that the same equations hold even if a true distribution is not contained in a parametric model. Also we prove that, the proposed equations in a regular case are asymptotically equivalent to the Takeuchi information criterion. Therefore, the proposed equations are always applicable without any condition on the unknown true distribution.
Osama AHMED OMER Toshihisa TANAKA
This paper addresses problems appearing in restoration algorithms based on utilizing both Tikhonov and bilateral total variation (BTV) regularization. The former regularization assumes that prior information has Gaussian distribution which indeed fails at edges, while the later regularization highly depends on the selected bilateral filter's parameters. To overcome these problems, we propose a locally adaptive regularization. In the proposed algorithm, we use general directional regularization functions with adaptive weights. The adaptive weights are estimated from local patches based on the property of the partially restored image. Unlike Tikhonov regularization, it can avoid smoothness across edges by using adaptive weights. In addition, unlike BTV regularization, the proposed regularization function doesn't depend on parameters' selection. The convexity conditions as well as the convergence conditions are derived for the proposed algorithm.
Manabu HAGIWARA Marc P.C. FOSSORIER Takashi KITAGAWA Hideki IMAI
In this paper, we investigate the smallest value of p for which a (J,L,p)-QC LDPC code with girth 6 exists for J=3 and J=4. For J=3, we determine the smallest value of p for any L. For J=4, we determine the smallest value of p for L ≤ 301. Furthermore we provide examples of specific constructions meeting these smallest values of p.
Shuzhuang ZHANG Hao LUO Binxing FANG Xiaochun YUN
Scanning packet payload at a high speed has become a crucial task in modern network management due to its wide variety applications on network security and application-specific services. Traditionally, Deterministic finite automatons (DFAs) are used to perform this operation in linear time. However, the memory requirements of DFAs are prohibitively high for patterns used in practical packet scanning, especially when many patterns are compiled into a single DFA. Existing solutions for memory blow-up are making a trade-off between memory requirement and memory access of processing per input character. In this paper we proposed a novel method to drastically reduce the memory requirements of DFAs while still maintain the high matching speed and provide worst-case guarantees. We removed the duplicate transitions between states by dividing all the DFA states into a number of groups and making each group of states share a merged transition table. We also proposed an efficient algorithm for transition sharing between states. The high efficiency in time and space made our approach adapted to frequently updated DFAs. We performed several experiments on real world rule sets. Overall, for all rule sets and approach evaluated, our approach offers the best memory versus run-time trade-offs.
In this paper, we study d-primitive words and D(1)-concatenated words. First we show that neither D(1), the set of all d-primitive words, nor D(1)D(1), the set of all D(1)-concatenated words, is regular. Next we show that for u, v, w ∈Σ+ with |u|=|w|, uvw ∈ D(1) if and only if uv+w ⊆ D(1). It is also shown that every d-primitive word, with the length of two or more, is D(1)-concatenated.