This paper proposes a bit-split string matcher architecture for a memory-efficient hardware-based parallel pattern matching engine. In the proposed bit-split string matcher, multiple finite-state machine (FSM) tiles share match vectors to reduce the required number of stored match vectors. By decreasing the memory size for storing match vectors, the total memory requirement can be minimized.
Seehwan YOO Kuenhwan KWAK Jaehyun JO Chuck YOO
In this paper, we address latency issue in Xen-ARM virtual machines. Despite the advantages of virtualization in mobile systems, the current Xen-ARM is difficult to apply to mobile devices because it has unpredictable I/O latency. This paper analyzes the latency of incoming packet handling in Xen-ARM, and presents how virtualization affects the latency in detail. To make the latency predictable, firstly, we modify Xen-ARM scheduler so that the driver domain can be promptly scheduled by the hypervisor. Secondly, we introduce additional paravirtualization of guest OS that minimizes non-preemptible code path. With our enhancements, 99% of incoming packets are predictably handled within one millisecond at the destined guest OS, which is a feasible time bound for most soft real-time applications.
Ryo NAGATA Kotaro FUNAKOSHI Tatsuya KITAMURA Mikio NAKANO
To acquire a second language, one must develop an ear and tongue for the correct stress and intonation patterns of that language. In English language teaching, there is an effective method called Jazz Chants for working on the sound system. In this paper, we propose a method for predicting stressed words, which play a crucial role in Jazz Chants. The proposed method is specially designed for stress prediction in Jazz chants. It exploits several sources of information including words, POSs, sentence types, and the constraint on the number of stressed words in a chant text. Experiments show that the proposed method achieves an F-measure of 0.939 and outperforms the other methods implemented for comparison. The proposed method is expected to be useful in supporting non-native teachers of English when they teach chants to students and create chant texts with stress marks from arbitrary texts.
Cong LI Yasunori IWANAMI Ryota YAMADA Naoki OKAMOTO
In this paper, we focus on the cancellation of interference among Destination Users (DU's) and the improvement of achievable sum rate of the nonregenerative multiuser Multiple-Input Multiple-Output (MIMO) relay downlink system. A novel design method of transmit weight is proposed to successively eliminate the interference among DU's, each of which is equipped with multiple receive antennas. We firstly investigate the transmit weight design for the Amplify-and-Forward (AF) relay scheme where the Relay Station (RS) just retransmits the received signals from Base Station (BS), then extend it to the joint design scheme of transmit weights at the both BS and RS. In the proposed joint design scheme, through the comparison of lower bound of achievable rate, an effective DU selection algorithm is proposed to generate the transmit weight at the RS and obtain the multiuser diversity. Dirty Paper Coding (DPC) technique is employed to remove the interference among DU's and ensures the achievable rate of downlink. Theoretical derivation and simulation results demonstrate the effectiveness of the proposed scheme in obtaining the achievable rate performance and BER characteristics.
Cheol-Ho HONG Young-Pil KIM Seehwan YOO Chi-Young LEE Chuck YOO
Facing practical limits to increasing processor frequencies, manufacturers have resorted to multi-core designs in their commercial products. In multi-core implementations, cores in a physical package share the last-level caches to improve inter-core communication. To efficiently exploit this facility, operating systems must employ cache-aware schedulers. Unfortunately, virtualization software, which is a foundation technology of cloud computing, is not yet cache-aware or does not fully exploit the locality of the last-level caches. In this paper, we propose a cache-aware virtual machine scheduler for multi-core architectures. The proposed scheduler exploits the locality of the last-level caches to improve the performance of concurrent applications running on virtual machines. For this purpose, we provide a space-partitioning algorithm that migrates and clusters communicating virtual CPUs (VCPUs) in the same cache domain. Second, we provide a time-partitioning algorithm that co-schedules or schedules in sequence clustered VCPUs. Finally, we present a theoretical analysis that proves our scheduling algorithm is more efficient in supporting concurrent applications than the default credit scheduler in Xen. We implemented our virtual machine scheduler in the recent Xen hypervisor with para-virtualized Linux-based operating systems. We show that our approach can improve performance of concurrent virtual machines by up to 19% compared to the credit scheduler.
Xue GAO Jinzhi GUO Lianwen JIN
Linear Discriminant Analysis (LDA) is one of the most popular dimensionality reduction techniques in existing handwritten Chinese character (HCC) recognition systems. However, when used for unconstrained handwritten Chinese character recognition, the traditional LDA algorithm is prone to two problems, namely, the class separation problem and multimodal sample distributions. To deal with these problems,we propose a new locally linear discriminant analysis (LLDA) method for handwritten Chinese character recognition.Our algorithm operates as follows. (1) Using the clustering algorithm, find clusters for the samples of each class. (2) Find the nearest neighboring clusters from the remaining classes for each cluster of one class. Then, use the corresponding cluster means to compute the between-class scatter matrix in LDA while keeping the within-class scatter matrix unchanged. (3) Finally, apply feature vector normalization to further improve the class separation problem. A series of experiments on both the HCL2000 and CASIA Chinese character handwriting databases show that our method can effectively improve recognition performance, with a reduction in error rate of 28.7% (HCL2000) and 16.7% (CASIA) compared with the traditional LDA method.Our algorithm also outperforms DLA (Discriminative Locality Alignment,one of the representative manifold learning-based dimensionality reduction algorithms proposed recently). Large-set handwritten Chinese character recognition experiments also verified the effectiveness of our proposed approach.
We have previously proposed an indoor monitoring and security system with an array sensor. The array sensor has some advantages, such as low privacy concern, easy installation with low cost, and wide detection range. Our study is different from the previously proposed classification method for array sensor, which uses a threshold to classify only two states for intrusion detection: nothing and something happening. This paper describes a novel state classification method based on array signal processing with a machine learning algorithm. The proposed method uses eigenvector and eigenvalue spanning the signal subspace as features, obtained from the array sensor, and assisted by multiclass support vector machines (SVMs) to classify various states of a human being or an object. The experimental results show that our proposed method can provide high classification accuracy and robustness, which is very useful for monitoring and surveillance applications.
Chunyan LIANG Lin YANG Qingwei ZHAO Yonghong YAN
In this letter, we adopt a new factor analysis of neighborhood-preserving embedding (NPE) for speaker verification. NPE aims at preserving the local neighborhood structure on the data and defines a low-dimensional speaker space called neighborhood-preserving embedding space. We compare the proposed method with the state-of-the-art total variability approach on the telephone-telephone core condition of the NIST 2008 Speaker Recognition Evaluation (SRE) dataset. The experimental results indicate that the proposed NPE method outperforms the total variability approach, providing up to 24% relative improvement.
Raouf SENHADJI-NAVARRO Ignacio GARCIA-VARGAS
This letter proposes a new model of state machine called Finite Virtual State Machine (FVSM). A memory-based architecture and a procedure for generating FVSM implementations from Finite State Machines (FSMs) are presented. FVSM implementations provide advantages in speed over conventional RAM-based FSM implementations. The results of experiments prove the feasibility of this approach.
Junghwan KIM Minkyu PARK Sangchul HAN Jinsoo KIM
Prefix caching improves the performance of IP lookup by exploiting spatial and temporal locality of IP references. However, it cannot cache non-leaf prefixes, so several prefix expansion schemes have been proposed to handle those prefixes. Such schemes have some drawbacks to incur modification of routing table or severe miss penalty. We propose an efficient prefix expansion scheme which achieves good performance without additional burden to lookup scheme. In the proposed scheme a non-leaf prefix is expanded to the length of the longest immediate descendant prefix when it is cached. Evaluation result shows our scheme achieves very low miss ratio even though it does not increase the size of routing table and cache miss penalty.
In this letter, distributed source coding with one distortion criterion and correlated messages is considered. This problem can be regarded as “Berger-Yeung problem with correlated messages”. It corresponds to the source coding part of the graph-based framework for transmission of a pair of correlated sources over the multiple-access channel where one is lossless and the other is lossy. As a result, the achievable rate-distortion region for this problem is provided. A rigorous proof of both achievability and converse part is also given.
Jianping WU Ming LING Yang ZHANG Chen MEI Huan WANG
This paper proposes a novel dynamic Scratch-pad Memory allocation strategy to optimize the energy consumption of the memory sub-system. Firstly, the whole program execution process is sliced into several time slots according to the temporal dimension; thereafter, a Time-Slotted Cache Conflict Graph (TSCCG) is introduced to model the behavior of Data Cache (D-Cache) conflicts within each time slot. Then, Integer Nonlinear Programming (INP) is implemented, which can avoid time-consuming linearization process, to select the most profitable data pages. Virtual Memory System (VMS) is adopted to remap those data pages, which will cause severe Cache conflicts within a time slot, to SPM. In order to minimize the swapping overhead of dynamic SPM allocation, a novel SPM controller with a tightly coupled DMA is introduced to issue the swapping operations without CPU's intervention. Last but not the least, this paper discusses the fluctuation of system energy profit based on different MMU page size as well as the Time Slot duration quantitatively. According to our design space exploration, the proposed method can optimize all of the data segments, including global data, heap and stack data in general, and reduce the total energy consumption by 27.28% on average, up to 55.22% with a marginal performance promotion. And comparing to the conventional static CCG (Cache Conflicts Graph), our approach can obtain 24.7% energy profit on average, up to 30.5% with a sight boost in performance.
Hansjorg HOFMANN Sakriani SAKTI Chiori HORI Hideki KASHIOKA Satoshi NAKAMURA Wolfgang MINKER
The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to multiple pronunciation variants in the utterances. Previous approaches address this problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence have not yet been considered. In this article, the sequence-based pronunciation variation is modeled using a noisy channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy channel approach will map from the phoneme to the word level. Two well-known natural language processing approaches are adopted and derived from the noisy channel model theory: Joint-sequence models and statistical machine translation. Both of them are applied and various experiments are conducted using microphone and telephone of spontaneous speech.
Guangchun LUO Jinsheng REN Ke QIN
A new training algorithm for the chaotic Adachi Neural Network (AdNN) is investigated. The classical training algorithm for the AdNN and it's variants is usually a “one-shot” learning, for example, the Outer Product Rule (OPR) is the most used. Although the OPR is effective for conventional neural networks, its effectiveness and adequateness for Chaotic Neural Networks (CNNs) have not been discussed formally. As a complementary and tentative work in this field, we modified the AdNN's weights by enforcing an unsupervised Hebbian rule. Experimental analysis shows that the new weighted AdNN yields even stronger dynamical associative memory and pattern recognition phenomena for different settings than the primitive AdNN.
Ning JIANG Jiu XU Satoshi GOTO
In recent years, local pattern based features have attracted increasing interest in object detection and recognition systems. Local Binary Pattern (LBP) feature is widely used in texture classification and face detection. But the original definition of LBP is not suitable for human detection. In this paper, we propose a novel feature named gradient local binary patterns (GLBP) for human detection. In this feature, original 256 local binary patterns are reduced to 56 patterns. These 56 patterns named uniform patterns are used for generating a 56-bin histogram. And gradient value of each pixel is set as the weight which is always same in LBP based features in histogram calculation to computing the values in 56 bins for histogram. Experiments are performed on INRIA dataset, which shows the proposal GLBP feature is discriminative than histogram of orientated gradient (HOG), Semantic Local Binary Patterns (S-LBP) and histogram of template (HOT). In our experiments, the window size is fixed. That means the performance can be improved by boosting methods. And the computation of GLBP feature is parallel, which make it easy for hardware acceleration. These factors make GLBP feature possible for real-time pedestrian detection.
An asymmetric classifier based on kernel partial least squares is proposed for software defect prediction. This method improves the prediction performance on imbalanced data sets. The experimental results validate its effectiveness.
In this paper, a dual-band bandpass filter (BPF) of multilayer suspended stripline (SSL) structure and an SSL diplexer composed of a low-pass filter (LPF) and a high-pass filter (HPF) are proposed. Bandstop structure creating transmission zeros is adopted in the BPF and diplexer, enhancing the signal selectivity of the former and increasing the isolation between the diverting ports of the latter. The dual-band BPF possesses two distinct bandpass structures and a bandstop circuit, all laid on different metallic layers. The metallic layers together with the supporting substrates are vertically stacked up to save the circuit dimension. The LPF and HPF used in the diplexer structure are designed by a quasi-lumped approach, which the LC lumped-elements circuit models are developed to analyze filters' characteristics and to emulate their frequency responses. Half-wavelength resonating slots are employed in the diplexer's structure to increase the isolation between its two signal diverting ports. Experiments are conducted to verify the multilayer dual-band BPF and the diplexer design. Agreements are observed between the simulation and the measurement.
Young-Sik EOM Jong Wook KWAK Seong Tae JHANG Chu Shik JHON
In Chip Multi-Processors (CMPs), private L2 caches have potential benefits in future CMPs, e.g. small access latency, performance isolation, tile-friendly architecture and simple low bandwidth on-chip interconnect. But the major weakness of private cache is the higher cache miss rate caused by small private cache capacity. To deal with this problem, private caches can share capacity through spilling replaced blocks to other private caches. However, indiscriminate spilling can make capacity problem worse and influence performance negatively. This letter proposes throttling capacity sharing (TCS) for effective capacity sharing in private L2 caches. TCS determines whether to spill a replaced block by predicting reuse possibility, based on life time and reuse time. In our performance evaluation, TCS improves weighted speedup by 48.79%, 6.37% and 5.44% compared to non-spilling, Cooperative Caching with best spill probability (CC) and Dynamic Spill-Receive (DSR), respectively.
An active learning method, called Two-stage Active learning algorithm (TAL), is developed for software defect prediction. Combining the clustering and support vector machine techniques, this method improves the performance of the predictor with less labeling effort. Experiments validate its effectiveness.
Yuanqiang HUANG Zhongzhi LUAN Depei QIAN Zhigao DU Ting CHEN Yuebin BAI
With the consideration of real-time stream processing technology, it's important to develop high availability mechanism to guarantee stream-based application not interfered by faults caused by potential anomalies. In this paper, we present a novel online prediction technique for predicting some anomalies which may occur in the near future. Concretely, we first present a value prediction which combines the Hidden Markov Model and the Mixture of Expert Model to predict the values of feature metrics in the near future. Then we employ the Support Vector Machine to do anomaly identification, which is a procedure to identify the kind of anomaly that we are about to alarm. The purpose of our approach is to achieve a tradeoff between fault penalty and resource cost. The experiment results show that our approach is of high accuracy for common anomaly prediction and low runtime overhead.