IEICE global.ieice.org Site

Keyword Search Result

[Keyword] ACH(1072hit)

781-800hit(1072hit)

Dynamic Code Repositioning for Java
Shinji TANAKA Tetsuyasu YAMADA Satoshi SHIRAISHI

PAPER-Software Support and Optimization Techniques

Vol:
E87-D No:7
Page(s):
1737-1742
The sizes of recent Java-based server-side applications, like J2EE containers, have been increasing continuously. Past techniques for improving the performance of Java applications have targeted relatively small applications. Moreover, when the methods of these small target applications are invoked, they are not usually distributed over the entire memory space. As a result, these techniques cannot be applied efficiently to improve the performance of current large applications. We propose a dynamic code repositioning approach to improve the hit rates of instruction caches and translation look-aside buffers. Profiles of method invocations are collected when the application performs with its heaviest processor load, and the code is repositioned based on these profiles. We also discuss a method-splitting technique to significantly reduce the sizes of methods. Our evaluation of a prototype implementing these techniques indicated 5% improvement in the throughput of the application.
A Local Learning Framework Based on Multiple Local Classifiers
BaekSop KIM HyeJeong SONG JongDae KIM

LETTER-Pattern Recognition

Vol:
E87-D No:7
Page(s):
1971-1973
This paper presents a local learning framework in which the local classifiers can be pre-learned and the support size of each classifier can be selected to minimize the error bound. The proposed algorithm is compared with the conventional support vector machine (SVM). Experimental results show that our scheme using the user-defined parameters C and σ is more accurate and less sensitive than the conventional SVM.
VLaTTe: A Java Just-in-Time Compiler for VLIW with Fast Scheduling and Register Allocation
Suhyun KIM Soo-Mook MOON Kemal EBCIOLU Erik ALTMAN

PAPER-Software Support and Optimization Techniques

Vol:
E87-D No:7
Page(s):
1712-1720
For network computing on desktop machines, fast execution of Java bytecode programs is essential because these machines are expected to run substantial application programs written in Java. We believe higher Java performance can be achieved by exploiting instruction-level parallelism (ILP) in the context of Java JIT compilation. This paper introduces VLaTTe, a Java JIT compiler for VLIW machines that performs efficient scheduling while doing fast register allocation. It is an extended version of our previous JIT compiler for RISC machines called LaTTe whose translation overhead is low (i.e., consistently taking one or two seconds for SPECJVM98 benchmarks) due to its fast register allocation. VLaTTe adds the scheduling capability onto the same framework of register allocation, with a constraint for precise in-order exception handling which guarantees the same Java exception behavior with the original bytecode program. Our experimental results on the SPECJVM98 benchmarks show that VLaTTe achieves a geometric mean of useful IPC 1.7 (2-ALU), 2.1 (4-ALU), and 2.3 (8-ALU), while the scheduling/allocation overhead is 3.6 times longer than LaTTe's on average, which appears to be reasonable.
A Proposal of Effective Cooperative Caching System Based on Random Access Assumption
Mitsuru ISHII Shimmi HATTORI

LETTER-Network

Vol:
E87-B No:6
Page(s):
1741-1745
In this letter, we propose an effective cooperative caching system under the assumption that each web object is accessed randomly. Under this assumption, the access frequency per unit time is given by Poisson distribution and the probability distribution of the web object in the future is derived. Based on this probability distribution, one can obtain the criterion to allocate the web objects with more access expected to the cache servers closer to clients. It is also shown that there is a tradeoff between the precision to allocate objects and the efficiency of caching.
ILP-Based Program Path Analysis for Bounding Worst-Case Inter-Task Cache Conflicts
Hiroyuki TOMIYAMA Nikil DUTT

LETTER-System Programs

Vol:
E87-D No:6
Page(s):
1582-1587
The unpredictable behavior of cache memory makes it difficult to statically analyze the worst-case performance of real-time systems. This problem is further exacerbated in the case of preemptive multitask systems because of inter-task cache interference, called Cache-Related Preemption Delay (CRPD). This paper proposes an approach to analyzing the tight upper bound on CRPD which a task might impose on lower-priority tasks. Our method finds the program execution path which requires the maximum number of cache blocks using an integer linear programming technique. Experimental results show that our approach provides up to 69% tighter bounds on CRPD than a conservative approach.
Genetic State Reduction Method of Incompletely Specified Machines
Masaki HASHIZUME Teruyoshi MATSUSHIMA Takashi SHIMAMOTO Hiroyuki YOTSUYANAGI Takeomi TAMESADA Akio SAKAMOTO

PAPER-Graphs and Networks

Vol:
E87-A No:6
Page(s):
1555-1563
A new state reduction method of incompletely specified sequential machines is proposed in this paper. The method is based on a genetic algorithm implementing a dormant mechanism. MCNC benchmark machines are simplified by using this method to evaluate the method. The experimental results show that machines of almost the same number of states as the minimum ones can be derived by this method.
Total Margin Algorithms in Support Vector Machines
Min YOON Yeboon YUN Hirotaka NAKAYAMA

PAPER-Pattern Recognition

Vol:
E87-D No:5
Page(s):
1223-1230
Support vector algorithms try to maximize the shortest distance between sample points and discrimination hyperplane. This paper suggests the total margin algorithms which consider the distance between all data points and the separating hyperplane. The method extends and modifies the existing algorithms. Experimental studies show that the total margin algorithms provide good performance comparing with the existing support vector algorithms.
A New Learning Algorithm for the Hierarchical Structure Learning Automata Operating in the General Multiteacher Environment
Norio BABA Yoshio MOGAMI

PAPER-Automata and Formal Language Theory

Vol:
E87-D No:5
Page(s):
1208-1213
Learning behaviors of hierarchically structured stochastic automata operating in a general nonstationary multiteacher environment are considered. It is shown that convergence with probability 1 to the optimal path is ensured by a new learning algorithm which is an extended form of the relative reward strength algorithm. Several computer simulation results confirm the effectiveness of the proposed algorithm.
Non-closure Property of One-Pebble Turing Machines with Sublogarithmic Space
Atsuyuki INOUE Akira ITO Katsushi INOUE

LETTER

Vol:
E87-A No:5
Page(s):
1185-1188
This paper investigates closure properties of one-pebble Turing machines with sublogarithmic space. It shows that for any function log log n L(n) = o(log n), neither of the classes of languages accepted by L(n) space-bounded deterministic and self-verifying nondeterministic one-pebble Turing machines is closed under concatenation, Kleene closure, and length-preserving homomorphism.
Sounds of Speech Based Spoken Document Categorization: A Subword Representation Method
Weidong QU Katsuhiko SHIRAI

PAPER

Vol:
E87-D No:5
Page(s):
1175-1184
In this paper, we explore a method to the problem of spoken document categorization, which is the task of automatically assigning spoken documents into a set of predetermined categories. To categorize spoken documents, subword unit representations are used as an alternative to word units generated by either keyword spotting or large vocabulary continuous speech recognition (LVCSR). An advantage of using subword acoustic unit representations to spoken document categorization is that it does not require prior knowledge about the contents of the spoken documents and addresses the out of vocabulary (OOV) problem. Moreover, this method works in reliance on the sounds of speech rather than exact orthography. The use of subword units instead of words allows approximate matching on inaccurate transcriptions, makes "sounds-like" spoken document categorization possible. We also explore the performance of our method when the training set contains both perfect and errorful phonetic transcriptions, and hope the classifiers can learn from the confusion characteristics of recognizer and pronunciation variants of words to improve the robustness of whole system. Our experiments based on both artificial and real corrupted data sets show that the proposed method is more effective and robust than the word based method.
One-Pass Semi-Dynamic Network Decoding Using a Subnetwork Caching Model for Large Vocabulary Continuous Speech Recongnition
Dong-Hoon AHN Minhwa CHUNG

PAPER

Vol:
E87-D No:5
Page(s):
1164-1174
This paper presents a new decoding framework for large vocabulary continuous speech recognition that can handle a static search network dynamically. Generally, a static network decoder can use a search space that is globally optimized in advance, and therefore it can run at high speed during decoding. However, its large memory requirement due to the large network size or the spatial complexity of the optimization algorithm often makes it impractical. Our new one-pass semi-dynamic network decoding scheme aims at incorporating such an optimized search network with memory efficiency, but without losing speed. In this framework, a complete search network is organized on the basis of self-structuring subnetworks and is nearly minimized using a modified tail-sharing algorithm. While the decoder runs, it caches subnetworks needed for decoding in memory, whereas static network decoders keep the complete network in memory. The subnetwork caching model is controlled by two levels of caches: local cache obtained by subnetwork caching operations and global cache obtained by subnetwork preloading operations. The model can also be controlled adaptively by using subnetwork profiling operations. Furthermore, it is made simple and fast with compactly designed self-structuring subnetworks. Experimental results on a 25 k-word Korean broadcast news transcription task show that the semi-dynamic decoder can run almost as fast as an equivalent static network decoder under various memory configurations by using the subnetwork caching model.
A Novel Static Prediction Scheme for Filter Cache Structures
Kugan VIVEKANANDARAJAH Thambipillai SRIKANTHAN Christopher T. CLARKE Saurav BHATTACHARYYA

PAPER

Vol:
E87-C No:4
Page(s):
543-548
Energy dissipation in cache memories is becoming a major design issue for embedded microprocessors. Predictive filter cache based instruction cache hierarchy has been shown to effectively reduce the energy-delay product. In this paper, a simplified pattern prediction algorithm is proposed for the filter cache hierarchy. The prediction scheme relies on the static nature of the hit or miss pattern of the instruction access streams. The static patterns are maintained in a small 32x1-bit wide Static Pattern Table (SPT). Our investigations show that the proposed prediction algorithm is superior to that based on Next Fetch Prediction Table (NFPT) for all the benchmarks simulated. With the proposed approach, energy delay product reduction of up to 6.79% was evident when compared with that using NFPT. Moreover, since the prediction scheme is based on the static assignment of patterns, it lends well for area and power efficient implementation than that employs dynamic pattern prediction although it is marginally inferior (i.e. 0.69%) in term of energy delay product.
Selective-Sets Resizable Cache Memory Design for High-Performance and Low-Power CPU Core
Takashi KURAFUJI Yasunobu NAKASE Hidehiro TAKATA Yukinaga IMAMURA Rei AKIYAMA Tadao YAMANAKA Atsushi IWABU Shutarou YASUDA Toshitsugu MIWA Yasuhiro NUNOMURA Niichi ITOH Tetsuya KAGEMOTO Nobuharu YOSHIOKA Takeshi SHIBAGAKI Hiroyuki KONDO Masayuki KOYAMA Takahiko ARAKAWA Shuhei IWADE

PAPER

Vol:
E87-C No:4
Page(s):
535-542
We apply a selective-sets resizable cache and a complete hierarchy SRAM for the high-performance and low-power RISC CPU core. The selective-sets resizable cache can change the cache memory size by varying the number of cache sets. It reduces the leakage current by 23% with slight degradation of the worst case operating speed from 213 MHz to 210 MHz. The complete hierarchy SRAM enables the partial swing operation not only in the bit lines, but also in the global signal lines. It reduces the current consumption of the memory by 4.6%, and attains the high-speed access of 1.4 ns in the typical case.
A 100 MHz 7.84 mm² 31.7 msec 439 mW 512-Point 2-Dimensional FFT Single-Chip Processor
Naoto MIYAMOTO Leo KARNAN Kazuyuki MARUO Koji KOTANI Tadahiro OHMI

PAPER

Vol:
E87-C No:4
Page(s):
502-509
A single-chip 512-point FFT processor is presented. This processor is based on the cached-memory architecture (CMA) with the resource-saving multi-datapath radix-23 computation element. The 2-stage CMA, including a pair of single-port SRAMs, is also introduced to speedup the execution time of the 2-dimensional FFTs. Using the above techniques, we have designed an FFT processor core which integrates 552,000 transistors within an area of 2.82.8 mm2 with CMOS 0.35 µm triple-layer-metal process. This processor can execute a 512-point, 36-bit-complex fixed-point data format, 1-dimensonal FFT in 23.2 µsec and a 2-dimensional one in only 23.8 msec at 133 MHz operation. The power consumption of this processor is 439.6 mW at 3.3 V, 100 MHz operation.
Design and Evaluation of a High Speed Routing Lookup Architecture
Jun ZHANG JeoungChill SHIM Hiroyuki KURINO Mitsumasa KOYANAGI

PAPER-Implementation and Operation

Vol:
E87-B No:3
Page(s):
406-412
The IP routing lookup problem is equivalent to finding the longest prefix of a packet's destination address in a routing table. It is a challenging problem to design a high performance IP routing lookup architecture, because of increasing traffic, higher link speed, frequent updates and increasing routing table size. At first, increasing traffic and higher link speed require that the IP routing can be executed at wire speed. Secondly, frequent routing table updates require that the insertion and deletion operations should be simple and low delay. At last, increasing routing table size hopes that less memory is used in order to reduce cost. Although many schemes to achieve fast lookup exist, less attention is paid on the latter two factors. This paper proposed a novel pipelined IP routing lookup architecture using selective binary search on hash table organized by prefix lengths. The evaluation results show that it can perform IP lookup operations at a maximum rate of one lookup per cycle. The hash operation ratio for one lookup can be reduced to about 1%, less than two hash operations are needed for one table update and only 512 kbytes SRAM is needed for a routing table with about 43000 prefixes. It proves to have higher performance than the existing schemes.
Anisotropic Bending Machine Using Conducting Polypyrrole
Mitsuyoshi ONODA Kazuya TADA

PAPER-Nano-interface Controlled Electronic Devices

Vol:
E87-C No:2
Page(s):
128-135
Recent new technologies of electro-mechanical conversion devices have been reviewed. Especially, the electrochemical properties of anisotropic actuators using polypyrrole have been reviewed in detailed and the realization of the bimorph (or bending beam) structure without artificial adhesive agent is introduced.
Two Step POS Selection for SVM Based Text Categorization
Takeshi MASUYAMA Hiroshi NAKAGAWA

PAPER

Vol:
E87-D No:2
Page(s):
373-379
Although many researchers have verified the superiority of Support Vector Machine (SVM) on text categorization tasks, some recent papers have reported much lower performance of SVM based text categorization methods when focusing on all types of parts of speech (POS) as input words and treating large numbers of training documents. This was caused by the overfitting problem that SVM sometimes selected unsuitable support vectors for each category in the training set. To avoid the overfitting problem, we propose a two step text categorization method with a variable cascaded feature selection (VCFS) using SVM. VCFS method selects a pair of the best number of words and the best POS combination for each category at each step of the cascade. We made use of the difference of words with the highest mutual information for each category on each POS combination. Through the experiments, we confirmed the validation of VCFS method compared with other SVM based text categorization methods, since our results showed that the macro-averaged F1 measure (64.8%) of VCFS method was significantly better than any reported F1 measures, though the micro-averaged F1 measure (85.4%) of VCFS method was similar to them.
Software Implementation of a Secure Socket Layer (SSL) Accelerator Based on Kernel Thread
Euiseok NAHM Byungjo MIN Jinbae PARK Hagbae KIM

LETTER-Software Engineering

Vol:
E87-D No:1
Page(s):
244-245
We implement an efficient Secure Socket Layer (SSL) accelerator, which is embedded in the kernel level and utilizes kernel threads as the same number of CPUs. In comparison with the conventional Apache with/without our SSL accelerator, the SSL accelerator significantly improves the web-server performance by up to 200%.
A Cache Replacement Policy for Transcoding Proxy Servers
Kai-Hau YEUNG Chun-Cheong WONG Kin-Yeung WONG Suk-Yu HUI

LETTER-Multimedia Systems

Vol:
E87-B No:1
Page(s):
209-211
A cache replacement policy which takes the transcoding time into account in making replacement decisions, for the emerging transcoding proxy servers is proposed. Simulation results show the proposed policy outperforms the conventional LRU in both the cache hit rate and the average object transcoding time.
Sequential Fusion of Output Coding Methods and Its Application to Face Recognition
Jaepil KO Hyeran BYUN

PAPER-Face

Vol:
E87-D No:1
Page(s):
121-128
In face recognition, simple classifiers are frequently used. For a robust system, it is common to construct a multi-class classifier by combining the outputs of several binary classifiers; this is called output coding method. The two basic output coding methods for this purpose are known as OnePerClass (OPC) and PairWise Coupling (PWC). The performance of output coding methods depends on accuracy of base dichotomizers. Support Vector Machine (SVM) is suitable for this purpose. In this paper, we review output coding methods and introduce a new sequential fusion method using SVM as a base classifier based on OPC and PWC according to their properties. In the experiments, we compare our proposed method with others. The experimental results show that our proposed method can improve the performance significantly on the real dataset.

781-800hit(1072hit)

Keyword Search Result

[Keyword] ACH(1072hit)

Dynamic Code Repositioning for Java

A Local Learning Framework Based on Multiple Local Classifiers

VLaTTe: A Java Just-in-Time Compiler for VLIW with Fast Scheduling and Register Allocation

A Proposal of Effective Cooperative Caching System Based on Random Access Assumption

ILP-Based Program Path Analysis for Bounding Worst-Case Inter-Task Cache Conflicts

Genetic State Reduction Method of Incompletely Specified Machines

Total Margin Algorithms in Support Vector Machines

A New Learning Algorithm for the Hierarchical Structure Learning Automata Operating in the General Multiteacher Environment

Non-closure Property of One-Pebble Turing Machines with Sublogarithmic Space

Sounds of Speech Based Spoken Document Categorization: A Subword Representation Method

One-Pass Semi-Dynamic Network Decoding Using a Subnetwork Caching Model for Large Vocabulary Continuous Speech Recongnition

A Novel Static Prediction Scheme for Filter Cache Structures

Selective-Sets Resizable Cache Memory Design for High-Performance and Low-Power CPU Core

A 100 MHz 7.84 mm² 31.7 msec 439 mW 512-Point 2-Dimensional FFT Single-Chip Processor

Design and Evaluation of a High Speed Routing Lookup Architecture

Anisotropic Bending Machine Using Conducting Polypyrrole

Two Step POS Selection for SVM Based Text Categorization

Software Implementation of a Secure Socket Layer (SSL) Accelerator Based on Kernel Thread

A Cache Replacement Policy for Transcoding Proxy Servers

Sequential Fusion of Output Coding Methods and Its Application to Face Recognition

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles