The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] ACH(1072hit)

781-800hit(1072hit)

  • Dynamic Code Repositioning for Java

    Shinji TANAKA  Tetsuyasu YAMADA  Satoshi SHIRAISHI  

     
    PAPER-Software Support and Optimization Techniques

      Vol:
    E87-D No:7
      Page(s):
    1737-1742

    The sizes of recent Java-based server-side applications, like J2EE containers, have been increasing continuously. Past techniques for improving the performance of Java applications have targeted relatively small applications. Moreover, when the methods of these small target applications are invoked, they are not usually distributed over the entire memory space. As a result, these techniques cannot be applied efficiently to improve the performance of current large applications. We propose a dynamic code repositioning approach to improve the hit rates of instruction caches and translation look-aside buffers. Profiles of method invocations are collected when the application performs with its heaviest processor load, and the code is repositioned based on these profiles. We also discuss a method-splitting technique to significantly reduce the sizes of methods. Our evaluation of a prototype implementing these techniques indicated 5% improvement in the throughput of the application.

  • A Local Learning Framework Based on Multiple Local Classifiers

    BaekSop KIM  HyeJeong SONG  JongDae KIM  

     
    LETTER-Pattern Recognition

      Vol:
    E87-D No:7
      Page(s):
    1971-1973

    This paper presents a local learning framework in which the local classifiers can be pre-learned and the support size of each classifier can be selected to minimize the error bound. The proposed algorithm is compared with the conventional support vector machine (SVM). Experimental results show that our scheme using the user-defined parameters C and σ is more accurate and less sensitive than the conventional SVM.

  • VLaTTe: A Java Just-in-Time Compiler for VLIW with Fast Scheduling and Register Allocation

    Suhyun KIM  Soo-Mook MOON  Kemal EBCIOLU  Erik ALTMAN  

     
    PAPER-Software Support and Optimization Techniques

      Vol:
    E87-D No:7
      Page(s):
    1712-1720

    For network computing on desktop machines, fast execution of Java bytecode programs is essential because these machines are expected to run substantial application programs written in Java. We believe higher Java performance can be achieved by exploiting instruction-level parallelism (ILP) in the context of Java JIT compilation. This paper introduces VLaTTe, a Java JIT compiler for VLIW machines that performs efficient scheduling while doing fast register allocation. It is an extended version of our previous JIT compiler for RISC machines called LaTTe whose translation overhead is low (i.e., consistently taking one or two seconds for SPECJVM98 benchmarks) due to its fast register allocation. VLaTTe adds the scheduling capability onto the same framework of register allocation, with a constraint for precise in-order exception handling which guarantees the same Java exception behavior with the original bytecode program. Our experimental results on the SPECJVM98 benchmarks show that VLaTTe achieves a geometric mean of useful IPC 1.7 (2-ALU), 2.1 (4-ALU), and 2.3 (8-ALU), while the scheduling/allocation overhead is 3.6 times longer than LaTTe's on average, which appears to be reasonable.

  • A Proposal of Effective Cooperative Caching System Based on Random Access Assumption

    Mitsuru ISHII  Shimmi HATTORI  

     
    LETTER-Network

      Vol:
    E87-B No:6
      Page(s):
    1741-1745

    In this letter, we propose an effective cooperative caching system under the assumption that each web object is accessed randomly. Under this assumption, the access frequency per unit time is given by Poisson distribution and the probability distribution of the web object in the future is derived. Based on this probability distribution, one can obtain the criterion to allocate the web objects with more access expected to the cache servers closer to clients. It is also shown that there is a tradeoff between the precision to allocate objects and the efficiency of caching.

  • ILP-Based Program Path Analysis for Bounding Worst-Case Inter-Task Cache Conflicts

    Hiroyuki TOMIYAMA  Nikil DUTT  

     
    LETTER-System Programs

      Vol:
    E87-D No:6
      Page(s):
    1582-1587

    The unpredictable behavior of cache memory makes it difficult to statically analyze the worst-case performance of real-time systems. This problem is further exacerbated in the case of preemptive multitask systems because of inter-task cache interference, called Cache-Related Preemption Delay (CRPD). This paper proposes an approach to analyzing the tight upper bound on CRPD which a task might impose on lower-priority tasks. Our method finds the program execution path which requires the maximum number of cache blocks using an integer linear programming technique. Experimental results show that our approach provides up to 69% tighter bounds on CRPD than a conservative approach.

  • Genetic State Reduction Method of Incompletely Specified Machines

    Masaki HASHIZUME  Teruyoshi MATSUSHIMA  Takashi SHIMAMOTO  Hiroyuki YOTSUYANAGI  Takeomi TAMESADA  Akio SAKAMOTO  

     
    PAPER-Graphs and Networks

      Vol:
    E87-A No:6
      Page(s):
    1555-1563

    A new state reduction method of incompletely specified sequential machines is proposed in this paper. The method is based on a genetic algorithm implementing a dormant mechanism. MCNC benchmark machines are simplified by using this method to evaluate the method. The experimental results show that machines of almost the same number of states as the minimum ones can be derived by this method.

  • Total Margin Algorithms in Support Vector Machines

    Min YOON  Yeboon YUN  Hirotaka NAKAYAMA  

     
    PAPER-Pattern Recognition

      Vol:
    E87-D No:5
      Page(s):
    1223-1230

    Support vector algorithms try to maximize the shortest distance between sample points and discrimination hyperplane. This paper suggests the total margin algorithms which consider the distance between all data points and the separating hyperplane. The method extends and modifies the existing algorithms. Experimental studies show that the total margin algorithms provide good performance comparing with the existing support vector algorithms.

  • A New Learning Algorithm for the Hierarchical Structure Learning Automata Operating in the General Multiteacher Environment

    Norio BABA  Yoshio MOGAMI  

     
    PAPER-Automata and Formal Language Theory

      Vol:
    E87-D No:5
      Page(s):
    1208-1213

    Learning behaviors of hierarchically structured stochastic automata operating in a general nonstationary multiteacher environment are considered. It is shown that convergence with probability 1 to the optimal path is ensured by a new learning algorithm which is an extended form of the relative reward strength algorithm. Several computer simulation results confirm the effectiveness of the proposed algorithm.

  • Non-closure Property of One-Pebble Turing Machines with Sublogarithmic Space

    Atsuyuki INOUE  Akira ITO  Katsushi INOUE  

     
    LETTER

      Vol:
    E87-A No:5
      Page(s):
    1185-1188

    This paper investigates closure properties of one-pebble Turing machines with sublogarithmic space. It shows that for any function log log n L(n) = o(log n), neither of the classes of languages accepted by L(n) space-bounded deterministic and self-verifying nondeterministic one-pebble Turing machines is closed under concatenation, Kleene closure, and length-preserving homomorphism.

  • Sounds of Speech Based Spoken Document Categorization: A Subword Representation Method

    Weidong QU  Katsuhiko SHIRAI  

     
    PAPER

      Vol:
    E87-D No:5
      Page(s):
    1175-1184

    In this paper, we explore a method to the problem of spoken document categorization, which is the task of automatically assigning spoken documents into a set of predetermined categories. To categorize spoken documents, subword unit representations are used as an alternative to word units generated by either keyword spotting or large vocabulary continuous speech recognition (LVCSR). An advantage of using subword acoustic unit representations to spoken document categorization is that it does not require prior knowledge about the contents of the spoken documents and addresses the out of vocabulary (OOV) problem. Moreover, this method works in reliance on the sounds of speech rather than exact orthography. The use of subword units instead of words allows approximate matching on inaccurate transcriptions, makes "sounds-like" spoken document categorization possible. We also explore the performance of our method when the training set contains both perfect and errorful phonetic transcriptions, and hope the classifiers can learn from the confusion characteristics of recognizer and pronunciation variants of words to improve the robustness of whole system. Our experiments based on both artificial and real corrupted data sets show that the proposed method is more effective and robust than the word based method.

  • One-Pass Semi-Dynamic Network Decoding Using a Subnetwork Caching Model for Large Vocabulary Continuous Speech Recongnition

    Dong-Hoon AHN  Minhwa CHUNG  

     
    PAPER

      Vol:
    E87-D No:5
      Page(s):
    1164-1174

    This paper presents a new decoding framework for large vocabulary continuous speech recognition that can handle a static search network dynamically. Generally, a static network decoder can use a search space that is globally optimized in advance, and therefore it can run at high speed during decoding. However, its large memory requirement due to the large network size or the spatial complexity of the optimization algorithm often makes it impractical. Our new one-pass semi-dynamic network decoding scheme aims at incorporating such an optimized search network with memory efficiency, but without losing speed. In this framework, a complete search network is organized on the basis of self-structuring subnetworks and is nearly minimized using a modified tail-sharing algorithm. While the decoder runs, it caches subnetworks needed for decoding in memory, whereas static network decoders keep the complete network in memory. The subnetwork caching model is controlled by two levels of caches: local cache obtained by subnetwork caching operations and global cache obtained by subnetwork preloading operations. The model can also be controlled adaptively by using subnetwork profiling operations. Furthermore, it is made simple and fast with compactly designed self-structuring subnetworks. Experimental results on a 25 k-word Korean broadcast news transcription task show that the semi-dynamic decoder can run almost as fast as an equivalent static network decoder under various memory configurations by using the subnetwork caching model.

  • A Novel Static Prediction Scheme for Filter Cache Structures

    Kugan VIVEKANANDARAJAH  Thambipillai SRIKANTHAN  Christopher T. CLARKE  Saurav BHATTACHARYYA  

     
    PAPER

      Vol:
    E87-C No:4
      Page(s):
    543-548

    Energy dissipation in cache memories is becoming a major design issue for embedded microprocessors. Predictive filter cache based instruction cache hierarchy has been shown to effectively reduce the energy-delay product. In this paper, a simplified pattern prediction algorithm is proposed for the filter cache hierarchy. The prediction scheme relies on the static nature of the hit or miss pattern of the instruction access streams. The static patterns are maintained in a small 32x1-bit wide Static Pattern Table (SPT). Our investigations show that the proposed prediction algorithm is superior to that based on Next Fetch Prediction Table (NFPT) for all the benchmarks simulated. With the proposed approach, energy delay product reduction of up to 6.79% was evident when compared with that using NFPT. Moreover, since the prediction scheme is based on the static assignment of patterns, it lends well for area and power efficient implementation than that employs dynamic pattern prediction although it is marginally inferior (i.e. 0.69%) in term of energy delay product.

  • Selective-Sets Resizable Cache Memory Design for High-Performance and Low-Power CPU Core

    Takashi KURAFUJI  Yasunobu NAKASE  Hidehiro TAKATA  Yukinaga IMAMURA  Rei AKIYAMA  Tadao YAMANAKA  Atsushi IWABU  Shutarou YASUDA  Toshitsugu MIWA  Yasuhiro NUNOMURA  Niichi ITOH  Tetsuya KAGEMOTO  Nobuharu YOSHIOKA  Takeshi SHIBAGAKI  Hiroyuki KONDO  Masayuki KOYAMA  Takahiko ARAKAWA  Shuhei IWADE  

     
    PAPER

      Vol:
    E87-C No:4
      Page(s):
    535-542

    We apply a selective-sets resizable cache and a complete hierarchy SRAM for the high-performance and low-power RISC CPU core. The selective-sets resizable cache can change the cache memory size by varying the number of cache sets. It reduces the leakage current by 23% with slight degradation of the worst case operating speed from 213 MHz to 210 MHz. The complete hierarchy SRAM enables the partial swing operation not only in the bit lines, but also in the global signal lines. It reduces the current consumption of the memory by 4.6%, and attains the high-speed access of 1.4 ns in the typical case.

  • A 100 MHz 7.84 mm2 31.7 msec 439 mW 512-Point 2-Dimensional FFT Single-Chip Processor

    Naoto MIYAMOTO  Leo KARNAN  Kazuyuki MARUO  Koji KOTANI  Tadahiro OHMI  

     
    PAPER

      Vol:
    E87-C No:4
      Page(s):
    502-509

    A single-chip 512-point FFT processor is presented. This processor is based on the cached-memory architecture (CMA) with the resource-saving multi-datapath radix-23 computation element. The 2-stage CMA, including a pair of single-port SRAMs, is also introduced to speedup the execution time of the 2-dimensional FFTs. Using the above techniques, we have designed an FFT processor core which integrates 552,000 transistors within an area of 2.82.8 mm2 with CMOS 0.35 µm triple-layer-metal process. This processor can execute a 512-point, 36-bit-complex fixed-point data format, 1-dimensonal FFT in 23.2 µsec and a 2-dimensional one in only 23.8 msec at 133 MHz operation. The power consumption of this processor is 439.6 mW at 3.3 V, 100 MHz operation.

  • Design and Evaluation of a High Speed Routing Lookup Architecture

    Jun ZHANG  JeoungChill SHIM  Hiroyuki KURINO  Mitsumasa KOYANAGI  

     
    PAPER-Implementation and Operation

      Vol:
    E87-B No:3
      Page(s):
    406-412

    The IP routing lookup problem is equivalent to finding the longest prefix of a packet's destination address in a routing table. It is a challenging problem to design a high performance IP routing lookup architecture, because of increasing traffic, higher link speed, frequent updates and increasing routing table size. At first, increasing traffic and higher link speed require that the IP routing can be executed at wire speed. Secondly, frequent routing table updates require that the insertion and deletion operations should be simple and low delay. At last, increasing routing table size hopes that less memory is used in order to reduce cost. Although many schemes to achieve fast lookup exist, less attention is paid on the latter two factors. This paper proposed a novel pipelined IP routing lookup architecture using selective binary search on hash table organized by prefix lengths. The evaluation results show that it can perform IP lookup operations at a maximum rate of one lookup per cycle. The hash operation ratio for one lookup can be reduced to about 1%, less than two hash operations are needed for one table update and only 512 kbytes SRAM is needed for a routing table with about 43000 prefixes. It proves to have higher performance than the existing schemes.

  • Anisotropic Bending Machine Using Conducting Polypyrrole

    Mitsuyoshi ONODA  Kazuya TADA  

     
    PAPER-Nano-interface Controlled Electronic Devices

      Vol:
    E87-C No:2
      Page(s):
    128-135

    Recent new technologies of electro-mechanical conversion devices have been reviewed. Especially, the electrochemical properties of anisotropic actuators using polypyrrole have been reviewed in detailed and the realization of the bimorph (or bending beam) structure without artificial adhesive agent is introduced.

  • Two Step POS Selection for SVM Based Text Categorization

    Takeshi MASUYAMA  Hiroshi NAKAGAWA  

     
    PAPER

      Vol:
    E87-D No:2
      Page(s):
    373-379

    Although many researchers have verified the superiority of Support Vector Machine (SVM) on text categorization tasks, some recent papers have reported much lower performance of SVM based text categorization methods when focusing on all types of parts of speech (POS) as input words and treating large numbers of training documents. This was caused by the overfitting problem that SVM sometimes selected unsuitable support vectors for each category in the training set. To avoid the overfitting problem, we propose a two step text categorization method with a variable cascaded feature selection (VCFS) using SVM. VCFS method selects a pair of the best number of words and the best POS combination for each category at each step of the cascade. We made use of the difference of words with the highest mutual information for each category on each POS combination. Through the experiments, we confirmed the validation of VCFS method compared with other SVM based text categorization methods, since our results showed that the macro-averaged F1 measure (64.8%) of VCFS method was significantly better than any reported F1 measures, though the micro-averaged F1 measure (85.4%) of VCFS method was similar to them.

  • Software Implementation of a Secure Socket Layer (SSL) Accelerator Based on Kernel Thread

    Euiseok NAHM  Byungjo MIN  Jinbae PARK  Hagbae KIM  

     
    LETTER-Software Engineering

      Vol:
    E87-D No:1
      Page(s):
    244-245

    We implement an efficient Secure Socket Layer (SSL) accelerator, which is embedded in the kernel level and utilizes kernel threads as the same number of CPUs. In comparison with the conventional Apache with/without our SSL accelerator, the SSL accelerator significantly improves the web-server performance by up to 200%.

  • A Cache Replacement Policy for Transcoding Proxy Servers

    Kai-Hau YEUNG  Chun-Cheong WONG  Kin-Yeung WONG  Suk-Yu HUI  

     
    LETTER-Multimedia Systems

      Vol:
    E87-B No:1
      Page(s):
    209-211

    A cache replacement policy which takes the transcoding time into account in making replacement decisions, for the emerging transcoding proxy servers is proposed. Simulation results show the proposed policy outperforms the conventional LRU in both the cache hit rate and the average object transcoding time.

  • Sequential Fusion of Output Coding Methods and Its Application to Face Recognition

    Jaepil KO  Hyeran BYUN  

     
    PAPER-Face

      Vol:
    E87-D No:1
      Page(s):
    121-128

    In face recognition, simple classifiers are frequently used. For a robust system, it is common to construct a multi-class classifier by combining the outputs of several binary classifiers; this is called output coding method. The two basic output coding methods for this purpose are known as OnePerClass (OPC) and PairWise Coupling (PWC). The performance of output coding methods depends on accuracy of base dichotomizers. Support Vector Machine (SVM) is suitable for this purpose. In this paper, we review output coding methods and introduce a new sequential fusion method using SVM as a base classifier based on OPC and PWC according to their properties. In the experiments, we compare our proposed method with others. The experimental results show that our proposed method can improve the performance significantly on the real dataset.

781-800hit(1072hit)