The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] instruction(79hit)


  • Instruction Encoding for Reducing Power Consumption of I-ROMs Based on Execution Locality

    Koji INOUE  Vasily G. MOSHNYAGA  Kazuaki MURAKAMI  


    E86-A No:4

    In this paper, we propose an instruction encoding scheme to reduce power consumption of instruction ROMs. The power consumption of the instruction ROM strongly depends on the switching activity of bit-lines due to their large load capacitance. In our approach, the binary-patterns to be assigned as op-codes are determined based on the frequency of instructions in order to reduce the number of bit-line dis-charging. Simulation results show that our approach can reduce 40% of bit-line switchings from a conventional organization.

  • Analysis of x86 Instruction Set Usage for DOS/Windows Applications and Its Implication on Superscalar Design

    Ing-Jer HUANG  Tzu-Chin PENG  

    PAPER-VLSI Systems

    E85-D No:6

    The understanding of instruction set usage in typical DOS/Windows applications plays a very important role in designing high performance x86 compatible microprocessors. This paper presents the tools to such analysis, the analysis results, and their implications on the design of a RISC-based superscalar processor for efficient x86 instruction execution. The analyzed results are used to optimize the execution of frequently executed instructions and micro operations.

  • Potential of Constructive Timing-Violation

    Toshinori SATO  Itsujiro ARITA  

    PAPER-High-Performance Technologies

    E85-C No:2

    This paper proposes constructive timing-violation (CTV) and evaluates its potential. It can be utilized both for increasing clock frequency and for reducing energy consumption. Increasing clock frequency over that determined by the critical paths causes timing violations. On the other hand, while supply voltage reduction can result in substantial power savings, it also causes larger gate delay and thus clock must be slow down in order not to violate timing constraints of critical paths. However, if any tolerant mechanisms are provided for the timing violations, it is not necessary to keep the constraints. Rather, the violations would be constructive for high clock frequency or for energy savings. From these observations, we propose the CTV, which is supported by the tolerant mechanism based on contemporary speculative execution mechanisms. We evaluate the CTV using a cycle-by-cycle simulator and present its considerably promising potential.

  • Analytical Models and Performance Analyses of Instruction Fetch on Superscalar Processors

    Sun-Mo KIM  Jung-Woo LEE  Soo-Haeng LEE  Sang-Bang CHOI  


    E84-A No:6

    Cache memories are small fast memories used to temporarily hold the contents of main memory that are likely to be referenced by processors so as to reduce instruction and data access time. In study of cache performance, most of previous works have employed simulation-based methods. However, that kind of researches cannot precisely explain the obtained results. Moreover, when a new processor is designed, huge simulations must be performed again with several different parameters. This research classifies cache structures for superscalar processors into four types, and then represents analytical model of instruction fetch process for each cache type considering various kinds of architectural parameters such as the frequency of branch instructions in program, cache miss rate, cache miss penalty, branch misprediction frequency, and branch misprediction penalty, and etc. To prove the correctness of the proposed models, we performed extensive simulations and compared the results with the analytical models. Simulation results showed that the proposed model can estimate the expected instruction fetch rate accurately within 10% error in most cases. This paper shows that the increase of cache misses reduces the instruction fetch rate more severely than that of branch misprediction does. The model is also able to provide exact relationship between cache miss and branch misprediction for the instruction fetch analysis. The proposed model can explain the causes of performance degradation that cannot be uncovered by the simulation method only.

  • Proposal of a Multi-Threaded Processor Architecture for Embedded Systems and Its Evaluation

    Shinsuke KOBAYASHI  Yoshinori TAKEUCHI  Akira KITAJIMA  Masaharu IMAI  


    E84-A No:3

    In this paper, an architecture of multi-threaded processor for embedded systems is proposed and evaluated comparing with other processors for embedded systems. The experimental results show the trade-off of hardware costs and execution times among processors. Taking proposed multi-threaded processor into account as an embedded processor, design space of embedded systems are enlarged and more suitable architecture can be selected under some design constraints.

  • Parallelism-Independent Scheduling Method

    Kirilka NIKOLOVA  Atusi MAEDA  Masahiro SOWA  


    E83-A No:6

    All the existing scheduling algorithms order the instructions of the program in such a way that it can be executed in minimal time only for one fixed number of processors. In this paper we propose a new scheduling method, called Parallelism-Independent Scheduling Method, which enables the execution of the scheduled program on parallel computers with any degree of parallelism in near-optimal time. We propose three Parallelism-Independent algorithms, which have the following phases: obtaining a parallel schedule by using a list scheduling heuristics, optimization of the parallel schedule by rearranging the tasks in each level, so that they can be executed efficiently with different degrees of parallelism, serialization of the parallel schedule, and insertion of markers for the parallel execution limits. The three algorithms differ in their optimization phase. To prove the efficiency of our algorithms, we have made simulations with random directed acyclic graphs with different size and degree of parallelism. We compared the results in terms of schedule length to those obtained using the Critical Path Algorithm separately for each degree of parallelism.

  • CLASSIC: An O(n2)-Heuristic Algorithm for Microcode Bit Optimization Based on Incompleteness Relations

    Young-doo CHOI  In-Cheol PARK  Chong-Min KYUNG  

    PAPER-VLSI Design Technology and CAD

    E83-A No:5

    This paper presents a heuristic algorithm called CLASSIC for the minimization of the control memory width in microprogrammed processors or the instruction memory width of application-specific VLIW (Very Long Instruction Word) processors. CLASSIC results in nearly optimal solutions with the time complexity of O(n2), where n denotes the number of microoperations. In this paper, we also propose the so-called incompleteness relations which are exploited for the minimization of the control memory width. Experiments using various examples have shown that CLASSIC always achieves smaller microprogram widths compared to the earlier techniques based on the maximal compatibility class or the minimal AND/OR set. The results show that CLASSIC can reduce the control memory width by 34.2% on average compared with a heuristic compatibility class algorithm.

  • Synthesizable HDL Generation for Pipelined Processors from a Micro-Operation Description

    Makiko ITOH  Yoshinori TAKEUCHI  Masaharu IMAI  Akichika SHIOMI  


    E83-A No:3

    A synthesizable HDL generation method for pipelined processors is proposed. By using the proposed method, data-path and control logic descriptions of a target processor is generated from a clock based instruction set specification. From the experimental results, feasibility of the proposed method is evaluated and the amount of processor design time was drastically reduced than that of conventional RT level manual design in HDL.

  • Path-Classified Trace Cache for Improving Hit Ratio in Wide-Issue Processors

    Jin-Hyuk YANG  In-Cheol PARK  Chong-Min KYUNG  

    PAPER-Computer Hardware and Design

    E82-D No:10

    In this paper, an instruction-cache scheme called Multi-Path Tracing is proposed to enhance the trace cache. Paths are classified to improve the trace cache hit ratio by reducing the path conflict and basic blocks are joined to reduce the hardware cost needed to implement the trace cache. Simulation results for various SPEC integer benchmarks show that the proposed scheme increases the hit ratio by more than 25% and the effective fetch size by 10%.

  • System Performance Analyses of Out-of-Order Superscalar Processors Using Analytical Method

    Hak-Jun KIM  Sun-Mo KIM  Sang-Bang CHOI  


    E82-A No:6

    This research presents a novel analytic model to predict the instruction execution rate of superscalar processors using the queuing model with finite-buffer size and synchronous operation mode. The proposed model is also able to analyze the performance relationship between cache and pipeline. The proposed model takes into account various kinds of architectural parameters such as instruction-level parallelism, branch probability, the accuracy of branch prediction, cache miss, and etc. To prove the correctness of the model, we performed extensive simulations and compared the results with the analytic model. Simulation results showed that the proposed model can estimate the average execution rate accurately within 10% error in most cases. The proposed model can explain the causes of performance bottleneck which cannot be uncovered by the simulation method only. The model is also able to show the effect of the cache miss on the performance of out-of-order issue superscalar processors, which can provide an valuable information in designing a balanced system.

  • Instruction Scheduling to Reduce Switching Activity of Off-Chip Buses for Low-Power Systems with Caches

    Hiroyuki TOMIYAMA  Tohru ISHIHARA  Akihiko INOUE  Hiroto YASUURA  


    E81-A No:12

    In many embedded systems, a significant amount of power is consumed for off-chip driving because off-chip capacitances are much larger than on-chip capacitances. This paper proposes instruction scheduling techniques to reduce power consumed for off-chip driving. The techniques minimize the switching activity of a data bus between an on-chip cache and a main memory when instruction cache misses occur. The scheduling problem is formulated and two scheduling algorithms are presented. Experimental results demonstrate the effectiveness and the efficiency of the proposed algorithms.

  • A Microprocessor Architecture Utilizing Histories of Dynamic Sequences Saved in Distributed Memories

    Toshinori SATO  


    E81-C No:9

    In order to improve microprocessor performance, we propose to utilize histories of dynamic instruction sequences. A lot of special purpose memories integrated in a processor chip hold the histories. In this paper, we describe the usefulness of using two special purpose memories: Non-Consecutive basic block Buffer (NCB) and Reference Prediction Table (RPT). The NCB improves instruction fetching efficiency in order to relieve control dependences. The RPT predicts data addresses in order to speculate data dependences. From the simulation study, it has been found that the proposed mechanisms improve processor performance by up to 49. 2%.

  • The Effect of Instruction Window on the Performance of Superscalar Processors

    Yong-Hyeon PYUN  Choung-Shik PARK  Sang-Bang CHOI  

    PAPER-Systems and Control

    E81-A No:6

    This paper suggests a novel analytical model to predict average issue rate of both in-order and out-of-order issue policies. Most of previous works have employed only simulation methods to measure the instruction-level parallelism for performance. However these methods cannot disclose the cause of the performance bottle-neck. In this paper, the proposed model takes into account such factors as issue policy, instruction-level parallelism, branch probability, the accuracy of branch prediction, instruction window size, and the number of pipeline units to estimate the issue rate more accurately. To prove the correctness of the model, extensive simulations were performed with Intel 80386/80387 instruction traces. Simulation results showed that the proposed model can estimate the issue rate accurately within 3-10% differences. The analytical model and simulations show that the out-of-order issue can improve the superscalar performance by 70-206% compared to the in-order issue. The model employs parameters to characterize the behavior of programs and the structure of superscalar that cause performance bottle-neck. Thus, it can disclose the cause of the disproportion in performance and reduce the burden of excess simulations that should be performed whenever a new processor is designed.

  • Instruction Sequence Based Synthesis for Application Specific Micro-Architecture

    Kyung-Sik JANG  Tsuyoshi ISSHIKI  Hiroaki KUNIEDA  


    E80-A No:6

    In this paper, a systematic method which generates the micro-architecture of Application Specific Instruction Processor (ASIP) is proposed. Different from previous works, the data path and control path are generated from the instruction sequence which is generated by translating the compiled assembly code. A graphical representation method called Register Transfer Graph (RTG) is introduced to describe the micro-operations of instruction sequence. To achieve high performance, we perform micro-operation level scheduling which dynamically assigns the micro-operations of instruction sequence to the control steps. By transforming the architecture using synthesis parameters, design space is explored more extensively. Connection cost is minimized by removing the inefficient data transfer paths.

  • Instructional Navigation Technology in a Multimedia System for Learner-Centered Learning

    Masanao KOBAYASHI  Hitoshi SASAKI  Makoto TAKEYA  

    PAPER-Advanced CAI system using media technologies

    E80-D No:2

    For two decades, our colleagues and we have been developing our multiple learning environments in mathematical education for upper secondary school learners, and have been reporting our learner-centered system in the latest four WCCE Conferences (WCCE/1981/1985/1990/1995). In our latest learning multimedia system, individual learners have to meet a complex network structure in which objectives are arranged in the form of non-linear linking and to proceed actively to their own goals. In order to support their exploring learning, we developed several instructional navigation tools from an instructional view point. This paper presents our instructional navigation technology and its tools. The feature of our present system is to provide a supportive environment where individual learners can set up their own goals, create their own paths for their goals through instructional materials, and construct their own instructional structure based on instructional strategies. This feature is remarkably different from a traditional CAI system in which learners are only directed through the courseware via a linear selection of menus. Also this feature fundamentally differs from general navigation technologies by which a user is able to traverse a series of nodes among non-linear network structure, because our navigation must present individual learners with some easily learnable sequences of objectives based on their object and interest. For this purpose, this system has three chracteristic technologies, i.e. focusing, sequencing and clustering ones. These are very useful for them to make their decisions in order to reach their own goals. This paper consists of (1) ideas of instructional navigation, (2) map technology and (3) navigation technology.

  • Optimization Method for Selecting Problems Using the Learner's Model in Intelligent Adaptive Instruction System

    Tatsunori MATSUI  

    PAPER-Advanced CAI system using media technologies

    E80-D No:2

    The purpose of our study is to develop an intelligent adaptive instruction system that manages intelligently the learner's estimated knowledge structure and optimizes the selection of problems according to his/her knowledge structures. The system adopts the dynamic problems of high school physics as a material of study, and is intended to operate on a UNIX Work Station. For these purposes, the system is composed of three parts, 1) interface part, 2) problem solving expert part, and 3) optimization expert system part for problem selection. The main feature of our system is that both knowledge structures of learner and teacher are represented by structural graph, and the problem selection process is controlled by the relationship between the learner's knowledge structure and the teacher's knowledge structure. In our system the relationship between these two knowledge structures is handled in the optimization expert system part for problem selection. In this paper the theory of the optimization expert system part for problem selection is described, and the effectiveness of this part is clarified through a simulation experiment of the originally defined matching coefficient.

  • An ASIP Instruction Set Optimization Algorithm with Functional Module Sharing Constraint

    Alauddin Y. ALOMARY  Masaharu IMAI  Nobuyuki HIKICHI  


    E76-A No:10

    One of the most interesting and most analyzed aspects of the CPU design is the instruction set design. How many and which operations to be provided by hardware is one of the most fundamental issues relaing to the instruction set design. This paper describes a novel method that formulates the instruction set design of ASIP (an Application Specific Integrated Processor) using a combinatorial appoach. Starting with the whole set of all possible candidata instructions that represesnt a given application domain, this approach selects a subset that maximizes the performance under the constraints of chip area, power consumption, and functional module sharing relation among operations. This leads to the efficient implementation of the selected instructions. A branch-and-bound algorithm is used to solve this combinatorial optimization problem. This approach selects the most important instructions for a given application as well as optimizing the hardware resources that implement the selected instructions. This approach also enables designers to predict the perfomance of their design before implementing them, which is a quite important feature for producing a quality design in reasonable time.

  • An Integer Programming Approach to Instruction Set Selection Problem

    Alauddin Y. ALOMARY  Masaharu IMAI  Jun SATO  Nobuyuki HIKICHI  

    PAPER-VLSI Design Technology

    E76-A No:10

    The performance of ASIPs (Application Specific Integrated Processors) is heavily affected by the design of their instruction set architecture. In order to maximize the performance of ASIP, it is essential to design an architecture that has an optimum instruction set. This paper descibes a new method that automates the design of optimum instruction set of ASIP. This method solves the Instruction set implementation Method Selection Problem(IMSP). IMSP is to be solved in the instruction set architecture design. Frse, the IMSP is formalized as an integer programming problem, which is to maximize the perfomance of the CPU under the constraints of chip area and power consumption. Then, a branch-and-bound algorithm to solve IMSP is described. According to the experimental results, the proposed algorithm is quite effective and efficient in solving the IMSP. The presented method automates a complex part of the ASIP chip design and is also a good design tool that enables designer to predict the performance of their design before completion.

  • A Concurrent Fault Detection Method for Instruction Level Parallel Processors



    E76-D No:7

    This paper describes a new method for the concurrent detection of faults in instruction level parallel (ILP) processors. This method uses the No OPeration (NOP) instruction slots that under branches, resource conflicts and some kind of data dependencies fill some of the pipelines (stages) in an ILP processor. NOPs are replaced by the copy of an effective instruction running in another pipeline. This allows the checking of the pipelines running the original instruction and its copy (ies), by the comparison of the outputs of their stages during the execution of the replicated instruction. We show some figures obtained for the application of this method to a two-pipeline superscalar processor.
