The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] register file(13hit)

1-13hit
  • Evaluation of Register Number Abstraction for Enhanced Instruction Register Files

    Naoki FUJIEDA  Kiyohiro SATO  Ryodai IWAMOTO  Shuichi ICHIKAWA  

     
    PAPER-Computer System

      Pubricized:
    2018/03/14
      Vol:
    E101-D No:6
      Page(s):
    1521-1531

    Instruction set randomization (ISR) is a cost-effective obfuscation technique that modifies or enhances the relationship between instructions and machine languages. An Instruction Register File (IRF), a list of frequently used instructions, can be used for ISR by providing the way of indirect access to them. This study examines the IRF that integrates a positional register, which was proposed as a supplementary unit of the IRF, for the sake of tamper resistance. According to our evaluation, with a new design for the contents of the positional register, the measure of tamper resistance was increased by 8.2% at a maximum, which corresponds to a 32.2% increase in the size of the IRF. The number of logic elements increased by the addition of the positional register was 3.5% of its baseline processor.

  • Skewed Multistaged Multibanked Register File for Area and Energy Efficiency

    Junji YAMADA  Ushio JIMBO  Ryota SHIOYA  Masahiro GOSHIMA  Shuichi SAKAI  

     
    PAPER-Computer System

      Pubricized:
    2017/01/11
      Vol:
    E100-D No:4
      Page(s):
    822-837

    The region that includes the register file is a hot spot in high-performance cores that limits the clock frequency. Although multibanking drastically reduces the area and energy consumption of the register files of superscalar processor cores, it suffers from low IPC due to bank conflicts. Our skewed multistaging drastically reduces not the bank conflict probability but the pipeline disturbance probability by the second stage. The evaluation results show that, compared with NORCS, which is the latest research on a register file for area and energy efficiency, a proposed register file with 18 banks achieves a 39.9% and 66.4% reduction in circuit area and in energy consumption, while maintaining a relative IPC of 97.5%.

  • Design of a Register Cache System with an Open Source Process Design Kit for 45nm Technology

    Junji YAMADA  Ushio JIMBO  Ryota SHIOYA  Masahiro GOSHIMA  Shuichi SAKAI  

     
    PAPER

      Vol:
    E100-C No:3
      Page(s):
    232-244

    An 8-issue superscalar core generally requires a 24-port RAM for the register file. The area and energy consumption of a multiported RAM increase in proportional to the square of the number of ports. A register cache can reduce the area and energy consumption of the register file. However, earlier register cache systems suffer from lower IPC caused by register cache misses. Thus, we proposed the Non-Latency-Oriented Register Cache System (NORCS) to solve the IPC problem with a modified pipeline. We evaluated NORCS mainly from the viewpoint of microarchitecture in the original article, and showed that NORCS maintains almost the same IPC as conventional register files. Researchers in NVIDIA adopted the same idea for their GPUs. However, the evaluation was not sufficient from the viewpoint of LSI design. In the original article, we used CACTI to evaluate the area and energy consumption. CACTI is a design space exploration tool for cache design, and adopts some rough approximations. Therefore, this paper shows design of NORCS with FreePDK45, an open source process design kit for 45nm technology. We performed manual layout of the memory cells and arrays of NORCS, and executed SPICE simulation with RC parasitics extracted from the layout. The results show that, from a full-port register file, an 8-entry NORCS achieves a 75.2% and 48.2% reduction in area and energy consumption, respectively. The results also include the latency which we did not present in our original article. The latencies of critical path is 307ps and 318ps for an 8-entry NORCS and a conventional multiported register file, respectively, when the same two cycles are allocated to register file read.

  • Short Term Cell-Flipping Technique for Mitigating SNM Degradation Due to NBTI

    Yuji KUNITAKE  Toshinori SATO  Hiroto YASUURA  

     
    PAPER

      Vol:
    E94-C No:4
      Page(s):
    520-529

    Negative Bias Temperature Instability (NBTI) is one of the major reliability problems in advanced technologies. NBTI causes threshold voltage shift in a PMOS transistor. When the PMOS transistor is biased to negative voltage, threshold voltage shifts to negatively. On the other hand, the threshold voltage recovers if the PMOS transistor is positively biased. In an SRAM cell, due to NBTI, threshold voltage degrades in the load PMOS transistors. The degradation has the impact on Static Noise Margin (SNM), which is a measure of read stability of a 6-T SRAM cell. In this paper, we discuss the relationship between NBTI degradation in an SRAM cell and the dynamic stress and recovery condition. There are two important characteristics. One is a stress probability, which is defined as the rate that the PMOS transistor is negatively biased. The other is a stress and recovery cycle, which is defined as the switching interval of an SRAM value. In our observations, in order to mitigate the NBTI degradation, the stress probability should be small and the stress and recovery cycle should be shorter than 10 msec. Based on the observations, we propose a novel cell-flipping technique, which makes the stress probability close to 50%. In addition, we show results of the case studies, which apply the cell-flipping technique to register file and cache memories.

  • Register File Size Reduction through Instruction Pre-Execution Incorporating Value Prediction

    Yusuke TANAKA  Hideki ANDO  

     
    PAPER-Computer System

      Vol:
    E93-D No:12
      Page(s):
    3294-3305

    Two-step physical register deallocation (TSD) is an architectural scheme that enhances memory-level parallelism (MLP) by pre-executing instructions. Ideally, TSD allows exploitation of MLP under an unlimited number of physical registers, and consequently only a small register file is needed for MLP. In practice, however, the amount of MLP exploitable is limited, because there are cases where either 1) pre-execution is not performed; or 2) the timing of pre-execution is delayed. Both are due to data dependencies among the pre-executed instructions. This paper proposes the use of value prediction to solve these problems. This paper proposes the use of value prediction to solve these problems. Evaluation results using the SPECfp2000 benchmark confirm that the proposed scheme with value prediction for predicting addresses achieves equivalent IPC, with a smaller register file, to the previous TSD scheme. The reduction rate of the register file size is 21%.

  • Power Estimation of Partitioned Register Files in a Clustered Architecture with Performance Evaluation

    Yukinori SATO  Ken-ichi SUZUKI  Tadao NAKAMURA  

     
    PAPER-VLSI Systems

      Vol:
    E90-D No:3
      Page(s):
    627-636

    High power consumption and slow access of enlarged and multiported register files make it difficult to design high performance superscalar processors. The clustered architecture, where the conventional monolithic register file is partitioned into several smaller register files, is expect to overcome the register file issues. In the clustered architecture, the more a monolithic register file is partitioned, the lower power and faster access register files can be realized. However, the partitioning causes losses of IPC (instructions per clock cycle) due to communication among register files. Therefore, degree of partitioning has a strong impact on the trade-off between power consumption and performance. In addition, the organization of partitioned register files also affects the trade-off. In this paper, we attempt to investigate appropriate degrees of partitioning and organizations of partitioned register files in a clustered architecture to assess the trade-off. From the results of execute-driven simulation, we find that the organization of register files and the degree of partitioning have a strong impact on the IPC, and the configuration with non-consistent register files can make use of the partitioned resources more effectively. From the results of register file access time and energy modeling, we find that the configurations with the highly partitioned non-consistent register file organization can receive benefit of the partitioning in terms of operating frequency and access energy of register files. Further, we examine relationship between IPS (instructions per second) and the product of IPC and operating frequency of register files. The results suggest that highly partitioned non-consistent configurations tends to gain more advantage in performance and power.

  • Physical Register Sharing through Value Similarity Detection

    In Pyo HONG  Ha Young JEONG  Yong Surk LEE  

     
    LETTER-Computer Systems

      Vol:
    E89-D No:10
      Page(s):
    2678-2681

    Modern processors have large instruction windows to improve performance. They usually adopt register renaming, where every active instruction with a valid destination needs a physical register. As the instruction windows get larger, however, bigger physical register files are required. To solve this problem, we proposed a physical register sharing technique. It shares a physical register among multiple instructions based on a value similarity. As a result, we achieved performance improvement without increasing the size of the physical register file. In addition, the proposed technique can also be used to reduce the timing, complexity and area overhead of the physical register file.

  • Multi-Ported Register File for Reducing the Impact of PVT Variation

    Yuuichirou IKEDA  Masaya SUMITA  Makoto NAGATA  

     
    PAPER-Signal Integrity and Variability

      Vol:
    E89-C No:3
      Page(s):
    356-363

    We have developed a 32-bit, 32-word, and 9-read, 7-write ported register file. This register file has several circuits and techniques for reducing the impact of process variation that is marked in recent process technologies, voltage variation, and temperature variation, so called PVT variation. We describe these circuits and techniques in detail, and confirm their effects by simulation and measurement of the test chip.

  • A Lower-Power Register File Based on Complementary Pass-Transistor Adiabatic Logic

    Jianping HU  Tiefeng XU  Hong LI  

     
    PAPER-Digital Circuits and Computer Arithmetic

      Vol:
    E88-D No:7
      Page(s):
    1479-1485

    This paper presents a novel low-power register file based on adiabatic logic. The register file consists of a storage-cell array, address decoders, read/write control circuits, sense amplifiers, and read/write drivers. The storage-cell array is based on the conventional memory cell. All the circuits except the storage-cell array employ CPAL (complementary pass-transistor adiabatic logic) to recover the charge of large node capacitance on address decoders, bit-lines and word-lines in fully adiabatic manner. The minimization of energy consumption was investigated by choosing the optimal size of CPAL circuits for large load capacitance. The power consumption of the proposed adiabatic register file is significantly reduced because the energy transferred to the large capacitance buses is mostly recovered. The energy and functional simulations are performed using the net-list extracted from the layout. HSPICE simulation results indicate that the proposed register file attains energy savings of 65% to 85% as compared to the conventional CMOS implementation for clock rates ranging from 25 to 200 MHz.

  • Designs of Building Blocks for High-Speed, Low-Power Processors

    Tadayoshi ENOMOTO  

     
    PAPER-High-Performance Technologies

      Vol:
    E85-C No:2
      Page(s):
    331-338

    A fast, low-power 16-bit adder, 32-word register file and 512-bit cache SRAM have been developed using 0.25-µm GaAs HEMT technology for future multi-GHz processors. The 16-bit adder, which uses a negative logic binary look-ahead carry structure based on NOR gates, operates at the maximum clock frequency of 1.67 GHz and consumes 134.4 mW at a supply voltage of 0.6 V. The active area is 1.6 mm2 and there are about 1,230 FETs. A new DC/DC level converter has been developed for use in high-speed, low-power storage circuits such as SRAMs and register files. The level converter can increase the DC voltage, which is supplied to an active-load circuit on request, or supply a minimal DC voltage to a load circuit in the stand-by mode. The power dissipation (P) of the 32-word register file with on-chip DC/DC level converters is 459 mW, a reduction to 25.2% of that of an equivalent conventional register file, while the operating frequency (fc) was 5.17 GHz that is 74.8% of fc for the conventional register file. P for the 512-bit cache SRAM with the new DC/DC level converters is 34.3 mW, 89.7% of the value for an equivalent conventional cache SRAM, with the read-access time of 455 psec, only 1.1% longer than that of the conventional cache SRAM.

  • A New Hardware/Software Partitioning Algorithm for DSP Processor Cores with Two Types of Register Files

    Nozomu TOGAWA  Takashi SAKURAI  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    LETTER-Hardware/Software Codesign

      Vol:
    E84-A No:11
      Page(s):
    2802-2807

    This letter proposes a hardware/software partitioning algorithm for digital signal processor cores with two register files. Given a compiled assembly code and a timing constraint of execution time, the proposed algorithm generates a processor core configuration with a new assembly code running on the generated processor core. The proposed algorithm considers two register files and determines the number of registers in each of register files. Moreover the algorithm considers two or more types of functional units for each arithmetic or logical operation and assigns functional units with small area to a processor core without causing performance penalty. A generated processor core will have small area compared with processor cores which have a single register file or those which consider only one type of functional units for each operation. The experimental results demonstrate the effectiveness and efficiency of the proposed algorithm.

  • A Hardware/Software Cosynthesis System for Digital Signal Processor Cores with Two Types of Register Files

    Nozomu TOGAWA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E83-A No:3
      Page(s):
    442-451

    In digital signal processing, bit width of intermediate variables should be longer than that of input and output variables in order to execute intermediate operations with high precision. Then a processor core for digital signal processing is required to have two types of register files, one of which is used by input and output variables and the other one is used by intermediate variables. This paper proposes a hardware/software cosynthesis system for digital signal processor cores with two types of register files. Given an application program and its data, the system synthesizes a hardware description of a processor core, an object code running on the processor core, and software environments. A synthesized processor core can be composed of a processor kernel, multiple data memory buses, hardware loop units, addressing units, and multiple functional units. Furthermore it can have two types of register files RF1 and RF2. The bit width and number of registers in RF1 or RF2 will be determined based on a given application program. Thus a synthesized processor core will have small area with keeping high precision of intermediate operations compared with a processor core with only one register file. The experimental results demonstrate the effectiveness of the proposed system.

  • Data Bypassing Register File for Low Power Microprocessor

    Makoto IKEDA  Kunihiro ASADA  

     
    LETTER-Integrated Electronics

      Vol:
    E78-C No:10
      Page(s):
    1470-1472

    In this paper, we propose a register file with data bypassing function. This register file bypasses data using data bypassing units instead of functional units when actual operation in functional units such as ALU is unnecessary. Applying this method to a general purpose microprocessor with benchmark programs, we demonstrate 50% power consumption reduction in functional units. Though length of bus lines increases a little due to an additional hardware in register file, as buses are not driven when data is bypassed, power consumption in bus lines is also reduced by 40% compared with the conventional architecture.