1-4hit |
Gi-Ho PARK Jung-Wook PARK Gunok JUNG Shin-Dug KIM
This paper presents a wordline gating logic for reducing unnecessary BTB accesses. Partial bit of the branch predictor was simultaneously recorded in the middle of BTB to prevent further SRAM operation. Experimental results with embedded applications showed that the proposed mechanism reduces around 38% of BTB power consumption.
Gi-Ho PARK Kil-Whan LEE Tack-Don HAN Shin-Dug KIM
This paper presents a dual data cache system structure, called a cooperative cache system, that is designed as a low power cache structure for embedded processors. The cooperative cache system consists of two caches, i.e., a direct-mapped temporal oriented cache (TOC) and a four-way set-associative spatial oriented cache (SOC). The cooperative cache system achieves improvement in performance and reduction in power consumption by virtue of the structural characteristics of the two caches designed inherently to help each other. An evaluation chip of an embedded processor having the cooperative cache system is manufactured by Samsung Electronics Co. with 0.25 µm 4-metal process technology.
Woo-Chan PARK Shi-Wha LEE Oh-Young KWON Tack-Don HAN Shin-Dug KIM
A model for the floating point adder/subtractor which can perform rounding and addition/subtraction operations in parallel is presented. The major requirements and structure to achieve this goal are described and algebraically verified. Processing flow of the conventional floating point addition/subtraction operation consists of alignment, addition/subtraction, normalization, and rounding stages. In general, the rounding stage requires a high speed adder for increment, increasing the overall execution time and occupying a large amount of chip area. Furthermore, it accompanies additional execution time and hardware logics for renormalization stage which may occur by an overflow from the rounding operation. A floating adder/subtractor performing addition/subtraction and IEEE rounding in parallel is designed by optimizing the operational flow of floating point addition/subtraction operation. The floating point adder/subtractor presented does not require any additional execution time nor any high speed adder for rounding operation. In addition, the renormalization step is not required because the rounding step is performed prior to the normalization operation. Thus, performance improvement and cost-effective design can be achieved by this approach.
Gi-Ho PARK Jung-Wook PARK Hoi-Jin LEE Gunok JUNG Sung-Bae PARK Shin-Dug KIM
This paper presents a cache way enabling mechanism using branch target addresses. This mechanism uses branch prediction information to avoid the power consumption due to unnecessary cache way access by enabling only the cache way(s) that should be accessed. The proposed cache way enabling mechanism reduces the power consumption of the instruction cache by 63% without any performance degradation of the processor. An ARM1136 processor simulator and the Synopsys PrimeTime are used to perform the performance/power simulation and static timing analysis of the proposed mechanisms respectively.