1-2hit |
Miseon HAN Yeoul NA Dongha JUNG Hokyoon LEE Seon WOOK KIM Youngsun HAN
A memory controller refreshes DRAM rows periodically in order to prevent DRAM cells from losing data over time. Refreshes consume a large amount of energy, and the problem becomes worse with the future larger DRAM capacity. Previously proposed selective refreshing techniques are either conservative in exploiting the opportunity or expensive in terms of required implementation overhead. In this paper, we propose a novel DRAM selective refresh technique by using page residence in a memory hierarchy of hardware-managed TLB. Our technique maximizes the opportunity to optimize refreshing by activating/deactivating refreshes for DRAM pages when their PTEs are inserted to/evicted from TLB or data caches, while the implementation cost is minimized by slightly extending the existing infrastructure. Our experiment shows that the proposed technique can reduce DRAM refresh power 43.6% on average and EDP 3.5% with small amount of hardware overhead.
Youngsun HAN Seok Joong HWANG Seon Wook KIM
In this paper, we present a reconfigurable processor infrastructure to accelerate Java applications, called Jaguar. The Jaguar infrastructure consists of a compiler framework and a runtime environment support. The compiler framework selects a group of Java methods to be translated into hardware for delivering the best performance under limited resources, and translates the selected Java methods into Verilog synthesizable code modules. The runtime environment support includes the Java virtual machine (JVM) running on a host processor to provide Java execution environment to the generated Java accelerator through communication interface units while preserving Java semantics. Our compiler infrastructure is a tightly integrated and solid compiler-aided solution for Java reconfigurable computing. There is no limitation in generating synthesizable Verilog modules from any Java application while preserving Java semantics. In terms of performance, our infrastructure achieves the speedup by 5.4 times on average and by up to 9.4 times in measured benchmarks with respect to JVM-only execution. Furthermore, two optimization schemes such as an instruction folding and a live buffer removal can reduce 24% on average and up to 39% of the resource consumption.