1-4hit |
Inhwa JUNG Moo-young KIM Dongsuk SHIN Seon Wook KIM Chulwoo KIM
This paper describes the Differential Pass Transistor Pulsed Latch (DPTPL) which enhances D-Q delay and reduce power consumption using NMOS pass transistors and feedback PMOS transistors. The proposed flip-flop uses the characteristic of stronger drivability of NMOS transistor than that of transmission gate if the sum of total transistor width is the same. Positive feedback PMOS transistors enhance the speed of the latch as well as guarantee the full-swing of internal nodes. Also, the power consumption of proposed pulsed latch is reduced significantly due to the reduced clock load and smaller total transistor width compared to conventional differential flip-flops. DPTPL reduces ED by 45.5% over ep-SFF. The simulations were performed in a 0.1 µm CMOS technology at 1.2 V supply voltage with 1.25 GHz clock frequency.
Youngsun HAN Seok Joong HWANG Seon Wook KIM
In this paper, we present a reconfigurable processor infrastructure to accelerate Java applications, called Jaguar. The Jaguar infrastructure consists of a compiler framework and a runtime environment support. The compiler framework selects a group of Java methods to be translated into hardware for delivering the best performance under limited resources, and translates the selected Java methods into Verilog synthesizable code modules. The runtime environment support includes the Java virtual machine (JVM) running on a host processor to provide Java execution environment to the generated Java accelerator through communication interface units while preserving Java semantics. Our compiler infrastructure is a tightly integrated and solid compiler-aided solution for Java reconfigurable computing. There is no limitation in generating synthesizable Verilog modules from any Java application while preserving Java semantics. In terms of performance, our infrastructure achieves the speedup by 5.4 times on average and by up to 9.4 times in measured benchmarks with respect to JVM-only execution. Furthermore, two optimization schemes such as an instruction folding and a live buffer removal can reduce 24% on average and up to 39% of the resource consumption.
Kyongjin JO Seon Wook KIM Jong-Kook KIM
The size and complexity of software in computer systems and even in consumer electronics is drastically and continuously increasing, thus increasing the compilation time. For example, the compilation time for building some of mobile phones' platform software takes several hours. In order to reduce the compilation time, this paper proposes a Distributed Scalable Compilation Tool, called DiSCo where full compilation passes such as preprocessing, compilation, and even linking are performed at remote machines, i.e. in parallel. To the best of our knowledge DiSCo is the first distributed compiler to support complete distributed processing in all the compilation passes. We use an extensive dependency analysis in parsing compilation commands for exploiting higher command-level parallelism, and we apply a file caching method and a network-drive protocol for reducing the remote compilation overhead and simplifying the implementation. Lastly, we minimize load imbalance and remote machine management overhead with our heuristic static scheduling method by predicting compilation time and considering the overheads invoked by the compilation process. Our evaluation using four large mobile applications and eight GNU applications shows that the performance of DiSCo is scalable and the performance is close to a profile scheduling.
Miseon HAN Yeoul NA Dongha JUNG Hokyoon LEE Seon WOOK KIM Youngsun HAN
A memory controller refreshes DRAM rows periodically in order to prevent DRAM cells from losing data over time. Refreshes consume a large amount of energy, and the problem becomes worse with the future larger DRAM capacity. Previously proposed selective refreshing techniques are either conservative in exploiting the opportunity or expensive in terms of required implementation overhead. In this paper, we propose a novel DRAM selective refresh technique by using page residence in a memory hierarchy of hardware-managed TLB. Our technique maximizes the opportunity to optimize refreshing by activating/deactivating refreshes for DRAM pages when their PTEs are inserted to/evicted from TLB or data caches, while the implementation cost is minimized by slightly extending the existing infrastructure. Our experiment shows that the proposed technique can reduce DRAM refresh power 43.6% on average and EDP 3.5% with small amount of hardware overhead.