1-6hit |
Koji INOUE Koji KAI Kazuaki MURAKAMI
This paper proposes a novel cache architecture suitable for merged DRAM/logic LSIs, which is called "dynamically variable line-size cache (D-VLS cache). " The D-VLS cache can optimize its line-size according to the characteristic of programs, and attempts to improve the performance by exploiting the high on-chip memory bandwidth on merged DRAM/logic LSIs appropriately. In our evaluation, it is observed that an average memory-access time improvement achieved by a direct-mapped D-VLS cache is about 20% compared to a conventional direct-mapped cache with fixed 32-byte lines. This performance improvement is better than that of a doubled-size conventional direct-mapped cache.
Koji INOUE Koji KAI Kazuaki MURAKAMI
Merged DRAM/logic LSIs could provide high on-chip memory bandwidth by interconnecting logic portions and DRAM with wider on-chip buses. For merged DRAM/logic LSIs with the memory hierarchy including cache memory, we can exploit such high on-chip memory bandwidth by means of replacing a whole cache line (or cache block) at a time on cache misses. This approach tends to increase the cache-line size if we attempt to improve the attainable memory bandwidth. Larger cache lines, however, might worsen the system performance if programs running on the LSIs do not have enough spatial locality of references and cache misses frequently take place. This paper describes a novel cache architecture suitable for merged DRAM/logic LSIs, called variable line-size cache or VLS cache, for resolving the above-mentioned dilemma. The VLS cache can make good use of the high on-chip memory bandwidth by means of larger cache lines and, at the same time, alleviate the negative effects of larger cache-line size by partitioning each large cache line into multiple sub-lines and allowing every sub-line to work as an independent cache line. The number of sub-lines involved when a cache replacement occurs can be determined depending on the characteristics of programs. This paper also evaluates the cost/performance improvements attainable by the VLS cache and compares it with those of conventional cache architectures. As a result, it is observed that a VLS cache reduces the average memory-access time by 16. 4% while it increases the hardware cost by only 13%, compared to a conventional direct-mapped cache with fixed 32-byte lines.
Koji KAI Akihiko INOUE Taku OHSAWA Kazuaki MURAKAMI
In merged DRAM/logic LSIs, the DRAM portion could suffer from shorter data retention time because of heat and noise caused by the logic portion. In order to reconsider the DRAM data retention characteristics, this paper formulates and evaluates the performance degradation due to conflicts between normal DRAM accesses and refresh operations. Next, this paper proposes a new DRAM refresh architecture which intends to reduce unnecessary refreshes. This architecture exploits multiple refresh periods. Each row is refreshed with the most appropriate period of them. Reducing the number of refreshes improves the accessibility to DRAM. It is shown that the method reduces the number of refreshes and the degree of the performance degradation of the logic portion.
Taku OHSAWA Koji KAI Kazuaki MURAKAMI
In merged DRAM/logic LSIs, it is necessary to reduce the number of DRAM refreshes because of higher heat dissipation caused by the logic portion on the same chip. In order to overcome this problem, we propose several DRAM refresh architectures. The basic is to eliminate unnecessary DRAM refreshes. In addition to this, we propose a method for reducing the number of DRAM refreshes by relocating data. In order to evaluate these architectures and method, we have estimated the DRAM refresh count in executing benchmark programs under several models which simulate each combination of them. As a result, in the most effective combination, we have obtained more than 80% reduction against a conventional DRAM refresh architecture for most of benchmark programs. In addition to it, we have taken normal DRAM access into account, even then we have obtained more than 50% reduction for several benchmarks.
Koji INOUE Koji KAI Kazuaki MURAKAMI
This paper proposes an on-chip memory-path architecture employing the dynamically variable line-size (D-VLS) cache for high performance and low energy consumption. The D-VLS cache exploits the high on-chip memory bandwidth attainable on merged DRAM/logic LSIs by replacing a whole large cache line in one cycle. At the same time, it attempts to avoid frequent evictions by decreasing the cache-line size when programs have poor spatial locality. Activating only on-chip DRAM subarrays corresponding to a replaced cache-line size produces a significant energy reduction. In our simulation, it is observed that our proposed on-chip memory-path architecture, which employs a direct-mapped D-VLS cache, improves the ED (Energy Delay) product by more than 75% over a conventional memory-path model.
Today, practical semiconductor products are an integral part of our lives and the infrastructure of society, and this trend will continue in the future. New areas of application will expand into medical, environmental, and agriculture (food)-related fields in addition to the conventional information and communication technology (ICT)-related field. Low-cost semiconductor devices with advanced functions have thus far been realized by miniaturization. However, we are now approaching the physical limit of miniaturization, and also, the investment required for new semiconductor manufacturing facilities has become huge. Under such circumstances, we propose an approach based on semiconductor devices called microcube chips and ideas of semiconductor development, i.e., agile integration and "inch-fab." Our approach is expected to contribute to expanding the range of companies that can fabricate semiconductor devices to include small-size companies, exploring new applications of semiconductor devices, and providing a wide variety of semiconductor devices at a low cost from the semiconductor industry.