1-2hit |
Hiroshi TSUTSUI Akihiko TOMITA Shigenori SUGIMOTO Kazuhisa SAKAI Tomonori IZUMI Takao ONOYE Yukihiro NAKAMURA
In this paper, a design of Programmable Logic Device (PLD) and a synthesis approach are proposed. Our PLD is derived from traditional Programmable Logic Array (PLA). The key extension is that programmable AND devices in PLA is replaced by Look-Up Tables (LUTs). A series of cascaded LUTs in the array can generate more complex terms, which we call generalized complex terms (GCTs), than product terms. In order to utilize the capability, a synthesis approach to map a given function into the array is also proposed. Our approach generates a expression of the sum of GCTs aiming to minimize the number of terms. A number of experimental results demonstrate that the number of terms for our PLD generated by our approach is 14.9% fewer than that by an existing approach. We design our PLD based on a fundamental unit named nGCT cell which can be used as LUTs in multiple sizes or random access memories. Implementation of the PLD based on a fundamental unit named nGCT cell which can be used as LUTs or random access memories is also described.
Yasuo SUGURE Seiji TAKEUCHI Yuichi ABE Hiromichi YAMADA Kazuya HIRAYANAGI Akihiko TOMITA Kesami HAGIWARA Takeshi KATAOKA Takanori SHIMURA
A 32-bit embedded RISC microcontroller core targeted for automotive, industrial, and PC-peripheral applications has been developed to offer the smaller code size, lower-latency instruction and interrupt processing needed for next-generation microcontrollers. The 360 MIPS/400MFLOPS/200 MHz core--based on the Harvard bus architecture--uses 0.13/0.15-µm CMOS technology and consists of a CPU, FPU, and register banks. To reduce the size of the control programs, new instructions have been added to the instruction set. These new instructions, as well as an enhanced C compiler, produce object files about 25% smaller than those for a previous designed core. A dual-issue superscalar structure consisting of three- or five-stage pipelines provides instruction processing with low latency. The cycle performance is thus an average of 1.8 times faster than the previous designed core. The superscalar structure is used to save 19 CPU registers in parallel when executing interrupt processing. That is, it saves the 19 CPU registers to the resister bank by accessing four registers at a time. This structure significantly improves interrupt response time from 37 cycles to 6 cycles.