1-7hit |
Tetsuya YAMADA Makoto ISHIKAWA Yuji OGATA Takanobu TSUNODA Takahiro IRITA Saneaki TAMAKI Kunihiko NISHIYAMA Tatsuya KAMEI Ken TATEZAWA Fumio ARAKAWA Takuichiro NAKAZAWA Toshihiro HATTORI Kunio UCHIYAMA
A 32-bit embedded RISC microprocessor core integrating a DSP has been developed using a 0.18-µm five-layer-metal CMOS technology. The integrated DSP has a single-MAC and exploits CPU resources to reduce hardware. The DSP occupies only 0.5 mm2. The processor core includes a large on-chip 128 kB SRAM called U-memory. A large capacity on-chip memory decreases the amount of traffic with an external memory. And it is effective for low-power and high-performance operation. To realize low-power dissipation for the U-memory access, the active ratio of U-memory's access is reduced. The critical path is a load path from the U-memory, and we optimized the path through the whole chip. The chip achieves 0.79 mA/MHz executing Dhrystone 1.1 at 108 MHz, which is suitable for mobile applications.
Makoto ISHIKAWA George SAIKALIS Shigeru OHO
We review practical case studies of a developing method of highly reliable real-time embedded control systems using a CPU model-based hardware/software co-simulation. We take an approach that enables us to fully simulate a virtual mechanical control system including a mechatronics plant, microcontroller hardware, and object code level software. This full virtual system approach simulates control system behavior, especially that of the microcontroller hardware and software. It enables design space exploration of microarchitecture, control design validation, robustness evaluation of the system, software optimization before components design. It also avoids potential problems. The advantage of this work is that it comprises all the components in a typical control system, enabling the designers to analyze effects from different domains, for example mechanical analysis of behavior due to differences in controller microarchitecture. To further improve system design, evaluation and analysis, we implemented an integrated behavior analyzer in the development environment. This analyzer can graphically display the processor behavior during the simulation without affecting simulation results such as task level CPU load, interrupt statistics, and the software variable transition chart. It also provides useful information on the system behavior. This virtual system analysis does not require software modification, does not change the control timing, and does not require any processing power from the target microcontroller. Therefore this method is suitable for real-time embedded control system design, in particular automotive control system design that requires a high level of reliability, robustness, quality, and safety. In this study, a Renesas SH-2A microcontroller model was developed on a CoMETTMplatform from VaST Systems Technology. An electronic throttle control (ETC) system and an engine control system were chosen to prove this concept. The electronic throttle body (ETB) model on the Saber® simulator from Synopsys® and the engine model on MATLAB®/Simulink® simulator from MathWorks can be simulated with the SH-2A model using a newly developed co-simulation interface between MATLAB®/Simulink® and CoMETTM. Though the SH-2A chip was being developed as the project was being executed, we were able to complete the OSEK OS development, control software design, and verification of the entire system using the virtual environment. After releasing a working sample chip in a later stage of the project, we found that such software could run on both actual ETC system and engine control system without critical problem. This demonstrates that our models and simulation environment are sufficiently credible and trustworthy.
Makoto ISHIKAWA Tatsuya KAMEI Yuki KONDO Masanao YAMAOKA Yasuhisa SHIMAZAKI Motokazu OZAWA Saneaki TAMAKI Mikio FURUYAMA Tadashi HOSHI Fumio ARAKAWA Osamu NISHII Kenji HIROSE Shinichi YOSHIOKA Toshihiro HATTORI
We have developed an application processor optimized for 3G cellular phones. It provides high energy efficiency by using various low power techniques. For low active power consumption, we use a hierarchical clock gating technique with a static clock gating controlled by software and a two-level dynamic clock gating controlled by hardware. This technique reduces clock power consumption by 35%. And we also apply a pointer-based pipeline to in the CPU core, which reduces the pipeline latch power by 25%. This processor contains 256 kB of on-chip user RAM (URAM) to reduce the external memory access power. The URAM read buffer (URB) enables high-throughput, low latency access to the URAM while keeping the CPU clock frequency high because the URAM read data is transferred to the URB in 256-bit widths at half the frequency of the CPU. The average miss penalty is 3.5 cycles at the CPU clock frequency, hit rate is 89% and the energy used for URAM reads is 8% less that what it would be for URAM without a URB. These techniques reduce the power consumption of the CPU core, and achieve 4500 MIPS/W at 1.0 V power supply (Dhrystone 2.1). For the low leakage requirements, we use internal power switches, and provides resume-standby (R-standby) and ultra-standby (U-standby) modes. Signals across a power boundary are transmitted through µI/O circuits to prevent invalid signal transmission. In the R-standby mode, the power supply to almost all the CPU core area, except for the URAM is cut off and the URAM is set to a retention mode. In the U-standby mode, the power supply to the URAM is also turned off for less leakage current. The leakage currents in the R-standby and in the U-standby modes are respectively only 98 and 12 µA. For quick recovery from the R-standby mode, the boot address register (BAR) and control register contents needed immediately after wake-up are saved by hardware into backup latches. The other contents are saved by software into URAM. It takes 2.8 ms to fully recover from R-standby.
Fumio ARAKAWA Motokazu OZAWA Osamu NISHII Toshihiro HATTORI Takeshi YOSHINAGA Tomoichi HAYASHI Yoshikazu KIYOSHIGE Takashi OKADA Masakazu NISHIBORI Tomoyuki KODAMA Tatsuya KAMEI Makoto ISHIKAWA
A SuperHTM embedded processor core implemented in a 130-nm CMOS process running at 400 MHz achieved 720 MIPS and 2.8 GFLOPS at a power of 250 mW in worst-case conditions. It has a dual-issue seven-stage pipeline architecture but maintains the 1.8 MIPS/MHz of the previous five-stage processor. The processor meets the requirements of a wide range of applications, and is suitable for digital appliances aimed at the consumer market, such as cellular phones, digital still/video cameras, and car navigation systems.
Makoto ISHIKAWA Naotake KAMIURA Yutaka HATA
This paper proposes a thresholding based segmentation method aided by Kleene Algebra. For a given image including some regions of interest (ROIs for short) with the coherent intensity level, assume that we can segment each ROI on applying thresholding technique. Three segmented states are then derived for every ROI: Shortage denoted by logic value 0, Correct denoted by 1 and Excess denoted by 2. The segmented states for every ROI in the image can be then expressed on a ternary logic system. Our goal is then set to find "Correct (1)" state for every ROI. First, unate function, which is a model of Kleene Algebra, based procedure is proposed. However, this method is not complete for some cases, that is, correctly segmented ratio is about 70% for three and four ROI segmentation. For the failed cases, Brzozowski operations, which are defined on De Morgan algebra, can accommodate to completely find all "Correct" states. Finally, we apply these procedures to segmentation problems of a human brain MR image and a foot CT image. As the result, we can find all "1" states for the ROIs, i. e. , we can correctly segment the ROIs.
Osamu NISHII Yoichi YUYAMA Masayuki ITO Yoshikazu KIYOSHIGE Yusuke NITTA Makoto ISHIKAWA Tetsuya YAMADA Junichi MIYAKOSHI Yasutaka WADA Keiji KIMURA Hironori KASAHARA Hideo MAEJIMA
We built a 12.4 mm12.4 mm, 45-nm CMOS, chip that integrates eight 648-MHz general purpose cores, two matrix processor (MX-2) cores, four flexible engine (FE) cores and media IP (VPU5) to establish heterogeneous multi-core chip architecture. The general purpose core had its IPC (instructions per cycle) performance enhanced by adding 32-bit instructions to the existing 16-bit fixed-length instruction set and executing up to two 32-bit instructions per cycle. Considering these five-to-seven years of embedded LSI and increasing trend of access-master within LSI, we predict that the memory usage of single core will not exceed 32-bit physical area (i.e. 4 GB), but chip-total memory usage will exceed 4 GB. Based on this prediction, the physical address was expanded from 32-bit to 40-bit. The fabricated chip was tested and a parallel operation of eight general purpose cores and four FE cores and eight data transfer units (DTU) is obtained on AAC (Advanced Audio Coding) encode processing.
Tetsuya YAMADA Masahide ABE Yusuke NITTA Kenji OGURA Manabu KUSAOKE Makoto ISHIKAWA Motokazu OZAWA Kiwamu TAKADA Fumio ARAKAWA Osamu NISHII Toshihiro HATTORI
A low-power SuperHTM embedded processor core, the SH-X2, has been designed in 90-nm CMOS technology. The power consumption was reduced by using hierarchical fine-grained clock gating to reduce the power consumption of the flip-flops and the clock-tree, synthesis and a layout that supports the implementation of the clock gating, and several-level power evaluations for RTL refinement. With this clock gating and RTL refinement, the power consumption of the clock-tree and flip-flops was reduced by 35% and 59%, including the process shrinking effects, respectively. As a result, the SH-X2 achieved 6,000 MIPS/W using a Renesas low-power process with a lowered voltage. Its performance-power efficiency was 25% better than that of a 130-nm-process SH-X.