1-3hit |
Makoto ISHIKAWA Tatsuya KAMEI Yuki KONDO Masanao YAMAOKA Yasuhisa SHIMAZAKI Motokazu OZAWA Saneaki TAMAKI Mikio FURUYAMA Tadashi HOSHI Fumio ARAKAWA Osamu NISHII Kenji HIROSE Shinichi YOSHIOKA Toshihiro HATTORI
We have developed an application processor optimized for 3G cellular phones. It provides high energy efficiency by using various low power techniques. For low active power consumption, we use a hierarchical clock gating technique with a static clock gating controlled by software and a two-level dynamic clock gating controlled by hardware. This technique reduces clock power consumption by 35%. And we also apply a pointer-based pipeline to in the CPU core, which reduces the pipeline latch power by 25%. This processor contains 256 kB of on-chip user RAM (URAM) to reduce the external memory access power. The URAM read buffer (URB) enables high-throughput, low latency access to the URAM while keeping the CPU clock frequency high because the URAM read data is transferred to the URB in 256-bit widths at half the frequency of the CPU. The average miss penalty is 3.5 cycles at the CPU clock frequency, hit rate is 89% and the energy used for URAM reads is 8% less that what it would be for URAM without a URB. These techniques reduce the power consumption of the CPU core, and achieve 4500 MIPS/W at 1.0 V power supply (Dhrystone 2.1). For the low leakage requirements, we use internal power switches, and provides resume-standby (R-standby) and ultra-standby (U-standby) modes. Signals across a power boundary are transmitted through µI/O circuits to prevent invalid signal transmission. In the R-standby mode, the power supply to almost all the CPU core area, except for the URAM is cut off and the URAM is set to a retention mode. In the U-standby mode, the power supply to the URAM is also turned off for less leakage current. The leakage currents in the R-standby and in the U-standby modes are respectively only 98 and 12 µA. For quick recovery from the R-standby mode, the boot address register (BAR) and control register contents needed immediately after wake-up are saved by hardware into backup latches. The other contents are saved by software into URAM. It takes 2.8 ms to fully recover from R-standby.
A combination of a software and a systolic hardware implementation for the Quasi Arithmetic compression algorithm is presented. The hardware is implemented as a pipeline hardware implementation. The implementation doesn't change the the algorithm. It just split it into two parts. The combination of parallel software and pipeline hardware can give very fast compression without decline of the compression efficiency.
Hongbing ZHU Mamoru SASAKI Takahiro INOUE
In this paper, by making good use of the parallel-transit-evaluation algorithm and sparsity of the connection between neurons, a pipeline structure is successfully introduced to the sequential Boltzmann machine processor. The novel structure speeds up nine times faster than the previous one, with only the 12% rise in hardware resources under 10,000 neurons. The performance is confirmed by designing it using 1.2 µm CMOS process standard cells and analyzing the probability of state-change.