1-7hit |
Tetsuya MATSUMURA Satoshi KUMAKI Hiroshi SEGAWA Kazuya ISHIHARA Atsuo HANAMI Yoshinori MATSUURA Stefan SCOTZNIOVSKY Hidehiro TAKATA Akira YAMADA Shu MURAYAMA Tetsuro WADA Hideo OHIRA Toshiaki SHIMADA Ken-ichi ASANO Toyohiko YOSHIDA Masahiko YOSHIMOTO Koji TSUCHIHASHI Yasutaka HORIBA
A single-chip MPEG-2 video, audio, and system encoder LSI has been developed. It performs concurrent real-time processing of MPEG-2 422P@ML video encoding, 2-channel Dolby Digital or MPEG-1 audio encoding, and system encoding that generates a multiplexed transport stream (TS) or a program stream (PS). Advanced hybrid architecture, which combines a high performance VLIW media-processor D30V and hardwired video processing circuits, has been adopted to satisfy the demands of both high flexibility and enormous computational capability. A unified control scheme has been newly proposed that hierarchically manages adaptive task priority control over asynchronous video, audio, and system encoding processes in order to achieve real-time concurrent processing using a single D30V. Dual dedicated motion estimation cores consisting of a coarse ME core (CME) for wide range searches and a fine ME core (FME) for precise searches have been integrated to produce high picture quality while using a small amount of hardware. Adopting these features, a single-chip encoder has been fabricated using 0.25-micron 4-layer metal CMOS technology, and integrated into a 14.2 mm 14.2 mm die with 11 million transistors.
Hiroshi SEGAWA Yoshinori MATSUURA Satoshi KUMAKI Tetsuya MATSUMURA Stefan SCOTZNIOVSKY Shu MURAYAMA Tetsuro WADA Ayako HARADA Eiji OHARA Ken-ichi ASANO Toyohiko YOSHIDA Yasutaka HORIBA
This paper describes an embedded software scheme for a single-chip MPEG-2 encoder that executes concurrent video, audio, and system encoding in real-time. The software features a scalable module structure, which is hierarchically composed and has expandable plug-in modules. For increased applicability, several task-modules are prepared for the respective video, audio, and system processing. In addition, an effective task management scheme that features polling and interrupt-based task switching has been proposed in order to achieve real-time operation. The software having these features and including all task-modules is implemented on a single media-processor D30V on a single chip MPEG-2 video, audio, and system encoder. This encoder realizes real-time MPEG-2 video encoding, Dolby Digital or MPEG-1 audio encoding, and system encoding that generates TS or PS over 50 Mbps for various applications. Assuming a DVD or DTV encoder system, the software is reconstructed with less than 56.6-kbytes of instruction and 145.6 MIPS performance. The single media-processor with 64-kbytes of instruction RAM and 162 MIPS performance, running at a clock rate of 162 MHz, can successfully accomplish a real-time operation with the proposed embedded software.
Akira YAMADA Toyohiko YOSHIDA Tetsuya MATSUMURA Shin-ichi URAMOTO Koji TSUCHIHASHI Edgar HOLMANN
Integrating a 243 MHz dual-issue RISC processor core with a small set of dedicated hardware can create a single chip system for real-time encoding and decoding for MPEG2 MP@ML (main profile at main level). A trade-off between software and dedicated hardware is very important to decide performance of the system. This paper evaluates several MPEG2 encoding and decoding systems, focusing on both chip area and power consumption. For MPEG2 encoding, a newly introduced hybrid approach includes the processor core and the dedicated hardware that performs the discrete cosine transform (DCT), the inverse DCT (IDCT), variable length encoding (VLC) and block loading process. The estimated area for the encoder, 23. 0 mm2 using a 0. 3-micrometer 1-poly 4-metal CMOS process, is 33% smaller than that of the dedicated hardware approach. The estimated power consumption for the encoder is 13% smaller than that of the dedicated hardware approach. The dual-issue RISC processor approach has the advantage of a small chip area, low power consumption and that of being very easy to program for multimedia applications.
Hisakazu SATO Toyohiko YOSHIDA Masahito MATSUO Toru KENGAKU Koji TSUCHIHASHI
This paper presents the architecture of a newly-developed dual-issue RISC processor, D10V, that achieves both high throughput signal processing capability and maintains flexibility for general purpose applications. The RISC processor uses a 2-way VLIW architecture with a 32-bit wide instruction word. Two sub-instructions in a VLIW instruction are executed in two execution units in parallel. It also has several enhancements for signal processing. The processor includes pipelined multiply-and-accumulate instructions allowing a new multiply operation to be initiated every clock cycle and block repeat instructions for zero delay penalty loops. Single-cycle data moves of double-word data elements with modulo addressing are provided to deliver required memory bandwidth for signal processing applications. As a result, the D10V achieves high signal processing capability as 1 clock cycle per tap for FIR filtering. Also, several DSP benchmarks illustrate that the D10V competes favorably and in some instances outperforms conventional 16-bit DSPs. For master controlling application, the processor provides memory operations for signed/unsigned byte and bit wise operations. It shows 49 Dhrystone MIPS at 52 MHz, for general purpose applications.
Morgan Hirosuke MIKI Mamoru SAKAMOTO Shingo MIYAMOTO Yoshinori TAKEUCHI Toyohiko YOSHIDA Isao SHIRAKAWA
This paper evaluates the code efficiency of the ARM, Java, and x86 instruction sets by compiling the SPEC CPU95/CPU2000/JVM98 and CaffeineMark benchmarks, from the aspects of code sizes, basic block sizes, instruction distributions, and average instruction lengths. As a result, mainly because (i) the Java architecture is a stack machine, (ii) there are only four local variables which can be accessed by a 1-byte instruction, and (iii) additional instructions are provided for the network security, the code efficiency of Java turns out to be inferior to that of ARM Thumb. Moreover, through this efficiency analysis it should be stressed that there exists the high potential of constructing a more efficient code architecture by taking minute account of the customization of an instruction set as well as the number of registers.
Ayako HARADA Shin-ichi HATTORI Tadashi KASEZAWA Hidenori SATO Tetsuya MATSUMURA Satoshi KUMAKI Kazuya ISHIHARA Hiroshi SEGAWA Atsuo HANAMI Yoshinori MATSUURA Ken-ichi ASANO Toyohiko YOSHIDA Masahiko YOSHIMOTO Tokumichi MURAKAMI
An MPEG-2 422P@HL encoder chip set composed of a preprocessing LSI, an encoding LSI, and a motion estimation LSI is described. This chip set realizes a two-type scalability of picture resolution and quality, and executes a hierarchical coding control in the overall encoder system. Due to its scalable architecture, the chip set realizes a 422P@HL video encoder with multi-chip configuration. This single encoding LSI achieves 422P@ML video, audio, and system encoding in real time. It employs an advanced hybrid architecture with a 162 MHz media processor and dedicated video processing hardware. It also has dual communication ports for parallel processing with multi-chip configuration. Transferring of reconstructed data and macroblock characteristic data between neighboring encoder modules is executed via these ports. The preprocessing LSI is fabricated using 0.25 micron three-layer metal CMOS technology and integrates 560 K gates in an area of 12.0 mm 12.0 mm . The encoding LSI is fabricated using 0.25 micron four-layer metal CMOS technology and integrates 11 million transistors in an area of 14.2 mm 14.2 mm . The motion estimation LSI is fabricated using 0.35 micron three-layer metal CMOS technology. It integrates 1.9 million transistors in an area of 8.5 mm 8.5 mm . This chip set makes various system configurations possible and allows for a compact and cost-effective video encoder with high picture quality.
Toyohiko YOSHIDA Akira YAMADA Edgar HOLMANN Hidehiro TAKATA Atsushi MOHRI Yukihiko SHIMAZU Kiyoshi NAKAKIMURA Keiichi HIGASHITANI
A dual-issue VLIW processor, running at 250 MHz, is enhanced with multimedia instructions for a sustained peak performance of 1000MOPS. The multimedia processor integrates 300 K transistors in an 8 mm2 core area and it is fabricated onto a 6 mm6. 2 mm chip with 32 kB instruction and 32 kB data RAMs in a 0. 3-micrometer, four-layer metal CMOS process. It consumes 1. 2 W at 2. 0 V running at 250 MHz. The VLIW processor achieves a speed-up of more than 4 times over a single-issue RISC for MPEG video block decoding. A decoder implemented on the multimedia processor with a small amount of dedicated hardware, such as the Huffman decoder and a DMA controller will decode the worst case 88 video block data in 754 cycles, leading to a real-time MPEG-2 system, video, and audio decoding system.