IEICE global.ieice.org Site

Keyword Search Result

[Keyword] MpSoC(9hit)

1-9hit

CoDMA: Buffer Avoided Data Exchange in Distributed Memory Systems
Ting CHEN Hengzhu LIU Botao ZHANG

PAPER-Integrated Electronics

Vol:
E97-C No:4
Page(s):
386-391
Data exchange, in which two blocks of data are swapped between cores in distributed memory systems, necessitates additional memory buffer in a multiprocessor system-on-chip. In this paper, we propose a novel bidirectional inter-core communication mechanism called coherent direct memory access (CoDMA). The CoDMA ensures that the writing address is always less than the reading address in coherent read and write mode, so as to avoid read-after-write (RAW) errors. It features an efficient data exchanging scheme without using data buffer in the memory. A four-core single-instruction multiple-data processor is established for the experiments, based on a multi-bus network-on-chip. Experimental results show that the proposed method consumes no additional memory buffer and achieves 39% and 20% average performance improvement compared with traditional Methods 1 and 2, respectively. And a maximal of 43% reduction in memory usage is achieved, at the cost of only 0.22% more area overhead compared with the entire system.
A Low-Cost and Energy-Efficient Multiprocessor System-on-Chip for UWB MAC Layer
Hao XIAO Tsuyoshi ISSHIKI Arif Ullah KHAN Dongju LI Hiroaki KUNIEDA Yuko NAKASE Sadahiro KIMURA

PAPER-Computer System

Vol:
E95-D No:8
Page(s):
2027-2038
Ultra-wideband (UWB) technology has attracted much attention recently due to its high data rate and low emission power. Its media access control (MAC) protocol, WiMedia MAC, promises a lot of facilities for high-speed and high-quality wireless communication. However, these benefits in turn involve a large amount of computational load, which challenges the traditional uniprocessor architecture based implementation method to provide the required performance. However, the constrained cost and power budget, on the other hand, makes using commercial multiprocessor solutions unrealistic. In this paper, a low-cost and energy-efficient multiprocessor system-on-chip (MPSoC), which tackles at once the aspects of system design, software migration and hardware architecture, is presented for the implementation of UWB MAC layer. Experimental results show that the proposed MPSoC, based on four simple RISC processors and shared-memory infrastructure, achieves up to 45% performance improvement and 65% power saving, but takes 15% less area than the uniprocessor implementation.
An H.264/AVC High422 Profile and MPEG-2 422 Profile Encoder LSI for HDTV Broadcasting Infrastructures
Koyo NITTA Hiroe IWASAKI Takayuki ONISHI Takashi SANO Atsushi SAGATA Yasuyuki NAKAJIMA Minoru INAMORI Ryuichi TANIDA Atsushi SHIMIZU Ken NAKAMURA Mitsuo IKEDA Jiro NAGANUMA

PAPER

Vol:
E95-C No:4
Page(s):
432-440
An H.264/AVC encoder LSI (named “SARA”) that supports High422 profile, as well as 422 profile of MPEG-2, has been developed for HDTV broadcasting infrastructures. It contains three motion estimation and compensation (ME/MC) engines with wide search ranges of -217.75 to +199.75 horizontally, -109.75 to +145.75 vertically, which can utilize almost all H.264/AVC ME/MC coding tools, such as multiple reference frame, variable block size, quarter-pel prediction, macroblock adaptive field/frame prediction (MBAFF), spatial/temporal direct mode, and weighted prediction. Our evaluations show that it can encode fast moving scenes with 1.2 dB to 1.7 dB higher than the JM. It was successfully fabricated in a 90-nm technology, and integrates 140 million transistors.
A Novel Sequential Tree Algorithm Based on Scoreboard for MPI Broadcast Communication
Won-young CHUNG Jae-won PARK Seung-Woo LEE Won Woo RO Yong-surk LEE

LETTER-Computer System

Vol:
E94-D No:12
Page(s):
2523-2527
The message passing interface (MPI) broadcast communication commonly causes a severe performance bottleneck in multicore system that uses distributed memory. Thus, in this paper, we propose a novel algorithm and hardware structure for the MPI broadcast communication to reduce the bottleneck situation. The transmission order is set based on the state of each processing node that comprises the multicore system, so the novel algorithm minimizes the performance degradation caused by conflict. The proposed scoreboard MPI unit is evaluated by modeling it with SystemC and implemented using VerilogHDL. The size of the proposed scoreboard MPI unit occupies less than 1.03% of the whole chip, and it yields a highly improved performance up to 75.48% as its maximum with 16 processing nodes. Hence, with respect to low-cost design and scalability, this scoreboard MPI unit is particularly useful towards increasing overall performance of the embedded MPSoC.
A Low-Cost Standard Mode MPI Hardware Unit for Embedded MPSoC
Won-young CHUNG Ha-young JEONG Won Woo RO Yong-surk LEE

LETTER-Computer System

Vol:
E94-D No:7
Page(s):
1497-1501
In this paper, we propose a novel low-cost Message Passing Interface (MPI) unit between processor nodes, which supports message passing in multiprocessor systems using distributed memory architecture. Our MPI unit operates in the standard mode – using the buffered mode for small amounts of data transaction and the synchronous mode for large amounts of data transaction. This results in increased performance by reducing the control message transmission time for the small amount of data. We verified the performance with a simulator designed based on SystemC. Additionally, we designed the MPI unit using VerilogHDL, and we synthesized it with a synopsys design compiler. The proposed standard mode MPI unit shows a high performance even though the size of the MPI unit occupies less than 1% of the whole chip. Thus, with respect to low-cost design and scalability, this MPI hardware unit is useful to increase overall performance of the embedded Multiprocessor System on a Chip (MPSoC).
Automatic Communication Synthesis with Hardware Sharing for Multi-Processor SoC Design
Yuki ANDO Seiya SHIBATA Shinya HONDA Hiroyuki TOMIYAMA Hiroaki TAKADA

PAPER-High-Level Synthesis and System-Level Design

Vol:
E93-A No:12
Page(s):
2509-2516
We present a hardware sharing method for design space exploration of multi-processor embedded systems. In our prior work, we had developed a system-level design tool named SystemBuilder which automatically synthesizes target implementation of a system from a functional description. In this work, we have extended SystemBuilder so that it can automatically synthesize an area-efficient implementation which shares a hardware module among different applications. With SystemBuilder, designers only need to enable an option in order to share a hardware module. The designers, therefore, can easily explore a design space including hardware sharing in short time. A case study shows the effectiveness of the hardware sharing on design space exploration.
Variation-Aware Task and Communication Scheduling in MPSoCs for Power-Yield Maximization
Mahmoud MOMTAZPOUR Maziar GOUDARZI Esmaeil SANAEI

PAPER-High-Level Synthesis and System-Level Design

Vol:
E93-A No:12
Page(s):
2542-2550
Parameter variations reveal themselves as different frequency and leakage powers per instances of the same MPSoC. By the increasing variation with technology scaling, worst-case-based scheduling algorithms result in either increasingly less optimal schedules or otherwise more lost yield. To address this problem, this paper introduces a variation-aware task and communication scheduling algorithm for multiprocessor system-on-chip (MPSoC). We consider both delay and leakage power variations during the process of finding the best schedule so that leakier processors are less utilized and can be more frequently put in sleep mode to reduce power. Our algorithm takes advantage of event tables to accelerate the statistical timing and power analysis. We use genetic algorithm to find the best schedule that maximizes power-yield under a performance-yield constraint. Experimental results on real world benchmarks show that our proposed algorithm achieves 16.6% power-yield improvement on average over deterministic worst-case-based scheduling.
Pipeline-Based Partition Exploration for Heterogeneous Multiprocessor Synthesis
Kang ZHAO Jinian BIAN Sheqin DONG Yang SONG Satoshi GOTO

PAPER-VLSI Design Technology and CAD

Vol:
E92-A No:9
Page(s):
2283-2294
To achieve an automated implementation for the application-specific heterogeneous multiprocessor systems-on-chip (MPSoC), partitioning and mapping the sequential programs onto multiple parallel processors is one of the most difficult challenges. However, the existing traditional parallelizing techniques cannot solve the MPSoC-related problems effectively, so designers are still required to manually extract the concurrency potentials in the program. To solve this bottleneck, an automated application partition technique is needed. However, completely automatic parallelism is ineffective, so it is promising to explore concurrency for certain practical special structures. To settle those issues, this paper proposes a template-based algorithm to automatically partition a special load-compute-store (LCS) loop structure. Since specific-instruction customization for the application specific instruction-set processors (ASIPs) has interactions with task partitioning, the proposed algorithm integrates the dynamic pipelining and ASIP techniques using an iterative improvement strategy: first, an initial pipelining scheme is generated to obtain the maximum parallelism; second, under the primary partition results specific instructions are customized respectively for each subprogram; third, the program is repartitioned via pipelining under the specific instruction configurations. The proposed method has been implemented in the context of a commercial extensible multiprocessor design flow, using the Xtensa-based XTMP platform from Tensilica Inc. Based on a case study of Fast Fourier Transform (FFT), the experimental results indicate that the partitioned programs by the proposed method demonstrate an average speedup of 10 compared to the original sequential programs which have not been partitioned and run on the uniprocessor system.
Exploring Partitions Based on Search Space Smoothing for Heterogeneous Multiprocessor System
Kang ZHAO Jinian BIAN Sheqin DONG Yang SONG Satoshi GOTO

PAPER-Electronic Circuits and Systems

Vol:
E91-A No:9
Page(s):
2456-2464
Programming the multiprocessor system-on-chip (MPSoC) requires partitioning the sequential reference programs onto multiple processors running in parallel. However, designers still need to partition the code manually due to the lack of automated partition techniques. To settle this issue, this paper proposes a partition exploration algorithm based on the search space smoothing techniques, and implements the proposed method using a commercial extensible processor (Xtensa LX2 processor from Tensilica Inc.). We have verified the feasibility of the algorithm by implementing the MPEG2 benchmark on the Xtensa-based two-processor system. The final experimental results indicate a performance improvement of at least 1.6 compared to the single-processor system.

Keyword Search Result

[Keyword] MpSoC(9hit)

CoDMA: Buffer Avoided Data Exchange in Distributed Memory Systems

A Low-Cost and Energy-Efficient Multiprocessor System-on-Chip for UWB MAC Layer

An H.264/AVC High422 Profile and MPEG-2 422 Profile Encoder LSI for HDTV Broadcasting Infrastructures

A Novel Sequential Tree Algorithm Based on Scoreboard for MPI Broadcast Communication

A Low-Cost Standard Mode MPI Hardware Unit for Embedded MPSoC

Automatic Communication Synthesis with Hardware Sharing for Multi-Processor SoC Design

Variation-Aware Task and Communication Scheduling in MPSoCs for Power-Yield Maximization

Pipeline-Based Partition Exploration for Heterogeneous Multiprocessor Synthesis

Exploring Partitions Based on Search Space Smoothing for Heterogeneous Multiprocessor System

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles