IEICE global.ieice.org Site

Keyword Search Result

[Keyword] reconfigurable processor(18hit)

1-18hit

Networking Experiment of Domain-Specific Networking Platform Based on Optically Interconnected Reconfigurable Communication Processors Open Access
Masaki MURAKAMI Takashi KURIMOTO Satoru OKAMOTO Naoaki YAMANAKA Takayuki MURANAKA

PAPER-Network System

Pubricized:
2023/02/15
Vol:
E106-B No:8
Page(s):
660-668
A domain-specific networking platform based on optically interconnected reconfigurable communication processors is proposed. Some application examples of the reconfigurable communication processor and networking experiment results are presented.
Acceleration of Block Matching on a Low-Power Heterogeneous Multi-Core Processor Based on DTU Data-Transfer with Data Re-Allocation
Yoshitaka HIRAMATSU Hasitha Muthumala WAIDYASOORIYA Masanori HARIYAMA Toru NOJIRI Kunio UCHIYAMA Michitaka KAMEYAMA

PAPER-Integrated Electronics

Vol:
E95-C No:12
Page(s):
1872-1882
The large data-transfer time among different cores is a big problem in heterogeneous multi-core processors. This paper presents a method to accelerate the data transfers exploiting data-transfer-units together with complex memory allocation. We used block matching, which is very common in image processing, to evaluate our technique. The proposed method reduces the data-transfer time by more than 42% compared to the earlier works that use CPU-based data transfers. Moreover, the total processing time is only 15 ms for a VGA image with 1616 pixel blocks.
Software Defined Modem for Cognitive Radio with Dynamically Reconfigurable Processor
Ren SAKATA Daisuke TAKEDA Noritaka DEGUCHI Tatsuma HIRANO Takashi YOSHIKAWA

PAPER-Transmission Systems and Transmission Equipment for Communications

Vol:
E95-B No:3
Page(s):
810-818
Software Defined Radio (SDR) techniques are expected to be among the key technologies of heterogeneous cognitive radio networks for realizing efficient and convenient wireless communications by providing multiple radio services to users and decreasing development costs. In this paper, in order to evaluate the feasibility of SDR modems, we study the amount of computing throughput of a recent wireless system and determine a suitable modem architecture. Firstly, the functions for which SDR techniques provide significant benefits are clarified. Secondly, the computing throughputs are measured under the assumption that a dynamically reconfigurable processor, FlexSwordTM, is employed. Finally, based on a consideration of timing charts, we propose the architecture of an SDR-based modem with FlexSword. The possibility of implementing several wireless systems is also considered.
Memory-Access-Driven Context Partitioning for Window-Based Image Processing on Heterogeneous Multicore Processors
Hasitha Muthumala WAIDYASOORIYA Yosuke OHBAYASHI Masanori HARIYAMA Michitaka KAMEYAMA

PAPER-Design Methodology

Vol:
E95-D No:2
Page(s):
354-363
Accelerator cores in low-power heterogeneous processors have on-chip local memories to enable parallel data access. The memory capacities of the local memories are very small. Therefore, the data should be transferred from the global memory to the local memories many times. These data transfers greatly increase the total processing time. Memory allocation technique to increase the data sharing is a good solution to this problem. However, when using reconfigurable cores, the data must be shared among multiple contexts. However, conventional context partitioning methods only consider how to reuse limited hardware resources in different time slots. They do not consider the data sharing. This paper proposes a context partitioning method to share both the hardware resources and the local memory data. According to the experimental results, the proposed method reduces the processing time by more than 87% compared to conventional context partitioning techniques.
Iterative Synthesis Methods Estimating Programmable-Wire Congestion in a Dynamically Reconfigurable Processor
Takao TOI Takumi OKAMOTO Toru AWASHIMA Kazutoshi WAKABAYASHI Hideharu AMANO

PAPER-High-Level Synthesis and System-Level Design

Vol:
E94-A No:12
Page(s):
2619-2627
Iterative synthesis methods for making aware of wire congestion are proposed for a multi-context dynamically reconfigurable processor (DRP) with a large number of processing elements (PEs) and programmable-wire connections. Although complex data-paths can be synthesized using the programmable-wire, its delay is long especially when wire connections are congested. We propose two iterative synthesis techniques between a high-level synthesizer (HLS) and the place & route tool to shorten the prolonged wire delay. First, we feed back wire delays for each context to a scheduler in the HLS. The experimental results showed that a critical-path delay was shorten by 21% on average for applications with timing closure problems. Second, we skip the routing and estimate wire delays based on the congestion. The synthesis time was shorten to 1/3 causing delay improvement rate degradation at two points on average.
Resource Minimization Method Satisfying Delay Constraint for Replicating Large Contents
Sho SHIMIZU Hiroyuki ISHIKAWA Yutaka ARAKAWA Naoaki YAMANAKA Kosuke SHIBA

PAPER-Fundamental Theories for Communications

Vol:
E92-B No:10
Page(s):
3102-3110
How to minimize the number of mirroring resources under a QoS constraint (resource minimization problem) is an important issue in content delivery networks. This paper proposes a novel approach that takes advantage of the parallelism of dynamically reconfigurable processors (DRPs) to solve the resource minimization problem, which is NP-hard. Our proposal obtains the optimal solution by running an exhaustive search algorithm suitable for DRP. Greedy algorithms, which have been widely studied for tackling the resource minimization problem, cannot always obtain the optimal solution. The proposed method is implemented on an actual DRP and in experiments reduces the execution time by a factor of 40 compared to the conventional exhaustive search algorithm on a Pentium 4 (2.8 GHz).
A Preemption Algorithm for a Multitasking Environment on Dynamically Reconfigurable Processors
Vu Manh TUAN Hideharu AMANO

PAPER-Computer Systems

Vol:
E91-D No:12
Page(s):
2793-2803
Task preemption is a critical mechanism for building an effective multi-tasking environment on dynamically reconfigurable processors. When a task is preempted, its necessary state information must be correctly preserved in order for the task to be resumed later. Not only do coarse-grained Dynamically Reconfigurable Processing Array (DRPAs) devices have different architectures using a variety of development tools, but the great amount of state data of hardware tasks executing on such devices are usually distributed on many different storage elements. To address these difficulties, this paper aims at studying a general method for capturing the state data of hardware tasks targeting coarse-grained DRPAs. Based on resource usage, algorithms for identifying preemption points and inserting preemption states subject to user-specified preemption latency are proposed. Moreover, a modification to automatically incorporate proposed steps into the system design flow is also discussed. The performance degradation caused by additional preemption states is minimized by allowing preemption only at predefined points where demanded resources are small. The evaluation result using a model based on NEC Electronics' DRP-1 shows that the proposed method can produce preemption points satisfying a given preemption latency with reasonable hardware overhead (from 6% to 15%).
A Mapping Method for Multi-Process Execution on Dynamically Reconfigurable Processors
Vu MANH TUAN Hideharu AMANO

PAPER-Computer Systems

Vol:
E91-D No:9
Page(s):
2312-2322
The multi-process execution in dynamically reconfigurable processors is a technique to enhance throughput by trying to exploit more inherent parallelism of applications. Basically, a total process for an application is divided into small processes, assigned into limited areas of a reconfigurable array, and concurrently executed in a pipelined manner. In order to improve the efficiency of the multi-process execution, a systematic method for mapping processes onto a reconfigurable array consisting of multiple hardware execution units is essential. This paper proposes and investigates a systematic method for mapping an application modeled as a Kahn Process Network onto a dynamically reconfigurable processing array. In order to execute streaming applications in a pipelined manner, the size of Tiles, which is a unit area of dynamically reconfigurable array, and the grouping of processes are adjusted. Using real applications such as DCT, JPEG encoder and Turbo encoder, the impact of different versions mapped onto the NEC Dynamically Reconfigurable Processor on performance is evaluated. Evaluation results show that our proposed mapping algorithm achieves the best performance in terms of the throughput and the execution time.
A Reconfigurable Processor Infrastructure for Accelerating Java Applications
Youngsun HAN Seok Joong HWANG Seon Wook KIM

PAPER-VLSI Design Technology and CAD

Vol:
E91-A No:8
Page(s):
2091-2100
In this paper, we present a reconfigurable processor infrastructure to accelerate Java applications, called Jaguar. The Jaguar infrastructure consists of a compiler framework and a runtime environment support. The compiler framework selects a group of Java methods to be translated into hardware for delivering the best performance under limited resources, and translates the selected Java methods into Verilog synthesizable code modules. The runtime environment support includes the Java virtual machine (JVM) running on a host processor to provide Java execution environment to the generated Java accelerator through communication interface units while preserving Java semantics. Our compiler infrastructure is a tightly integrated and solid compiler-aided solution for Java reconfigurable computing. There is no limitation in generating synthesizable Verilog modules from any Java application while preserving Java semantics. In terms of performance, our infrastructure achieves the speedup by 5.4 times on average and by up to 9.4 times in measured benchmarks with respect to JVM-only execution. Furthermore, two optimization schemes such as an instruction folding and a live buffer removal can reduce 24% on average and up to 39% of the resource consumption.
A Self-Test of Dynamically Reconfigurable Processors with Test Frames
Tomoo INOUE Takashi FUJII Hideyuki ICHIHARA

PAPER-High-Level Testing

Vol:
E91-D No:3
Page(s):
756-762
This paper proposes a self-test method of coarse grain dynamically reconfigurable processors (DRPs) without hardware overhead. In the method, processor elements (PEs) compose a test frame, which consists of test pattern generators (TPGs), processor elements under test (PEUTs) and response analyzers (RAs), while testing themselves one another by changing test frames appropriately. We design several test frames with different structures, and discuss the relationship of the structures to the numbers of contexts and test frames for testing all the functions of PEs. A case study shows that there exists an optimal test frame which minimizes the test application time under a constraint.
Compiler for Architecture with Dynamic Reconfigurable Processing Unit by Use of Automatic Assignment Method of Sub-Programs Based on Their Quantitative Evaluation
Takefumi MIYOSHI Nobuhiko SUGINO

PAPER-Reconfigurable Device and Design Tools

Vol:
E90-D No:12
Page(s):
1967-1976
For a coarse grain dynamic reconfigurable processing unit cooperating with a general purpose processor, a context selection method, which can reduce total execution cycles of a given program, is proposed. The method evaluates context candidates from a given program, in terms of reduction in cycles by exploiting parallel and pipeline execution of the reconfigurable processor. According to this evaluation measure, the method selects appropriate contexts for the dynamic reconfigurable processing unit. The proposed method is implemented on the framework of COINS project. For several example programs, the generated codes are evaluated by a software simulator in terms of execution cycles, and these results prove the effectiveness of the proposed method.
Query-Transaction Acceleration Using a DRP Enabling High-Speed Stateful Packet-by-Packet Self-Reconfiguration
Takashi ISOBE

PAPER-Reconfigurable System and Applications

Vol:
E90-D No:12
Page(s):
1905-1913
Ubiquitous computing and the upcoming broadcast-and-communication convergence require networks that provide very complex services. In particular, networks are needed that can service several users or terminals at various times or places with various application-layer functions that can be changed at a high response speed by adding high-speed processing at the network edge. I present a query-transaction acceleration appliance that uses a dynamic reconfigurable processor (DRP) and enables high-speed stateful packet-by-packet self-reconfiguration to achieve that requirement. This appliance processes at high speeds, has flexible application layer functions that are changeable with a high-speed response, and uses direct packet I/O bypassing memory, hierarchical interconnection of processors, and stateful packet-by-packet self-reconfiguration. In addition, the DRP enables the fabrication of a compact and electric-power-saving appliance. I made a prototype and implemented several transport/application layer functions, such as TCP connection control, auto-caching of server files, uploading cache data for server, and selection/insertion/deletion/update of data for a database. In an experimental evaluation in which four kinds of query-transactions were continually executed in order, I found that the appliance achieved four functions changeable at a high response speed (within 1 ms), and a processing speed (2,273 transactions/sec.) 18 times faster than a PC with a 2-GHz processor.
Improving Performance and Energy Saving in a Reconfigurable Processor via Accelerating Control Data Flow Graphs
Farhad MEHDIPOUR Hamid NOORI Morteza SAHEB ZAMANI Koji INOUE Kazuaki MURAKAMI

PAPER-Reconfigurable Device and Design Tools

Vol:
E90-D No:12
Page(s):
1956-1966
Extracting frequently executed (hot) portions of the application and executing their corresponding data flow graph (DFG) on the hardware accelerator brings about more speedup and energy saving for embedded systems comprising a base processor integrated with a tightly coupled accelerator. Extending DFGs to support control instructions and using Control DFGs (CDFGs) instead of DFGs results in more coverage of application code portion are being accelerated hence, more speedup and energy saving. In this paper, motivations for extending DFGs to CDFGs and handling control instructions are introduced. In addition, basic requirements for an accelerator with conditional execution support are proposed. Then, two algorithms are presented for temporal partitioning of CDFGs considering the target accelerator architectural constraints. To demonstrate effectiveness of the proposed ideas, they are applied to the accelerator of a reconfigurable processor called AMBER. Experimental results approve the remarkable effectiveness of covering control instructions and using CDFGs versus DFGs in the aspects of performance and energy reduction.
A Survey on Dynamically Reconfigurable Processors Open Access
Hideharu AMANO

INVITED PAPER

Vol:
E89-B No:12
Page(s):
3179-3187
Dynamically reconfigurable processors are consisting of an array of processing elements whose functions and interconnections can be dynamically changed. 9 commercial systems are picked up, and their array structures, processing elements and interconnection architectures are classified.
Low-Power Field-Programmable VLSI Using Multiple Supply Voltages
Weisheng CHONG Masanori HARIYAMA Michitaka KAMEYAMA

PAPER-Low Power Methodology

Vol:
E88-A No:12
Page(s):
3298-3305
A low-power field-programmable VLSI (FPVLSI) is presented to overcome the problem of large power consumption in field-programmable gate arrays (FPGAs). To reduce power consumption in routing networks, the FPVLSI consists of cells that are based on a bit-serial pipeline architecture which reduces routing block complexity. Moreover, a level-converter-less multiple-supply-voltage scheme using dynamic circuits is proposed, where the cells in non-critical paths use a low supply voltage for low power under a speed constraint. The FPVLSI is evaluated based on a 0.18-µm CMOS design rule. The power consumption of the FPVLSI using multiple supply voltages is reduced to 17% or less compared to that of the static-circuit-based FPVLSI using multiple supply voltages.
Design and Performance Evaluation of IEEE 802.11a SDR Software Implemented on a Reconfigurable Processor
Kazunori AKABANE Hiroyuki SHIBA Munehiro MATSUI Kiyoshi KOBAYASHI Katsuhiko ARAKI

PAPER

Vol:
E88-B No:11
Page(s):
4163-4169
Software defined radio (SDR) mobile terminals that can access multiple wireless communication systems are the trend of the future. An SDR wideband mobile terminal must be capable of high-speed data processing and low power consumption. We focused on reconfigurable processors with these features. To evaluate the performance of reconfigurable processors for SDR wideband mobile terminals, we developed and evaluated software that runs on a reconfigurable processor for the IEEE 802.11a wireless local area network (LAN) baseband part, which requires high-speed data processing. This paper describes the configuration of the SDR IEEE 802.11a software for the reconfigurable processor and its performance evaluation results. Moreover, we showed the requirements for applying the reconfigurable processor to SDR wideband mobile terminals, and confirmed that the reconfigurable processor could be applied to SDR mobile terminals by slight progresses.
Architecture of a Fine-Grain Field-Programmable VLSI Based on Multiple-Valued Source-Coupled Logic
Md.Munirul HAQUE Michitaka KAMEYAMA

PAPER

Vol:
E87-C No:11
Page(s):
1869-1875
A novel Multiple-Valued Field-Programmable VLSI (MV-FPVLSI) architecture using the Multiple-Valued Source-Coupled Logic (MVSCL) is proposed to implement special-purpose processors. An MV-FPVLSI consists of identical cells, which are connected to 8-neighborhood ones. To reduce the complexity of the interconnection block between two cells in an MV-FPVLSI, a bit-serial fine-grain pipeline architecture is introduced which allows single-wire data transmission and as a result, the data-transmission delay becomes very small in comparison with that of a conventional FPGA. To reduce the number of switches in the interconnection block further, a cell, using multiple-valued source-coupled logic circuits, is proposed, where the input currents can be linearly summed just by wiring without using any active devices. Not only the data, but also the control signal can be superposed by linear summation. As a result, no input switch is required which contributes to smaller data transmission delay. Moreover, an arbitrary 2-input logic function can be generated by linear summation of the input currents and threshold operations using these reconfigurable MVSCL circuits. As the MVSCL circuit has high driving capability in comparison with that of an equivalent CMOS circuit, high-speed logic operation is also possible while maintaining low power.
Dynamically Reconfigurable Processor Implemented with IPFlex's DAPDNA Technology
Takayuki SUGAWARA Keisuke IDE Tomoyoshi SATO

INVITED PAPER

Vol:
E87-D No:8
Page(s):
1997-2003
The DAPDNA®-2 is the world's first general purpose dynamically reconfigurable processor for commercial usage. It is a dual-core processor consisting of a custom RISC core called the Digital Application Processor (DAP), and a two dimensional array of dynamically reconfigurable processing elements referred to as the Distributed Network Architecture (DNA). The DAP has a 32 bit instruction set architecture with an 8 KB instruction cache and 8 KB data cache that can be accessed in one clock cycle. It has an interrupt control function to detect data processing completion in the DNA-Matrix. The DNA-Matrix has different types of data processing elements such as ALU, delay, and memory elements to process fully parallel computations. The DNA-Matrix includes 32 independent 16 KB high speed SRAM elements (in total 512 KB). The DNA-Matrix, even with its parallel computational capability, can be synchronized and co-work at the same clock frequency as the DAP. The processor operates at a 166 MHz working frequency and fabricated with a 0.11 µm CMOS process. The DAPDNA-2 device can be connected directly with up to 16 units with linear scalability in processing performance, provided the bandwidth requirement is within the maximum communication speed between DNAs, which is 32 Gbps. The DAPDNA-2 performs at a level that is two orders of magnitude higher than conventional high performance processors.

Keyword Search Result

[Keyword] reconfigurable processor(18hit)

Networking Experiment of Domain-Specific Networking Platform Based on Optically Interconnected Reconfigurable Communication Processors Open Access

Acceleration of Block Matching on a Low-Power Heterogeneous Multi-Core Processor Based on DTU Data-Transfer with Data Re-Allocation

Software Defined Modem for Cognitive Radio with Dynamically Reconfigurable Processor

Memory-Access-Driven Context Partitioning for Window-Based Image Processing on Heterogeneous Multicore Processors

Iterative Synthesis Methods Estimating Programmable-Wire Congestion in a Dynamically Reconfigurable Processor

Resource Minimization Method Satisfying Delay Constraint for Replicating Large Contents

A Preemption Algorithm for a Multitasking Environment on Dynamically Reconfigurable Processors

A Mapping Method for Multi-Process Execution on Dynamically Reconfigurable Processors

A Reconfigurable Processor Infrastructure for Accelerating Java Applications

A Self-Test of Dynamically Reconfigurable Processors with Test Frames

Compiler for Architecture with Dynamic Reconfigurable Processing Unit by Use of Automatic Assignment Method of Sub-Programs Based on Their Quantitative Evaluation

Query-Transaction Acceleration Using a DRP Enabling High-Speed Stateful Packet-by-Packet Self-Reconfiguration

Improving Performance and Energy Saving in a Reconfigurable Processor via Accelerating Control Data Flow Graphs

A Survey on Dynamically Reconfigurable Processors Open Access

Low-Power Field-Programmable VLSI Using Multiple Supply Voltages

Design and Performance Evaluation of IEEE 802.11a SDR Software Implemented on a Reconfigurable Processor

Architecture of a Fine-Grain Field-Programmable VLSI Based on Multiple-Valued Source-Coupled Logic

Dynamically Reconfigurable Processor Implemented with IPFlex's DAPDNA Technology

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles