IEICE global.ieice.org Site

Keyword Search Result

[Keyword] multiprocessor(80hit)

1-20hit(80hit)

A Two-Level Cache Aware Adaptive Data Replication Mechanism for Shared LLC
Qianqian WU Zhenzhou JI

LETTER-Computer System

Pubricized:
2022/03/25
Vol:
E105-D No:7
Page(s):
1320-1324
The shared last level cache (SLLC) in tile chip multiprocessors (TCMP) provides a low off-chip miss rate, but it causes a long on-chip access latency. In the two-level cache hierarchy, data replication stores replicas of L1 victims in the local LLC (L2 cache) to obtain a short local LLC access latency on the next accesses. Many data replication mechanisms have been proposed, but they do not consider both L1 victim reuse behaviors and LLC replica reception capability. They either produce many useless replicas or increase LLC pressure, which limits the improvement of system performance. In this paper, we propose a two-level cache aware adaptive data replication mechanism (TCDR), which controls replication based on both L1 victim reuse behaviors prediction and LLC replica reception capability monitoring. TCDR not only increases the accuracy of L1 replica selection, but also avoids the pressure of replication on LLC. The results show that TCDR improves the system performance with reasonable hardware overhead.
A Conflict-Aware Capacity Control Mechanism for Deep Cache Hierarchy
Jiaheng LIU Ryusuke EGAWA Hiroyuki TAKIZAWA

PAPER-Computer System

Pubricized:
2022/03/09
Vol:
E105-D No:6
Page(s):
1150-1163
As the number of cores on a processor increases, cache hierarchies contain more cache levels and a larger last level cache (LLC). Thus, the power and energy consumption of the cache hierarchy becomes non-negligible. Meanwhile, because the cache usage behaviors of individual applications can be different, it is possible to achieve higher energy efficiency of the computing system by determining the appropriate cache configurations for individual applications. This paper proposes a cache control mechanism to improve energy efficiency by adjusting a cache hierarchy to each application. Our mechanism first bypasses and disables a less-significant cache level, then partially disables the LLC, and finally adjusts the associativity if it suffers from a large number of conflict misses. The mechanism can achieve significant energy saving at the sacrifice of small performance degradation. The evaluation results show that our mechanism improves energy efficiency by 23.9% and 7.0% on average over the baseline and the cache-level bypassing mechanisms, respectively. In addition, even if the LLC resource contention occurs, the proposed mechanism is still effective for improving energy efficiency.
Incorporating Zero-Laxity Policy into Mixed-Criticality Multiprocessor Real-Time Systems
Namyong JUNG Hyeongboo BAEK Donghyouk LIM Jinkyu LEE

PAPER-Systems and Control

Vol:
E101-A No:11
Page(s):
1888-1899
As real-time embedded systems are required to accommodate various tasks with different levels of criticality, scheduling algorithms for MC (Mixed-Criticality) systems have been widely studied in the real-time systems community. Most studies have focused on MC uniprocessor systems whereas there have been only a few studies to support MC multiprocessor systems. In particular, although the ZL (Zero-Laxity) policy has been known to an effective technique in improving the schedulability performance of base scheduling algorithms on SC (Single-Criticality) multiprocessor systems, the effectiveness of the ZL policy on MC multiprocessor systems has not been revealed to date. In this paper, we focus on realizing the potential of the ZL policy for MC multiprocessor systems, which is the first attempt. To this end, we design the ZL policy for MC multiprocessor systems, and apply the policy to EDF (Earliest Deadline First), yielding EDZL (Earliest Deadline first until Zero-Laxity) tailored for MC multiprocessor systems. Then, we develop a schedulability analysis for EDZL (as well as its base algorithm EDF) to support its timing guarantee. Our simulation results show a significant schedulability improvement of EDZL over EDF, demonstrating the effectiveness of the ZL policy for MC multiprocessor systems.
Proof and Evaluation of Improved Slack Reclamation for Response Time Analysis of Real-Time Multiprocessor Systems
Hyeongboo BAEK Donghyouk LIM Jinkyu LEE

LETTER-Software System

Pubricized:
2018/05/02
Vol:
E101-D No:8
Page(s):
2136-2140
RTA (Response time analysis) is a popular technique to guarantee timing requirements for a real-time system, and therefore the RTA framework has been widely studied for popular scheduling algorithms such as EDF (Earliest Deadline First) and FP (Fixed Priority). While a number of extended techniques of RTA have been introduced, some of them cannot be used since they have not been proved and evaluated in terms of their correctness and empirical performance. In this letter, we address the state of the art technique of slack reclamation of the existing generic RTA framework for multiprocessors. We present its mathematical proof of correctness and empirical performance evaluation, which have not been revealed to this day.
Distributed Synchronization for Message-Passing Based Embedded Multiprocessors
Hao XIAO Ning WU Fen GE Guanyu ZHU Lei ZHOU

LETTER-Architecture

Vol:
E98-D No:2
Page(s):
272-275
This paper presents a synchronization mechanism to effectively implement the lock and barrier protocols in a decentralized manner through explicit message passing. In the proposed solution, a simple and efficient synchronization control mechanism is proposed to support queued synchronization without contention. By using state-of-the-art Application-Specific Instruction-set Processor (ASIP) technology, we embed the synchronization functionality into a baseline processor, making the proposed mechanism feature ultra-low overhead. Experimental results show the proposed synchronization achieves ultra-low latency and almost ideal scalability when the number of processors increases.
A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts
Masayuki SATO Ryusuke EGAWA Hiroyuki TAKIZAWA Hiroaki KOBAYASHI

PAPER-Computer System

Vol:
E96-D No:9
Page(s):
2047-2054
Chip multiprocessors (CMPs) improve performance by simultaneously executing multiple threads using integrated multiple cores. However, since these cores commonly share one cache, inter-thread cache conflicts often limit the performance improvement by multi-threading. This paper focuses on two causes of inter-thread cache conflicts. In shared caches of CMPs, cached data fetched by one thread are frequently evicted by another thread. Such an eviction, called inter-thread kickout (ITKO), is one of the major causes of inter-thread cache conflicts. The other cause is capacity shortage that occurs when one cache is shared by threads demanding large cache capacities. If the total capacity demanded by the threads exceeds the actual cache capacity, the threads compete to use the limited cache capacity, resulting in capacity shortage. To address inter-thread cache conflicts, we must take into account both ITKOs and capacity shortage. Therefore, this paper proposes a capacity-aware thread scheduling method combined with cache partitioning. In the proposed method, inter-thread cache conflicts due to ITKOs and capacity shortage are decreased by cache partitioning and thread scheduling, respectively. The proposed scheduling method estimates the capacity demand of each thread with an estimation method used in the cache partitioning mechanism. Based on the estimation used for cache partitioning, the thread scheduler decides thread combinations sharing one cache so as to avoid capacity shortage. Evaluation results suggest that the proposed method can improve overall performance by up to 8.1%, and the performance of individual threads by up to 12%. The results also show that both cache partitioning and thread scheduling are indispensable to avoid both ITKOs and capacity shortage simultaneously. Accordingly, the proposed method can significantly reduce the inter-thread cache conflicts and hence improve performance.
A Low-Cost and Energy-Efficient Multiprocessor System-on-Chip for UWB MAC Layer
Hao XIAO Tsuyoshi ISSHIKI Arif Ullah KHAN Dongju LI Hiroaki KUNIEDA Yuko NAKASE Sadahiro KIMURA

PAPER-Computer System

Vol:
E95-D No:8
Page(s):
2027-2038
Ultra-wideband (UWB) technology has attracted much attention recently due to its high data rate and low emission power. Its media access control (MAC) protocol, WiMedia MAC, promises a lot of facilities for high-speed and high-quality wireless communication. However, these benefits in turn involve a large amount of computational load, which challenges the traditional uniprocessor architecture based implementation method to provide the required performance. However, the constrained cost and power budget, on the other hand, makes using commercial multiprocessor solutions unrealistic. In this paper, a low-cost and energy-efficient multiprocessor system-on-chip (MPSoC), which tackles at once the aspects of system design, software migration and hardware architecture, is presented for the implementation of UWB MAC layer. Experimental results show that the proposed MPSoC, based on four simple RISC processors and shared-memory infrastructure, achieves up to 45% performance improvement and 65% power saving, but takes 15% less area than the uniprocessor implementation.
NSIM: An Interconnection Network Simulator for Extreme-Scale Parallel Computers
Hideki MIWA Ryutaro SUSUKITA Hidetomo SHIBAMURA Tomoya HIRAO Jun MAKI Makoto YOSHIDA Takayuki KANDO Yuichiro AJIMA Ikuo MIYOSHI Toshiyuki SHIMIZU Yuji OINAGA Hisashige ANDO Yuichi INADOMI Koji INOUE Mutsumi AOYAGI Kazuaki MURAKAMI

PAPER

Vol:
E94-D No:12
Page(s):
2298-2308
In the near future, interconnection networks of massively parallel computer systems will connect more than a hundred thousands of computing nodes. The performance evaluation of the interconnection networks can provide real insights to help the development of efficient communication library. Hence, to evaluate the performance of such interconnection networks, simulation tools capable of modeling the networks with sufficient details, supporting a user-friendly interface to describe communication patterns, providing the users with enough performance information, completing simulations within a reasonable time, are a real necessity. This paper introduces a novel interconnection network simulator NSIM, for the evaluation of the performance of extreme-scale interconnection networks. The simulator implements a simplified simulation model so as to run faster without any loss of accuracy. Unlike the existing simulators, NSIM is built on the execution-driven simulation approach. The simulator also provides a MPI-compatible programming interface. Thus, the simulator can emulate parallel program execution and correctly simulate point-to-point and collective communications that are dynamically changed by network congestion. The experimental results in this paper showed sufficient accuracy of this simulator by comparing the simulator and the real machine. We also confirmed that the simulator is capable of evaluating ultra large-scale interconnection networks, consumes smaller memory area, and runs faster than the existing simulator. This paper also introduces a simulation service built on a cloud environment. Without installing NSIM, users can simulate interconnection networks with various configurations by using a web browser.
On Synthesizing a Reliable Multiprocessor for Embedded Systems
Makoto SUGIHARA

PAPER-High-Level Synthesis and System-Level Design

Vol:
E93-A No:12
Page(s):
2560-2569
Utilizing a heterogeneous multiprocessor system has become a popular design paradigm to build an embedded system at a cheap cost. A reliability issue, which is vulnerability to soft errors, has not been taken into account in the conventional IC (integrated circuit) design flow, while chip area, performance, and power consumption have been done. This paper presents a system design paradigm in which a heterogeneous multiprocessor system is synthesized and its chip area is minimized under real-time and reliability constraints. First we define an SEU vulnerability factor as a vulnerability measure for computer systems so that we evaluate task-wise reliability over various processor structures. Next we build a mixed integer linear programming (MILP) model for minimizing the chip area of a heterogeneous multiprocessor system under real-time and SEU vulnerability constraints. Finally, we show several experimental results on our synthesis approach. Experimental results show that our design paradigm has achieved automatic generation of cost-competitive and reliable heterogeneous multiprocessor systems.
Semi-Dynamic Multiprocessor Scheduling with an Asymptotically Optimal Performance Ratio
Satoshi FUJITA

PAPER-Theory

Vol:
E92-A No:8
Page(s):
1764-1770
In this paper, we consider a problem of assigning n independent tasks onto m identical processors in such a way that the overall execution time of the tasks will be minimized. Unlike conventional task assignment problem, we assume that the execution time of each task is not fixed in advance, and merely upper and lower bounds of the execution time are given at the compile time. In the following, we first provide a theoretical analysis of several conventional scheduling policies in terms of the worst case slowdown compared with the outcome of an optimal off-line scheduling policy. It is shown that the best known algorithm in the literature achieves a worst case performance ratio of 1 + 1/f(n) where f(n) = O(n2/3) for any fixed m, which approaches to one by increasing n to the infinity. We then propose a new scheme that achieves better worst case ratio of 1 + 1/g(n) where g(n) = Θ (n/log n) for any fixed m, which approaches to one more quickly than previous schemes.
Reliability Inherent in Heterogeneous Multiprocessor Systems and Task Scheduling for Ameliorating Their Reliability
Makoto SUGIHARA

PAPER

Vol:
E92-A No:4
Page(s):
1121-1128
Utilizing a heterogeneous multiprocessor system has become a popular design paradigm to build an embedded system at a cheap cost within short development time. A reliability issue for embedded systems, which is vulnerability to single event upsets (SEUs), has become a matter of concern as technology proceeds. This paper discusses reliability inherent in heterogeneous multiprocessors and proposes task scheduling for minimizing SEU vulnerability of them. This paper experimentally shows that increasing performance of a CPU core deteriorates its reliability. Based on the experimental observation, we propose task scheduling for reducing SEU vulnerability of a heterogeneous multiprocessor system. The experimental results demonstrate that our task scheduling technique can reduce much of SEU vulnerability under real-time constraints.
Decomposition of Task-Level Concurrency on C Programs Applied to the Design of Multiprocessor SoC
Mohammad ZALFANY URFIANTO Tsuyoshi ISSHIKI Arif ULLAH KHAN Dongju LI Hiroaki KUNIEDA

PAPER-VLSI Design Technology and CAD

Vol:
E91-A No:7
Page(s):
1748-1756
A simple extension used to assist the decomposition of task-level concurrency within C programs is presented in this paper. The concurrency decomposition is meant to be used as the point of entry for Multiprocessor System-on-Chips (MPSoC) architectures' design-flow. Our methodology allows the (re)use of readily available reference C programs and enables easy and rapid exploration for various alternatives of task partitioning strategies; a crucial task that greatly influences the overall quality of the designed MPSoC. A test case using a JPEG encoder application has been performed and the results are presented in this paper.
A Multiprocessor SoC Architecture with Efficient Communication Infrastructure and Advanced Compiler Support for Easy Application Development
Mohammad ZALFANY URFIANTO Tsuyoshi ISSHIKI Arif ULLAH KHAN Dongju LI Hiroaki KUNIEDA

PAPER-VLSI Design Technology and CAD

Vol:
E91-A No:4
Page(s):
1185-1196
This paper presents a Multiprocessor System-on-Chips (MPSoC) architecture used as an execution platform for the new C-language based MPSoC design framework we are currently developing. The MPSoC architecture is based on an existing SoC platform with a commercial RISC core acting as the host CPU. We extend the existing SoC with a multiprocessor-array block that is used as the main engine to run parallel applications modeled in our design framework. Utilizing several optimizations provided by our compiler, an efficient inter-communication between processing elements with minimum overhead is implemented. A host-interface is designed to integrate the existing RISC core to the multiprocessor-array. The experimental results show that an efficacious integration is achieved, proving that the designed communication module can be used to efficiently incorporate off-the-shelf processors as a processing element for MPSoC architectures designed using our framework.
Power-Aware Compiler Controllable Chip Multiprocessor
Hiroaki SHIKANO Jun SHIRAKO Yasutaka WADA Keiji KIMURA Hironori KASAHARA

PAPER

Vol:
E91-C No:4
Page(s):
432-439
A power-aware compiler controllable chip multiprocessor (CMP) is presented and its performance and power consumption are evaluated with the optimally scheduled advanced multiprocessor (OSCAR) parallelizing compiler. The CMP is equipped with power control registers that change clock frequency and power supply voltage to functional units including processor cores, memories, and an interconnection network. The OSCAR compiler carries out coarse-grain task parallelization of programs and reduces power consumption using architectural power control support and the compiler's power saving scheme. The performance evaluation shows that MPEG-2 encoding on the proposed CMP with four CPUs results in 82.6% power reduction in real-time execution mode with a deadline constraint on its sequential execution time. Furthermore, MP3 encoding on a heterogeneous CMP with four CPUs and four accelerators results in 53.9% power reduction at 21.1-fold speed-up in performance against its sequential execution in the fastest execution mode.
Dynamic Scheduling Real-Time Task Using Primary-Backup Overloading Strategy for Multiprocessor Systems
Wei SUN Chen YU Xavier DEFAGO Yasushi INOGUCHI

PAPER-Dependable Computing

Vol:
E91-D No:3
Page(s):
796-806
The scheduling of real-time tasks with fault-tolerant requirements has been an important problem in multiprocessor systems. The primary-backup (PB) approach is often used as a fault-tolerant technique to guarantee the deadlines of tasks despite the presence of faults. In this paper we propose a dynamic PB-based task scheduling approach, wherein an allocation parameter is used to search the available time slots for a newly arriving task, and the previously scheduled tasks can be re-scheduled when there is no available time slot for the newly arriving task. In order to improve the schedulability we also propose an overloading strategy for PB-overloading and Backup-backup (BB) overloading. Our proposed task scheduling algorithm is compared with some existing scheduling algorithms in the literature through simulation studies. The results have shown that the task rejection ratio of our real-time task scheduling algorithm is almost 50% lower than the compared algorithms.
Finish Time Predictability of Earliest Deadline Zero Laxity Algorithm for Multiprocessor Real-Time Systems
Sangchul HAN Heeheon KIM Xuefeng PIAO Minkyu PARK Seongje CHO Yookun CHO

LETTER-System Programs

Vol:
E89-D No:12
Page(s):
2981-2984
This letter proves the finish time predictability of EDZL (Earliest Deadline Zero Laxity) scheduling algorithm for multiprocessor real-time systems, which is a variant of EDF. Based on the results, it also shows that EDZL can successfully schedule any periodic task set if its total utilization is not greater than (m+1)/2, where m is the number of processors.
Minimizing the Directory Size for Large-Scale Shared-Memory Multiprocessors
Jinseok KONG Pen-Chung YEW Gyungho LEE

PAPER-Computer Systems

Vol:
E88-D No:11
Page(s):
2533-2543
Directory-based cache coherence schemes are commonly used in large-scale shared-memory multiprocessors, but most of them rely on heuristics to avoid large hardware requirements. We proposed using physical address mapping on directories to significantly reduce directory size needed. This approach allows the size of directory to grow as O(cn log2 n) as in optimal pointer-based directory schemes [11], where n is the number of nodes in the system and c is the number of cache lines in each cache memory. Performance aspects of the proposed scheme are studied in detail using simulation.
Performance Evaluation of a Two-Processor Scheduling Method for Acyclic SWITCH-less Program Nets
Qi-Wei GE Chen LI Mitsuru NAKATA

LETTER

Vol:
E88-A No:6
Page(s):
1502-1506
This paper investigates the usefulness of a new priority list for two-processor scheduling problem of program nets. Firstly, we discuss the weakness of a previously proposed priority list and then introduce a new priority list. Through simulation experiment we show that the new priority list is better than the previous one and can generate the same length of schedules as GA scheduling, which implies the new priority list can generate approximately optimal schedules.
Torus Ring: Improving Interconnection Network Performance by Modifying Hierarchical Ring
Jong Wook KWAK Hyong Jin BAN Chu Shik JHON

LETTER-Computer Systems

Vol:
E88-D No:5
Page(s):
1067-1071
In this letter, we propose "Torus Ring", which is a modified version of 2-level hierarchical ring. The Torus Ring has the same complexity as the hierarchical rings, since the only difference is the way it connects the local rings. It has an advantage over the hierarchical ring when the destination of a packet is the adjacent local ring, especially to the backward direction. Although we assume that the destination of a network packet is uniformly distributed across the processing nodes, the average number of hops in Torus Ring is equal to that of the hierarchical ring. However, the performance gain of the Torus Ring is expected to increase, due to the spatial locality of the application programs in the real parallel programming environment. In the simulation results, latencies of the interconnection network are reduced by up to 19%, with moderate ring utilization ratios.
Comparison of Deadline-Based Scheduling Algorithms for Periodic Real-Time Tasks on Multiprocessor
Minkyu PARK Sangchul HAN Heeheon KIM Seongje CHO Yookun CHO

LETTER-System Programs

Vol:
E88-D No:3
Page(s):
658-661
Multiprocessor architecture becomes common on real-time systems as the workload of real-time systems increases. Recently new deadline-based (EDF-based) multiprocessor scheduling algorithms are devised, and comparative studies on the performance of these algorithms are necessary. In this paper, we compare EDZL, a hybrid of EDF and LLF, with other deadline-based scheduling algorithms such as EDF, EDF-US[m/(2m-1)], and fpEDF. We show EDZL schedules all task sets schedulable by EDF. The experimental results show that the number of preemptions of EDZL is comparable to that of EDF and the schedulable utilization bound of EDZL is higher than those of other algorithms we consider.

1-20hit(80hit)

Keyword Search Result

[Keyword] multiprocessor(80hit)

A Two-Level Cache Aware Adaptive Data Replication Mechanism for Shared LLC

A Conflict-Aware Capacity Control Mechanism for Deep Cache Hierarchy

Incorporating Zero-Laxity Policy into Mixed-Criticality Multiprocessor Real-Time Systems

Proof and Evaluation of Improved Slack Reclamation for Response Time Analysis of Real-Time Multiprocessor Systems

Distributed Synchronization for Message-Passing Based Embedded Multiprocessors

A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts

A Low-Cost and Energy-Efficient Multiprocessor System-on-Chip for UWB MAC Layer

NSIM: An Interconnection Network Simulator for Extreme-Scale Parallel Computers

On Synthesizing a Reliable Multiprocessor for Embedded Systems

Semi-Dynamic Multiprocessor Scheduling with an Asymptotically Optimal Performance Ratio

Reliability Inherent in Heterogeneous Multiprocessor Systems and Task Scheduling for Ameliorating Their Reliability

Decomposition of Task-Level Concurrency on C Programs Applied to the Design of Multiprocessor SoC

A Multiprocessor SoC Architecture with Efficient Communication Infrastructure and Advanced Compiler Support for Easy Application Development

Power-Aware Compiler Controllable Chip Multiprocessor

Dynamic Scheduling Real-Time Task Using Primary-Backup Overloading Strategy for Multiprocessor Systems

Finish Time Predictability of Earliest Deadline Zero Laxity Algorithm for Multiprocessor Real-Time Systems

Minimizing the Directory Size for Large-Scale Shared-Memory Multiprocessors

Performance Evaluation of a Two-Processor Scheduling Method for Acyclic SWITCH-less Program Nets

Torus Ring: Improving Interconnection Network Performance by Modifying Hierarchical Ring

Comparison of Deadline-Based Scheduling Algorithms for Periodic Real-Time Tasks on Multiprocessor

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles