IEICE global.ieice.org Site

Keyword Search Result

[Keyword] multi-thread(6hit)

1-6hit

Implementation of a Multi-Word Compare-and-Swap Operation without Garbage Collection
Kento SUGIURA Yoshiharu ISHIKAWA

PAPER

Pubricized:
2022/02/03
Vol:
E105-D No:5
Page(s):
946-954
With the rapid increase in the number of CPU cores, software that can utilize these many cores is required. A lock-free algorithm based on compare-and-swap (CAS) operations is one of the concurrency control methods to implement such multi-threading software. A multi-word CAS (MwCAS) operation is an extension of a CAS operation to swap multiple words atomically. However, we noticed that the performance of the existing MwCAS implementation is limited because of garbage collection even if in a low-contention environment. To achieve high performance in low-contention workloads, we propose a new MwCAS algorithm without garbage collection. Experimental results show that our approach is three to five times faster than implementation with garbage collection in low-contention workloads. Moreover, the performance of the proposed method is also superior in a high-contention environment.
Which Metric Is Suitable for Evaluating Your Multi-Threading Processors? In Terms of Throughput, Fairness, and Predictability
Xin JIN Ningmei YU

LETTER-VLSI Design Technology and CAD

Vol:
E103-A No:9
Page(s):
1127-1132
Simultaneous multithreading technology (SMT) can effectively improve the overall throughput and fairness through improving the resources usage efficiency of processors. Traditional works have proposed some metrics for evaluation in real systems, each of which strikes a trade-off between fairness and throughput. How to choose an appropriate metric to meet the demand is still controversial. Therefore, we put forward suggestions on how to select the appropriate metrics through analyzing and comparing the characteristics of each metric. In addition, for the new application scenario of cloud computing, the data centers have high demand for the quality of service for killer applications, which bring new challenges to SMT in terms of performance guarantees. Therefore, we propose a new metric P-slowdown to evaluate the quality of performance guarantees. Based on experimental data, we show the feasibility of P-slowdown on performance evaluation. We also demonstrate the benefit of P-slowdown through two use cases, in which we not only improve the performance guarantee level of SMT processors through the cooperation of P-slowdown and resources allocation strategy, but also use P-slowdown to predict the occurrence of abnormal behavior against security attacks.
Efficient Parallel Join Processing Exploiting SIMD in Multi-Thread Environments
Gilseok HONG Seonghyeon KANG Chang soo KIM Jun-Ki MIN

PAPER-Data Engineering, Web Information Systems

Pubricized:
2017/12/14
Vol:
E101-D No:3
Page(s):
659-667
In this paper, we study parallel join processing to improve the performance of the merge phase of sort-merge join by integrating all parallelism provided by mainstream CPUs. Modern CPUs support SIMD instruction sets with wider SIMD registers which allows to process multiple data items per each instruction. Thus, we devise an efficient parallel join algorithm, called Parallel Merge Join with SIMD instructions (PMJS). In our proposed algorithm, we utilize data parallelism by exploiting SIMD instructions. And we also accelerate the performance by avoiding the usage of conditional branch instructions. Furthermore, to take advantage of the multiple cores, our proposed algorithm is threaded in multi-thread environments. In our multi-thread algorithm, to distribute workload evenly to each thread, we devise an efficient workload balancing algorithm based on the kernel density estimator which allows to estimate the workload of each thread accurately.
Proposal of a Multi-Threaded Processor Architecture for Embedded Systems and Its Evaluation
Shinsuke KOBAYASHI Yoshinori TAKEUCHI Akira KITAJIMA Masaharu IMAI

PAPER

Vol:
E84-A No:3
Page(s):
748-754
In this paper, an architecture of multi-threaded processor for embedded systems is proposed and evaluated comparing with other processors for embedded systems. The experimental results show the trade-off of hardware costs and execution times among processors. Taking proposed multi-threaded processor into account as an embedded processor, design space of embedded systems are enlarged and more suitable architecture can be selected under some design constraints.
Multi-Threaded Design for a Software Distributed Shared Memory Systems
Jyh-Chang UENG Ce-Kuen SHIEH Su-Cheong MAC An-Chow LAI Tyng-Yue LIANG

PAPER-Sofware System

Vol:
E82-D No:12
Page(s):
1512-1523
This paper describes the design and implementation of a multi-threaded Distributed Shared Memory (DSM) system, called Cohesion, which provides high programming flexibility and latency masking, and supports load balancing. Cohesion offers a parallel programming environment which is very similar to that on a multiprocessors system. Threads could be created recursively in this environment, and users are not required to handle the locations of the threads. Instead of supporting a shared variable model, Cohesion provides a global shared address space among all nodes in the system. The space is further divided into three regions, i. e. , release, conventional, and object-based memory, each is applied with different consistency protocol. In this paper, the design issues in an ordinary thread system, such as thread management, load balancing, and synchronization, have been reconsidered with the memory management provided by the DSM system. Several real applications have been used to evaluate the performance of the system. The results show that multi-threading usually has better performance than single-threading because the network latency can be masked by overlapping communication and computation. However, the gain depends on program behavior and the number of threads executed on each node in the system.
Processor Pipeline Design for Fast Network Message Handling in RWC-1 Multiprocessor
Hiroshi MATSUOKA Kazuaki OKAMOTO Hideo HIRONO Mitsuhisa SATO Takashi YOKOTA Shuichi SAKAI

PAPER

Vol:
E81-C No:9
Page(s):
1391-1397
In this paper we describe the pipeline design and enhanced hardware for fast message handling in a RICA-1 processor, a processing element (PE) in the RWC-1 multiprocessor. The RWC-1 is based on the reduced inter-processor communication architecture (RICA), in which communications are combined with computation in the processor pipeline. The pipeline is enhanced with hardware mechanisms to support fine-grain parallel execution. The data paths of the RICA-1 super-scalar processor are commonly used for communication as well as instruction execution to minimize its implementation cost. A 128-PE system has been built on January 1998, and it is currently used for hardware debugging, software development and performance evaluation.

Keyword Search Result

[Keyword] multi-thread(6hit)

Implementation of a Multi-Word Compare-and-Swap Operation without Garbage Collection

Which Metric Is Suitable for Evaluating Your Multi-Threading Processors? In Terms of Throughput, Fairness, and Predictability

Efficient Parallel Join Processing Exploiting SIMD in Multi-Thread Environments

Proposal of a Multi-Threaded Processor Architecture for Embedded Systems and Its Evaluation

Multi-Threaded Design for a Software Distributed Shared Memory Systems

Processor Pipeline Design for Fast Network Message Handling in RWC-1 Multiprocessor

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles