The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] multi-thread(6hit)

1-6hit
  • Implementation of a Multi-Word Compare-and-Swap Operation without Garbage Collection

    Kento SUGIURA  Yoshiharu ISHIKAWA  

     
    PAPER

      Pubricized:
    2022/02/03
      Vol:
    E105-D No:5
      Page(s):
    946-954

    With the rapid increase in the number of CPU cores, software that can utilize these many cores is required. A lock-free algorithm based on compare-and-swap (CAS) operations is one of the concurrency control methods to implement such multi-threading software. A multi-word CAS (MwCAS) operation is an extension of a CAS operation to swap multiple words atomically. However, we noticed that the performance of the existing MwCAS implementation is limited because of garbage collection even if in a low-contention environment. To achieve high performance in low-contention workloads, we propose a new MwCAS algorithm without garbage collection. Experimental results show that our approach is three to five times faster than implementation with garbage collection in low-contention workloads. Moreover, the performance of the proposed method is also superior in a high-contention environment.

  • Which Metric Is Suitable for Evaluating Your Multi-Threading Processors? In Terms of Throughput, Fairness, and Predictability

    Xin JIN  Ningmei YU  

     
    LETTER-VLSI Design Technology and CAD

      Vol:
    E103-A No:9
      Page(s):
    1127-1132

    Simultaneous multithreading technology (SMT) can effectively improve the overall throughput and fairness through improving the resources usage efficiency of processors. Traditional works have proposed some metrics for evaluation in real systems, each of which strikes a trade-off between fairness and throughput. How to choose an appropriate metric to meet the demand is still controversial. Therefore, we put forward suggestions on how to select the appropriate metrics through analyzing and comparing the characteristics of each metric. In addition, for the new application scenario of cloud computing, the data centers have high demand for the quality of service for killer applications, which bring new challenges to SMT in terms of performance guarantees. Therefore, we propose a new metric P-slowdown to evaluate the quality of performance guarantees. Based on experimental data, we show the feasibility of P-slowdown on performance evaluation. We also demonstrate the benefit of P-slowdown through two use cases, in which we not only improve the performance guarantee level of SMT processors through the cooperation of P-slowdown and resources allocation strategy, but also use P-slowdown to predict the occurrence of abnormal behavior against security attacks.

  • Efficient Parallel Join Processing Exploiting SIMD in Multi-Thread Environments

    Gilseok HONG  Seonghyeon KANG  Chang soo KIM  Jun-Ki MIN  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2017/12/14
      Vol:
    E101-D No:3
      Page(s):
    659-667

    In this paper, we study parallel join processing to improve the performance of the merge phase of sort-merge join by integrating all parallelism provided by mainstream CPUs. Modern CPUs support SIMD instruction sets with wider SIMD registers which allows to process multiple data items per each instruction. Thus, we devise an efficient parallel join algorithm, called Parallel Merge Join with SIMD instructions (PMJS). In our proposed algorithm, we utilize data parallelism by exploiting SIMD instructions. And we also accelerate the performance by avoiding the usage of conditional branch instructions. Furthermore, to take advantage of the multiple cores, our proposed algorithm is threaded in multi-thread environments. In our multi-thread algorithm, to distribute workload evenly to each thread, we devise an efficient workload balancing algorithm based on the kernel density estimator which allows to estimate the workload of each thread accurately.

  • Proposal of a Multi-Threaded Processor Architecture for Embedded Systems and Its Evaluation

    Shinsuke KOBAYASHI  Yoshinori TAKEUCHI  Akira KITAJIMA  Masaharu IMAI  

     
    PAPER

      Vol:
    E84-A No:3
      Page(s):
    748-754

    In this paper, an architecture of multi-threaded processor for embedded systems is proposed and evaluated comparing with other processors for embedded systems. The experimental results show the trade-off of hardware costs and execution times among processors. Taking proposed multi-threaded processor into account as an embedded processor, design space of embedded systems are enlarged and more suitable architecture can be selected under some design constraints.

  • Multi-Threaded Design for a Software Distributed Shared Memory Systems

    Jyh-Chang UENG  Ce-Kuen SHIEH  Su-Cheong MAC  An-Chow LAI  Tyng-Yue LIANG  

     
    PAPER-Sofware System

      Vol:
    E82-D No:12
      Page(s):
    1512-1523

    This paper describes the design and implementation of a multi-threaded Distributed Shared Memory (DSM) system, called Cohesion, which provides high programming flexibility and latency masking, and supports load balancing. Cohesion offers a parallel programming environment which is very similar to that on a multiprocessors system. Threads could be created recursively in this environment, and users are not required to handle the locations of the threads. Instead of supporting a shared variable model, Cohesion provides a global shared address space among all nodes in the system. The space is further divided into three regions, i. e. , release, conventional, and object-based memory, each is applied with different consistency protocol. In this paper, the design issues in an ordinary thread system, such as thread management, load balancing, and synchronization, have been reconsidered with the memory management provided by the DSM system. Several real applications have been used to evaluate the performance of the system. The results show that multi-threading usually has better performance than single-threading because the network latency can be masked by overlapping communication and computation. However, the gain depends on program behavior and the number of threads executed on each node in the system.

  • Processor Pipeline Design for Fast Network Message Handling in RWC-1 Multiprocessor

    Hiroshi MATSUOKA  Kazuaki OKAMOTO  Hideo HIRONO  Mitsuhisa SATO  Takashi YOKOTA  Shuichi SAKAI  

     
    PAPER

      Vol:
    E81-C No:9
      Page(s):
    1391-1397

    In this paper we describe the pipeline design and enhanced hardware for fast message handling in a RICA-1 processor, a processing element (PE) in the RWC-1 multiprocessor. The RWC-1 is based on the reduced inter-processor communication architecture (RICA), in which communications are combined with computation in the processor pipeline. The pipeline is enhanced with hardware mechanisms to support fine-grain parallel execution. The data paths of the RICA-1 super-scalar processor are commonly used for communication as well as instruction execution to minimize its implementation cost. A 128-PE system has been built on January 1998, and it is currently used for hardware debugging, software development and performance evaluation.