IEICE global.ieice.org Site

Keyword Search Result

[Keyword] multithread(6hit)

1-6hit

Accelerating Large-Scale Interconnection Network Simulation by Cellular Automata Concept
Takashi YOKOTA Kanemitsu OOTSU Takeshi OHKAWA

PAPER-Computer System

Pubricized:
2018/10/05
Vol:
E102-D No:1
Page(s):
52-74
State-of-the-art parallel systems employ a huge number of computing nodes that are connected by an interconnection network. An interconnection network (ICN) plays an important role in a parallel system, since it is responsible to communication capability. In general, an ICN shows non-linear phenomena in its communication performance, most of them are caused by congestion. Thus, designing a large-scale parallel system requires sufficient discussions through repetitive simulation runs. This causes another problem in simulating large-scale systems within a reasonable cost. This paper shows a promising solution by introducing the cellular automata concept, which is originated in our prior work. Assuming 2D-torus topologies for simplification of discussion, this paper discusses fundamental design of router functions in terms of cellular automata, data structure of packets, alternative modeling of a router function, and miscellaneous optimization. The proposed models have a good affinity to GPGPU technology and, as representative speed-up results, the GPU-based simulator accelerates simulation upto about 1264 times from sequential execution on a single CPU. Furthermore, since the proposed models are applicable in the shared memory model, multithread implementation of the proposed methods achieve about 162 times speed-ups at the maximum.
Equivalent Circuit of Yee's Cells and Its Application to Mixed Electromagnetic and Circuit Simulations
Yuichi TANJI

PAPER-Microwaves, Millimeter-Waves

Vol:
E101-C No:9
Page(s):
703-710
An equivalent circuit of Yee's cells is proposed for mixed electromagnetic and circuit simulations. Using the equivalent circuit, a mixed electromagnetic and circuit simulator can be developed, in which the electromagnetic field and circuit responses are simultaneously analyzed. Representing the electromagnetic system as a circuit, active and passive device models in a circuit simulator can be used for the mixed simulations without any modifications. Hence, the propose method is very useful for designing various electronic systems. To evaluate the mixed simulations with the equivalent circuit, two implementations with shared or distributed memory computer system are presented. In the numerical examples, we evaluate the performances of the prototype simulators to demonstrate the effectiveness.
Issue Mechanism for Embedded Simultaneous Multithreading Processor
Chengjie ZANG Shigeki IMAI Steven FRANK Shinji KIMURA

PAPER

Vol:
E91-A No:4
Page(s):
1092-1100
Simultaneous Multithreading (SMT) technology enhances instruction throughput by issuing multiple instructions from multiple threads within one clock cycle. For in-order pipeline to each thread, SMT processors can provide large number of issued instructions close to or surpass than using out-of-order pipeline. In this work, we show an efficient issue logic for predicated instruction sequence with the parallel flag in each instruction, where the predicate register based issue control is adopted and the continuous instructions with the parallel flag of '0' are executed in parallel. The flag is pre-defined by a compiler. Instructions from different threads are issued based on the round-robin order. We also introduce an Instruction Queue skip mechanism for thread if the queue is empty. Using this kind of issue logic, we designed a 6 threads, 7-stage, in-order pipeline processor. Based on this processor, we compare round-robin issue policy (RR(T1-Tn)) with other policies: thread one always has the highest priority (PR(T1)) and thread one or thread n has the highest priority in turn (PR(T1-Tn)). The results show that RR(T1-Tn) policy outperforms others and PR(T1-Tn) is almost the same to RR(T1-Tn) from the point of view of the issued instructions per cycle.
Message-Based Efficient Remote Memory Access on a Highly Parallel Computer EM-X
Yuetsu KODAMA Hirohumi SAKANE Mitsuhisa SATO Hayato YAMANA Shuichi SAKAI Yoshinori YAMAGUCHI

PAPER-Architectures

Vol:
E79-D No:8
Page(s):
1065-1071
Communication latency is central to multiprocessor design. This study presents the design principles of the EM-X distributed-memory multiprocessor towards tolerating communication latency. The EM-X overlaps computation with communication for latency tolerance by multithreading. In particular, we present two types of hardware support for remote memory access: (1) priority-based packet scheduling for thread invocation, and (2) direct remote memory access. The priority-based scheduling policy extends a FIFO ordered thread invocation policy to adopt to different computational needs. The direct remote memory access is designed to overlap remote memory operations with thread execution. The 80-processor prototype of EM-X is developed and is operational since December 1995. We execute several programs on the machine and evaluate how the EM-X effectively overlaps computation with communication toward tolerating communication latency for high performance parallel computing.
hMDCE: The Hierarchical Multidimensional Directed Cycles Ensemble Network
Takashi YOKOTA Hiroshi MATSUOKA Kazuaki OKAMOTO Hideo HIRONO Shuichi SAKAI

PAPER-Interconnection Networks

Vol:
E79-D No:8
Page(s):
1099-1106
This paper discusses a massively parallel interconnection scheme for multithreaded architecture and introduces a new class of direct interconnection networks called the hierarchical Multidimensional Directed Cycles Ensemble (hMDCE). Its suitability for massively parallel systems is discussed. The network is evolved from the Multidimensional Directed Cycles Ensemble (MDCE) network, where each node is substituted by lower-level sub-networks. The new network addresses some serious problems caused by the increasing scale of parallel systems, such as longer latency, limited throughput and high implementation cost. This paper first introduces the MDCE network and then presents and examines in detail the hierarchical MDCE network. Bisection bandwidth of hMDCE is considerably reduced from its ancestor MDCE and the network performs significantly higher throughput and lower latency under some practical implementation constraints. The gate count and delay time of the compiled circuit for the routing function are insignificant. These results reveal that the hMDCE network is an important candidate for massively parallel systems interconnection.
High-Level Synthesis of a Multithreaded Processor for Image Generation
Takao ONOYE Toshihiro MASAKI Isao SHIRAKAWA Hiroaki HIRATA Kozo KIMURA Shigeo ASAHARA Takayuki SAGISHIMA

PAPER-VLSI Design Technology and CAD

Vol:
E78-A No:3
Page(s):
322-330
The design procedure of a multithreaded processor dedicated to the image generation is described, which can be achieved by means of a high-level synthesis tool PARTHENON. The processor employs a multithreaded architecture which is a novel promising approach to the parallel image generation. This paper puts special stress on the high-level synthesis scheme which can simplify the behavioral description for the structure and control of a complex hardware, and therefore enables the design of a complicated mechanism for a multithreaded processor. Implementation results of the synthesis are also shown to demonstrate the performance of the designed processor. This processor greatly improves the throughput of the image generation so far attained by the conventional approach.