1-2hit |
Mostafa SOLIMAN Stanislav SEDUKHIN
Within a few years it will be possible to integrate a billion transistors on a single chip operating at frequency more than 10 GHz. At this integration level, we propose using a multi-level ISA to express fine-grain data parallelism to hardware instead of using a huge transistor budget to dynamically extract it. Since the fundamental data structures for a wide variety of data parallel applications are scalar, vector, and matrix, our proposed Trident processor extends a scalar ISA with vector and matrix instruction sets to effectively process matrix formulated applications. Like vector architectures, the Trident processor consists of a set of parallel lanes (each lane contains a set of vector pipelines and a slice of register file) combined with a fast scalar core. However, Trident processor can effectively process on the parallel lanes not only vector but also matrix data. One key point of our architecture is the local communication within and across lanes to overcome the limitations of the future VLSI technology. Another key point is the effective execution of a mixture of scalar, vector, and matrix operations. This paper describes the architecture of the Trident processor and evaluates its performance on BLAS and on the standard matrix bidiagonalization algorithm. The last one is evaluated as an example of an entire application based on a mixture of scalar, vector, and matrix operations. Our results show that many data parallel applications, such as scientific, engineering, multimedia, etc., can be speeded up on the Trident processor. Besides, the scalability of the Trident processor does not require more fetch, decode, or issue bandwidth, but requires only replication of parallel lanes.
This paper presents a novel technique for analyzing and designing local communication systems for distributed mobile robotic systems (DMRS). Our goal is to provide an analysis-base guideline for designing local communication systems to efficiently transmit task information to the appropriate robots. In this paper, we propose a layered methodology, i. e. , design from spatial and temporal aspects based on analysis of information diffusion by local communication between robots. The task environment is classified so that each analysis and design is applied in a systematic way. The spatial design gives the optimal communication area for minimizing transmission time for various cooperative tasks. In the temporal design, we derive the information announcing time to avoid excessive information diffusion. The designed local communication is evaluated in comparison with global communication. Finally, we performed simulations and experiments to demonstrate that the analysis and design technique is effective for constructing an efficient local communication system.