IEICE global.ieice.org Site

Author Search Result

[Author] Takanobu BABA(4hit)

1-4hit

Fast Computation with Efficient Object Data Distribution for Large-Scale Hologram Generation on a Multi-GPU Cluster Open Access
Takanobu BABA Shinpei WATANABE Boaz JESSIE JACKIN Kanemitsu OOTSU Takeshi OHKAWA Takashi YOKOTA Yoshio HAYASAKI Toyohiko YATAGAI

PAPER-Human-computer Interaction

Pubricized:
2019/03/29
Vol:
E102-D No:7
Page(s):
1310-1320
The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have adapted a conventional FFT algorithm to a GPU cluster environment in order to avoid heavy inter-node communications. Then, we have applied several single-node and multi-node optimization and parallelization techniques. The single-node optimizations include a change of the way of object decomposition, reduction of data transfer between the CPU and GPU, kernel integration, stream processing, and utilization of multiple GPUs within a node. The multi-node optimizations include distribution methods of object data from host node to the other nodes. Experimental results show that intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain an execution time of 4.28 sec for generating a 1.6 giga-pixel hologram from a 3.2 giga-pixel object. It means a 237.92 times speed-up of the sequential processing by CPU and 41.78 times speed-up of multi-threaded execution on multicore-CPU, using a conventional FFT-based algorithm.
A Declarative Synchronization Mechanism for Parallel Object-Oriented Computation
Takanobu BABA Norihito SAITOH Takahiro FURUTA Hiroshi TAGUCHI Tsutomu YOSHINAGA

PAPER-Computer Systems

Vol:
E78-D No:8
Page(s):
969-981
We have designed and implemented a simple yet powerful declarative synchronization mechanism for a paralle object-oriented computation model. The mechanism allows the user to control multiple message reception, specify the order of message reception, lock an invocation, and specify relations as invocation constraints. It has been included in a parallel object-oriented language, called A-NETL. The compiler and operating system have been developed on a total architecture, A-NET (Actors NETwork). The experimental results show that (i) the mechanism allows the user to model asynchronous events naturally, without losing the integrity of described programs; (ii) the replacement of the mechanism with the user's code requires tedious descriptions, but gains little performance enhancement, and certainly loses program readability and integrity; (iii) the mechanism allows the user to shift synchronous programs to asynchronous ones, with a scalable reduction of execution times: an average 20.6% for 6 to 17 objects and 46.1% for 65 objects. These prove the effectiveness of the proposed synchronization mechanism.
Some Properties of the Perfect Shuffle Interconnection for Parallel Computations
Takeshi KUMAGAI Takanobu BABA

PAPER-Computer Networks

Vol:
E72-E No:9
Page(s):
998-1002
The perfect shuffle interconnection is widely used in parallel processing hardware, mostly in multistage configurations. However it is rarely applied to VLSI arrays except for the case of realizing FFT or sorting. This is due to the fact that control methods to load data into cells have not been established yet. VLSI arrays using the interconnection have a potential possibility to realize some kinds of computations more efficiently than ones using other interconnections. This paper analyzes properties of the perfect shuffle interconnection to apply it to parallel computations by VLSI arrays, especially existence of a cell into which a given pair of inputs are loaded and a control method to make pairs on cells are discussed. The properties presented become basis to realize parallel computations by VLSI arrays using the perfect shuffle interconnection.
A Network-Topology-Independent Static Task Allocation Strategy for Massively Parallel Computers
Takanobu BABA Akehito GUNJI Yoshifumi IWAMOTO

PAPER-Computer Networks

Vol:
E76-D No:8
Page(s):
870-881
A network-topology-independent static task allocation strategy has been designed and implemented for massively parallel computers. For mapping a task graph to a processor graph, this strategy evaluates several functions that represent some intuitively feasible properties or the graphs. They include the connectivity with the allocated nodes, distance from the median of a graph, connectivity with candidate nodes, and the number of candidate nodes within a distance. Several greedy strategies are defined to guide the mapping process, utilizing the indicated function values. An allocation system has been designed and implemented based on the allocation strategy. In experiments we have defined about 1000 nodes in task graphs with regular and irregular topologies, and the same order of processors with mesh, tree, and hypercube topologies. The results are summarized as follows. 1) The system can yield 4.0 times better total communication costs than an arbitrary allocation. 2) It is difficult to select a single strategy capable of providing the best solutions for a wide range of task-processor combinations. 3) Comparison with hypercube-topology-dependent research indicates that our topology-independent allocator produces better results than the dependent ones. 4) The order of computaion time of the allocator is experimentally proved to be O (n2) where n represents the number of tasks.

Author Search Result

[Author] Takanobu BABA(4hit)

Fast Computation with Efficient Object Data Distribution for Large-Scale Hologram Generation on a Multi-GPU Cluster Open Access

A Declarative Synchronization Mechanism for Parallel Object-Oriented Computation

Some Properties of the Perfect Shuffle Interconnection for Parallel Computations

A Network-Topology-Independent Static Task Allocation Strategy for Massively Parallel Computers

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles