The search functionality is under construction.

Author Search Result

[Author] Takanobu BABA(4hit)

1-4hit
  • Fast Computation with Efficient Object Data Distribution for Large-Scale Hologram Generation on a Multi-GPU Cluster Open Access

    Takanobu BABA  Shinpei WATANABE  Boaz JESSIE JACKIN  Kanemitsu OOTSU  Takeshi OHKAWA  Takashi YOKOTA  Yoshio HAYASAKI  Toyohiko YATAGAI  

     
    PAPER-Human-computer Interaction

      Pubricized:
    2019/03/29
      Vol:
    E102-D No:7
      Page(s):
    1310-1320

    The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have adapted a conventional FFT algorithm to a GPU cluster environment in order to avoid heavy inter-node communications. Then, we have applied several single-node and multi-node optimization and parallelization techniques. The single-node optimizations include a change of the way of object decomposition, reduction of data transfer between the CPU and GPU, kernel integration, stream processing, and utilization of multiple GPUs within a node. The multi-node optimizations include distribution methods of object data from host node to the other nodes. Experimental results show that intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain an execution time of 4.28 sec for generating a 1.6 giga-pixel hologram from a 3.2 giga-pixel object. It means a 237.92 times speed-up of the sequential processing by CPU and 41.78 times speed-up of multi-threaded execution on multicore-CPU, using a conventional FFT-based algorithm.

  • A Declarative Synchronization Mechanism for Parallel Object-Oriented Computation

    Takanobu BABA  Norihito SAITOH  Takahiro FURUTA  Hiroshi TAGUCHI  Tsutomu YOSHINAGA  

     
    PAPER-Computer Systems

      Vol:
    E78-D No:8
      Page(s):
    969-981

    We have designed and implemented a simple yet powerful declarative synchronization mechanism for a paralle object-oriented computation model. The mechanism allows the user to control multiple message reception, specify the order of message reception, lock an invocation, and specify relations as invocation constraints. It has been included in a parallel object-oriented language, called A-NETL. The compiler and operating system have been developed on a total architecture, A-NET (Actors NETwork). The experimental results show that (i) the mechanism allows the user to model asynchronous events naturally, without losing the integrity of described programs; (ii) the replacement of the mechanism with the user's code requires tedious descriptions, but gains little performance enhancement, and certainly loses program readability and integrity; (iii) the mechanism allows the user to shift synchronous programs to asynchronous ones, with a scalable reduction of execution times: an average 20.6% for 6 to 17 objects and 46.1% for 65 objects. These prove the effectiveness of the proposed synchronization mechanism.

  • Some Properties of the Perfect Shuffle Interconnection for Parallel Computations

    Takeshi KUMAGAI  Takanobu BABA  

     
    PAPER-Computer Networks

      Vol:
    E72-E No:9
      Page(s):
    998-1002

    The perfect shuffle interconnection is widely used in parallel processing hardware, mostly in multistage configurations. However it is rarely applied to VLSI arrays except for the case of realizing FFT or sorting. This is due to the fact that control methods to load data into cells have not been established yet. VLSI arrays using the interconnection have a potential possibility to realize some kinds of computations more efficiently than ones using other interconnections. This paper analyzes properties of the perfect shuffle interconnection to apply it to parallel computations by VLSI arrays, especially existence of a cell into which a given pair of inputs are loaded and a control method to make pairs on cells are discussed. The properties presented become basis to realize parallel computations by VLSI arrays using the perfect shuffle interconnection.

  • A Network-Topology-Independent Static Task Allocation Strategy for Massively Parallel Computers

    Takanobu BABA  Akehito GUNJI  Yoshifumi IWAMOTO  

     
    PAPER-Computer Networks

      Vol:
    E76-D No:8
      Page(s):
    870-881

    A network-topology-independent static task allocation strategy has been designed and implemented for massively parallel computers. For mapping a task graph to a processor graph, this strategy evaluates several functions that represent some intuitively feasible properties or the graphs. They include the connectivity with the allocated nodes, distance from the median of a graph, connectivity with candidate nodes, and the number of candidate nodes within a distance. Several greedy strategies are defined to guide the mapping process, utilizing the indicated function values. An allocation system has been designed and implemented based on the allocation strategy. In experiments we have defined about 1000 nodes in task graphs with regular and irregular topologies, and the same order of processors with mesh, tree, and hypercube topologies. The results are summarized as follows. 1) The system can yield 4.0 times better total communication costs than an arbitrary allocation. 2) It is difficult to select a single strategy capable of providing the best solutions for a wide range of task-processor combinations. 3) Comparison with hypercube-topology-dependent research indicates that our topology-independent allocator produces better results than the dependent ones. 4) The order of computaion time of the allocator is experimentally proved to be O (n2) where n represents the number of tasks.