The search functionality is under construction.

Author Search Result

[Author] Susumu HORIGUCHI(44hit)

21-40hit(44hit)

  • Experimental Evaluation of Parallel Sorting on a Multiprocessor System

    Susumu HORIGUCHI  Takeo NAKADA  Yoshiharu SHIGEI  

     
    LETTER-Algorithm and Computational Complexity

      Vol:
    E71-E No:2
      Page(s):
    127-131

    A parallel sorting is actually executed on the multiprocessor system to estimate the real performance. The sorting algorithm consists of merging and exchanging data between the nearest neighbors. The system performance of parallel sorting is measured experimentally. It is proved that the parallel sorting algorithm is really effective for a multiprocessor system with the limited number of processors.

  • Experimental Evaluation of Parallel FFT on a Clustered Multiprocessor System

    Susumu HORIGUCHI  

     
    LETTER-Computer Systems

      Vol:
    E73-E No:3
      Page(s):
    360-364

    A parallel FFT (Fast Fourier Transform) is actually executed on a clustered multiprocessor system with 32 processors to study real performance of three communication schemes, common memory, linear array and ring array. The speed-up of parallel FFT to a serial computer really depends on the number of data to be transformed and the number of processors. It is also proved that the performance really depends on the communication scheme of multiprocessor system.

  • FOREWORD

    Susumu HORIGUCHI  

     
    FOREWORD

      Vol:
    E79-D No:8
      Page(s):
    1013-1014
  • Multicasting in Multihop Optical WDM Networks with Limited Wavelength Conversion

    Hong SHEN  Yi PAN  John SUM  Susumu HORIGUCHI  

     
    INVITED SURVEY PAPER

      Vol:
    E86-D No:1
      Page(s):
    3-14

    This paper provides an overview on efficient algorithms for multicasting in optical networks supported by Wavelength Division Multiplexing (WDM) with limited wavelength conversion. We classify the multicast problems according to off-line and on-line in both reliable and unreliable networks. In each problem class, we present efficient algorithms for multicast and multiple multicast and show their performance. We also present efficient schemes for dynamic multicast group membership updating. We conclude the paper by showing possible extension of the presented algorithms for QoS provision.

  • TESH: A New Hierarchical Interconnection Network for Massively Parallel Computing

    Vijay K. JAIN  Tadasse GHIRMAI  Susumu HORIGUCHI  

     
    PAPER-Interconnection Networks

      Vol:
    E80-D No:9
      Page(s):
    837-846

    Advanced scientific and engineering problems require massively parallel computing. Critical to the designand ultimately the performanceof such computing systems is the interconnection network binding the computing elements, just as is the cardiovascular network to the human body. This paper develops a new interconnection network, "Tori connected mESHes (TESH)," consisting of k-ary n-cube connection of supernodes that comprise meshes of lower level nodes. Its key features are the following: it is hierarchical, thus allowing exploitation of computation locality as well as easy expansion (up to a million processors), and it appears to be well suited for 3-D VLSI implementation, for it requires far fewer number of vertical wires than almost all known multi-computer networks. Presented in the paper are the architecture of the new network, node addressing and message routing, 3-D VLSI/ULSI considerations, and application of the network to massively parallel computing. Specifically, we discuss the mapping on to the network of stack filtering, a hardware oriented technique for order statistic image filtering.

  • A Parallel Sorting Algorithm for a Linearly Connected Multiprocessor

    Susumu HORIGUCHI  Yoshiharu SHIGEI  

     
    PAPER-Algorithm, Computational Complexity

      Vol:
    E69-E No:9
      Page(s):
    996-1001

    A new parallel sorting algorithm is proposed, assuming a linearly connected multiprocessor system with a limited number of processors. The algorithm based on merging and exchanging data between adjacent processors. The upper bound is analytically investigated, and the throughput is proportional to the number of processors. The time complexity is O(N) for N data with O(log N) processors. Average number of iterations for a multiprocessor with two processing elements is also analytically investigated. Execution time for real situation is numerically simulated on a conventional computer.

  • A Probabilistic Sentence Reduction Using Maximum Entropy Model

    Minh LE NGUYEN  Masaru FUKUSHI  Susumu HORIGUCHI  

     
    PAPER-Natural Language Processing

      Vol:
    E88-D No:2
      Page(s):
    278-288

    This paper describes a new probabilistic sentence reduction method using maximum entropy model. In contrast to previous methods, the proposed method has the ability to produce multiple best results for a given sentence, which is useful in text summarization applications. Experimental results show that the proposed method improves on earlier methods in both accuracy and computation time.

  • Modified Hierarchical 3D-Torus Network

    M.M. Hafizur RAHMAN  Yasushi INOGUCHI  Susumu HORIGUCHI  

     
    PAPER-Computer Systems

      Vol:
    E88-D No:2
      Page(s):
    177-186

    Three-dimensional (3D) wafer stacked implementation (WSI) has been proposed as a promising technology for massively parallel computers. A hierarchical 3D-torus (H3DT) network, which is a 3D-torus network of multiple basic modules in which the basic modules are 3D-mesh networks, has been proposed for efficient 3D-WSI. However, the restricted use of physical links between basic modules in the higher level networks reduces the dynamic communication performance of this network. A torus network has better dynamic communication performance than a mesh network. Therefore, we have modified the H3DT network by replacing the 3D-mesh modules by 3D-tori, calling it a Modified H3DT (MH3DT) network. This paper addresses the architectural details of the MH3DT network and explores aspects such as degree, diameter, cost, average distance, arc connectivity, bisection width, and wiring complexity. We also present a deadlock-free routing algorithm for the MH3DT network using two virtual channels and evaluate the network's dynamic communication performance under the uniform traffic pattern, using the proposed routing algorithm. It is shown that the MH3DT network possesses several attractive features including small diameter, small cost, small average distance, better bisection width, and better dynamic communication performance.

  • Crosstalk-Free Permutation in Photonic Rearrangeable Networks Built on a Combination of Horizontal Expansion and Vertical Stacking of Banyan Networks

    Xiaohong JIANG  Hong SHEN  Md. Mamun-ur-Rashid KHANDKER  Susumu HORIGUCHI  

     
    PAPER-Networking and Architectures

      Vol:
    E86-D No:9
      Page(s):
    1525-1533

    Crosstalk in optical switch is an intrinsic drawback of optical networks, and avoiding crosstalk is important for making optical network work properly. Horizontal expansion and vertical stacking are two basic techniques for creating nonblocking multistage interconnection networks (MINs). Rearrangeable (nonblocking) optical MINs are feasible since they have lower complexity than their strictly nonblocking counterparts. In this paper, we study the crosstalk-free permutations in rearrangeable optical MINs built on a combination of horizontal expansion and vertical stacking of banyan networks, and provide a scheme for realizing crosstalk-free permutations in this kind of optical MINs. The basic idea of this scheme is to first decompose a permutation into multiple partial permutations by using Euler Split technique, then route and realize each of these partial permutations crosstalk-free in one plane (stacked copy) of a MIN based on both the Euler Split technique and self-routing property of a banyan network. The tradeoff between the overall time complexity and hardware cost of this class of MINs is also explored in this paper.

  • Effects of Element Radiation Pattern on Characteristics of Turnstile Spherical Array Antenna

    Susumu HORIGUCHI  Takayuki ISHIZONE  Yasuto MUSHIAKE  

     
    LETTER-Antenna and Propagation

      Vol:
    E67-E No:9
      Page(s):
    527-528

    Effects of the element radiation pattern on characteristics of the spherical array antenna composed of turnstile elements is discussed. It is found by numerical calculations that scanning characteristics of the directivity and the axial ratio do not so much depend on element radiation patterns.

  • High-Performance Training of Conditional Random Fields for Large-Scale Applications of Labeling Sequence Data

    Xuan-Hieu PHAN  Le-Minh NGUYEN  Yasushi INOGUCHI  Susumu HORIGUCHI  

     
    PAPER-Parallel Processing System

      Vol:
    E90-D No:1
      Page(s):
    13-21

    Conditional random fields (CRFs) have been successfully applied to various applications of predicting and labeling structured data, such as natural language tagging & parsing, image segmentation & object recognition, and protein secondary structure prediction. The key advantages of CRFs are the ability to encode a variety of overlapping, non-independent features from empirical data as well as the capability of reaching the global normalization and optimization. However, estimating parameters for CRFs is very time-consuming due to an intensive forward-backward computation needed to estimate the likelihood function and its gradient during training. This paper presents a high-performance training of CRFs on massively parallel processing systems that allows us to handle huge datasets with hundreds of thousand data sequences and millions of features. We performed the experiments on an important natural language processing task (text chunking) on large-scale corpora and achieved significant results in terms of both the reduction of computational time and the improvement of prediction accuracy.

  • Routing Algorithms for Packet/Circuit Switching in Optical Multi-log2N Networks

    Yusuke FUKUSHIMA  Xiaohong JIANG  Susumu HORIGUCHI  

     
    PAPER-Switching for Communications

      Vol:
    E91-B No:12
      Page(s):
    3913-3924

    The multi-log2N network architecture is attractive for constructing optical switches, and the related routing algorithms are critical for the operation and efficiency of such switches. Although several routing algorithms have been proposed for multi-log2N networks, a full performance comparison among them has not been published up to now. Thus, we rectify this omission by providing such a comparison in terms of blocking probability, time complexity, hardware cost and load balancing capability. Notice that the load balance is important for reducing the peak power requirement of a switch, so we also propose in this paper a new routing algorithm for optical multi-log2N networks to achieve a better load balance.

  • Robust Node Positioning in Wireless Sensor Networks

    Ayong YE  Jianfeng MA  Xiaohong JIANG  Susumu HORIGUCHI  

     
    PAPER-Network

      Vol:
    E92-B No:6
      Page(s):
    2023-2031

    Secure sensor localization is a prerequisite for many sensor networks to retrieve trustworthy data. However, most of existing node positioning systems were studied in trust environment and are therefore vulnerable to malicious attacks. In this work, we develop a robust node positioning mechanism(ROPM) to protect localization techniques from position attacks. Instead of introducing countermeasures for every possible internal or external attack, our approach aims at making node positioning system attack-tolerant by removing malicious beacons. We defeat internal attackers and external attackers by applying different strategies, which not only achieves robustness to attacks but also dramatically reduces the computation overhead. Finally, we provide detailed theoretical analysis and simulations to evaluate the proposed technique.

  • HTN: A New Hierarchical Interconnection Network for Massively Parallel Computers

    M.M. Hafizur RAHMAN  Susumu HORIGUCHI  

     
    PAPER-Networking and Architectures

      Vol:
    E86-D No:9
      Page(s):
    1479-1486

    Interconnection networks usually suffer from Little's Law: low cost implies low performance and high performance is obtained high cost. However, hierarchical interconnection networks provide high performance at low cost by exploring the locality that exists in communication patterns of massively parallel computers. In this paper, we propose a new hierarchical interconnection network, called Hierarchical Torus Network (HTN). This network reduces the number of vertical links in 3D stacked implementation while maintaining good network features. This paper addresses the architectural details of the HTN, and explores aspects such as the network diameter, average distance, bisection width, peak number of vertical links, and VLSI layout area of the HTN as well as for several commonly used networks for parallel computers. It is shown that the HTN possesses several attractive features including small diameter, small average distance, small number of wires, a particularly small number of vertical links, and economic layout area.

  • Self-Routing Nonblocking WDM Switches Based on Arrayed Waveguide Grating

    Yusuke FUKUSHIMA  Xiaohong JIANG  Achille PATTAVINA  Susumu HORIGUCHI  

     
    PAPER-Switching for Communications

      Vol:
    E92-B No:4
      Page(s):
    1173-1182

    Arrayed waveguide grating (AWG) is a promising technology for constructing high-speed large-capacity WDM switches, because it can switch fast, is scalable to large size and consumes little power. To take the full advantage of high-speed AWG, the routing control of a massive AWG-based switch should be as simple as possible. In this paper, we focus on the self-routing design of AWG-based switches with O(1) constant routing complexity and propose a novel construction of self-routing AWG switches that can guarantee the attractive nonblocking property for both the wavelength-to-wavelength and wavelength-to-fiber request models. We also fully analyze the proposed design in terms of its blocking property, hardware cost and crosstalk performance and compare it against traditional designs. It is expected that the proposed construction will be useful for the design and all-optical implementation of future ultra high-speed optical packet/burst switches.

  • Load Balancing Based on Load Coherence between Continuous Images for an Object-Space Parallel Ray-Tracing System

    Hiroaki KOBAYASHI  Hideyuki KUBOTA  Susumu HORIGUCHI  Tadao NAKAMURA  

     
    PAPER-Computer Systems

      Vol:
    E76-D No:12
      Page(s):
    1490-1499

    The ray-tracing algorithm can synthesize very realistic images. However, the ray tracing is very time consuming. To solve this problem, a load balancing strategy using temporal coherence between images in an animation is presented for balancing computational loads among processing elements of a parallel processng system. Our parallel processing model is based on a space subdivision method for the ray-tracing algorithm. A subdivided object space is distributed among processing elements of the parallel system. To clarify the effectiveness of the load balancing strategy, we examine the system performance by computer simulation.

  • A Nonblocking Optical Switching Network for Crosstalk-Free Permutation

    Xiaohong JIANG  Md. Mamun-ur-Rashid KHANDKER  Hong SHEN  Susumu HORIGUCHI  

     
    PAPER-Switching

      Vol:
    E86-B No:12
      Page(s):
    3580-3589

    Vertical stacking is a novel technique for building switching networks, and packing multiple compatible connections together is an effective strategy to reduce network hardware cost. In this paper, we study the crosstalk-free permutation capability of an optical switching network built on the vertical stacking of optical banyan networks to which packing strategy is applied. We first look into the nonblocking condition of this optical switching network. We then study the crosstalk-free permutation in this network by decomposing a permutation evenly into multiple crosstalk-free partial permutations (CFPPs) and realizing each CFPP in a stacked plane of the network such that a crosstalk-free permutation can be performed in a single pass. We present a rigorous proof of CFPP decomposability of a permutation and also a complete algorithm for CFPP decomposition. The possibility of a tradeoff between the number of passes and the number of planes required for realizing a crosstalk-free permutation in this network is also explored in this paper.

  • Parallel Molecular Dynamics in a Parallelizing SML Compiler

    Norman SCAIFE  Ryoko HAYASHI  Susumu HORIGUCHI  

     
    PAPER-Software Systems and Technologies

      Vol:
    E86-D No:9
      Page(s):
    1569-1576

    We have constructed a parallelizing compiler for Standard ML (SML) based upon algorithmic skeletons. We present an implementation of a Parallel Molecular Dynamics (PMD) simulation in order to compare our functional approach with a traditional imperative approach. Although we present performance data, the principal benefits from our approach are in the modularity of the code and the ease of programming. Extant FORTRAN90 code for an O(N 2) algorithm is translated, firstly into imperative SML and then into purely functional SML which is then parallelized. The ease of programming and the performance of the FORTRAN90 and SML code are compared. Modest parallel performance is obtained from the parallel SML but with a much slower sequential execution time compared to the FORTRAN90. We then improve the implementation with a ring topology implementation which gives much closer performance to the FORTRAN90 implementation.

  • Variant X-Tree Clock Distribution Network and Its Performance Evaluations

    Xu ZHANG  Xiaohong JIANG  Susumu HORIGUCHI  

     
    PAPER-Low-Power and High-Performance VLSI Circuit Technology

      Vol:
    E90-C No:10
      Page(s):
    1909-1918

    The evolution of VLSI chips towards larger die size, smaller feature size and faster clock speed makes the clock distribution an increasingly important issue. In this paper, we propose a new clock distribution network (CDN), namely Variant X-Tree, based on the idea of X-Architecture proposed recently for efficient wiring within VLSI chips. The Variant X-Tree CDN keeps the nice properties of equal-clock-path and symmetric structure of the typical H-Tree CDN, but results in both a lower maximal clock delay and a lower clock skew than its H-Tree counterpart, as verified by an extensive simulation study that incorporates simultaneously the effects of process variations and on-chip inductance. We also propose a closed-form statistical models for evaluating the skew and delay of the Variant X-Tree CDN. The comparison between the theoretical results and the simulation results indicates that the proposed statistical models can be used to efficiently and rapidly evaluate the performance of the variant X-Tree CDNs.

  • On the Multiple Bridge Fault Diagnosis of Baseline Multistage Interconnection Networks*

    Fabrizio LOMBARDI  Nohpill PARK  Susumu HORIGUCHI  

     
    PAPER-Fault Diagnosis/Tolerance

      Vol:
    E79-D No:8
      Page(s):
    1168-1179

    This paper proposes new algorithms for diagnosing (detection, identification and location) baseline multistage interconnection networks (MIN) as one of the basic units in a massively parallel system. This is accomplished in the presence of single and multiple faults under a new fault model. This model referred to as the geometric fault model, considers defective crossing connections which are located between adjacent stages, internally to the MIN (therefore, a fault corresponds to a physical bridge fault between two connections). It is shown that this type of fault affects the correct geometry of the network, thus requiring a different testing approach than previous methods. Initially, an algorithm which detects the presence of bridge faults (both in the single and multiple fault cases), is presented. For a single bridge fault, the proposed algorithm locates the fault except in an unique pathological case under which it is logically impossible to differentiate between two equivalent locations of the fault (however, the switching element affected by this fault is uniquely located). The proposed algorithm requires log2 N test vectors to diagnose the MIN as fault free (where N is the number of input lines to the MIN). For fully diagnosing a single bridge fault, this algorithm requires at most 2 log2 N tests and terminates when multiple bridge faults are detected. Subsequently, an algorithm which locates all bridge faults is given. The number of required test vectors is O(N). Fault location of each bridge fault is accomplished in terms of the two lines in the bridge and the numbers of the stages between which it occurs. Illustrative examples are given.

21-40hit(44hit)