Byeonghag SEONG Donggook KIM Yangwoo ROH Kyuho PARK Daeyeon PARK
Shared memory multiprocessors in which each processor has its own TLB must manage consistency among TLBs and a page table. As the large-scale CC-NUMA (cache-coherent non-uniform memory access) shared memory multiprocessors become popular, it is important for TLB consistency management algorithms to be highly scalable. In this paper, we propose a TLB update-hint algorithm as a scalable TLB consistency management solution for CC-NUMA multiprocessors. By using a lazy TLB invalidation approach, we reduced the number of unnecessary processor interruptions and idle-waiting time, and achieved a high level of scalability. Using a shared memory simulator, we evaluated the TLB update-hint algorithm. For performance comparison, we also simulated the TLB shootdown algorithm, one of the most popular TLB consistency algorithms. The simulations demonstrated that the TLB update-hint algorithm scales well in systems with a large number of processors. At 64 node systems, the TLB update-hint algorithm shows 4787% better performance than the TLB shootdown algorithm.
Takaaki ARAKAWA Ken'ichi KAWANISHI Yoshikuni ONOZATO
In this paper, we consider a location management scheme using Limited Pointer forwarding from Commonly visited sites (LPC) strategy for Personal Communication Services (PCS) networks. The Commonly Visited Site (CVS) is defined as a site in which a mobile user is found with high probability. A feature of the strategy is that it skips updating location information of the mobile user, provided that the mobile user moves within its CVSs. Such a strategy is expected to significantly reduce the location update cost. We evaluate the location management cost of the LPC scheme by employing a Continuous-Time Markov Chain (CTMC) model. We show that the LPC scheme can reduce the location management cost of a highly mobile user who is found in its CVS with high probability.
Rodrigo Fernandes de MELLO Erico C. T. de MATTOS Luis Carlos TREVELIN Maria Stela Veludo de PAIVA Laurence T. YANG
The availability of a low cost hardware has increased the development of distributed systems, by making then more and more accessible. In order to optimize the resources allocation on the distributed systems, some load balancing algorithms have been proposed. These algorithms distribute the application loads over the environment computers, make homogeneous the occupation of the whole environment and increase the application performance. This equal distribution prevents certain computers to get overloaded, to the detriment of the idleness of the other ones. This article proposes and analyzes the TLBAGrid, a load balancing algorithm for Grid computing environments.
Hiroshi MATSUURA Hideo IMANAKA Kazumasa TAKAMI
The cost-effective provision of IP services requires multi-layered traffic engineering to obtain dynamic cooperation between IP and photonic layers. The effective control and management of generalized multi-protocol label-switching (GMPLS) networks is an essential part of this. Huge photonic capacities and the number of IP and photonic networks make it likely that enormous amounts of GMPLS network-related data will have to be managed in the near future. At the same time, routing burdens on individual GMPLS routers are critical because of the strong need for per-path quality of service (QoS). To solve these problems, we propose a hierarchically distributed network-management system (NMS) in which we flexibly allocate a GMPLS subnetwork to each sub-NMS and at the same time conduct QoS routing. The distributed nature of our architecture reduces the burden on the NMS as a whole and also lets us remove the routing-burden from GMPLS routers with minimum effect on management processes.
Shinichi FURUSHO Teruaki KITASUKA Tsuneo NAKANISHI Akira FUKUDA
In ad-hoc on-demand routing algorithm, when a route is broken a relay node must perform error transaction and the source node must do rerouting to discover an alternate route. It is important to construct a stable route when route discovery occurs. In this paper, we use relative speeds among nodes as a measure of node mobility. Our routing algorithm chooses nodes of lower relative speed as relay nodes. As a result of our simulation, when there is one session in the network, our proposing algorithm can reduce the number of route breaks: about 3 times smaller than DSR. And our proposing algorithm can deliver more packets than DSR: 18% higher rate. However, in the congested traffic situation our algorithm should be improved.
Integration of the MPLS network and the optical mesh network is a promising approach to realize an efficient backbone network. Because large volumes of traffic incur damage from failure, survivability is important in the backbone network. In the MPLS over optical networks, a pair of primary LSP (Label Switched Path) and secondary LSP needs to be established on two optical link-disjoint routes assuming all single optical link failures. However, two link-disjoint routes in the MPLS layer may not correspond to two link-disjoint routes in the optical layer. Thus, a pair of primary and secondary LSPs should be routed considering link-disjointness in the optical layer. In the MPLS over optical networks, secondary LSPs can mutually share lightpath bandwidth if those secondary LSPs correspond to the primary LSPs that never fail simultaneously. Thus, routing of secondary LSPs should promote sharing of the lightpath bandwidth among the secondary LSPs. The primary and secondary LSPs with variable bandwidths should efficiently be packed into fewer lightpaths with a fixed bandwidth. Moreover, if all the LSPs accommodated in a lightpath can be re-routed to other lightpaths, this lightpath can then be released. By re-routing only secondary LSPs, unnecessary lightpaths may be released without disturbance of the conveyed traffic. This paper proposes an efficient routing scheme to establish primary and secondary LSPs with variable bandwidths through the MPLS over optical network. This routing scheme satisfies the above conditions. The bandwidth of each lightpath is efficiently utilized by this routing scheme, and the loss rate of LSP requests can be reduced. This paper also proposes an efficient re-routing scheme to remove secondary LSPs from selected lightpaths through which the efficiency of channel utilization in the optical links is increased, and the loss rate of LSP requests can be reduced as a result. Both the proposed routing and re-routing schemes are quantitatively evaluated and the effectiveness of those schemes is verified by computer simulation.
In this paper we present a component approach for configurable network processing for active documents. The approach has two key ideas. The first is to enable documents to process themselves on networks. That is, documents can define their own itineraries, like the notion of active packets in active network technology. The second is to enable documents to transmit other documents to their destinations as first-class objects, such as the notion of active nodes in active netwwork technology. The approach also enables buidling and managing active documents as compound documents. The dynamic deployment of network processing for exchanging documents can be defined and achieved by means of GUI-based manipulation of compound documents. Therefore, the approach allows a user to easily and rapidly develop and customize network processing in the same way as if that user had edited the documents. A prototype implementation of the approach and its applications were constructed on a Java-based mobile agent system to evaluate the effectiveness of the approach.
Shin-ichi WAKABAYASHI Asako BABA Hitomi MORIYA Xiaomin WANG Tatsushi HASEGAWA Akira SUZUKI
We have developed the tunable dispersion compensator based on two twin linearly chirped fiber Bragg gratings with various temperature gradients. Controlling the temperature gradient over one of the twin fiber Bragg gratings by Peltier elements, the dispersion and the dispersion slope were changed independently and continuously. The dispersion and dispersion slope compensator has a large bandwidth of 8 nm and low group-delay ripple of < 4 ps in its chirped fiber Bragg gratings. We experimentally demonstrated a precise controllability of the dispersion and the dispersion slope using linear and parabolic temperature gradient. The dispersion and the dispersion slope changes were achieved continuously with -0.67 ps/nm/ and -0.14 ps/nm2/. The transmission characteristics of the dispersion slope compensation were examined using ultra short pulses in the fiber link. When the total dispersion was zero, the distorted pulse was restored back and the tail was significantly suppressed. 160 Gbit/s signals were also demonstrated over 140 km within 1 dB power penalty by using the dispersion slope compensator.
Weijia JIA Bo HAN Pui On AU Yong HE Wanlei ZHOU
Cluster computation has been used in the applications that demand performance, reliability, and availability, such as cluster server groups, large-scale scientific computations, distributed databases, distributed media-on-demand servers and search engines etc. In those applications, multicast can play the vital roles for the information dissemination among groups of servers and users. This paper proposes a set of novel efficient fault-tolerant multicast routing algorithms on hypercube interconnection of cluster computers using multicast shared tree approach. We present some new algorithms for selecting an optimal core (root) and constructing the shared tree so as to minimize the average delay for multicast messages. Simulation results indicate that our algorithms are efficient in the senses of short end-to-end average delay, load balance and less resource utilizations over hypercube cluster interconnection networks.
This paper proposes several novel hierarchical interconnection networks based on the (3, 3)-graphs, namely folded (3, 3)-networks, root-folded (3, 3)-networks, recursively expanded (3, 3)-networks, and flooded (3, 3)-networks. Just as the hypercubes, CCC, Peterson-based networks, and Heawood-based networks, these hierarchical networks have the following nice properties: regular topology, high scalability, and small diameters. Due to these important properties, these hierarchical networks seem to have the potential as alternatives for the future interconnection structures of multicomputer systems, especially massively parallel processors (MPPs). Furthermore, this paper will present the routing and broadcasting algorithms for these proposed networks to demonstrate that these algorithms are as elegant as the algorithms for hypercubes, CCC, and Petersen- or Heawood-based networks.
Nader MOHAMED Jameela AL-JAROODI Hong JIANG
High performance scientific and engineering applications running on clusters have different communication requirements. Current cluster configurations typically provide multiple network interfaces per node and multiple interconnections among nodes. However, transport protocols such as TCP do not utilize existing multiple network interfaces to enhance communication performance. This paper introduces a new configurable communication model utilizing multiple interconnections. The model adds mechanisms to manage and enhance the overall communication performance of clusters. These configurations include the use of parallel message transfers, the separation of the transfer channels between small messages and large messages, and load balancing among the channels. The main advantages of the model are: (1) providing a flexible, enhanced network infrastructure, (2) hiding the technical details of the heterogeneous network resources from the applications, and (3) providing an easy and flexible way to extend the network capacities for specific nodes. To illustrate the advantages and performance enhancements of the model, a prototype was implemented to experimentally evaluate the cluster network performance, which showed considerable gains.
The ATM Forum recommends the use of the Generic Cell Rate Algorithm (GCRA) to perform Usage Parameter Control at the User Network Interface of ATM networks. In order to facilitate the Call Admission Control and resource allocation procedure, it is important to investigate the characteristics of the model in which GCRA-enforced sources are merged together by a multiplexer. Such a multiplexer could be the one arranged in front of a switch to concentrate user traffic and reduce the number of required input ports. It may also represent the logical multiplexer at the output port of a switch that collects cells routed from various input ports. Moreover, it may represent the service function of the edge router situated between the integrated-services (IntServ) networks and the backbone networks that provide differentiated-services (DiffServ). In this paper, the environment under discussion is a multiplexer in which every traffic source is enforced by a dual-stage GCRA enforcer before being merged. The worst traffic pattern that maximizes the average waiting time in the multiplexer is found. The maximum average waiting time is deduced and expressed as a function of the GCRA parameters and the number of multiplexed sources. In particular, the analysis considers the speed-up function, which is widely used for ATM multiplexers and switches. The results can also be applied to a GCRA shaper without any modification.
Jacobo TARRIO Juan TOURIÑO María J. MARTIN Patricia GONZALEZ Ramon DOALLO
Grid computing can help to promote high-performance computing at a low overall cost by encouraging research centers to share their resources. However, research staff usually finds it quite hard to use Grids effectively, due to the need of installing and managing new Grid software. Thus, Grid portals are created, making it easier to take advantage of the full capability of the Grid, favoring in this way its use. The goal of this paper is to describe the process of design and implementation of a Grid Portal with the aim of both supporting distributed high-performance resources and make its use by researchers as transparent as possible. This portal uses standard Grid and Web technologies. We have designed the portal so that it can be adapted to different existing Grid infrastructures, based on the Globus Toolkit, and new functionalities can be easily added. The first prototype of the portal has been tested on an experimental Grid platform, and we present encouraging experiences carried out there.
M. M. Hafizur RAHMAN Susumu HORIGUCHI
Interconnection networks play a crucial role in the performance of massively parallel computers. Hierarchical interconnection networks provide high performance at low cost by exploring the locality that exists in the communication patterns of massively parallel computers. A Hierarchical Torus Network (HTN) is a 2D-torus network of multiple basic modules, in which the basic modules are 3D-torus networks that are hierarchically interconnected for higher level networks. The static network performance of the HTN has already been studied and has been shown to be good. Dynamic communication performance has been evaluated under uniform traffic pattern but not under non-uniform traffic patterns. In this paper, we present a deadlock-free routing algorithm for the HTN using 3 virtual channels and evaluate the network's dynamic communication performance under three non-uniform traffic patterns, using the proposed routing algorithm. We evaluate the dynamic communication performance of HTN, H3D-mesh, H3D-torus, TESH, and mesh networks by computer simulation. We find that the dynamic communication performance of HTN is better than that of the H3D-mesh, H3D-torus, TESH, and mesh networks.
Michael (Shan-Hui) HO Weng-Long CHANG Minyi GUO Laurence T. YANG
This paper shows how to use sticker to construct solution space of DNA for the library sequences in the set-packing problem and the clique problem. Then, with biological operations, we propose DNA-based algorithms to remove illegal solutions and to find legal solutions for the set-packing and clique problems from the solution space of sticker. Any NP-complete problem in Cook's Theorem can be reduced and solved by the proposed DNA-based computing approach if its size is equal to or less than that of the set-packing problem. Otherwise, Cook's Theorem is incorrect on DNA-based computing and a new DNA algorithm should be developed from the characteristics of the NP-complete problem. Finally, the result to DNA simulation is given.
Namyoon WOO Hyungsoo JUNG Heon Young YEOM Taesoon PARK Hyungwoo PARK
Fault-tolerance is an essential feature of the distributed systems where the possibility of a failure increases with the growth of the system. In spite of extensive researches over two decades, fault-tolerance systems have not succeeded in practical use. It is due to the high overhead and the unhandiness of the previous fault-tolerance systems. In this paper, we propose MPICH-GF, a user-transparent checkpointing system for grid-enabled MPICH. Our objectives are to fill the gap between the theory and the practice of fault-tolerance systems, and to provide a checkpointing-recovery system for grids. To build a fault-tolerant MPICH version, we have designed task migration, dynamic process management, and atomic message transfer. MPICH-GF requires no modification of application source codes, and it affects the MPICH communication characteristics as less as possible. The features of MPICH-GF are that it supports the direct message transfer mode and that all of the implementation has been done at the lower layer, that is, the abstract device level. We have evaluated MPICH-GF using NPB applications on Globus middleware.
Biplab KUMER SARKER Anil KUMAR TRIPATHI Deo PRAKASH VIDYARTHI Laurence T. YANG Kuniaki UEHARA
In a Distributed Computing Systems (DCS) tasks submitted to it, are usually partitioned into different modules and these modules may be allocated to different processing nodes so as to achieve minimum turn around time of the tasks utilizing the maximum resources of the existing system such as CPU speed, memory capacities etc. The problem lies on how to obtain the optimal allocation of these multiple tasks by keeping in mind that no processing node is overloaded due to this allocation. This paper proposes an algorithm A*RS, using well-known A*, which aims to reduce the search space and time for task allocation. It aims at minimization of turn around time of tasks in the way so that processing nodes do not become overloaded due to this allocation. Our experimental results justify the claims with necessary supports by comparing it with the earlier algorithm for multiple tasks allocation.
Tran CONG SO Shigeru OYANAGI Katsuhiro YAMAZAKI
We have proposed a speculative selection function for adaptive routing, which uses idle cycles of the network physical links to exchange network information between nodes, thus helps to decide the best selection. Previous study on the mesh network showed that SSR gives message selection flexibility that improves network performance by balancing the network traffic in both global and local scopes. This paper evaluates the speculative selection function on 2D torus network with simulation. The simulation compares the network throughput and latency with various traffic patterns. The visualization graphs show how the speculative selection eliminates hotspots and disperses traffic in the global scope. The simulation results demonstrate that by using speculative selection, the network performance is increased by around 7%. Compared to the mesh network, the torus's version has smaller gain due to the high performance nature of the torus network.
Riaz INAYAT Reiji AIBARA Kouji NISHIMURA Takahiro FUJITA Kaori MAEDA
This paper presents a network architecture with a dual interface IP handoff technique that allows smooth node mobility without using any intermediate proxy. The proposed architecture is suitable for low bit-rate time sensitive real time applications, where payload tends to be short and packet header overhead is particularly significant. Connections are established as per permanent addresses of the nodes but are carried on by the IP layer according to the temporary addresses by address translation within the end hosts. The mapping information is maintained by database servers, which can be placed in the Internet in a distributed manner. We describe the architecture and show its mobile capabilities by prototype implementation and performance evaluation. Furthermore a dual-interface handoff suitable to the proposed architecture is also introduced. Preliminary results show that the proposed architecture has significantly low overheads. It is compatible with the existing infrastructure and works fine in both IPv4 and IPv6 environments. Analysis also shows that with dual-interface handoff it is possible to achieve seamless handoff without any packet loss by exploiting overlapping coverage area and speed of the mobile node. Handoff latency is reduced significantly as compare to MIPv6. We believe that with more powerful network interface card drivers our concept of dual interface handoff can be realized.
Yiyuan GONG Morikazu NAKAMURA Takashi MATSUMURA Kenji ONAGA
In this paper we propose a parallel and distributed computation of genetic local search with irregular topology in distributed environments. The scheme we propose in this paper is implemented with a tree topology established on an irregular network where each computing element carries out genetic local search on its own chromosome set and communicates with its parent when the best solution of each generation is updated. We evaluate the proposed algorithm by a simulation system implemented on a PC-cluster. We test our algorithm on four types topologies: star, line, balanced binary tree and sided binary tree, and investigate the influence of communication topology and delay on the evolution process.