Lihan TONG Weijia LI Qingxia YANG Liyuan CHEN Peng CHEN
Yinan YANG
Myung-Hyun KIM Seungkwang LEE
Shuoyan LIU Chao LI Yuxin LIU Yanqiu WANG
Takumi INABA Takatsugu ONO Koji INOUE Satoshi KAWAKAMI
Martin LUKAC Saadat NURSULTAN Georgiy KRYLOV Oliver KESZOCZE Abilmansur RAKHMETTULAYEV Michitaka KAMEYAMA
Zheqing ZHANG Hao ZHOU Chuan LI Weiwei JIANG
Liu ZHANG Zilong WANG Yindong CHEN
Wenxia Bao An Lin Hua Huang Xianjun Yang Hemu Chen
Fengshan ZHAO Qin LIU Takeshi IKENAGA
Haruhiko KAIYA Shinpei OGATA Shinpei HAYASHI
Jiakai LI Jianyong DUAN Hao WANG Li HE Qing ZHANG
Yuxin HUANG Yuanlin YANG Enchang ZHU Yin LIANG Yantuan XIAN
Naohito MATSUMOTO Kazuhiro KURITA Masashi KIYOMI
Na XING Lu LI Ye ZHANG Shiyi YANG
Zhe Wang Zhe-Ming Lu Hao Luo Yang-Ming Zheng
Rina TAGAMI Hiroki KOBAYASHI Shuichi AKIZUKI Manabu HASHIMOTO
Tomohiro KOBAYASHI Tomomi MATSUI
Shin-ichi NAKANO
Hongzhi XU Binlian ZHANG
Weizhi WANG Lei XIA Zhuo ZHANG Xiankai MENG
Yuka KO Katsuhito SUDOH Sakriani SAKTI Satoshi NAKAMURA
Rinka KAWANO Masaki KAWAMURA
Zhishuo ZHANG Chengxiang TAN Xueyan ZHAO Min YANG
Peng WANG Guifen CHEN Zhiyao SUN
Zeyuan JU Zhipeng LIU Yu GAO Haotian LI Qianhang DU Kota YOSHIKAWA Shangce GAO
Ji WU Ruoxi YU Kazuteru NAMBA
Hao WANG Yao Ma Jianyong Duan Li HE Xin Li
Shijie WANG Xuejiao HU Sheng LIU Ming LI Yang LI Sidan DU
Arata KANEKO Htoo Htoo Sandi KYAW Kunihiro FUJIYOSHI Keiichi KANEKO
Qi LIU Bo WANG Shihan TAN Shurong ZOU Wenyi GE
HanYu Zhang Tomoji Kishi
Shinobu NAGAYAMA Tsutomu SASAO Jon T. BUTLER
Yoon Hak KIM
Takashi HIRAYAMA Rin SUZUKI Katsuhisa YAMANAKA Yasuaki NISHITANI
Yosuke IIJIMA Atsunori OKADA Yasushi YUMINAKA
Batnasan Luvaanjalba Elaine Yi-Ling Wu
KuanChao CHU Satoshi YAMAZAKI Hideki NAKAYAMA
Shenglei LI Haoran LUO Tengfei SHAO Reiko HISHIYAMA
Yasushi YUMINAKA Kazuharu NAKAJIMA Yosuke IIJIMA
Chunbo Liu Liyin Wang Zhikai Zhang Chunmiao Xiang Zhaojun Gu Zhi Wang Shuang Wang
Jia-ji JIANG Hai-bin WAN Hong-min SUN Tuan-fa QIN Zheng-qiang WANG
Yuhao LIU Zhenzhong CHU Lifei WEI
Ken ASANO Masanori NATSUI Takahiro HANYU
Shuto HASEGAWA Koichiro ENOMOTO Taeko MIZUTANI Yuri OKANO Takenori TANAKA Osamu SAKAI
Zhewei XU Mizuho IWAIHARA
Takao WAHO Akihisa KOYAMA Hitoshi HAYASHI
Taisei SAITO Kota ANDO Tetsuya ASAI
Shiyu YANG Tetsuya KANDA Daniel M. GERMAN Yoshiki HIGO
Tsutomu SASAO
Jiyeon LEE
Koichi MORIYAMA Akira OTSUKA
Hongliang FU Qianqian LI Huawei TAO Chunhua ZHU Yue XIE Ruxue GUO
Gao WANG Gaoli WANG Siwei SUN
Hua HUANG Yiwen SHAN Chuan LI Zhi WANG
Zhi LIU Heng WANG Yuan LI Hongyun LU Hongyuan JING Mengmeng ZHANG
Tomoyasu NAKANO Masataka GOTO
Hyebong CHOI Joel SHIN Jeongho KIM Samuel YOON Hyeonmin PARK Hyejin CHO Jiyoung JUNG
Xianglong LI Yuan LI Jieyuan ZHANG Xinhai XU Donghong LIU
Haoran LUO Tengfei SHAO Shenglei LI Reiko HISHIYAMA
Chang SUN Yitong LIU Hongwen YANG
Ji XI Yue XIE Pengxu JIANG Wei JIANG
Ming PAN
Lei PAN Wenhui ZHANG Arthur ASUNCION Ming Kin LAI Michael B. DILLENCOURT Lubomir F. BIC Laurence T. YANG
The Navigational Programming (NavP) methodology is based on the principle of self-migrating computations. It is a truly incremental methodology for developing parallel programs: each step represents a functioning program, and each intermediate program is an improvement over its predecessor. The transformations are mechanical and straightforward to apply. We illustrate our methodology in the context of matrix multiplication, showing how the transformations lead from a sequential program to a fully parallel program. The NavP methodology is conducive to new ways of thinking that lead to ease of programming and high performance. Even though our parallel algorithm was derived using a sequence of mechanical transformations, it displays certain performance advantages over the classical handcrafted Gentleman's Algorithm.
Makoto ISHIHARA Hiroki HONDA Mitsuhisa SATO
iPat/OMP is an interactive parallelization assistance tool for OpenMP. In the present paper, we describe the design concept of iPat/OMP, the parallelization sequence achieved by the tool and its current implementation status. In addition, we present an evaluation of the performance of the implemented functionalities. The experimental results show that iPat/OMP can detect parallelism and create an appropriate OpenMP directive for several for-loops.
Gabriel RODRIGUEZ María J. MARTIN Patricia GONZALEZ Juan TOURIÑO
This paper presents CPPC (Controller/Precompiler for Portable Checkpointing), a checkpointing tool designed for heterogeneous clusters and Grid infrastructures through the use of portable protocols, portable checkpoint files and portable code. It works at variable level being user-directed, thus generating small checkpoint files. It allows parallel processes to checkpoint independently, without runtime coordination or message-logging. Consistency is achieved at restart time by negotiating the restart point. A directive-based checkpointing precompiler has also been implemented to ease up user's effort. CPPC was designed to work with parallel MPI programs, though it can be used with sequential ones, and easily extended to parallel programs written using different message-passing libraries, due to its highly modular design. Experimental results are shown using CPPC with different test applications.
In parallelizing compilers on distributed memory systems, distributions of irregular sized array blocks are provided for load balancing and irregular problems. The irregular data redistribution is different from the regular block-cyclic redistribution. This paper is devoted to scheduling message for irregular data redistribution that attempt to obtain suboptimal solutions while satisfying the minimal communication costs condition and the minimal step condition. Based on the list scheduling, an efficient algorithm is developed and its experimental results are compared with previous algorithms. The improved list algorithm provides more chance for conflict messages in its relocation phase, since it allocates conflict messages through methods used in a divide-and-conquer algorithm and a relocation algorithm proposed previously. The method of selecting the smallest relocation cost guarantees that the improved list algorithm is more efficient than the other two in average.
This paper presents a newly implemented remote MPI-I/O mechanism using a Parallel Virtual File System (PVFS) to achieve high performance data-intensive I/O operations among computers. MPI-I/O extensions were realized in a flexible intermediate library named Stampi to support seamless MPI-I/O operations among computers. With the help of a flexible underlying communication mechanism of the library, MPI-I/O operations are available both inside a computer and among computers with the same MPI-I/O APIs without awareness of underlying communication and I/O mechanisms. To cope with recent data-intensive applications, PVFS was developed, and it provides a huge amount of parallel file system on a Linux PC cluster. Although it is available inside a PC cluster, it is not accessible from a remote computer. To exploit advantage of the PVFS file system in the remote MPI-I/O operations using Stampi, PVFS I/O functions have been implemented in the remote MPI-I/O mechanism. Through performance measurement of primitive MPI-I/O functions, sufficient performance has been achieved and effectiveness of the implementation has been confirmed.
Krishna KANT Amit SAHOO Nrupal JANI
Given the availability of high-speed Ethernet and HW based protocol offload, clustered systems using a commodity network fabric (e.g., TCP/IP over Ethernet) are expected to become more attractive for a range of e-business and data center applications. In this paper, we describe a comprehensive simulation to study the performance of clustered database systems using such a fabric. The simulation model currently supports both TCP and SCTP as the transport protocol and models an Oracle 9i like clustered DBMS running a TPC-C like workload. The model can be used to study a wide variety of issues regarding the performance of clustered DBMS systems including the impact of enhancements to network layers (transport, IP, MAC), QoS mechanisms or latency improvements, and cluster-wide power control issues.
Hashem Hashemi NAJAF-ABADI Hamid SARBAZI-AZAD
In this paper, routing properties of cube-based optoelectronic OTIS networks are explored. We show emulations of various cubical network topologies on their OTIS augmented variants, including the n-D grid networks, shuffle-exchange, and de Brujin networks. An analytical performance model for OTIS-cube networks is proposed. The model is validated by means of comparison with rigorously obtained simulation results. Using this model, the performance characteristics of the OTIS-hypercube network are evaluated in view of a number of different constraints. Moreover, we compare the performance characteristics of the OTIS-hypercube with that of equivalent fully-electronic networks under various implementation constraints.
JungYul CHOI JinSeek CHOI Minho KANG
In this paper, we develop an analytical model to evaluate the performance of optical burst switching (OBS) node with optical-level buffers for retransmission of blocked bursts. First, currently used burst blocking models and modelling of optical buffers at OBS nodes are shown to be inappropriate as a blocking model for retransmission in OBS nodes. Thus, we propose a new blocking model for the burst transmission mechanism with retransmission technique in an optical-level buffered OBS node. From the numerical analysis, we show the performance enhancement by applying optical buffers for retransmission of blocked bursts in terms of burst blocking probability and link utilization.
Hiroshi YAMAMOTO Kenji KAWAHARA Tetsuya TAKINE Yuji OIE
Recent improvements in the performance of end-computers and networks have made it feasible to construct a grid system over the Internet. A grid environment consists of many computers, each having a set of components and a distinct performance. These computers are shared among many users and managed in a distributed manner. Thus, it is important to focus on a situation in which the computers are used unevenly due to decentralized management by different task schedulers. In this study, which is a preliminary investigation of the performance of task allocation schemes employed in a decentralized environment, the average execution time of a long-lived task is analytically derived using the M/G/1-PS queue. Furthermore, assuming a more realistic condition, we evaluate the performance of some task allocation schemes adopted in the analysis, and clarify which scheme is applicable to a realistic grid environment.
Alex K. JONES Jiang ZHENG Ahmed AMER
The performance of parallel computing applications is highly dependent on the efficiency of the underlying communication operations. While often characterized as dynamic, these communication operations frequently exhibit spatial and temporal locality as well as regularity in structure. These characteristics can be exploited to improve communication performance if the correct prediction model is selected to a suitable communication topology. In this paper we describe an entropy based methodology for quantifying and evaluating the success of different prediction models on actual workloads drawn from representative parallel benchmarks. We evaluate two different prediction criteria and combinations thereof: (1) Messages are partitioned by source node. (2) Use of a first order context model. We also describe the threshold for predication designed to largely avoid incorrect predication overheads. Our results show for simple predication models, even on highly dynamic benchmark applications, predictability can be improved by several orders of magnitude. In fact, using simple prediction techniques, over 75% of the communication volume is accurately predictable.
Yuanyuan ZHANG Yasushi INOGUCHI
Efficient task scheduling is critical for achieving high performance in grid computing systems. Existing task scheduling algorithms for grid environments usually assume that the performance prediction for both tasks and resources is perfectly accurate. In practice, however, it is very difficult to achieve such an accurate prediction in a heterogeneous and dynamic grid environment. Therefore, the performance of a task scheduling algorithm may be significantly influenced by prediction inaccuracy. In this paper, we study the influence of inaccurate predictions on task scheduling in the contexts of task selection and processor selection, which are two critical phases in task scheduling algorithms. We develop formulas for the misprediction degree, which is defined as the probability that the predicted values for the performances of tasks and processors reveal different orders from their real values. Based on these formulas, we also investigate the effect of several key parameters on the misprediction degree. Finally, we conduct extensive simulation for the sensitivities of some existing task scheduling algorithms to the prediction errors.
Kuan-Rong LEE Mong-Fong HORNG Yau-Hwang KUO
A novel distributed dynamic regional location management scheme called MORR (Mobility Oriented Regional Registration) is proposed for Mobile IP to improve the signaling traffic cost of a mobile node. This improvement is achieved by adjusting each mobile node's optimal regional domains according to its mobility behavior. With Mobile IP, the capricious mobility and variable traffic load of a mobile node has an impact on its average signaling traffic cost. In this paper, the mobility of all mobile nodes is classified into two modes: random mobility mode and regular mobility mode. We develop new analytical models to formulate the movement behavior and mathematically evaluate their characteristics when applied to these two modes. MORR has the adaptability to manipulate various mobility modes of each mobile node in dedicated ways to determine an optimal regional domain of this mobile node. Simulation results show that anywhere from 3 to 15 percent of the signaling cost is saved by MORR in comparison with the previous distributed dynamic location management schemes for various scenarios.
Mobile applications require software reconfiguration to improve resource usage and availability. We propose a power-aware reconfiguration scheme that (1) moves energy-demanding applications to proxy servers, and (2) adjusts the fidelity of mobile applications as resources diminish. We formulate a cooperative reconfiguration plan which determines when, where, and which components should be deployed and have their fidelity controlled, so as to minimize the power consumption of mobile devices and to utilize the system resources of servers efficiently. We then construct a graph-theoretic model of the cost of migrating components to one proxy server or to a cluster of servers. In this model, changes to the residual energy of mobile devices, available server resources, and the wireless network bandwidth can all accelerate or decelerate the migration and fidelity control of applications. We suggest an approximation algorithm that achieves a near-optimal solution in terms of energy consumption. Our proposal will support mobile applications which require large amount of computation and need to maintain their services for an extended time such as video conferencing, multimedia e-mail, and real-time navigation. Simulation-based experiments verify that our scheme is an efficient way to extend the battery life of mobile devices and to improve the response time of mobile applications.
Hyung-Min YOON Woo-Shik KANG Oh-Young KWON Seong-Hun JEONG Bum-Seok KANG Tack-Don HAN
New service concepts involving mobile devices with a diverse range of embedded sensors are emerging that share contexts supporting communication on a wireless network infrastructure. To promote these services in mobile devices, we propose a method that can efficiently detect a context provider by partitioning the location, time, speed, and discovery sensitivities.
Ying-Hong WANG Chih-Chieh CHUANG Chih-Feng CHAO
A mobile ad hoc network (MANET) is a self-organizing and adaptive wireless network constructed by the dynamic gathering of mobile nodes (MNs). The communication among MNs in MANETs is carried out without base stations or access points and the transmission of data packets is completed through relays among nodes. Due to the mobility of MNs, the topology of a MANET frequently changes and thus results in the disability of on-the-fly data transmission routes. To cope with the intrinsic properties of MANETs, Dynamic Backup Routes Routing Protocol (DBR2P), a distributed backup routes mechanism for quick reconnection during link failures, is proposed in this paper. DBR2P is an on-demand routing protocol which can set up many backup routes to reach a destination node in a given period of time. The information of backup routes can be saved in a specific on-the-route node and enables backup routes to be found immediately in situation regarding disconnection. When a link fails, routes from the source node to the destination node are analyzed to obtain backup routes and to sustain quick reconnection. As a result, DBR2P could more thoroughly improve the quality of routing than previous routing protocols.
Young-Ching DENG Ching-Chi HSU Ferng-Ching LIN
An ad hoc network is formed by a group of mobile hosts communicating over wireless channels. There is no any fixed network interaction and centralized administration. Because a routing protocol needs an efficient medium access control (MAC) protocol to support, to design an efficient MAC protocol is important and fundamental in ad hoc networks. So far, no other MAC protocol has stable broadcast performance in the dense mobile ad hoc network. In this paper, we address the issue of reliable broadcast and stable performance at the MAC layer. We present a reliable and adaptive broadcast MAC protocol RAMAC which is a TDMA-based distributed MAC protocol for the broadcast reservation in mobile ad hoc networks. We divide the area into many grid cells with the support of GPS. We use the properties of grid cells to design an efficient protocol. RAMAC is characterized by five important features: (i) A dynamic frame size is generated in every contention. This dynamic frame size can let RAMAC adapt to the network load. (ii) Our well-designed reservation protocol can avoid the deadlock problem. (iii) When the network is dense, RAMAC can still work stably; however, no other MAC protocols can work well in the dense network. (iv) We propose a reservation protocol that can efficiently and fast reserve data slots. (v) The well-designed grid architecture makes the senders of unicast in a grid cell transmit concurrently as many as possible, so RAMAC is highly parallel in unicast.
Wireless sensor networks present a promising opportunity for realizing many practical applications. Tracking is one of the important applications of these networks. Many approaches have been proposed in the literature to deal with the tracking problem. Recently, a particular type of tracking problem called on-site tracking has been introduced [15],[16]. On-site tracking has been characterized as the tracking in which the sink is eventually required to be present in the vicinity of the target, possibly to perform further actions. In this paper, first we propose two efficient on-site tracking algorithms. Then, we derive theoretical upper bounds for the tracking time and the number of messages generated by the sensor nodes during the tracking for our algorithms. Finally, we present a simulation study that we conducted to evaluate the performance of our algorithms. The results show that our algorithms are efficient as compared to the other existing methods that can solve the on-site tracking problem. In particular, the path adaptive nature of the sink in our algorithms allows the network to conserve the energy and the sink to reduce the tracking time.
Feng HONG Minglu LI Minyou WU Jiadi YU
Routing efficiency is the critical issue when constructing peer-to-peer overlay. However, Chord has often been criticized on its careless of routing locality. A routing efficiency enhancement protocol on top of Chord is illustrated in this paper, which is called PChord. PChord aims to achieve better routing efficiency than Chord by exploiting proximity of the underlying network topology. The simulation shows that PChord has achieved lower RDP per message routing.
Shigeaki TAGASHIRA Syuhei SHIRAKAWA Satoshi FUJITA
Content-Addressable Network (CAN) provides a mechanism that could retrieve objects in a P2P network by maintaining indices to those objects in a fully decentralized manner. In the CAN system, index caching is a useful technique for reducing the response time of retrieving objects. The key points of effective caching techniques are to improve cache hit ratio by actively sharing caches distributed over the P2P network with every node and to reduce a maintenance and/or routing overhead for locating the cache of a requested index. In this paper, we propose a new caching technique based on the notion of proxy-type caching techniques which have been widely used in WWW systems. It can achieve active cache sharing by incorporating the concept of proxy caching into the index access mechanism and locate a closer proxy cache of a requested index with a little routing overhead. By the result of simulations, we conclude that it can improve the response time of retrieving indices by 30% compared with conventional caching techniques.
Flash bulk files downloading in style of P2P through perpendicular pattern becomes more popular recently. Many peers download different pieces of shared files from the source in parallel. They try to reconstruct complete files by exchanging needed pieces with other downloading peers. The throughput of entire downloading community, as well as the perceived downloading rate of each peer, greatly depends on uploading bandwidth contributed by every individual peer. Unfortunately, without proper built-in incentive mechanism, peers inherently tend to relentlessly download while intentionally limiting their uploading bandwidth. In this paper, we propose a both effective and efficient incentive approach--Reciprocity, which is only based on end-to-end measurement and reaction: a peer caps uploading rate to each of its peers at the rate that is proportional to its downloading rate from that one. It requires no centralized control, or electronic monetary payment, or certification. Preliminary experiments' results reveal that this approach offers favorable performance for cooperative peers, while effectively punishing defective ones.
Giscard WEPIWE Plamen L. SIMEONOV
The paper presents HiPeer, a robust resource distribution and discovery algorithm that can be used for fast and fault-tolerant location of resources in P2P network environments. HiPeer defines a concentric multi-ring overlay networking topology, whereon dynamic network management methods are deployed. In terms of performance, HiPeer delivers of number of lowest bounds. We demonstrate that for any De Bruijn digraph of degree d
Baoliu YE Minyi GUO Jingyang ZHOU Daoxu CHEN
A fundamental problem in a pure Peer-to-Peer (P2P) file sharing system is how to protect the anonymity of peer nodes when providing efficient data access services. Most of existing work mainly focus on how to provide the initiator anonymity, but neglect the anonymity of the responder. In this paper, we propose a multicast-based protocol, called Mapper, for efficient file sharing with mutual anonymity. By seamlessly combining the technologies of multi-proxy and IP multicast together, the proposed protocol guarantees mutual anonymity during the entire session of file retrieval. Furthermore, Mapper replicates requested files inside the multicast group, so file distribution can be adjusted adaptively and the cost for multicast can be further reduced. Results of both simulations and theoretical analyses demonstrate that Mapper possesses the merits of scalability, reliability, and high adaptability.
Genetic algorithms are a general problem-solving technique that has been widely used in computational biology. In this paper, we present a framework to map hierarchical parallel genetic algorithms for protein folding problems onto computational grids. By using this framework, the two level communication parts of hierarchical parallel genetic algorithms are separated. Thus both parts of the algorithm can evolve independently. This permits users to experiment with alternative communication models on different levels conveniently. The underlying programming techniques are based on generic programming, a programming technique suited for the generic representation of abstract concepts. This allows the framework to be built in a generic way at application level and thus provides good extensibility and flexibility. Experiments show that it can lead to significant runtime savings on PC clusters and computational grids.
Chuliang WENG Minglu LI Xinda LU
The computational grid provides a promising platform for the deployment of various high-performance computing applications. Problem in implementing computational grid environments is how to effectively use various resources in the system, such as CPU cycle, memory, communication network, and data storage. There are many effective heuristic algorithms for scheduling in the computational grid, however most scheduling strategies have no theoretical guarantee at all. In this paper, a cost-based online scheduling algorithm is presented for job assignment in the grid environment with theoretical guarantee. Firstly, a scheduling framework is described, where the grid environment is characterized, and the online job model is defined. Secondly, the cost-based online scheduling algorithm is presented where costs of resources are exponential functions of their loads, and the performance of this algorithm is theoretically analyzed against the performance of the optimal offline algorithm. Finally, we implement the algorithm in the grid simulation environment, and compare the performance of the presented algorithm with the other three algorithms, and experimental results indicate that the cost-based online scheduling algorithm can outperform the other three online algorithms.
The concept of grid computing emerged with the appearance of high-speed network. Effective grid worker (i.e., computing resource) selection mechanism is important to achieve reliable grid computing system since each worker participate in grid computing is heterogeneous. In this paper, we suggest a credible worker selection mechanism that maximizes grid computing performance by allocating appropriate tasks to each grid worker. Diverse workers can be used efficiently by grid applications through the ranking process of worker's credibility. Initially, the rank of each grid worker's credibility is decided considering static information only such as CPU speed, RAM size, storage capacity and network bandwidth. And then, the rank is refined by using dynamic information such as failure rate, turn around time provided after the task is completed, and correctness of the return value. In the experiments, we find that the proposed mechanism provides improved grid computing performance with high credibility.
Hai JIN Xuanhua SHI Weizhong QIANG Deqing ZOU
Grid computing presents a new trend to distributed and Internet computing to coordinate large scale resources sharing and problem solving in dynamic, multi-institutional virtual organizations. Due to the diverse failures and error conditions in the grid environments, developing, deploying, and executing applications over the grid is a challenge, thus dependability is a key factor for grid computing. This paper presents a dependable grid computing framework, called DRIC, to provide an adaptive failure detection service and a policy-based failure handling mechanism. The failure detection service in DRIC is adaptive to users' QoS requirements and system conditions, and the failure-handling mechanism can be set optimized based on decision-making method by a policy engine. The performance evaluation results show that this framework is scalable, high efficiency and low overhead.
Changes in recent business and scientific environment have created a necessity for more efficient and effective workflow infrastructure. With increasing emphasis on Service-oriented architecture, service composition becomes a hot topic in workflow research. This paper proposes a novel approach of using ECA rules to realize the workflow modeling and implementation for service composition. First of all, the concept and formalization of ECA rule-based workflow is presented. Special activities and data structures are customized for the purpose of service composition. Second, an automatic event composition and decomposition algorithm is developed to ensure the correctness and validness of service composition at design time. Finally, the proposed ECA rule-based approach for service composition is illustrated through the implementation of a workflow prototype system.
Real-time applications are indispensable for conducting research and business in government, industry, and academic organizations. Recently, real-time applications with security requirements increasingly emerged in large-scale distributed systems such as Grids. However, the complexities and specialties of diverse security mechanisms dissuade users from employing existing security services for their applications. To effectively tackle this problem, in this paper we propose a security middleware (SMW) model from which security-sensitive real-time applications are enabled to exploit a variety of security services to enhance the trustworthy executions of the applications. A quality of security control manager (QSCM), a centerpiece of the SMW model, has been designed and implemented to achieve a flexible trade-off between overheads caused by security services and system performance, especially under situations where available resources are dynamically changing and insufficient. A security-aware scheduling mechanism, which plays an important role in QSCM, is capable of maximizing quality of security for real-time applications running in distributed systems as large-scale as Grids. Our empirical studies based on real world traces from a supercomputing center demonstratively show that the proposed model can significantly improve the performance of Grids in terms of both security and schedulability.
FPGAs (Field-Programmable Gate Arrays) have been widely used as coprocessors to boost the performance of data-intensive applications [1],[2]. However, there are several challenges to further boost FPGA performance: the communication overhead between the host workstation and the FPGAs can be substantial; large-scale applications cannot fit in a single FPGA because of its limited capacity; mapping an application algorithm to FPGAs still remains a daunting job in configurable system design. To circumvent these problems, we propose in this paper the FPGA-based Hierarchical-SIMD (H-SIMD) machine with its codesign of the Pyramidal Instruction Set Architecture (PISA). PISA comprises high-level instructions implemented as FPGA functions of coarse-grain SIMD (Single-Instruction, Multiple-Data) tasks to facilitate ease of program development, code portability across different H-SIMD implementations and high performance. We assume a multi-FPGA board where each FPGA is configured as a separate SIMD machine. Multiple FPGA chips can work in unison at a higher SIMD level, if needed, controlled by the host. Additionally, by using a memory switching scheme and the high-level PISA to partition applications into coarse-grain tasks, host-FPGA communication overheads can be hidden. We enlist the two-dimensional Fast Fourier Transform (2D FFT) to test the effectiveness of H-SIMD. The test results show sustained high performance for this problem. The H-SIMD machine even outperforms a Xeon processor for this problem.
An algorithm is described for solving the node-to-set disjoint paths problem in bi-rotator graphs, which are obtained by making each edge of a rotator graph bi-directional. The algorithm is of polynomial order of n for an n-bi-rotator graph. It is based on recursion and divided into three cases according to the distribution of destination nodes in the classes into which the nodes in a bi-rotator graph are categorized. We estimated that it obtains 2n-3 disjoint paths with a time complexity of O(n5), that the sum of the path lengths is O(n3), and that the length of the maximum path is O(n2). Computer experiment showed that the average execution time was O(n3.9) and, the average sum of the path lengths was O(n3.0).
This paper proposes a set of novel distributed algorithms on m-D mesh overlay configurations for short delay and low resource consumption application layer multicast. In contrast to previous approaches, our application layer multicast adopts two-layer tree architecture and the novelty and contribution are: (1) cluster formation algorithm assigns the closest group members into the same cluster that greatly decreases the multicast delay and resource consumption caused by the message transmission among the members with long distances; (2) optimal core selection algorithm seeks the cluster member who has the minimum sum of static delay distances to other cluster members as the optimal cores (i.e. cluster cores) that guarantees the short multicast delay; (3) weighted shortest path tree generation algorithm constructs a shortest path tree rooted at the optimal core for each cluster. The shortest path tree utilizes the minimum sum of links that are on the shortest paths among the cluster members; and (4) distributed multicast routing algorithm directs the multicast messages to be efficiently distributed along the two-layer multicast architecture in parallel without a global control. The extended simulation results indicate that the application layer multicast constructed by our algorithms is efficient in terms of short multicast delay and low network resource consumption as compared with other well-known existing multicast solutions.
Popular Web sites form their Web servers into Web server clusters. The Web server cluster operates with a load-balancing algorithm to distribute Web requests evenly among Web servers. The load-balancing algorithms founded on conventional periodic load-information update mechanism are not scalable due to the synchronized update of load-information. We propose a load-balancing algorithm that the load-information update is not synchronized by exploiting variant execution times of executing scripts in dynamic Web pages. The load-information of each server is updated 'individually' by a new load-information update mechanism, and the proposed algorithm supports high scalability based on this individual update. Simulation results have proven the improvement in system performance through another aspect of high scalability. Furthermore, the proposed algorithm guarantees some level of QoS for Web clients by fairly distributing requests. A fundamental merit of the proposed algorithm is its simplicity, which supports higher throughput of the Web switch.
Sabin TABIRCA Tatiana TABIRCA Laurence T. YANG
The Feedback-Guided Dynamic Loop Scheduling (FGDLS) algorithm [1] is a recent dynamic approach to the scheduling of a parallel loop within a sequential outer loop. Earlier papers have analysed convergence under the assumption that the workload is a positive, continuous, function of a continuous argument (the iteration number). However, this assumption is unrealistic since it is known that the iteration number is a discrete variable. In this paper we extend the proof of convergence of the algorithm to the case where the iteration number is treated as a discrete variable. We are able to establish convergence of the FGDLS algorithm for the case when the workload is monotonically decreasing.
I-Shyan HWANG I-Feng HUANG Chih-Dar CHIEN David H. SU
This work proposes a distributed fault protection mechanism called the Dynamic-Shared Segment Protection (DSSP) algorithm for WDM (Wavelength Division Multiplexing) mesh networks. The objects are to assure high probability of path protection and efficient use of network resources. The proposed approach exploits the segment protection mode, which accommodates the characteristics of both path-based and link-based protections, for providing finer service granularities, to satisfy the versatile requirements of critical applications in the foreseeable future. To show that DSSP can improve performance efficiency, simulations are conducted using four networks (NSFNET, USANET, Mesh 6
Takeru INOUE Ryosuke KUREBAYASHI
In this paper, we examine the efficiency of tunneling techniques since they will accelerate multicast deployment. Our motivation is that, despite the many proposals focused on tunneling techniques, their impact on multicast efficiency has yet to be assessed sufficiently. First, the structure of multicast delivery trees is examined based on the seminal work of Phillips et al. [26]. We then quantitatively assess the impact of tunneling, such as loads imposed on the tunnel endpoints and redundant traffic. We also formulate a critical size of multicast island, above which the loads are suddenly diminished. Finally, a unique delivery tree model is introduced, which is so simple yet practical, to better understand the performance of the multicast-related protocols. This paper is the first to formulate the impact of tunneling.
Paola FLOCCHINI Antonio Mesa ENRIQUES Linda PAGLI Giuseppe PRENCIPE Nicola SANTORO
We consider the problem of computing the optimal swap edges of a shortest-path tree. This problem arises in designing systems that offer point-of-failure shortest-path rerouting service in presence of a single link failure: if the shortest path is not affected by the failed link, then the message will be delivered through that path; otherwise, the system will guarantee that, when the message reaches the node where the failure has occurred, the message will then be re-routed through the shortest detour to its destination. There exist highly efficient serial solutions for the problem, but unfortunately because of the structures they use, there is no known (nor foreseeable) efficient distributed implementation for them. A distributed protocol exists only for finding swap edges, not necessarily optimal ones. We present two simple and efficient distributed algorithms for computing the optimal swap edges of a shortest-path tree. One algorithm uses messages containing a constant amount of information, while the other is tailored for systems that allow long messages. The amount of data transferred by the protocols is the same and depends on the structure of the shortest-path spanning-tree; it is no more, and sometimes significantly less, than the cost of constructing the shortest-path tree.
This paper studies the problem of light splitter placement (LSP) and wavelength converter placement (WCP) in all-optical WDM networks to enable optimal provisioning of static and dynamic traffic through efficient photonic multicast connections. To solve the LSP-WCP problem under static traffic provisioning, an Integer Linear Programming model is formulated to achieve the optimal solution in the sense that the total number of wavelength channels required by the multicast requests is minimized. To solve the LSP-WCP problem under dynamic traffic provisioning, a complementary-combined LSP-WCP heuristic is proposed to minimize the multicast traffic blocking probability, and is proved through extensive simulations.
Chen YU Xiaohong JIANG Susumu HORIGUCHI
A combination of horizontal expansion and vertical stacking of optical Banyan (HVOB) is the general architecture for building Banyan-based optical cross-connects (OXCs), and the intrinsic crosstalk problem of optical signals is a major constraint in designing OXCs. In this paper, we analyze the blocking behavior of HVOB networks and develop the lower bound on blocking probability of a HVOB network that is free of first-order crosstalk in switching elements. The proposed lower-bound is significant because it provides network designers an effective tool to estimate the minimum blocking probability they can expect from a HVOB architecture regardless what kind of routing strategy to be adopted. Our lower bound can accurately depict the overall blocking behavior in terms of the minimum blocking probability in a HVOB network, as verified by extensive simulation based on a network simulator with both random routing and packing routing strategies. Surprisingly, the simulated and theoretical results show that our lower bound can be used to efficiently estimate the blocking probability of HVOB networks applying packing strategy. Thus, our analytical model can guide network designers to find the tradeoff among the number of planes (stacked copies), the number of SEs, the number of stages and blocking probability in a HVOB network applying packing strategy.
While the survivability becomes more and more important in WDM backbone network design, the signaling strategy corresponding to a protection/restoration scenario upon failures will have significant influence on the performance and then determine the integrity of the total solution. In this paper we will discuss the control mechanisms for several representative protection schemes, analyze their adaptation and application, and propose the corresponding signaling model and control protocol for a novel dynamic group protection strategy. The simulation results show that the control overhead of our proposed method outperforms the segmented protection and has the benefit on resource utilization and failure restoration speed.
Chao-Tung YANG Po-Chi SHIH Sung-Yi CHEN
Grid computing technologies enable large-scale aggregation and sharing of resources via wide-area networks. Grid technologies include elements such as security, job description, information gathering, scheduling, and resource dispatching, among others. In this paper, we address information gathering and focus on providing a domain-based model for network information measurement using Network Weather Service (NWS) on Grid computing environments.
Let G = (V,E) be an edge weighted graph with n vertices and m edges. For a given integer p with 1 < p < n, we call a set X
Toshiya MASHIMA Takanori FUKUOKA Satoshi TAOKA Toshimasa WATANABE
The 2-vertex-connectivity augmentation problem for a specified set of vertices of a graph with degree constraints, 2VCA-SV-DC, is defined as follows: "Given an undirected graph G = (V,E), a specified set of vertices S ⊆V with |S|
A chordal graph is a graph which contains no chordless cycle of at least four edges as an induced subgraph. The class of chordal graphs contains many famous graph classes such as trees, interval graphs, and split graphs, and is also a subclass of perfect graphs. In this paper, we address the problem of enumerating all labeled chordal graphs included in a given graph. We think of some variations of this problem. First we introduce an algorithm to enumerate all connected labeled chordal graphs in a complete graph of n vertices. Next, we extend the algorithm to an algorithm to enumerate all labeled chordal graphs in a n-vertices complete graph. Then, we show that we can use, with small changes, these algorithms to generate all (connected or not necessarily connected) labeled chordal graphs in arbitrary graph. All our algorithms are based on reverse search method, and time complexities to generate a chordal graph are O(1), and also O(1) delay. Additionally, we present an algorithm to generate every clique of a given chordal graph in constant time. Using these algorithms we obtain combinatorial Gray code like sequences for these graph structures in which the differences between two consecutive graphs are bounded by a constant size.
Tomokazu ARITA Kensei TSUCHIDA Takeo YAKU
Vigna and Ghezzi showed that the language of grid graphs could not be constructed by their context-free graph grammars [1]. In this paper, we construct a context-sensitive graph grammar for the two-dimensional grid graphs.
In this paper, we study the layout problem of the supercube by embedding-in-book technique. The supercube, a generalization of the hypercube, can be constructed for an arbitrary system size. Moreover, it has the same diameter and connectivity of the corresponding hypercube. Embedding a graph in book is to place nodes along the spine of the book and to draw the edges such that edges residing in a page do not cross. An n-dimensional hypercube is regular because the degree of each node is n. The corresponding supercube with N nodes, where 2n-1 < N
Jinhee CHUN Kunihiko SADAKANE Takeshi TOKUYAMA
In [5], the following pyramid construction problem was proposed: Given nonnegative valued functions ρ and µ in d variables, we consider the optimal pyramid maximizing the total parametric gain of ρ against µ. The pyramid can be considered as the optimal unimodal approximation of ρ relative to µ, and can be applied to hierarchical data segmentation. In this paper, we give efficient algorithms for a couple of two-dimensional pyramid construction problems.
In many parallel programs, run-time array redistribution is usually required to enhance data locality and reduce remote memory access on the distributed memory multicomputers. In general, array distribution can be classified into regular distribution and irregular distribution according to the distribution fashion. Many methods for performing regular array redistribution have been presented in the literature. However, for the heterogeneous computation environment, irregular array redistributions can be used to adjust data assignment at run-time. In this paper, an Essential Cycle Calculation method for unequal block sizes array redistribution is presented. In the ECC method, a processor first computes the source/destination processor/data sets of array elements in the first essential cycle of the local array it owns. From the source/destination processor/data sets of array elements in the first essential cycle, we can construct packing/unpacking pattern tables. Since each essential cycle has the same communication pattern, based on the packing/unpacking pattern tables, a processor can pack/unpack array elements efficiently. To evaluate the performance of the ECC method, we have implemented this method on an IBM SP2 parallel machine and compare it with the Sequence method. The cost models for these methods are also presented. The experimental results show that the ECC method greatly outperforms the Sequence method for all test samples.
Miki FUKUYAMA Masatoshi SHIMAKAGE Atsuo HAZEYAMA
In everyday life, a situation often occurs wherein two or more persons with different personal schedules must determine a single job schedule. The authors focus on the practical concept of rent and loan and propose a scheduling system. This system generates a schedule that automatically coordinates with a state involving minimum rent and loan. They also propose a method that employs the analytic network process (ANP) for setting individual priorities based on the rent and loan information. Furthermore, the authors implement the proposed system as a simulation system and verify whether it generates a fair schedule by computing the sum of the rent and loan of different individuals. The result shows that in comparison with human scheduling, the proposed method generates a fairer schedule by computing the rent and loan of each individual.
Kazuhiro KONDO Kiyoshi NAKAGAWA
We proposed and evaluated a speech packet loss concealment method which predicts lost segments from speech included in packets either before, or both before and after the lost packet. The lost segments are predicted recursively by using linear prediction both in the forward direction from the packet preceding the loss, and in the backward direction from the packet succeeding the lost segment. Predicted samples in each direction are smoothed by averaging using linear weights to obtain the final interpolated signal. The adjacent segments are also smoothed extensively to significantly reduce the speech quality discontinuity between the interpolated signal and the received speech signal. Subjective quality comparisons between the proposed method and the the packet loss concealment algorithm described in the ITU standard G.711 Appendix I showed similar scores up to about 10% packet loss. However, the proposed method showed higher scores above this loss rate, with Mean Opinion Score rating exceeding 2.4, even at an extremely high packet loss rate of 30%. Packet loss concealment of speech degraded with G.729 coding, and babble noise mixed speech showed similar trends, with the proposed method showing higher qualities at high loss rates. We plan to further improve the performance by using adaptive LPC prediction order depending on the estimated pitch, and adaptive LPC bandwidth expansion depending on the consecutive number of repetitive prediction, among many other improvements. We also plan to investigate complexity reduction using gradient LPC coefficient updates, and processing delay reduction using adaptive forward/bidirectional prediction modes depending on the measured packet loss ratio.
Masato OGATA Hiroyuki WADA Kagenori KAJIHARA Jeroen van BAAR
Multi-projector technology has been under consideration in recent years. This technology allows the generation of wide field of view and high-resolution images in a cost-effective manner. It is expected to be applied extensively to training simulators where vivid immersive sensations and precision are required. However, in many systems the viewing frustums cannot be automatically assigned for distributed rendering, and the required manual setup is complicated and difficult. This is because the camera should be coincide exactly with a desired eye point to avoid perspective distortions. For the actual applications, the camera is seldom able to be set up at the desired eye point because of physical constraints, e.g., a narrow cockpit with many instruments. To resolve this issue, we have developed a "virtual camera method" that yields high-precision calibration regardless of the camera position. This method takes advantage of the quadratic nature of the display surface. We developed a practical real-time multi-projector display system for applications such as training simulators, that require high-accuracy in geometry and rapid response time.
Xuan-Hieu PHAN Le-Minh NGUYEN Susumu HORIGUCHI
Cross-document personal name resolution is the process of identifying whether or not a common personal name mentioned in different documents refers to the same individual. Most previous approaches usually rely on lexical matching such as the occurrence of common words surrounding the entity name to measure the similarity between documents, and then clusters the documents according to their referents. In spite of certain successes, measuring similarity based on lexical comparison sometimes ignores important linguistic phenomena at the semantic level such as synonym or paraphrase. This paper presents a semantics-based approach to the resolution of personal name crossover documents that can make the most of both lexical evidences and semantic clues. In our method, the similarity values between documents are determined by estimating the semantic relatedness between words. Further, the semantic labels attached to sentences allow us to highlight the common personal facts that are potentially available among documents. An evaluation on three web datasets demonstrates that our method achieves the better performance than the previous work.
This paper investigates zero pronouns in Korean, especially focusing on the center transitions of adjacent utterances under the framework of Centering Theory. Four types of nominal entity (Epair, Einter, Eintra, and Enon) from Centering Theory are defined with the concept of inter-, intra-, and pairwise salience. For each entity type, a case study of zero phenomena is performed through analyzing corpus and building a pronominalization model. This study shows that the zero phenomena of entities which have been neglected in previous Centering works are explained via the center transition of the second previous utterance, and provides valuable results for pronominalization of such entities, such as p2-trans rule. We improve the accuracy of pronominalization model by optimal feature selection and show that our accuracy outperforms the accuracy of previous works.
Geometrical properties of the lifting-up technique in support vector machines (SVMs) are discussed here. In many applications, an SVM finds the optimal inhomogeneous separating hyperplane in terms of margins while some of the theoretical analyses on SVMs have treated only homogeneous hyperplanes for simplicity. Although they seem equivalent due to the so-called lifting-up technique, they differ in fact and the solution of the homogeneous SVM with lifting-up strongly depends on the parameter of lifting-up. It is also shown that the solution approaches that of the inhomogeneous SVM in the asymptotic case that the parameter goes to infinity.
Jungbo SON Hae-Wook CHOI Sin-Chong PARK
The increased complexity in system design has brought an explosive growth in functional verification time. Thus, many verification methodologies have been proposed to reduce it. One of them is the co-emulation method in which the hardware accelerator and software simulator work together. This paper presents an effective testbench architecture for accelerated verification and reuse of parts of the testbench in co-emulation. The testbench is divided into a synthesizable part which can be hardware accelerated and a non-synthesizable part which remains on the software simulator. The split blocks of the testbench can be reused in other test environments. Experiments with real world systems show that the proposed verification environment has over 31% higher performance than that of the conventional co-emulation environment.
Toru KUMAGAI Motoyuki AKAMATSU
This paper presents a method of predicting future human driving behavior under the condition that its resultant behavior and past observations are given. The proposed method makes use of a dynamic Bayesian network and the junction tree algorithm for probabilistic inference. The method is applied to behavior prediction for a vehicle assumed to stop at an intersection. Such a predictive system would facilitate warning and assistance to prevent dangerous activities, such as red-light violations, by allowing detection of a deviation from normal behavior.