The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] failure detection(6hit)

1-6hit
  • Proactive Failure Detection Learning Generation Patterns of Large-Scale Network Logs

    Tatsuaki KIMURA  Akio WATANABE  Tsuyoshi TOYONO  Keisuke ISHIBASHI  

     
    PAPER-Network Management/Operation

      Pubricized:
    2018/08/13
      Vol:
    E102-B No:2
      Page(s):
    306-316

    Recent carrier-grade networks use many network elements (switches, routers) and servers for various network-based services (e.g., video on demand, online gaming) that demand higher quality and better reliability. Network log data generated from these elements, such as router syslogs, are rich sources for quickly detecting the signs of critical failures to maintain service quality. However, log data contain a large number of text messages written in an unstructured format and contain various types of network events (e.g., operator's login, link down); thus, genuinely important log messages for network operation are difficult to find automatically. We propose a proactive failure-detection system for large-scale networks. It automatically finds abnormal patterns of log messages from a massive amount of data without requiring previous knowledge of data formats used and can detect critical failures before they occur. To handle unstructured log messages, the system has an online log-template-extraction part for automatically extracting the format of a log message. After template extraction, the system associates critical failures with the log data that appeared before them on the basis of supervised machine learning. By associating each log message with a log template, we can characterize the generation patterns of log messages, such as burstiness, not just the keywords in log messages (e.g. ERROR, FAIL). We used real log data collected from a large production network to validate our system and evaluated the system in detecting signs of actual failures of network equipment through a case study.

  • A Novel Failure Detection Circuit for SUMPLE Using Variability Index

    Leiou WANG  Donghui WANG  Chengpeng HAO  

     
    BRIEF PAPER-Electronic Circuits

      Vol:
    E101-C No:2
      Page(s):
    139-142

    SUMPLE, one of important signal combining approaches, its combining loss increases when a sensor in an array fails. A novel failure detection circuit for SUMPLE is proposed by using variability index. This circuit can effectively judge whether a sensor fails or not. Simulation results validate its effectiveness with respect to the existing algorithms.

  • Failure Detection in P2P-Grid System

    Huan WANG  Hideroni NAKAZATO  

     
    PAPER-Grid System

      Pubricized:
    2015/09/15
      Vol:
    E98-D No:12
      Page(s):
    2123-2131

    Peer-to-peer (P2P)-Grid systems are being investigated as a platform for converging the Grid and P2P network in the construction of large-scale distributed applications. The highly dynamic nature of P2P-Grid systems greatly affects the execution of the distributed program. Uncertainty caused by arbitrary node failure and departure significantly affects the availability of computing resources and system performance. Checkpoint-and-restart is the most common scheme for fault tolerance because it periodically saves the execution progress onto stable storage. In this paper, we suggest a checkpoint-and-restart mechanism as a fault-tolerant method for applications on P2P-Grid systems. Failure detection mechanism is a necessary prerequisite to fault tolerance and fault recovery in general. Given the highly dynamic nature of nodes within P2P-Grid systems, any failure should be detected to ensure effective task execution. Therefore, failure detection mechanism as an integral part of P2P-Grid systems was studied. We discussed how the design of various failure detection algorithms affects their performance in average failure detection time of nodes. Numerical analysis results and implementation evaluation are also provided to show different average failure detection times in real systems for various failure detection algorithms. The comparison shows the shortest average failure detection time by 8.8s on basis of the WP failure detector. Our lowest mean time to recovery (MTTR) is also proven to have a distinct advantage with a time consumption reduction of about 5.5s over its counterparts.

  • Bayesian Optimal Release Time Based on Inflection S-Shaped Software Reliability Growth Model

    Hee Soo KIM  Dong Ho PARK  Shigeru YAMADA  

     
    PAPER-Reliability, Maintainability and Safety Analysis

      Vol:
    E92-A No:6
      Page(s):
    1485-1493

    The inflection S-shaped software reliability growth model (SRGM) proposed by Ohba (1984) is one of the well- known SRGMs. This paper deals with the optimal software release problem with regard to the expected software cost under this model based on the Bayesian approach. To reflect the effect of the learning experience for the updated software system, we consider several improvement factors to adjust the values of parameters characterizing the inflection S-shaped SRGM. Appropriate prior distributions are assumed for such factors and the expected total software cost is formulated. The optimal release time is shown to be finite and uniquely determined. Because of the flexibility of prior distributions, the proposed Bayesian methods may be applied in many different situations. Numerical results are presented on the basis of the real data.

  • Object Tracking with Target and Background Samples

    Chunsheng HUA  Haiyuan WU  Qian CHEN  Toshikazu WADA  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E90-D No:4
      Page(s):
    766-774

    In this paper, we present a general object tracking method based on a newly proposed pixel-wise clustering algorithm. To track an object in a cluttered environment is a challenging issue because a target object may be in concave shape or have apertures (e.g. a hand or a comb). In those cases, it is difficult to separate the target from the background completely by simply modifying the shape of the search area. Our algorithm solves the problem by 1) describing the target object by a set of pixels; 2) using a K-means based algorithm to detect all target pixels. To realize stable and reliable detection of target pixels, we firstly use a 5D feature vector to describe both the color ("Y, U, V") and the position ("x, y") of each pixel uniformly. This enables the simultaneous adaptation to both the color and geometric features during tracking. Secondly, we use a variable ellipse model to describe the shape of the search area and to model the surrounding background. This guarantees the stable object tracking under various geometric transformations. The robust tracking is realized by classifying the pixels within the search area into "target" and "background" groups with a K-means clustering based algorithm that uses the "positive" and "negative" samples. We also propose a method that can detect the tracking failure and recover from it during tracking by making use of both the "positive" and "negative" samples. This feature makes our method become a more reliable tracking algorithm because it can discover the target once again when the target has become lost. Through the extensive experiments under various environments and conditions, the effectiveness and efficiency of the proposed algorithm is confirmed.

  • DRIC: Dependable Grid Computing Framework

    Hai JIN  Xuanhua SHI  Weizhong QIANG  Deqing ZOU  

     
    PAPER-Grid Computing

      Vol:
    E89-D No:2
      Page(s):
    612-623

    Grid computing presents a new trend to distributed and Internet computing to coordinate large scale resources sharing and problem solving in dynamic, multi-institutional virtual organizations. Due to the diverse failures and error conditions in the grid environments, developing, deploying, and executing applications over the grid is a challenge, thus dependability is a key factor for grid computing. This paper presents a dependable grid computing framework, called DRIC, to provide an adaptive failure detection service and a policy-based failure handling mechanism. The failure detection service in DRIC is adaptive to users' QoS requirements and system conditions, and the failure-handling mechanism can be set optimized based on decision-making method by a policy engine. The performance evaluation results show that this framework is scalable, high efficiency and low overhead.