The search functionality is under construction.

Author Search Result

[Author] Masashi TOYODA(6hit)

1-6hit
  • Finding Neighbor Communities in the Web Using an Inter-Site Graph

    Yasuhito ASANO  Hiroshi IMAI  Masashi TOYODA  Masaru KITSUREGAWA  

     
    PAPER-Database

      Vol:
    E87-D No:9
      Page(s):
    2163-2170

    In this paper, we present Neighbor Community Finder (NCF, for short), a tool for finding Web communities related to given URLs. While existing link-based methods of finding communities, such as HITS, trawling, and Companion, use algorithms running on a Web graph whose vertices are pages and edges are links on the Web, NCF uses an algorithm running on an inter-site graph whose vertices are sites and edges are global-links (links between sites). Since the phrase "Web site" is used ambiguously in our daily life and has no unique definition, NCF uses directory-based sites proposed by the authors as a model of Web sites. NCF receives URLs interested in by a user and constructs an inter-site graph containing neighbor sites of the given URLs by using a method of identifying directory-based sites from URL and link data obtained from the actual Web on demand. By computational experiments, we show that NCF achieves higher quality than Google's "Similar Pages" service for finding pages related to given URLs corresponding to various topics selected from among the directories of Yahoo! Japan.

  • Compact Encoding of the Web Graph Exploiting Various Power Distributions

    Yasuhito ASANO  Tsuyoshi ITO  Hiroshi IMAI  Masashi TOYODA  Masaru KITSUREGAWA  

     
    LETTER

      Vol:
    E87-A No:5
      Page(s):
    1183-1184

    Compact encodings of the web graph are required in order to keep the graph on the main memory and to perform operations on the graph efficiently. In this paper, we propose a new compact encoding of the web graph. It is 10% more compact than Link2 used in the Connectivity Server of Altavista and 20% more compact than the encoding proposed by Guillaume et al. in 2002 and is comparable to it in terms of extraction time.

  • FOREWORD Open Access

    Masashi TOYODA  

     
    FOREWORD

      Vol:
    E101-D No:4
      Page(s):
    985-985
  • Detecting Hijacked Sites by Web Spammer Using Link-Based Algorithms

    Young-joo CHUNG  Masashi TOYODA  Masaru KITSUREGAWA  

     
    PAPER-Information Retrieval

      Vol:
    E93-D No:6
      Page(s):
    1414-1421

    In this paper, we propose a method for finding web sites whose links are hijacked by web spammers. A hijacked site is a trustworthy site that points to untrustworthy sites. To detect hijacked sites, we evaluate the trustworthiness of web sites, and examine how trustworthy sites are hijacked by untrustworthy sites in their out-neighbors. The trustworthiness is evaluated based on the difference between the white and spam scores that calculated by two modified versions of PageRank. We define two hijacked scores that measure how likely a trustworthy site is to be hijacked based on the distribution of the trustworthiness in its out-neighbors. The performance of those hijacked scores are compared using our large-scale Japanese Web archive. The results show that a better performance is obtained by the score that considers both trustworthy and untrustworthy out-neighbors, compared with the one that only considers untrustworthy out-neighbors.

  • Web Community Chart: A Tool for Navigating the Web and Observing Its Evolution

    Masashi TOYODA  Masaru KITSUREGAWA  

     
    PAPER-Databases

      Vol:
    E86-D No:6
      Page(s):
    1024-1031

    We propose a web community chart that is a tool for navigating the Web and for observing its evolution through web communities. A web community is a set of web pages created by individuals or associations with a common interest in a topic. Recent research shows that such communities can be extracted by link analysis. Our web community chart is a graph of whole communities, in which relevant communities are connected by edges. Using this chart, we can navigate through related communities. Moreover we can answer historical queries about topics on the Web and understand sociology of web community creation, by observing when and how communities emerged and evolved. We observe the evolution of communities by comparing three charts built from Japanese web archives crawled in 1999, 2000, and 2001. Several metrics are introduced for measuring the degree of community evolution, such as growth rate, novelty. Finally, we develop a web community evolution viewer that allows us to extract evolving communities using the relevance and metrics. Several evolution examples are shown using this viewer.

  • Mining Communities on the Web Using a Max-Flow and a Site-Oriented Framework

    Yasuhito ASANO  Takao NISHIZEKI  Masashi TOYODA  Masaru KITSUREGAWA  

     
    PAPER-Data Mining

      Vol:
    E89-D No:10
      Page(s):
    2606-2615

    There are several methods for mining communities on the Web using hyperlinks. One of the well-known ones is a max-flow based method proposed by Flake et al. The method adopts a page-oriented framework, that is, it uses a page on the Web as a unit of information, like other methods including HITS and trawling. Recently, Asano et al. built a site-oriented framework which uses a site as a unit of information, and they experimentally showed that trawling on the site-oriented framework often outputs significantly better communities than trawling on the page-oriented framework. However, it has not been known whether the site-oriented framework is effective in mining communities through the max-flow based method. In this paper, we first point out several problems of the max-flow based method, mainly owing to the page-oriented framework, and then propose solutions to the problems by utilizing several advantages of the site-oriented framework. Computational experiments reveal that our max-flow based method on the site-oriented framework is very effective in mining communities, related to the topics of given pages, in comparison with the original max-flow based method on the page-oriented framework.