The search functionality is under construction.

Author Search Result

[Author] Masatoshi YOSHIKAWA(19hit)

1-19hit
  • FOREWORD

    Yahiko KAMBAYASHI  Masatoshi YOSHIKAWA  

     
    FOREWORD

      Vol:
    E82-D No:1
      Page(s):
    1-2
  • Adaptive Balanced Allocation for Peer Assessments

    Hideaki OHASHI  Yasuhito ASANO  Toshiyuki SHIMIZU  Masatoshi YOSHIKAWA  

     
    PAPER

      Pubricized:
    2019/12/26
      Vol:
    E103-D No:5
      Page(s):
    939-948

    Peer assessments, in which people review the works of peers and have their own works reviewed by peers, are useful for assessing homework. In conventional peer assessment systems, works are usually allocated to people before the assessment begins; therefore, if people drop out (abandoning reviews) during an assessment period, an imbalance occurs between the number of works a person reviews and that of peers who have reviewed the work. When the total imbalance increases, some people who diligently complete reviews may suffer from a lack of reviews and be discouraged to participate in future peer assessments. Therefore, in this study, we adopt a new adaptive allocation approach in which people are allocated review works only when requested and propose an algorithm for allocating works to people, which reduces the total imbalance. To show the effectiveness of the proposed algorithm, we provide an upper bound of the total imbalance that the proposed algorithm yields. In addition, we extend the above algorithm to consider reviewing ability. The extended algorithm avoids the problem that only unskilled (or skilled) reviewers are allocated to a given work. We show the effectiveness of the proposed two algorithms compared to the existing algorithms through experiments using simulation data.

  • Flexible and Fast Similarity Search for Enriched Trajectories

    Hideaki OHASHI  Toshiyuki SHIMIZU  Masatoshi YOSHIKAWA  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2017/05/30
      Vol:
    E100-D No:9
      Page(s):
    2081-2091

    In this study, we focus on a method to search for similar trajectories. In the majority of previous works on searching for similar trajectories, only raw trajectory data were used. However, to obtain deeper insights, additional time-dependent trajectory features should be utilized depending on the search intent. For instance, to identify similar combination plays in soccer games, such additional features include the movements of the team players. In this paper, we develop a framework to flexibly search for similar trajectories associated with time-dependent features, which we call enriched trajectories. In this framework, weights, which represent the relative importance of each feature, can be flexibly given by users. Moreover, to facilitate fast searching, we first propose a lower bounding measure of the DTW distance between enriched trajectories, and then we propose algorithms based on this lower bounding measure. We evaluate the effectiveness of the lower bounding measure and compare the performances of the algorithms under various conditions using soccer data and synthetic data. Our experimental results suggest that the proposed lower bounding measure is superior to the existing measure, and one of the proposed algorithms, which is based on the threshold algorithm, is suitable for practical use.

  • XSemantic: An Extension of LCA Based XML Semantic Search

    Umaporn SUPASITTHIMETHEE  Toshiyuki SHIMIZU  Masatoshi YOSHIKAWA  Kriengkrai PORKAEW  

     
    PAPER-Contents Technology and Web Information Systems

      Vol:
    E92-D No:5
      Page(s):
    1079-1092

    One of the most convenient ways to query XML data is a keyword search because it does not require any knowledge of XML structure or learning a new user interface. However, the keyword search is ambiguous. The users may use different terms to search for the same information. Furthermore, it is difficult for a system to decide which node is likely to be chosen as a return node and how much information should be included in the result. To address these challenges, we propose an XML semantic search based on keywords called XSemantic. On the one hand, we give three definitions to complete in terms of semantics. Firstly, the semantic term expansion, our system is robust from the ambiguous keywords by using the domain ontology. Secondly, to return semantic meaningful answers, we automatically infer the return information from the user queries and take advantage of the shortest path to return meaningful connections between keywords. Thirdly, we present the semantic ranking that reflects the degree of similarity as well as the semantic relationship so that the search results with the higher relevance are presented to the users first. On the other hand, in the LCA and the proximity search approaches, we investigated the problem of information included in the search results. Therefore, we introduce the notion of the Lowest Common Element Ancestor (LCEA) and define our simple rule without any requirement on the schema information such as the DTD or XML Schema. The first experiment indicated that XSemantic not only properly infers the return information but also generates compact meaningful results. Additionally, the benefits of our proposed semantics are demonstrated by the second experiment.

  • A Structural Numbering Scheme for Processing Queries by Structure and Keyword on XML Data

    Dao Dinh KHA  Masatoshi YOSHIKAWA  Shunsuke UEMURA  

     
    PAPER

      Vol:
    E87-D No:2
      Page(s):
    361-372

    Generating the identifiers of XML nodes is a crucial task in XML applications. On the other hand, the structural information of XML data is essential to evaluate the XML queries. Several numbering schemes have been proposed so far to express the structural information using the identifiers of XML nodes. In this paper, we introduce a new numbering scheme called recursive UID (rUID) that has been designed to be robust in structural update and applicable to arbitrarily large XML documents. We investigate the applications of rUID to XML query processing in a system called SKEYRUS, which enables the integrated structure-keyword searches on XML data. Experimental results of the performance of SKEYRUS are also reported.

  • Full-Text and Structural Indexing of XML Documents on B+-Tree

    Toshiyuki SHIMIZU  Masatoshi YOSHIKAWA  

     
    PAPER-Contents Technology and Web Information Systems

      Vol:
    E89-D No:1
      Page(s):
    237-247

    XML query processing is one of the most active areas of database research. Although the main focus of past research has been the processing of structural XML queries, there are growing demands for a full-text search for XML documents. In this paper, we propose XICS (XML Indices for Content and Structural search), which aims at high-speed processing of both full-text and structural queries in XML documents. An important design principle of our indices is the use of a B+-tree. To represent the structural information of XML trees, each node in the XML tree is labeled with an identifier. The identifier contains an integer number representing the path information from the root node. XICS consist of two types of indices, the COB-tree (COntent B+-tree) and the STB-tree (STructure B+-tree). The search keys of the COB-tree are a pair of text fragments in the XML document and the identifiers of the leaf nodes that contain the text, whereas the search keys of the STB-tree are the node identifiers. By using a node identifier in the search keys, we can retrieve only the entries that match the path information in the query. The STB-tree can filter nodes using structural conditions in queries, while the COB-tree can filter nodes using text conditions. We have implemented a COB-tree and an STB-tree using GiST and examined index size and query processing time. Our experimental results show the efficiency of XICS in query processing.

  • Mining Knowledge on Relationships between Objects from the Web

    Xinpeng ZHANG  Yasuhito ASANO  Masatoshi YOSHIKAWA  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E97-D No:1
      Page(s):
    77-88

    How do global warming and agriculture influence each other? It is possible to answer the question by searching knowledge about the relationship between global warming and agriculture. As exemplified by this question, strong demands exist for searching relationships between objects. Mining knowledge about relationships on Wikipedia has been studied. However, it is desired to search more diverse knowledge about relationships on the Web. By utilizing the objects constituting relationships mined from Wikipedia, we propose a new method to search images with surrounding text that include knowledge about relationships on the Web. Experimental results show that our method is effective and applicable in searching knowledge about relationships. We also construct a relationship search system named “Enishi” based on the proposed new method. Enishi supplies a wealth of diverse knowledge including images with surrounding text to help users to understand relationships deeply, by complementarily utilizing knowledge from Wikipedia and the Web.

  • Time Graph Pattern Mining for Network Analysis and Information Retrieval Open Access

    Yasuhito ASANO  Taihei OSHINO  Masatoshi YOSHIKAWA  

     
    PAPER

      Vol:
    E97-D No:4
      Page(s):
    733-742

    Graph pattern mining has played important roles in network analysis and information retrieval. However, temporal characteristics of networks have not been estimated sufficiently. We propose time graph pattern mining as a new concept of graph mining reflecting the temporal information of a network. We conduct two case studies of time graph pattern mining: extensively discussed topics on blog sites and a book recommendation network. Through examination of case studies, we ascertain that time graph pattern mining has numerous possibilities as a novel means for information retrieval and network analysis reflecting both structural and temporal characteristics.

  • XML Content Update Using Relative Region Coordinates

    Dao DINH KHA  Masatoshi YOSHIKAWA  Shunsuke UEMURA  

     
    PAPER-Databases

      Vol:
    E87-D No:3
      Page(s):
    771-779

    Among several methods of storing XML documents, a straightforward yet efficient method is to store a string representation of the XML document. An XML node is usually represented by a region coordinate, which is a pair of integers expressing the start and end positions of the substring corresponding to the node. This approach, however, has the drawback that a change of a node's region coordinate causes change of the region coordinates of many other elements. This recomputation normally degrades the performance of XML applications, especially when content is updated frequently. In this paper, we propose the Relative Region Coordinate (RRC) technique to effectively reduce the cost of recomputation. The main idea is to express the coordinate of an XML element in the region of its parent element. We present a method to integrate the RRC information into XML systems and provide experimental results that demonstrate the effectiveness of the RRC in the content update.

  • News Bias Analysis Based on Stakeholder Mining

    Tatsuya OGAWA  Qiang MA  Masatoshi YOSHIKAWA  

     
    PAPER

      Vol:
    E94-D No:3
      Page(s):
    578-586

    In this paper, we propose a novel stakeholder mining mechanism for analyzing bias in news articles by comparing descriptions of stakeholders. Our mechanism is based on the presumption that interests often induce bias of news agencies. As we use the term, a "stakeholder" is a participant in an event described in a news article who should have some relationships with other participants in the article. Our approach attempts to elucidate bias of articles from three aspects: stakeholders, interests of stakeholders, and the descriptive polarity of each stakeholder. Mining of stakeholders and their interests is achieved by analysis of sentence structure and the use of RelationshipWordNet, a lexical resource that we developed. For analyzing polarities of stakeholder descriptions, we propose an opinion mining method based on the lexical resource SentiWordNet. As a result of analysis, we construct a relations graph of stakeholders to group stakeholders sharing mutual interests and to represent the interests of stakeholders. We also describe an application system we developed for news comparison based on the mining mechanism. This paper presents some experimental results to validate the proposed methods.

  • Differentially Private Real-Time Data Publishing over Infinite Trajectory Streams

    Yang CAO  Masatoshi YOSHIKAWA  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2015/10/06
      Vol:
    E99-D No:1
      Page(s):
    163-175

    Recent emerging mobile and wearable technologies make it easy to collect personal spatiotemporal data such as activity trajectories in daily life. Publishing real-time statistics over trajectory streams produced by crowds of people is expected to be valuable for both academia and business, answering questions such as “How many people are in Kyoto Station now?” However, analyzing these raw data will entail risks of compromising individual privacy. ε-Differential Privacy has emerged as a well-known standard for private statistics publishing because of its guarantee of being rigorous and mathematically provable. However, since user trajectories will be generated infinitely, it is difficult to protect every trajectory under ε-differential privacy. On the other hand, in real life, not all users require the same level of privacy. To this end, we propose a flexible privacy model of l-trajectory privacy to ensure every desired length of trajectory under protection of ε-differential privacy. We also design an algorithmic framework to publish l-trajectory private data in real time. Experiments using four real-life datasets show that our proposed algorithms are effective and efficient.

  • Entity Ranking for Queries with Modifiers Based on Knowledge Bases and Web Search Results

    Wiradee IMRATTANATRAI  Makoto P. KATO  Katsumi TANAKA  Masatoshi YOSHIKAWA  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2018/06/18
      Vol:
    E101-D No:9
      Page(s):
    2279-2290

    This paper proposes methods of finding a ranked list of entities for a given query (e.g. “Kennin-ji”, “Tenryu-ji”, or “Kinkaku-ji” for the query “ancient zen buddhist temples in kyoto”) by leveraging different types of modifiers in the query through identifying corresponding properties (e.g. established date and location for the modifiers “ancient” and “kyoto”, respectively). While most major search engines provide the entity search functionality that returns a list of entities based on users' queries, entities are neither presented for a wide variety of search queries, nor in the order that users expect. To enhance the effectiveness of entity search, we propose two entity ranking methods. Our first proposed method is a Web-based entity ranking that directly finds relevant entities from Web search results returned in response to the query as a whole, and propagates the estimated relevance to the other entities. The second proposed method is a property-based entity ranking that ranks entities based on properties corresponding to modifiers in the query. To this end, we propose a novel property identification method that identifies a set of relevant properties based on a Support Vector Machine (SVM) using our seven criteria that are effective for different types of modifiers. The experimental results showed that our proposed property identification method could predict more relevant properties than using each of the criteria separately. Moreover, we achieved the best performance for returning a ranked list of relevant entities when using the combination of the Web-based and property-based entity ranking methods.

  • Mining and Explaining Relationships in Wikipedia

    Xinpeng ZHANG  Yasuhito ASANO  Masatoshi YOSHIKAWA  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E95-D No:7
      Page(s):
    1918-1931

    Mining and explaining relationships between concepts are challenging tasks in the field of knowledge search. We propose a new approach for the tasks using disjoint paths formed by links in Wikipedia. Disjoint paths are easy to understand and do not contain redundant information. To achieve this approach, we propose a naive method, as well as a generalized flow based method, and a technique for mining more disjoint paths using the generalized flow based method. We also apply the approach to classification of relationships. Our experiments reveal that the generalized flow based method can mine many disjoint paths important for understanding a relationship, and the classification is effective for explaining relationships.

  • Design Framework of a Database for Structured Documents with Object Links

    Masatoshi YOSHIKAWA  Hiroyuki KATO  Hiroko KINUTANI  

     
    PAPER-Web and Document Databases

      Vol:
    E82-D No:1
      Page(s):
    147-155

    Structured documents often contain character strings of which semantics can be naturally stored as database values or has direct correspondence with database values. By building bilateral logical links between character strings in documents and corresponding database values, semantically rich queries are made expressible. We have introduced a new ADT, named "paratext," to model text which has links with database values. Paratexts are logically viewed as consisting of two parallel layers; on the "appearance" layer, ordinary text (i. e. a linear sequence of character strings) is placed, while the "reference" layer holds an array of OIDs and literals. Each OID or literal on the reference layer is associated with a contiguous substring of the appearance layer text, and represents the semantics of the associated substring. We have also designed domain-specific functions for this document model. Using the functions, we can express queries which go back and forth between the two layers. In structured documents, such character strings can appear in the whole content of logical elements, or as phrases inside logical elements. We also present frameworks for the implementation of the paratext ADT, and discuss how traditional full-text indexing techniques can be extended to support paratext.

  • Does Student-Submission Allocation Affect Peer Assessment Accuracy?

    Hideaki OHASHI  Toshiyuki SHIMIZU  Masatoshi YOSHIKAWA  

     
    PAPER

      Pubricized:
    2022/01/05
      Vol:
    E105-D No:5
      Page(s):
    888-897

    Peer assessment in education has pedagogical benefits and is a promising method for grading a large number of submissions. At the same time, student reliability has been regarded as a problem; consequently, various methods of estimating highly reliable grades from scores given by multiple students have been proposed. Under most of the existing methods, a nonadaptive allocation pattern, which performs allocation in advance, is assumed. In this study, we analyze the effect of student-submission allocation on score estimation in peer assessment under a nonadaptive allocation setting. We examine three types of nonadaptive allocation methods, random allocation, circular allocation and group allocation, which are considered the commonly used approaches among the existing nonadaptive peer assessment methods. Through simulation experiments, we show that circular allocation and group allocation tend to yield lower accuracy than random allocation. Then, we utilize this result to improve the existing adaptive allocation method, which performs allocation and assessment in parallel and tends to make similar allocation result to circular allocation. We propose the method to replace part of the allocation with random allocation, and show that the method is effective through experiments.

  • Geo-Graph-Indistinguishability: Location Privacy on Road Networks with Differential Privacy

    Shun TAKAGI  Yang CAO  Yasuhito ASANO  Masatoshi YOSHIKAWA  

     
    PAPER

      Pubricized:
    2023/01/16
      Vol:
    E106-D No:5
      Page(s):
    877-894

    In recent years, concerns about location privacy are increasing with the spread of location-based services (LBSs). Many methods to protect location privacy have been proposed in the past decades. Especially, perturbation methods based on Geo-Indistinguishability (GeoI), which randomly perturb a true location to a pseudolocation, are getting attention due to its strong privacy guarantee inherited from differential privacy. However, GeoI is based on the Euclidean plane even though many LBSs are based on road networks (e.g. ride-sharing services). This causes unnecessary noise and thus an insufficient tradeoff between utility and privacy for LBSs on road networks. To address this issue, we propose a new privacy notion, Geo-Graph-Indistinguishability (GeoGI), for locations on a road network to achieve a better tradeoff. We propose Graph-Exponential Mechanism (GEM), which satisfies GeoGI. Moreover, we formalize the optimization problem to find the optimal GEM in terms of the tradeoff. However, the computational complexity of a naive method to find the optimal solution is prohibitive, so we propose a greedy algorithm to find an approximate solution in an acceptable amount of time. Finally, our experiments show that our proposed mechanism outperforms GeoI mechanisms, including optimal GeoI mechanism, with respect to the tradeoff.

  • Mechanisms to Address Different Privacy Requirements for Users and Locations

    Ryota HIRAISHI  Masatoshi YOSHIKAWA  Yang CAO  Sumio FUJITA  Hidehito GOMI  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2023/09/25
      Vol:
    E106-D No:12
      Page(s):
    2036-2047

    The significance of individuals' location information has been increasing recently, and the utilization of such data has become indispensable for businesses and society. The possible uses of location information include personalized services (maps, restaurant searches and weather forecast services) and business decisions (deciding where to open a store). However, considering that the data could be exploited, users should add random noise using their terminals before providing location data to collectors. In numerous instances, the level of privacy protection a user requires depends on their location. Therefore, in our framework, we assume that users can specify different privacy protection requirements for each location utilizing the adversarial error (AE), and the system computes a mechanism to satisfy these requirements. To guarantee some utility for data analysis, the maximum error in outputting the location should also be output. In most privacy frameworks, the mechanism for adding random noise is public; however, in this problem setting, the privacy protection requirements and the mechanism must be confidential because this information includes sensitive information. We propose two mechanisms to address privacy personalization. The first mechanism is the individual exponential mechanism, which uses the exponential mechanism in the differential privacy framework. However, in the individual exponential mechanism, the maximum error for each output can be used to narrow down candidates of the actual location by observing outputs from the same location multiple times. The second mechanism improves on this deficiency and is called the donut mechanism, which uniformly outputs a random location near the location where the distance from the user's actual location is at the user-specified AE distance. Considering the potential attacks against the idea of donut mechanism that utilize the maximum error, we extended the mechanism to counter these attacks. We compare these two mechanisms by experiments using maps constructed from artificial and real world data.

  • An Efficient Schema-Based Technique for Querying XML Data

    Dao Dinh KHA  Masatoshi YOSHIKAWA  

     
    PAPER-Database

      Vol:
    E89-D No:4
      Page(s):
    1480-1489

    As data integration over the Web has become an increasing demand, there is a growing desire to use XML as a standard format for data exchange. For sharing their grammars efficiently, most of the XML documents in use are associated with a document structure description, such as DTD or XML schema. However, the document structure information is not utilized efficiently in previously proposed techniques of XML query processing. In this paper, we present a novel technique that reduces the disk I/O complexity of XML query processing. We design a schema-based numbering scheme called SPAR that incorporates both structure information and tag names extracted from DTD or XML schema. Based on SPAR, we develop a mechanism called VirtualJoin that significantly reduces disk I/O workload for processing XML queries. As shown by experiments, VirtualJoin outperforms many prior techniques.

  • Estimating Knowledge Category Coverage by Courses Based on Centrality in Taxonomy

    Yiling DAI  Masatoshi YOSHIKAWA  Yasuhito ASANO  

     
    PAPER

      Pubricized:
    2019/12/26
      Vol:
    E103-D No:5
      Page(s):
    928-938

    The proliferation of Massive Open Online Courses has made it a challenge for the user to select a proper course. We assume a situation in which the user has targeted on the knowledge defined by some knowledge categories. Then, knowing how much of the knowledge in the category is covered by the courses will be helpful in the course selection. In this study, we define a concept of knowledge category coverage and aim to estimate it in a semi-automatic manner. We first model the knowledge category and the course as a set of concepts, and then utilize a taxonomy and the idea of centrality to differentiate the importance of concepts. Finally, we obtain the coverage value by calculating how much of the concepts required in a knowledge category is also taught in a course. Compared with treating the concepts uniformly important, we found that our proposed method can effectively generate closer coverage values to the ground truth assigned by domain experts.