The search functionality is under construction.

Keyword Search Result

[Keyword] information retrieval(61hit)

1-20hit(61hit)

  • Discriminative Question Answering via Cascade Prompt Learning and Sentence Level Attention Mechanism

    Xiaoguang YUAN  Chaofan DAI  Zongkai TIAN  Xinyu FAN  Yingyi SONG  Zengwen YU  Peng WANG  Wenjun KE  

     
    PAPER-Natural Language Processing

      Pubricized:
    2023/06/02
      Vol:
    E106-D No:9
      Page(s):
    1584-1599

    Question answering (QA) systems are designed to answer questions based on given information or with the help of external information. Recent advances in QA systems are overwhelmingly contributed by deep learning techniques, which have been employed in a wide range of fields such as finance, sports and biomedicine. For generative QA in open-domain QA, although deep learning can leverage massive data to learn meaningful feature representations and generate free text as answers, there are still problems to limit the length and content of answers. To alleviate this problem, we focus on the variant YNQA of generative QA and propose a model CasATT (cascade prompt learning framework with the sentence-level attention mechanism). In the CasATT, we excavate text semantic information from document level to sentence level and mine evidence accurately from large-scale documents by retrieval and ranking, and answer questions with ranked candidates by discriminative question answering. Our experiments on several datasets demonstrate the superior performance of the CasATT over state-of-the-art baselines, whose accuracy score can achieve 93.1% on IR&QA Competition dataset and 90.5% on BoolQ dataset.

  • A Method to Detect Chorus Sections in Lyrics Text

    Kento WATANABE  Masataka GOTO  

     
    PAPER-Music Information Processing

      Pubricized:
    2023/06/02
      Vol:
    E106-D No:9
      Page(s):
    1600-1609

    This paper addresses the novel task of detecting chorus sections in English and Japanese lyrics text. Although chorus-section detection using audio signals has been studied, whether chorus sections can be detected from text-only lyrics is an open issue. Another open issue is whether patterns of repeating lyric lines such as those appearing in chorus sections depend on language. To investigate these issues, we propose a neural-network-based model for sequence labeling. It can learn phrase repetition and linguistic features to detect chorus sections in lyrics text. It is, however, difficult to train this model since there was no dataset of lyrics with chorus-section annotations as there was no prior work on this task. We therefore generate a large amount of training data with such annotations by leveraging pairs of musical audio signals and their corresponding manually time-aligned lyrics; we first automatically detect chorus sections from the audio signals and then use their temporal positions to transfer them to the line-level chorus-section annotations for the lyrics. Experimental results show that the proposed model with the generated data contributes to detecting the chorus sections, that the model trained on Japanese lyrics can detect chorus sections surprisingly well in English lyrics, and that patterns of repeating lyric lines are language-independent.

  • Linear Algebraic Approach to Strongly Secure Ramp Secret Sharing for General Access Structures with Application to Symmetric PIR

    Reo ERIGUCHI  Noboru KUNIHIRO  Koji NUIDA  

     
    PAPER

      Pubricized:
    2022/09/13
      Vol:
    E106-A No:3
      Page(s):
    263-271

    Ramp secret sharing is a variant of secret sharing which can achieve better information ratio than perfect schemes by allowing some partial information on a secret to leak out. Strongly secure ramp schemes can control the amount of leaked information on the components of a secret. In this paper, we reduce the construction of strongly secure ramp secret sharing for general access structures to a linear algebraic problem. As a result, we show that previous results on strongly secure network coding imply two linear transformation methods to make a given linear ramp scheme strongly secure. They are explicit or provide a deterministic algorithm while the previous methods which work for any linear ramp scheme are non-constructive. In addition, we present a novel application of strongly secure ramp schemes to symmetric PIR in a multi-user setting. Our solution is advantageous over those based on a non-strongly secure scheme in that it reduces the amount of communication between users and servers and also the amount of correlated randomness that servers generate in the setup.

  • MP-BERT4REC: Recommending Multiple Positive Citations for Academic Manuscripts via Content-Dependent BERT and Multi-Positive Triplet

    Yang ZHANG  Qiang MA  

     
    PAPER-Natural Language Processing

      Pubricized:
    2022/08/08
      Vol:
    E105-D No:11
      Page(s):
    1957-1968

    Considering the rapidly increasing number of academic papers, searching for and citing appropriate references has become a nontrivial task during manuscript composition. Recommending a handful of candidate papers to a working draft could ease the burden of the authors. Conventional approaches to citation recommendation generally consider recommending one ground-truth citation from an input manuscript for a query context. However, it is common for a given context to be supported by two or more co-citation pairs. Here, we propose a novel scientific paper modelling for citation recommendations, namely Multi-Positive BERT Model for Citation Recommendation (MP-BERT4REC), complied with a series of Multi-Positive Triplet objectives to recommend multiple positive citations for a query context. The proposed approach has the following advantages: First, the proposed multi-positive objectives are effective in recommending multiple positive candidates. Second, we adopt noise distributions on the basis of historical co-citation frequencies; thus, MP-BERT4REC is not only effective in recommending high-frequency co-citation pairs, but it also significantly improves the performance of retrieving low-frequency ones. Third, the proposed dynamic context sampling strategy captures macroscopic citing intents from a manuscript and empowers the citation embeddings to be content-dependent, which allows the algorithm to further improve performance. Single and multiple positive recommendation experiments confirmed that MP-BERT4REC delivers significant improvements over current methods. It also effectively retrieves the full list of co-citations and historically low-frequency pairs better than prior works.

  • Competent Triple Identification for Knowledge Graph Completion under the Open-World Assumption

    Esrat FARJANA  Natthawut KERTKEIDKACHORN  Ryutaro ICHISE  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2021/12/02
      Vol:
    E105-D No:3
      Page(s):
    646-655

    The usefulness and usability of existing knowledge graphs (KGs) are mostly limited because of the incompleteness of knowledge compared to the growing number of facts about the real world. Most existing ontology-based KG completion methods are based on the closed-world assumption, where KGs are fixed. In these methods, entities and relations are defined, and new entity information cannot be easily added. In contrast, in open-world assumptions, entities and relations are not previously defined. Thus there is a vast scope to find new entity information. Despite this, knowledge acquisition under the open-world assumption is challenging because most available knowledge is in a noisy unstructured text format. Nevertheless, Open Information Extraction (OpenIE) systems can extract triples, namely (head text; relation text; tail text), from raw text without any prespecified vocabulary. Such triples contain noisy information that is not essential for KGs. Therefore, to use such triples for the KG completion task, it is necessary to identify competent triples for KGs from the extracted triple set. Here, competent triples are the triples that can contribute to add new information to the existing KGs. In this paper, we propose the Competent Triple Identification (CTID) model for KGs. We also propose two types of feature, namely syntax- and semantic-based features, to identify competent triples from a triple set extracted by a state-of-the-art OpenIE system. We investigate both types of feature and test their effectiveness. It is found that the performance of the proposed features is about 20% better compared to that of the ReVerb system in identifying competent triples.

  • Private Information Retrieval from Coded Storage in the Presence of Omniscient and Limited-Knowledge Byzantine Adversaries Open Access

    Jun KURIHARA  Toru NAKAMURA  Ryu WATANABE  

     
    PAPER-Coding Theory

      Pubricized:
    2021/03/23
      Vol:
    E104-A No:9
      Page(s):
    1271-1283

    This paper investigates an adversarial model in the scenario of private information retrieval (PIR) from n coded storage servers, called Byzantine adversary. The Byzantine adversary is defined as the one altering b server responses and erasing u server responses to a user's query. In this paper, two types of Byzantine adversaries are considered; 1) the classic omniscient type that has the full knowledge on n servers as considered in existing literature, and 2) the reasonable limited-knowledge type that has information on only b+u servers, i.e., servers under the adversary's control. For these two types, this paper reveals that the resistance of a PIR scheme, i.e., the condition of b and u to correctly obtain the desired message, can be expressed in terms of a code parameter called the coset distance of linear codes employed in the scheme. For the omniscient type, the derived condition expressed by the coset distance is tighter and more precise than the estimation of the resistance by the minimum Hamming weight of the codes considered in existing researches. Furthermore, this paper also clarifies that if the adversary is limited-knowledge, the resistance of a PIR scheme could exceed that for the case of the omniscient type. Namely, PIR schemes can increase their resistance to Byzantine adversaries by allowing the limitation on adversary's knowledge.

  • A Local Feature Aggregation Method for Music Retrieval

    Jin S. SEO  

     
    LETTER

      Pubricized:
    2017/10/16
      Vol:
    E101-D No:1
      Page(s):
    64-67

    The song-level feature summarization is an essential building block for browsing, retrieval, and indexing of digital music. This paper proposes a local pooling method to aggregate the feature vectors of a song over the universal background model. Two types of local activation patterns of feature vectors are derived; one representation is derived in the form of histogram, and the other is given by a binary vector. Experiments over three publicly-available music datasets show that the proposed local aggregation of the auditory features is promising for music-similarity computation.

  • Robust Singing Transcription System Using Local Homogeneity in the Harmonic Structure

    Hoon HEO  Kyogu LEE  

     
    PAPER-Music Information Processing

      Pubricized:
    2017/02/18
      Vol:
    E100-D No:5
      Page(s):
    1114-1123

    Automatic music transcription from audio has long been one of the most intriguing problems and a challenge in the field of music information retrieval, because it requires a series of low-level tasks such as onset/offset detection and F0 estimation, followed by high-level post-processing for symbolic representation. In this paper, a comprehensive transcription system for monophonic singing voice based on harmonic structure analysis is proposed. Given a precise tracking of the fundamental frequency, a novel acoustic feature is derived to signify the harmonic structure in singing voice signals, regardless of the loudness and pitch. It is then used to generate a parametric mixture model based on the von Mises-Fisher distribution, so that the model represents the intrinsic harmonic structures within a region of smoothly connected notes. To identify the note boundaries, the local homogeneity in the harmonic structure is exploited by two different methods: the self-similarity analysis and hidden Markov model. The proposed system identifies the note attributes including the onset time, duration and note pitch. Evaluations are conducted from various aspects to verify the performance improvement of the proposed system and its robustness, using the latest evaluation methodology for singing transcription. The results show that the proposed system significantly outperforms other systems including the state-of-the-art systems.

  • Microblog Retrieval Using Ensemble of Feature Sets through Supervised Feature Selection

    Abu Nowshed CHY  Md Zia ULLAH  Masaki AONO  

     
    PAPER

      Pubricized:
    2017/01/17
      Vol:
    E100-D No:4
      Page(s):
    793-806

    Microblog, especially twitter, has become an integral part of our daily life for searching latest news and events information. Due to the short length characteristics of tweets and frequent use of unconventional abbreviations, content-relevance based search cannot satisfy user's information need. Recent research has shown that considering temporal and contextual aspects in this regard has improved the retrieval performance significantly. In this paper, we focus on microblog retrieval, emphasizing the alleviation of the vocabulary mismatch, and the leverage of the temporal (e.g., recency and burst nature) and contextual characteristics of tweets. To address the temporal and contextual aspect of tweets, we propose new features based on query-tweet time, word embedding, and query-tweet sentiment correlation. We also introduce some popularity features to estimate the importance of a tweet. A three-stage query expansion technique is applied to improve the relevancy of tweets. Moreover, to determine the temporal and sentiment sensitivity of a query, we introduce query type determination techniques. After supervised feature selection, we apply random forest as a feature ranking method to estimate the importance of selected features. Then, we make use of ensemble of learning to rank (L2R) framework to estimate the relevance of query-tweet pair. We conducted experiments on TREC Microblog 2011 and 2012 test collections over the TREC Tweets2011 corpus. Experimental results demonstrate the effectiveness of our method over the baseline and known related works in terms of precision at 30 (P@30), mean average precision (MAP), normalized discounted cumulative gain at 30 (NDCG@30), and R-precision (R-Prec) metrics.

  • Bayesian Exponential Inverse Document Frequency and Region-of-Interest Effect for Enhancing Instance Search Accuracy

    Masaya MURATA  Hidehisa NAGANO  Kaoru HIRAMATSU  Kunio KASHINO  Shin'ichi SATOH  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2016/06/03
      Vol:
    E99-D No:9
      Page(s):
    2320-2331

    In this paper, we first analyze the discriminative power in the Best Match (BM) 25 formula and provide its calculation method from the Bayesian point of view. The resulting, derived discriminative power is quite similar to the exponential inverse document frequency (EIDF) that we have previously proposed [1] but retains more preferable theoretical advantages. In our previous paper [1], we proposed the EIDF in the framework of the probabilistic information retrieval (IR) method BM25 to address the instance search task, which is a specific object search for videos using an image query. Although the effectiveness of our EIDF was experimentally demonstrated, we did not consider its theoretical justification and interpretation. We also did not describe the use of region-of-interest (ROI) information, which is supposed to be input to the instance search system together with the original image query showing the instance. Therefore, here, we justify the EIDF by calculating the discriminative power in the BM25 from the Bayesian viewpoint. We also investigate the effect of the ROI information for improving the instance search accuracy and propose two search methods incorporating the ROI effect into the BM25 video ranking function. We validated the proposed methods through a series of experiments using the TREC Video Retrieval Evaluation instance search task dataset.

  • Sentence Similarity Computational Model Based on Information Content

    Hao WU  Heyan HUANG  

     
    PAPER-Natural Language Processing

      Pubricized:
    2016/03/14
      Vol:
    E99-D No:6
      Page(s):
    1645-1652

    Sentence similarity computation is an increasingly important task in applications of natural language processing such as information retrieval, machine translation, text summarization and so on. From the viewpoint of information theory, the essential attribute of natural language is that the carrier of information and the capacity of information can be measured by information content which is already successfully used for word similarity computation in simple ways. Existing sentence similarity methods don't emphasize the information contained by the sentence, and the complicated models they employ often need using empirical parameters or training parameters. This paper presents a fully unsupervised computational model of sentence semantic similarity. It is also a simply and straightforward model that neither needs any empirical parameter nor rely on other NLP tools. The method can obtain state-of-the-art experimental results which show that sentence similarity evaluated by the model is closer to human judgment than multiple competing baselines. The paper also tests the proposed model on the influence of external corpus, the performance of various sizes of the semantic net, and the relationship between efficiency and accuracy.

  • Efficient Algorithm for Math Formula Semantic Search

    Shunsuke OHASHI  Giovanni Yoko KRISTIANTO  Goran TOPIC  Akiko AIZAWA  

     
    PAPER

      Pubricized:
    2016/01/14
      Vol:
    E99-D No:4
      Page(s):
    979-988

    Mathematical formulae play an important role in many scientific domains. Regardless of the importance of mathematical formula search, conventional keyword-based retrieval methods are not sufficient for searching mathematical formulae, which are structured as trees. The increasing number as well as the structural complexity of mathematical formulae in scientific articles lead to the necessity for large-scale structure-aware formula search techniques. In this paper, we formulate three types of measures that represent distinctive features of semantic similarity of math formulae, and develop efficient hash-based algorithms for the approximate calculation. Our experiments using NTCIR-11 Math-2 Task dataset, a large-scale test collection for math information retrieval with about 60-million formulae, show that the proposed method improves the search precision while also keeps the scalability and runtime efficiency high.

  • Comparison of Access Pattern Protection Schemes and Proposals for Efficient Implementation Open Access

    Yuto NAKANO  Shinsaku KIYOMOTO  Yutaka MIYAKE  Kouichi SAKURAI  

     
    INVITED PAPER

      Vol:
    E97-D No:10
      Page(s):
    2576-2585

    Oblivious RAM (ORAM) schemes, the concept introduced by Goldreich and Ostrovsky, are very useful technique for protecting users' privacy when storing data in remote untrusted servers and running software on untrusted systems. However they are usually considered impractical due to their huge overhead. In order to reduce overhead, many improvements have been presented. Thanks to these improvements, ORAM schemes can be considered practical on cloud environment where users can expect huge storage and high computational power. Especially for private information retrieval (PIR), some literatures demonstrated they are usable. Also dedicated PIRs have been proposed and shown that they are usable in practice. Yet, they are still impractical for protecting software running on untrusted systems. We first survey recent researches on ORAM and PIR. Then, we present a practical software-based memory protection scheme applicable to several environments. The main feature of our scheme is that it records the history of accesses and uses the history to hide the access pattern. We also address implementing issues of ORAM and propose practical solutions for these issues.

  • A Fair and Efficient Agent Scheduling Method for Content-Based Information Retrieval with Individual Time Constraints and Its Implementation

    Kazuhiko KINOSHITA  Nariyoshi YAMAI  Koso MURAKAMI  

     
    PAPER-Network System

      Vol:
    E97-B No:5
      Page(s):
    945-951

    The recent explosive growth in information networks has driven a huge increase in content. For efficient and flexible information retrieval over such large networks, agent technology has received much attention. We previously proposed an agent execution control method for time-constrained information retrieval that finds better results by terminating an agent that has already acquired results of high-enough quality or one that is unlikely to improve the quality of results with continued retrieval. However, this method assumed that all agents have identical time constraints. This leads to a disparity in the obtained score between users who give individual time constraints. In this paper, we propose a fair and efficient scheduling method based on the expected improvement of the highest score (EIS). The proposed method allocates all CPU resources to the agent that has the highest EIS to decrease the difference between users' scores and to increase the mean highest score of requested results.

  • Time Graph Pattern Mining for Network Analysis and Information Retrieval Open Access

    Yasuhito ASANO  Taihei OSHINO  Masatoshi YOSHIKAWA  

     
    PAPER

      Vol:
    E97-D No:4
      Page(s):
    733-742

    Graph pattern mining has played important roles in network analysis and information retrieval. However, temporal characteristics of networks have not been estimated sufficiently. We propose time graph pattern mining as a new concept of graph mining reflecting the temporal information of a network. We conduct two case studies of time graph pattern mining: extensively discussed topics on blog sites and a book recommendation network. Through examination of case studies, we ascertain that time graph pattern mining has numerous possibilities as a novel means for information retrieval and network analysis reflecting both structural and temporal characteristics.

  • Creating Chinese-English Comparable Corpora

    Degen HUANG  Shanshan WANG  Fuji REN  

     
    PAPER-Natural Language Processing

      Vol:
    E96-D No:8
      Page(s):
    1853-1861

    Comparable Corpora are valuable resources for many NLP applications, and extensive research has been done on information mining based on comparable corpora in recent years. While there are not enough large-scale available public comparable corpora at present, this paper presents a bi-directional CLIR-based method for creating comparable corpora from two independent news collections in different languages. The original Chinese document collections and English documents collections are crawled from XinHuaNet respectively and formatted in a consistent manner. For each document from the two collections, the best query keywords are extracted to represent the essential content of the document, and then the keywords are translated into the language of the other collection. The translated queries are run against the collection in the same language to pick up the candidate documents in the other language and candidates are aligned based on their publication dates and the similarity scores. Results show that our approach significantly outperforms previous approaches to the construction of Chinese-English comparable corpora.

  • Detecting Partial and Near Duplication in the Blogosphere

    Yeo-Chan YOON  Myung-Gil JANG  Hyun-Ki KIM  So-Young PARK  

     
    LETTER-Data Engineering, Web Information Systems

      Vol:
    E95-D No:2
      Page(s):
    681-685

    In this paper, we propose a duplicate document detection model recognizing both partial duplicates and near duplicates. The proposed model can detect partial duplicates as well as exact duplicates by splitting a large document into many small sentence fingerprints. Furthermore, the proposed model can detect even near duplicates, the result of trivial revisions, by filtering the common words and reordering the word sequence.

  • Feature Location in Source Code by Trace-Based Impact Analysis and Information Retrieval

    Zhengong CAI  Xiaohu YANG  Xinyu WANG  Aleksander J. KAVS  

     
    PAPER-Software System

      Vol:
    E95-D No:1
      Page(s):
    205-214

    Feature location is to identify source code that implements a given feature. It is essential for software maintenance and evolution. A large amount of research, including static analysis, dynamic analysis and the hybrid approaches, has been done on the feature location problems. The existing approaches either need plenty of scenarios or rely on domain experts heavily. This paper proposes a new approach to locate functional feature in source code by combining the change impact analysis and information retrieval. In this approach, the source code is instrumented and executed using a single scenario to obtain the execution trace. The execution trace is extended according to the control flow to cover all the potentially relevant classes. The classes are ranked by trace-based impact analysis and information retrieval. The ranking analysis takes advantages of the semantics and structural characteristics of source code. The identified results are of higher precision than the individual approaches. Finally, two open source cases have been studied and the efficiency of the proposed approach is verified.

  • A Short Introduction to Learning to Rank Open Access

    Hang LI  

     
    INVITED PAPER

      Vol:
    E94-D No:10
      Page(s):
    1854-1862

    Learning to rank refers to machine learning techniques for training the model in a ranking task. Learning to rank is useful for many applications in Information Retrieval, Natural Language Processing, and Data Mining. Intensive studies have been conducted on the problem and significant progress has been made [1],[2]. This short paper gives an introduction to learning to rank, and it specifically explains the fundamental problems, existing approaches, and future work of learning to rank. Several learning to rank methods using SVM techniques are described in details.

  • An Efficient Agent Execution Control Method for Content-Based Information Retrieval with Time Constraints

    Kazuhiko KINOSHITA  Atsushi NARISHIGE  Yusuke HARA  Nariyoshi YAMAI  Koso MURAKAMI  

     
    PAPER-Network System

      Vol:
    E94-B No:7
      Page(s):
    1892-1900

    Networks have gotten bigger recently, and users have a more difficult time finding the information that they want. The use of mobile agents to help users effectively retrieve information has garnered a lot of attention. In this paper, we propose an agent control method for time constrained information retrieval. We pay attention to the highest past score gained by the agents and control the agents with the expectation of achieving better scores. Using computer simulations, we confirmed that our control method gave the best improvement over the whole network while reducing the overall variance. From these results, we can say that our control method improves the quality of information retrieved by the agent.

1-20hit(61hit)