IEICE global.ieice.org Site

Keyword Search Result

[Keyword] text search(3hit)

1-3hit

Tag-Annotated Text Search Using Extended Region Algebra
Katsuya MASUDA Jun'ichi TSUJII

PAPER-Information Retrieval

Vol:
E92-D No:12
Page(s):
2369-2377
This paper presents algorithms for searching text regions with specifying annotated information in tag-annotated text by using Region Algebra. The original algebra and its efficient algorithms are extended to handle both nested regions and crossed regions. The extensions are necessary for text search by using rich linguistic annotations. We first assign a depth number to every nested tag region to order these regions and write efficient algorithms using the depth number for the containment operations which can treat nested tag regions. Next, we introduce variables for attribute values of tags into the algebra to treat annotations in which attributes indicate another tag regions, and propose an efficient method of treating re-entrancy by incrementally determining values for variables. Our algorithms have been implemented in a text search engine for MEDLINE, which is a large textbase of abstracts in medical science. Experiments in tag-annotated MEDLINE abstracts demonstrate the effectiveness of specifying annotations and the efficiency of our algorithms. The system is made publicly accessible at http://www-tsujii.is.s.u-tokyo.ac.jp/medie/.
A Query System for Texts with Macros
Keehang KWON Dae-Seong KANG Jinsoo KIM

LETTER-Automata and Formal Language Theory

Vol:
E91-D No:1
Page(s):
145-147
We propose a query language based on extended regular expressions. This language extends texts with text-generating macros. These macros make it possible to define languages in a compressed, elegant way. This paper also extends queries with linear implications and additive (classical) conjunctions. To be precise, it allows goals of the form D —ο G and G1&G2 where D is a text or a macro and G is a query. The first goal is solved by adding D to the current text and then solving G. This goal is flexible in controlling the current text dynamically. The second goal is solved by solving both G1 and G2 from the current text. This goal is particularly useful for internet search.
Full-Text and Structural Indexing of XML Documents on B⁺-Tree
Toshiyuki SHIMIZU Masatoshi YOSHIKAWA

PAPER-Contents Technology and Web Information Systems

Vol:
E89-D No:1
Page(s):
237-247
XML query processing is one of the most active areas of database research. Although the main focus of past research has been the processing of structural XML queries, there are growing demands for a full-text search for XML documents. In this paper, we propose XICS (XML Indices for Content and Structural search), which aims at high-speed processing of both full-text and structural queries in XML documents. An important design principle of our indices is the use of a B+-tree. To represent the structural information of XML trees, each node in the XML tree is labeled with an identifier. The identifier contains an integer number representing the path information from the root node. XICS consist of two types of indices, the COB-tree (COntent B+-tree) and the STB-tree (STructure B+-tree). The search keys of the COB-tree are a pair of text fragments in the XML document and the identifiers of the leaf nodes that contain the text, whereas the search keys of the STB-tree are the node identifiers. By using a node identifier in the search keys, we can retrieve only the entries that match the path information in the query. The STB-tree can filter nodes using structural conditions in queries, while the COB-tree can filter nodes using text conditions. We have implemented a COB-tree and an STB-tree using GiST and examined index size and query processing time. Our experimental results show the efficiency of XICS in query processing.