IEICE global.ieice.org Site

Keyword Search Result

[Keyword] search engine(11hit)

1-11hit

A 250 Msps, 0.5 W eDRAM-Based Search Engine Dedicated Low Power FIB Application
Hisashi IWAMOTO Yuji YANO Yasuto KURODA Koji YAMAMOTO Kazunari INOUE Ikuo OKA

PAPER-Integrated Electronics

Vol:
E96-C No:8
Page(s):
1076-1082
Ternary content addressable memory (TCAM) is popular LSI for use in high-throughput forwarding engines on routers. However, the unique structure applied in TCAM consume huge amounts of power, therefore it restricts the ability to handle large lookup table capacity in IP routers. In this paper, we propose a commodity-memory based hardware architecture for the forwarding information base (FIB) application that solves the substantial problems of power and density. The proposed architecture is examined by a fabricated test chip with 40 nm embedded DRAM (eDRAM) technology, and the effect of power reduction verified is greatly lower than conventional TCAM based and the energy metric achieve 0.01 fJ/bit/search. The power consumption is almost 0.5 W at 250 Msps and 8M entries.
Efficient Top-k Document Retrieval for Long Queries Using Term-Document Binary Matrix – Pursuit of Enhanced Informational Search on the Web –
Etsuro FUJITA Keizo OYAMA

PAPER-Advanced Search

Vol:
E96-D No:5
Page(s):
1016-1028
With the successful adoption of link analysis techniques such as PageRank and web spam filtering, current web search engines well support “navigational search”. However, due to the use of a simple conjunctive Boolean filter in addition to the inappropriateness of user queries, such an engine does not necessarily well support “informational search”. Informational search would be better handled by a web search engine using an informational retrieval model combined with enhancement techniques such as query expansion and relevance feedback. Moreover, the realization of such an engine requires a method to prosess the model efficiently. In this paper we propose a novel extension of an existing top-k query processing technique to improve search efficiency. We add to it the technique utilizing a simple data structure called a “term-document binary matrix,” resulting in more efficient evaluation of top-k queries even when the queries have been expanded. We show on the basis of experimental evaluation using the TREC GOV2 data set and expanded versions of the evaluation queries attached to this data set that the proposed method can speed up evaluation considerably compared with existing techniques especially when the number of query terms gets larger.
Improvements of HITS Algorithms for Spam Links
Yasuhito ASANO Yu TEZUKA Takao NISHIZEKI

PAPER-Scoring Algorithms

Vol:
E91-D No:2
Page(s):
200-208
The HITS algorithm proposed by Kleinberg is one of the representative methods of scoring Web pages by using hyperlinks. In the days when the algorithm was proposed, most of the pages given high score by the algorithm were really related to a given topic, and hence the algorithm could be used to find related pages. However, the algorithm and the variants including Bharat's improved HITS, abbreviated to BHITS, proposed by Bharat and Henzinger cannot be used to find related pages any more on today's Web, due to an increase of spam links. In this paper, we first propose three methods to find "linkfarms," that is, sets of spam links forming a densely connected subgraph of a Web graph. We then present an algorithm, called a trust-score algorithm, to give high scores to pages which are not spam pages with a high probability. Combining the three methods and the trust-score algorithm with BHITS, we obtain several variants of the HITS algorithm. We ascertain by experiments that one of them, named TaN+BHITS using the trust-score algorithm and the method of finding linkfarms by employing name servers, is most suitable for finding related pages on today's Web. Our algorithms take time and memory no more than those required by the original HITS algorithm, and can be executed on a PC with a small amount of main memory.
Improving Search Performance: A Lesson Learned from Evaluating Search Engines Using Thai Queries
Shisanu TONGCHIM Virach SORNLERTLAMVANICH Hitoshi ISAHARA

PAPER

Vol:
E90-D No:10
Page(s):
1557-1564
This study initiates a systematic evaluation of web search engine performance using queries written in Thai. Statistical testing indicates that there are some significant differences in the performance of search engines. In addition to compare the search performance, an analysis of the returned results is carried out. The analysis of the returned results shows that the majority of returned results are unique to a particular search engine and each system provides quite different results. This encourages the use of metasearch techniques to combine the search results in order to improve the performance and reliability in finding relevant documents. We examine several metasearch models based on the Borda count and Condorcet voting schemes. We also propose the use of Evolutionary Programming (EP) to optimize weight vectors used by the voting algorithms. The results show that the use of metasearch approaches produces superior performance compared to any single search engine on Thai queries.
Statistical-Based Approach to Non-segmented Language Processing
Virach SORNLERTLAMVANICH Thatsanee CHAROENPORN Shisanu TONGCHIM Canasai KRUENGKRAI Hitoshi ISAHARA

PAPER

Vol:
E90-D No:10
Page(s):
1565-1573
Several approaches have been studied to cope with the exceptional features of non-segmented languages. When there is no explicit information about the boundary of a word, segmenting an input text is a formidable task in language processing. Not only the contemporary word list, but also usages of the words have to be maintained to cover the use in the current texts. The accuracy and efficiency in higher processing do heavily rely on this word boundary identification task. In this paper, we introduce some statistical based approaches to tackle the problem due to the ambiguity in word segmentation. The word boundary identification problem is then defined as a part of others for performing the unified language processing in total. To exhibit the ability in conducting the unified language processing, we selectively study the tasks of language identification, word extraction, and dictionary-less search engine.
Proof: A Novel DHT-Based Peer-to-Peer Search Engine
Kai-Hsiang YANG Jan-Ming HO

PAPER

Vol:
E90-B No:4
Page(s):
817-825
In this paper we focus on building a large scale keyword search service over structured Peer-to-Peer (P2P) networks. Current state-of-the-art keyword search approaches for structured P2P systems are based on inverted list intersection. However, the biggest challenge in those approaches is that when the indices are distributed over peers, a simple query may cause a large amount of data to be transmitted over the network. We propose in this paper a new P2P keyword search scheme, called "Proof," which aims to reduce the network traffic generated during the intersection process. We applied three main ideas in Proof to reduce network traffic, including (1) using a sorted query flow, (2) storing content summaries in the inverted lists, and (3) setting a stop condition for the checking of content summaries. We also discuss the advantages and limitations of Proof, and conducted extensive experiments to evaluate the search performance and the quality of search results. Our simulation results showed that, compared with previous solutions, Proof can dramatically reduce network traffic while providing 100% precision and high recall of search results, at some additional storage overhead.
Document Genre Classification for User Interface of Web Search Engine
Kong-Joo LEE

LETTER-Natural Language Processing

Vol:
E87-D No:7
Page(s):
1982-1986
In this letter we suggest sets of features to classify genres of web documents. Web documents are different from textual documents in that they contain URL and HTML tags within the pages. We introduce the features specific to web documents, which are extracted from URL and HTML tags. Experimental results enable us to evaluate their characteristics and performances. On the basis of the experimental results, we implement a user interface of a web search engine that presents documents grouped by genres.
UPRISE: Unified Presentation Slide Retrieval by Impression Search Engine
Haruo YOKOTA Takashi KOBAYASHI Taichi MURAKI Satoshi NAOI

PAPER

Vol:
E87-D No:2
Page(s):
397-406
A combination of slides used in a presentation and a video recording of the circumstances of the presentation are quite useful for many applications, such as e-learning. However, to create new content from these with current authoring tools requires considerable effort for the author and the products have reduced flexibility. In this paper, we propose the preparation of a unifying function without creating new content manually. We also propose a new approach to search unified presentation manuscripts for slides matched with given keywords by considering the features peculiar to the presentation slides. We propose impression indicators to express how well a slide matches the given keywords. We also propose a system for retrieving a sequence of desired presentation slides from archives of the combined slides and video. We named the system Unified Presentation Slide Retrieval by Impression Search Engine or UPRISE. We describe the system configuration of UPRISE and the experimentation undertaken to evaluate the effect of the proposed indicators and to compare the results with those of the traditional tf.idf retrieval method.
Results Merging with the OASIS System: An Experimental Comparison of Two Techniques
Vitaliy KLUEV

PAPER

Vol:
E86-D No:9
Page(s):
1773-1780
Mechanisms used for results merging are very important for distributed search systems. They are to select the most relevant documents retrieved by different servers and put them on the top of the list returned to the end user. There are several approaches to solve key problems of this task such as eliminating duplicates and ranking results combined. But it is still not clear how to achieve this. We use the clustering technique to divide retrieved results into several groups and a metric on the base of the vector space model to arrange items inside each group. Preliminary tests were conducted using the OASIS system and several collections of real Internet data. They showed relatively superior results when compared to the neural network clustering and LSI calculation. Proposed mechanisms can be applied to metasearch systems and to distributed search systems as well because such mechanisms do not require any special information except standard de facto data received from servers.
Min-Wise Independence vs. 3-Wise Independence
Toshiya ITOH

PAPER

Vol:
E85-A No:5
Page(s):
957-966
A family F of min-wise independent permutations is known to be a useful tool of indexing replicated documents on the Web. We say that a family F of permutations on {0,1,. . . ,n-1} is min-wise independent if for any X {0,1,. . . ,n-1} and any x X, Pr[min {π(X)} = π(x)]= ||X||-1 when π is chosen uniformly at random from F, where ||A|| is the cardinality of a finite set A. We also say that a family F of permutations on {0,1,. . . ,n-1} is d-wise independent if for any distinct x1,x2,. . . ,xd {0,1,. . . , n-1} and any distinct y1,y2,. . . ,yd {0,1,. . . , n-1}, Pr[i=1d π(xi) = π(yi)]= 1/{n(n-1) (n-d+1)} when π is chosen uniformly at random from F (note that nontrivial constructions of d-wise independent family F of permutations on {0,1,. . . ,n-1} are known only for d=2,3). Recently, Broder, et al. showed that any family F of pairwise (2-wise) independent permutations behaves close to a family of min-wise independent permutations, i.e., for any X {0,1,. . . ,n-1} such that 3 ||X||=k n-2 and any x X, (lower bound) Pr[min {π(X)}=π(x)] 1/{2(k-1)}; (upper bound) Pr[min {π(X)}=π(x)] O(1/k). In this paper, we extend these bounds to 3-wise independent permutation family and show that any family of 3-wise independent permutations behaves closer to a family of min-wise independent permutations, i.e., for any X {0,1,. . . ,n-1} such that 4 ||X||=k n-3 and any x X, (lower bound) Pr[min {π(X)}=π(x)] 1/{2(k-2)}- 1/{6(k-2)2}; (upper bound) Pr[min {π(X)}=π(x)] 2/k - 2/k + 1/(3kk).
Visualized Sound Retrieval and Categorization Using a Feature-Based Image Search Engine
Katsunobu FUSHIKIDA Yoshitsugu HIWATARI Hideyo WAKI

PAPER-Multimedia Pattern Processing

Vol:
E83-D No:11
Page(s):
1978-1985
In this paper, visualized sound retrieval and categorization methods using a feature-based image search engine were evaluated aiming at efficient video scene query. Color-coded patterns of the sound spectrogram are adopted as the visualized sound index. Sound categorization experiments were conducted using visualized sound databases including speech, bird song, musical sounds, insect chirping, and the sound-track of sports video. The results of the retrieval experiments show that the simple feature-based image search engine can be effectively used for visualized sound retrieval and categorization. The results of categorization experiments involving humans show that after brief training humans can at least do rough categorization. These results suggest that using visualized sound can be effective method for an efficient video scene query.