The search functionality is under construction.

Keyword Search Result

[Keyword] indexing(49hit)

1-20hit(49hit)

  • Multi-Scale Chroma n-Gram Indexing for Cover Song Identification

    Jin S. SEO  

     
    LETTER

      Pubricized:
    2019/10/23
      Vol:
    E103-D No:1
      Page(s):
    59-62

    To enhance cover song identification accuracy on a large-size music archive, a song-level feature summarization method is proposed by using multi-scale representation. The chroma n-grams are extracted in multiple scales to cope with both global and local tempo changes. We derive index from the extracted n-grams by clustering to reduce storage and computation for DB search. Experiments on the widely used music datasets confirmed that the proposed method achieves the state-of-the-art accuracy while reducing cost for cover song search.

  • Efficient Supergraph Search Using Graph Coding

    Shun IMAI  Akihiro INOKUCHI  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2019/09/26
      Vol:
    E103-D No:1
      Page(s):
    130-141

    This paper proposes a method for searching for graphs in the database which are contained as subgraphs by a given query. In the proposed method, the search index does not require any knowledge of the query set or the frequent subgraph patterns. In conventional techniques, enumerating and selecting frequent subgraph patterns is computationally expensive, and the distribution of the query set must be known in advance. Subsequent changes to the query set require the frequent patterns to be selected again and the index to be reconstructed. The proposed method overcomes these difficulties through graph coding, using a tree structured index that contains infrequent subgraph patterns in the shallow part of the tree. By traversing this code tree, we are able to rapidly determine whether multiple graphs in the database contain subgraphs that match the query, producing a powerful pruning or filtering effect. Furthermore, the filtering and verification steps of the graph search can be conducted concurrently, rather than requiring separate algorithms. As the proposed method does not require the frequent subgraph patterns and the query set, it is significantly faster than previous techniques; this independence from the query set also means that there is no need to reconstruct the search index when the query set changes. A series of experiments using a real-world dataset demonstrate the efficiency of the proposed method, achieving a search speed several orders of magnitude faster than the previous best.

  • The BINDS-Tree: A Space-Partitioning Based Indexing Scheme for Box Queries in Non-Ordered Discrete Data Spaces

    A. K. M. Tauhidul ISLAM  Sakti PRAMANIK  Qiang ZHU  

     
    PAPER

      Pubricized:
    2019/01/16
      Vol:
    E102-D No:4
      Page(s):
    745-758

    In recent years we have witnessed an increasing demand to process queries on large datasets in Non-ordered Discrete Data Spaces (NDDS). In particular, one type of query in an NDDS, called box queries, is used in many emerging applications including error corrections in bioinformatics and network intrusion detection in cybersecurity. Effective indexing methods are necessary for efficiently processing queries on large datasets in disk. However, most existing NDDS indexing methods were not designed for box queries. Several recent indexing methods developed for box queries on a large NDDS dataset in disk are based on the popular data-partitioning approach. Unfortunately, a space-partitioning based indexing scheme, which is more effective for box queries in an NDDS, has not been studied before. In this paper, we propose a novel indexing method based on space-partitioning, called the BINDS-tree, for supporting efficient box queries on a large NDDS dataset in disk. A number of effective strategies such as node split based on minimum span and cross optimal balance, redundancy reduction utilizing a singleton dimension inheritance property, and a space-efficient structure for the split history are incorporated in the constructing algorithm for the BINDS-tree. Experimental results demonstrate that the proposed BINDS-tree significantly improves the box query I/O performance, comparing to that of the state-of-the-artdata-partitioning based NDDS indexing method.

  • Visual Indexing of Large Scale Train-Borne Video for Rail Condition Perceiving

    Peng DAI  Shengchun WANG  Yaping HUANG  Hao WANG  Xinyu DU  Qiang HAN  

     
    PAPER

      Pubricized:
    2017/06/14
      Vol:
    E100-D No:9
      Page(s):
    2017-2026

    Train-borne video captured from the camera installed in the front or back of the train has been used for railway environment surveillance, including missing communication units and bolts on the track, broken fences, unpredictable objects falling into the rail area or hanging on wires on the top of rails. Moreover, the track condition can be perceived visually from the video by observing and analyzing the train-swaying arising from the track irregularity. However, it's a time-consuming and labor-intensive work to examine the whole large scale video up to dozens of hours frequently. In this paper, we propose a simple and effective method to detect the train-swaying quickly and automatically. We first generate the long rail track panorama (RTP) by stitching the stripes cut from the video frames, and then extract track profile to perform the unevenness detection algorithm on the RTP. The experimental results show that RTP, the compact video representation, can fast examine the visual train-swaying information for track condition perceiving, on which we detect the irregular spots with 92.86% recall and 82.98% precision in only 2 minutes computation from the video close to 1 hour.

  • Packed Compact Tries: A Fast and Efficient Data Structure for Online String Processing

    Takuya TAKAGI  Shunsuke INENAGA  Kunihiko SADAKANE  Hiroki ARIMURA  

     
    PAPER

      Vol:
    E100-A No:9
      Page(s):
    1785-1793

    We present a new data structure called the packed compact trie (packed c-trie) which stores a set S of k strings of total length n in nlog σ+O(klog n) bits of space and supports fast pattern matching queries and updates, where σ is the alphabet size. Assume that α=logσn letters are packed in a single machine word on the standard word RAM model, and let f(k,n) denote the query and update times of the dynamic predecessor/successor data structure of our choice which stores k integers from universe [1,n] in O(klog n) bits of space. Then, given a string of length m, our packed c-tries support pattern matching queries and insert/delete operations in $O( rac{m}{alpha} f(k,n))$ worst-case time and in $O( rac{m}{alpha} + f(k,n))$ expected time. Our experiments show that our packed c-tries are faster than the standard compact tries (a.k.a. Patricia trees) on real data sets. We also discuss applications of our packed c-tries.

  • A Continuous Query Indexing Method for Location Based Services in Broadcast Environments

    Kyoungsoo BOK  Yonghun PARK  Jaesoo YOO  

     
    PAPER-Network System

      Pubricized:
    2016/12/01
      Vol:
    E100-B No:5
      Page(s):
    702-710

    Recently, several methods to process continuous queries for mobile objects in broadcast environments have been proposed. We propose a new indexing method for processing continuous queries that uses vector information in broadcast environments. We separate the index structure according to the velocities of the objects to avoid unnecessary accesses. The index structure consists of the index files for the slow moving objects and the fast moving objects. By avoiding unnecessary accesses, we reduce the tuning time to process a query in broadcast environments. To show the superiority of the proposed method, we evaluate its performance from various perspectives.

  • A Loitering Discovery System Using Efficient Similarity Search Based on Similarity Hierarchy

    Jianquan LIU  Shoji NISHIMURA  Takuya ARAKI  Yuichi NAKAMURA  

     
    INVITED PAPER

      Vol:
    E100-A No:2
      Page(s):
    367-375

    Similarity search is an important and fundamental problem, and thus widely used in various fields of computer science including multimedia, computer vision, database, information retrieval, etc. Recently, since loitering behavior often leads to abnormal situations, such as pickpocketing and terrorist attacks, its analysis attracts increasing attention from research communities. In this paper, we present AntiLoiter, a loitering discovery system adopting efficient similarity search on surveillance videos. As we know, most of existing systems for loitering analysis, mainly focus on how to detect or identify loiterers by behavior tracking techniques. However, the difficulties of tracking-based methods are known as that their analysis results are heavily influenced by occlusions, overlaps, and shadows. Moreover, tracking-based methods need to track the human appearance continuously. Therefore, existing methods are not readily applied to real-world surveillance cameras due to the appearance discontinuity of criminal loiterers. To solve this problem, we abandon the tracking method, instead, propose AntiLoiter to efficiently discover loiterers based on their frequent appearance patterns in longtime multiple surveillance videos. In AntiLoiter, we propose a novel data structure Luigi that indexes data using only similarity value returned by a corresponding function (e.g., face matching). Luigi is adopted to perform efficient similarity search to realize loitering discovery. We conducted extensive experiments on both synthetic and real surveillance videos to evaluate the efficiency and efficacy of our approach. The experimental results show that our system can find out loitering candidates correctly and outperforms existing method by 100 times in terms of runtime.

  • Spoken Term Detection Using SVM-Based Classifier Trained with Pre-Indexed Keywords

    Kentaro DOMOTO  Takehito UTSURO  Naoki SAWADA  Hiromitsu NISHIZAKI  

     
    PAPER-Spoken term detection

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2528-2538

    This study presents a two-stage spoken term detection (STD) method that uses the same STD engine twice and a support vector machine (SVM)-based classifier to verify detected terms from the STD engine's output. In a front-end process, the STD engine is used to pre-index target spoken documents from a keyword list built from an automatic speech recognition result. The STD result includes a set of keywords and their detection intervals (positions) in the spoken documents. For keywords having competitive intervals, we rank them based on the STD matching cost and select the one having the longest duration among competitive detections. The selected keywords are registered in the pre-index. They are then used to train an SVM-based classifier. In a query term search process, a query term is searched by the same STD engine, and the output candidates are verified by the SVM-based classifier. Our proposed two-stage STD method with pre-indexing was evaluated using the NTCIR-10 SpokenDoc-2 STD task and it drastically outperformed the traditional STD method based on dynamic time warping and a confusion network-based index.

  • A General Framework and Algorithms for Score Level Indexing and Fusion in Biometric Identification

    Takao MURAKAMI  Kenta TAKAHASHI  Kanta MATSUURA  

     
    PAPER-Information Network

      Vol:
    E97-D No:3
      Page(s):
    510-523

    Biometric identification has recently attracted attention because of its convenience: it does not require a user ID nor a smart card. However, both the identification error rate and response time increase as the number of enrollees increases. In this paper, we combine a score level fusion scheme and a metric space indexing scheme to improve the accuracy and response time in biometric identification, using only scores as information sources. We firstly propose a score level indexing and fusion framework which can be constructed from the following three schemes: (I) a pseudo-score based indexing scheme, (II) a multi-biometric search scheme, and (III) a score level fusion scheme which handles missing scores. A multi-biometric search scheme can be newly obtained by applying a pseudo-score based indexing scheme to multi-biometric identification. We secondly propose the NBS (Naive Bayes search) scheme as a multi-biometric search scheme and discuss its optimality with respect to the retrieval error rate. We evaluated our proposal using the datasets of multiple fingerprints and face scores from multiple matchers. The results showed that our proposal significantly improved the accuracy of the unimodal biometrics while reducing the average number of score computations in both the datasets.

  • An Improved Rete Algorithm Based on Double Hash Filter and Node Indexing for Distributed Rule Engine

    Tianyang DONG  Jianwei SHI  Jing FAN  Ling ZHANG  

     
    PAPER

      Vol:
    E96-D No:12
      Page(s):
    2635-2644

    Rule engine technologies have been widely used in the development of enterprise information systems. However, these rule-based systems may suffer the problem of low performance, when there is a large amount of facts data to be matched with the rules. The way of cluster or grid to construct rule engines can flexibly expand system processing capability by increasing cluster scale, and acquire shorter response time. In order to speed up pattern matching in rule engine, a double hash filter approach for alpha network, combined with beta node indexing, is proposed to improve Rete algorithm in this paper. By using fact type node in Rete network, a hash map about ‘fact type - fact type node’ is built in root node, and hash maps about ‘attribute constraint - alpha node’ are constructed in fact type nodes. This kind of double hash mechanism can speed up the filtration of facts in alpha network. Meanwhile, hash tables with the indexes calculated through fact objects, are built in memories of beta nodes, to avoid unnecessary iteration in the join operations of beta nodes. In addition, rule engine based on this improved Rete algorithm is applied in the enterprise information systems. The experimental results show that this method can effectively speed up the pattern matching, and significantly decrease the response time of the application systems.

  • Spoken Document Retrieval Leveraging Unsupervised and Supervised Topic Modeling Techniques

    Kuan-Yu CHEN  Hsin-Min WANG  Berlin CHEN  

     
    PAPER-Speech Processing

      Vol:
    E95-D No:5
      Page(s):
    1195-1205

    This paper describes the application of two attractive categories of topic modeling techniques to the problem of spoken document retrieval (SDR), viz. document topic model (DTM) and word topic model (WTM). Apart from using the conventional unsupervised training strategy, we explore a supervised training strategy for estimating these topic models, imagining a scenario that user query logs along with click-through information of relevant documents can be utilized to build an SDR system. This attempt has the potential to associate relevant documents with queries even if they do not share any of the query words, thereby improving on retrieval quality over the baseline system. Likewise, we also study a novel use of pseudo-supervised training to associate relevant documents with queries through a pseudo-feedback procedure. Moreover, in order to lessen SDR performance degradation caused by imperfect speech recognition, we investigate leveraging different levels of index features for topic modeling, including words, syllable-level units, and their combination. We provide a series of experiments conducted on the TDT (TDT-2 and TDT-3) Chinese SDR collections. The empirical results show that the methods deduced from our proposed modeling framework are very effective when compared with a few existing retrieval approaches.

  • An Optimal Algorithm for Searching the Optimal Translation of Query Windows in Quadtree Decomposition

    Hao CHEN  Guangcun LUO  

     
    LETTER-Data Engineering, Web Information Systems

      Vol:
    E94-D No:10
      Page(s):
    2043-2047

    One of the efficient methods to build the index of continuous window queries over moving objects is by means of region quadtree index. In this paper, we present an optimal algorithm to search for the optimal position translation of query windows, where the total number of decomposed quadtree blocks for those windows in quadtree representation is minimal. We exploit the branch-and-bound concept to prune the particular paths of recursions in the search space. Evaluation proves that our optimal algorithm reduces search time greatly and the quadtree index based on optimal position translation works efficiently for continuous window queries. To the best of our knowledge, the algorithms and experiments reported in this paper are novel.

  • A Fast Divide-and-Conquer Algorithm for Indexing Human Genome Sequences

    Woong-Kee LOH  Yang-Sae MOON  Wookey LEE  

     
    PAPER-Fundamentals of Information Systems

      Vol:
    E94-D No:7
      Page(s):
    1369-1377

    Since the release of human genome sequences, one of the most important research issues is about indexing the genome sequences, and the suffix tree is most widely adopted for that purpose. The traditional suffix tree construction algorithms suffer from severe performance degradation due to the memory bottleneck problem. The recent disk-based algorithms also provide limited performance improvement due to random disk accesses. Moreover, they do not fully utilize the recent CPUs with multiple cores. In this paper, we propose a fast algorithm based on `divide-and-conquer' strategy for indexing the human genome sequences. Our algorithm nearly eliminates random disk accesses by accessing the disk in the unit of contiguous chunks. In addition, our algorithm fully utilizes the multi-core CPUs by dividing the genome sequences into multiple partitions and then assigning each partition to a different core for parallel processing. Experimental results show that our algorithm outperforms the previous fastest DIGEST algorithm by up to 10.5 times.

  • Margin-Based Pivot Selection for Similarity Search Indexes

    Hisashi KURASAWA  Daiji FUKAGAWA  Atsuhiro TAKASU  Jun ADACHI  

     
    PAPER-Multimedia Databases

      Vol:
    E93-D No:6
      Page(s):
    1422-1432

    When developing an index for a similarity search in metric spaces, how to divide the space for effective search pruning is a fundamental issue. We present Maximal Metric Margin Partitioning (MMMP), a partitioning scheme for similarity search indexes. MMMP divides the data based on its distribution pattern, especially for the boundaries of clusters. A partitioning boundary created by MMMP is likely to be located in a sparse area between clusters. Moreover, the partitioning boundary is at maximum distances from the two cluster edges. We also present an indexing scheme, named the MMMP-Index, which uses MMMP and pivot filtering. The MMMP-Index can prune many objects that are not relevant to a query, and it reduces the query execution cost. Our experimental results show that MMMP effectively indexes clustered data and reduces the search cost. For clustered data in a vector space, the MMMP-Index reduces the computational cost to less than two thirds that of comparable schemes.

  • Generating Concise Rules for Human Motion Retrieval

    Tomohiko MUKAI  Ken-ichi WAKISAKA  Shigeru KURIYAMA  

     
    PAPER-Computer Graphics

      Vol:
    E93-D No:6
      Page(s):
    1636-1643

    This paper proposes a method for retrieving human motion data with concise retrieval rules based on the spatio-temporal features of motion appearance. Our method first converts motion clip into a form of clausal language that represents geometrical relations between body parts and their temporal relationship. A retrieval rule is then learned from the set of manually classified examples using inductive logic programming (ILP). ILP automatically discovers the essential rule in the same clausal form with a user-defined hypothesis-testing procedure. All motions are indexed using this clausal language, and the desired clips are retrieved by subsequence matching using the rule. Such rule-based retrieval offers reasonable performance and the rule can be intuitively edited in the same language form. Consequently, our method enables efficient and flexible search from a large dataset with simple query language.

  • An Improved Speech / Nonspeech Classification Based on Feature Combination for Audio Indexing

    Ji-Soo KEUM  Hyon-Soo LEE  Masafumi HAGIWARA  

     
    LETTER-Speech and Hearing

      Vol:
    E93-A No:4
      Page(s):
    830-832

    In this letter, we propose an improved speech/ nonspeech classification method to effectively classify a multimedia source. To improve performance, we introduce a feature based on spectral duration analysis, and combine recently proposed features such as high zero crossing rate ratio (HZCRR), low short time energy ratio (LSTER), and pitch ratio (PR). According to the results of our experiments on speech, music, and environmental sounds, the proposed method obtained high classification results when compared with conventional approaches.

  • Spectral Fluctuation Method: A Texture-Based Method to Extract Text Regions in General Scene Images

    Yoichiro BABA  Akira HIROSE  

     
    PAPER-Pattern Recognition

      Vol:
    E92-D No:9
      Page(s):
    1702-1715

    To obtain text information included in a scene image, we first need to extract text regions from the image before recognizing the text. In this paper, we examine human vision and propose a novel method to extract text regions by evaluating textural variation. Human beings are often attracted by textural variation in scenes, which causes foveation. We frame a hypothesis that texts also have similar property that distinguishes them from the natural background. In our method, we calculate spatial variation of texture to obtain the distribution of the degree of likelihood of text region. Here we evaluate the changes in local spatial spectrum as the textural variation. We investigate two options to evaluate the spectrum, that is, those based on one- and two-dimensional Fourier transforms. In particular, in this paper, we put emphasis on the one-dimensional transform, which functions like the Gabor filter. The proposal can be applied to a wide range of characters mainly because it employs neither templates nor heuristics concerning character size, aspect ratio, specific direction, alignment, and so on. We demonstrate that the method effectively extracts text regions contained in various general scene images. We present quantitative evaluation of the method by using databases open to the public.

  • A New Signature-Based Indexing Scheme for Efficient Trajectory Retrieval in Spatial Networks

    Jae-Woo CHANG  Jung-Ho UM  

     
    PAPER-Database

      Vol:
    E92-D No:6
      Page(s):
    1240-1249

    Even though it is very important to retrieve similar trajectories with a given query trajectory, there has been a little research on trajectory retrieval in spatial networks, like road networks. In this paper, we propose an efficient indexing scheme for retrieving moving object trajectories in spatial networks. For this, we design a signature-based indexing scheme for efficiently dealing with the trajectories of current moving objects as well as for maintaining those of past moving objects. In addition, we provide an insertion algorithm for storing the segment information of a moving object trajectory as well as a retrieval algorithm to find a set of moving objects whose trajectories match the segments of a query trajectory. Finally, we show that our signature-based indexing scheme achieves at least twice better performance on trajectory retrieval than the leading trajectory indexing schemes, such as TB-tree, FNR-tree, and MON-tree.

  • Monotone Increasing Binary Similarity and Its Application to Automatic Document-Acquisition of a Category

    Izumi SUZUKI  Yoshiki MIKAMI  Ario OHSATO  

     
    PAPER-Knowledge Acquisition

      Vol:
    E91-D No:11
      Page(s):
    2545-2551

    A technique that acquires documents in the same category with a given short text is introduced. Regarding the given text as a training document, the system marks up the most similar document, or sufficiently similar documents, from among the document domain (or entire Web). The system then adds the marked documents to the training set to learn the set, and this process is repeated until no more documents are marked. Setting a monotone increasing property to the similarity as it learns enables the system to 1) detect the correct timing so that no more documents remain to be marked and to 2) decide the threshold value that the classifier uses. In addition, under the condition that the normalization process is limited to what term weights are divided by a p-norm of the weights, the linear classifier in which training documents are indexed in a binary manner is the only instance that satisfies the monotone increasing property. The feasibility of the proposed technique was confirmed through an examination of binary similarity and using English and German documents randomly selected from the Web.

  • Efficient Query-by-Content Audio Retrieval by Locality Sensitive Hashing and Partial Sequence Comparison

    Yi YU  Kazuki JOE  J. Stephen DOWNIE  

     
    PAPER-Contents Technology and Web Information Systems

      Vol:
    E91-D No:6
      Page(s):
    1730-1739

    This paper investigates suitable indexing techniques to enable efficient content-based audio retrieval in large acoustic databases. To make an index-based retrieval mechanism applicable to audio content, we investigate the design of Locality Sensitive Hashing (LSH) and the partial sequence comparison. We propose a fast and efficient audio retrieval framework of query-by-content and develop an audio retrieval system. Based on this framework, four different audio retrieval schemes, LSH-Dynamic Programming (DP), LSH-Sparse DP (SDP), Exact Euclidian LSH (E2LSH)-DP, E2LSH-SDP, are introduced and evaluated in order to better understand the performance of audio retrieval algorithms. The experimental results indicate that compared with the traditional DP and the other three compititive schemes, E2LSH-SDP exhibits the best tradeoff in terms of the response time, retrieval accuracy and computation cost.

1-20hit(49hit)