The search functionality is under construction.

IEICE TRANSACTIONS on Information

  • Impact Factor

    0.72

  • Eigenfactor

    0.002

  • article influence

    0.1

  • Cite Score

    1.4

Advance publication (published online immediately after acceptance)

Volume E86-D No.9  (Publication Date:2003/09/01)

    Special Issue on Parallel and Distributed Computing, Applications and Technologies
  • FOREWORD

    Susumu HORIGUCHI  

     
    FOREWORD

      Page(s):
    1477-1478
  • HTN: A New Hierarchical Interconnection Network for Massively Parallel Computers

    M.M. Hafizur RAHMAN  Susumu HORIGUCHI  

     
    PAPER-Networking and Architectures

      Page(s):
    1479-1486

    Interconnection networks usually suffer from Little's Law: low cost implies low performance and high performance is obtained high cost. However, hierarchical interconnection networks provide high performance at low cost by exploring the locality that exists in communication patterns of massively parallel computers. In this paper, we propose a new hierarchical interconnection network, called Hierarchical Torus Network (HTN). This network reduces the number of vertical links in 3D stacked implementation while maintaining good network features. This paper addresses the architectural details of the HTN, and explores aspects such as the network diameter, average distance, bisection width, peak number of vertical links, and VLSI layout area of the HTN as well as for several commonly used networks for parallel computers. It is shown that the HTN possesses several attractive features including small diameter, small average distance, small number of wires, a particularly small number of vertical links, and economic layout area.

  • Adaptive TCP Receiver Window Control on Channel Conditions for Wireless Systems

    Namgi KIM  Hyunsoo YOON  

     
    PAPER-Networking and Architectures

      Page(s):
    1487-1494

    Wireless networks are quickly becoming an integral part of the Internet. But, the TCP does not work well in wireless networks. Considerable research has tried to improve TCP performance in wireless networks, especially in the face of the wireless link loss problem. However, TCP performance is also deeply influenced by channel conditions, and the performance in variable channel conditions has not been studied widely. In this paper, we observe the behavior of the traditional standard TCP performance in the face of variable channel conditions. Then, we propose a new simple TCP flow control scheme. The traditional standard TCP performs poorly because it does not reflect current channel conditions. Our adaptive TCP receiver window control scheme, however, performs well on variable channel conditions. Our scheme efficiently improves TCP performance with minimum modification of TCP module on the wireless terminal. It adaptively adjusts the TCP receiver window size depending on the dynamic channel conditions. Thus, our scheme maintains network conditions properly and has good TCP performance over all wireless channel conditions. In addition, since our scheme is simple and the only the TCP receiver module on the wireless terminal needs to be changed, it is feasible. Through the simulation and analysis, we found that our scheme has good TCP throughput and short end-to-end delay over all variable channel conditions.

  • Adaptive Backtracking Handover Scheme Using a Best-Fit COS Search Method for Improving Handover Efficiency in Wireless ATM Networks

    Hosang YUN  Kwangwook SHIN  Hyunsoo YOON  

     
    PAPER-Networking and Architectures

      Page(s):
    1495-1503

    The crucial handover elements in wireless ATM networks are handover delay and handover efficiency. Since the research about the handover in wireless ATM has until now focused mainly on minimizing handover delay, the results have shown the inefficiency of network resources. In broadband wireless ATM networks, handover efficiency is critical to network capacity. In this paper, we propose a new handover scheme based on a partial path rerouting scheme called the delay limited best-fit backtracking scheme. The scheme searches for the crossover switch that limits handover delay and at the same time maximizes handover efficiency. It uses a new crossover switch searching method, which is an adaptive backtracking searching method that uses a best-fit search manner, to search for the optimal crossover switch that satisfies the given crossover switch condition. We evaluated the performance of proposed handover scheme, and show that the suggested scheme can improve handover efficiency more than other handover schemes.

  • Organizing the LDD in Mobile Environments

    JeHyok RYU  MoonBae SONG  Chong-Sun HWANG  

     
    PAPER-Networking and Architectures

      Page(s):
    1504-1512

    In wireless mobile environments, data requests based on the location of mobile clients (MCs) have increased. The requested data, called location-dependent data (LDD), may be uncertain if MCs use terms of general distance like "near". Fuzzy theory allows us to represent uncertainty without sharp boundaries. In this paper we quantify the fuzziness and propose a method for constructing the data region of LDD by the degree of the distance between LDDs' and MCs' locations. In simulation studies, we evaluate the LDD regions (LDRs) in various situations: MCs' extensive and intensive queried pattern in data regions of two "near" senses and civilized regions with regional features. Our performance evaluation shows that the number of database accesses in proposed LDRs can be reduced in each case.

  • Performance Evaluation for a New Quasi-Synchronous CDMA System Employing Generalized Orthogonal Sequences

    Li HAO  Pingzhi FAN  

     
    PAPER-Networking and Architectures

      Page(s):
    1513-1524

    In this paper, a quasi-synchronous code-division multiple-access (QS-CDMA) is investigated for application in the reverse link of a microcellular or indoor mobile communication environment. In a QS-CDMA system, the relative time delay between the signals of different users is normally restricted within a few chips. Generalized orthogonal (GO) codes added with guard chips are employed as the spreading sequences in this paper and the strict timing error restrictions for BPSK and M-QAM modulation schemes are derived based on the correlation properties of GO code set which determines the multiple access interference (MAI). The results reveal that the system employing GO code set with bigger GO zone can tolerate more serious timing error, and higher order modulation schemes require stricter synchronization. Based on the system model presented, the BER performance for BPSK and M-QAM is evaluated by Gaussian Approximation (GA) and Monte Carlo simulation. The system capacity in terms of acquirable total bit rates are also evaluated, revealing that if system synchronization error is limited within the GO zone, M-QAM scheme can be utilized to improve the system capacity.

  • Crosstalk-Free Permutation in Photonic Rearrangeable Networks Built on a Combination of Horizontal Expansion and Vertical Stacking of Banyan Networks

    Xiaohong JIANG  Hong SHEN  Md. Mamun-ur-Rashid KHANDKER  Susumu HORIGUCHI  

     
    PAPER-Networking and Architectures

      Page(s):
    1525-1533

    Crosstalk in optical switch is an intrinsic drawback of optical networks, and avoiding crosstalk is important for making optical network work properly. Horizontal expansion and vertical stacking are two basic techniques for creating nonblocking multistage interconnection networks (MINs). Rearrangeable (nonblocking) optical MINs are feasible since they have lower complexity than their strictly nonblocking counterparts. In this paper, we study the crosstalk-free permutations in rearrangeable optical MINs built on a combination of horizontal expansion and vertical stacking of banyan networks, and provide a scheme for realizing crosstalk-free permutations in this kind of optical MINs. The basic idea of this scheme is to first decompose a permutation into multiple partial permutations by using Euler Split technique, then route and realize each of these partial permutations crosstalk-free in one plane (stacked copy) of a MIN based on both the Euler Split technique and self-routing property of a banyan network. The tradeoff between the overall time complexity and hardware cost of this class of MINs is also explored in this paper.

  • Cost Analysis of Optimistic Recovery Model for Forked Checkpointing

    Jiman HONG  Sangsu KIM  Yookun CHO  

     
    PAPER-Networking and Architectures

      Page(s):
    1534-1541

    Forked checkpointing scheme is proposed to achieve low checkpoint overhead. When a process wants to take a checkpoint in the forked checkpointing scheme, it creates a child process and continues its normal computation. Two recovery models can be used for forked checkpointing when the parent process fails before the child process establishes the checkpoint. One is the pessimistic recovery model where the recovery process rolls back to the previous checkpoint state. The other is the optimistic recovery model where a recovery process waits for the checkpoint to be established by the child process. In this paper, we present the recovery models for forked checkpointing by deriving the expected execution time of a process with and without checkpointing and also show that the expected recovery time of the optimistic recovery model is smaller than that of the pessimistic recovery model.

  • Design of a Low-Power Configurable-Way Cache Applied in Multiprocessor Systems

    Hsin-Chuan CHEN  Jen-Shiun CHIANG  

     
    PAPER-Networking and Architectures

      Page(s):
    1542-1548

    In the design of a set-associative cache, maintaining low average access time and reducing the average energy dissipation are important issues. In this paper, we propose a set-associative cache that can provide the flexibility to configure its associativity according to different program behaviors, which means that the proposed cache scheme can be configured from n-way set-associative cache to direct-mapped cache. Besides, the proposed cache scheme also can disable all tag-subarrays and only enable a desired data-subarray when adjacent memory references are within the same block as the previous access. By this scheme, the power consumption can be saved when an n-way set-associative cache configures the cache with lower associativity (less than n) due to only enabling fewer subarrays of the tag memory and data memory, and when the tag checking is eliminated for the intra-block access due to disabling all subarrays of the tag memory. However, the performance is still maintained to the same as the conventional set-associative cache or the direct-mapped cache.

  • Technology Scalable Matrix Architecture for Data Parallel Applications

    Mostafa SOLIMAN  Stanislav SEDUKHIN  

     
    PAPER-Networking and Architectures

      Page(s):
    1549-1559

    Within a few years it will be possible to integrate a billion transistors on a single chip operating at frequency more than 10 GHz. At this integration level, we propose using a multi-level ISA to express fine-grain data parallelism to hardware instead of using a huge transistor budget to dynamically extract it. Since the fundamental data structures for a wide variety of data parallel applications are scalar, vector, and matrix, our proposed Trident processor extends a scalar ISA with vector and matrix instruction sets to effectively process matrix formulated applications. Like vector architectures, the Trident processor consists of a set of parallel lanes (each lane contains a set of vector pipelines and a slice of register file) combined with a fast scalar core. However, Trident processor can effectively process on the parallel lanes not only vector but also matrix data. One key point of our architecture is the local communication within and across lanes to overcome the limitations of the future VLSI technology. Another key point is the effective execution of a mixture of scalar, vector, and matrix operations. This paper describes the architecture of the Trident processor and evaluates its performance on BLAS and on the standard matrix bidiagonalization algorithm. The last one is evaluated as an example of an entire application based on a mixture of scalar, vector, and matrix operations. Our results show that many data parallel applications, such as scientific, engineering, multimedia, etc., can be speeded up on the Trident processor. Besides, the scalability of the Trident processor does not require more fetch, decode, or issue bandwidth, but requires only replication of parallel lanes.

  • Resource-Optimal Software Pipelining Using Flow Graphs

    Dirk FIMMEL  Jan MULLER  Renate MERKER  

     
    INVITED PAPER-Software Systems and Technologies

      Page(s):
    1560-1568

    We present a new approach to the loop scheduling problem, which excels previous solutions in two important aspects: The resource constraints are formulated using flow graphs, and the initiation interval λ is treated as a rational variable. The approach supports heterogeneous processor architectures and pipelined functional units, and the Integer Linear Programming implementation produces an optimum loop schedule, whereby a minimum λ is achieved. Our flow graph model facilitates the cyclic binding of loop operations to functional units. Compared to previous research results, the solution can provide faster loop schedules and a significant reduction of the problem complexity and solution time.

  • Parallel Molecular Dynamics in a Parallelizing SML Compiler

    Norman SCAIFE  Ryoko HAYASHI  Susumu HORIGUCHI  

     
    PAPER-Software Systems and Technologies

      Page(s):
    1569-1576

    We have constructed a parallelizing compiler for Standard ML (SML) based upon algorithmic skeletons. We present an implementation of a Parallel Molecular Dynamics (PMD) simulation in order to compare our functional approach with a traditional imperative approach. Although we present performance data, the principal benefits from our approach are in the modularity of the code and the ease of programming. Extant FORTRAN90 code for an O(N 2) algorithm is translated, firstly into imperative SML and then into purely functional SML which is then parallelized. The ease of programming and the performance of the FORTRAN90 and SML code are compared. Modest parallel performance is obtained from the parallel SML but with a much slower sequential execution time compared to the FORTRAN90. We then improve the implementation with a ring topology implementation which gives much closer performance to the FORTRAN90 implementation.

  • Efficient and Scalable Client Clustering for Web Proxy Cache

    Kyungbaek KIM  Daeyeon PARK  

     
    PAPER-Software Systems and Technologies

      Page(s):
    1577-1585

    Many cooperated web cache systems and protocols have been proposed. These systems, however, require expensive resources, such as external bandwidth and CPU power or storage of a proxy, while inducing hefty administrative costs to achieve adequate client population growth. Moreover, a scalability problem in the cache server management still exists. This paper suggests peer-to-peer client-clustering. The client-cluster provides a proxy cache with backup storage which is comprised of the residual resources of the clients. We use DHT based peer-to-peer lookup protocol to manage the client-cluster. With the natural characteristics of this protocol, the client-cluster is self-organizing, fault-tolerant, well-balanced and scalable. Additionally, we propose the Backward ICP which is used to communicate between the proxy cache and the client-cluster, to reduce the overhead of the object replication and to use the resources more efficiently. We examine the performance of the client-cluster via a trace driven simulation and demonstrate effective enhancement of the proxy cache performance.

  • Parallel Algorithms for Higher-Dimensional Euclidean Distance Transforms with Applications

    Yuh-Rau WANG  Shi-Jinn HORNG  Yu-Hua LEE  Pei-Zong LEE  

     
    INVITED PAPER-Algorithms and Applications

      Page(s):
    1586-1593

    Based on the dimensionality reduction technique and the solution for proximate points problem, we achieve the optimality of the three-dimensional Euclidean distance transform (3D_EDT) computation. For an N N N binary image, our algorithms for both 3D_EDT and its applications can be performed in O (log log N) time using CRCW processors or in O (log N) time using EREW processors. To the best of our knowledge, all results described above are the best known. As for the n-dimensional Euclidean distance transform (nD_EDT) and its applications of a binary image of size Nn, all of them can be computed in O (nlog log N) time using CRCW processors or in O (nlog N) time using EREW processors.

  • Concerning the Length of Time Slots for Efficient Gang Scheduling

    Bing Bing ZHOU  Andrzej M. GOSCINSKI  Richard P. BRENT  

     
    INVITED PAPER-Algorithms and Applications

      Page(s):
    1594-1600

    Applying gang scheduling can alleviate the blockade problem caused by exclusively used space-sharing strategies for parallel processing. However, the original form of gang scheduling is not practical as there are several fundamental problems associated with it. Recently many researchers have developed new strategies to alleviate some of these problems. Unfortunately, one important problem has not been so far seriously addressed, that is, how to set the length of time slots to obtain a good performance of gang scheduling. In this paper we present a strategy to deal with this important issue for efficient gang scheduling.

  • Secure Distributed Configuration Management with Randomised Scheduling of System-Administration Tasks

    Frode EIKA SANDNES  

     
    PAPER-Algorithms and Applications

      Page(s):
    1601-1610

    Distributed configuration management involves maintaining a set of distributed storage and processing resources in such a way that they serve a community of users fairly, promptly, reliably and securely. Security has recently received much attention due to the general anxiety of hacking. Parallel computing systems such as clusters of workstations are no exception to this threat. This paper discusses experiments that measure the effect of employing randomisation in the scheduling of interdependent user and management tasks onto a distributed system such as clusters of workstations. Two attributes are investigated, namely performance and security. Performance is usually the prime objective in task scheduling. In this work the scheduling problem is viewed as a multi-objective optimisation problem where there is a subtle balance between efficient schedules and security. A schedule is secure if it is not vulnerable to malicious acts or inadvertent human errors. Further, the scheduling model should be hidden from unauthorised observers. The results of the study support the use of randomisation in the scheduling of tasks over an insecure network of processing nodes inhabited by malicious users.

  • A Performance Study of Task Allocation Algorithms in a Distributed Computing System (DCS)

    Biplab KUMER SARKER  Anil KUMAR TRIPATHI  Deo PRAKASH VIDYARTHI  Kuniaki UEHARA  

     
    PAPER-Algorithms and Applications

      Page(s):
    1611-1619

    A Distributed Computing System (DCS) contributes in proper partitioning of the tasks into modules and allocating them to various nodes so as to enable parallel execution of their modules by individual different processing nodes of the system. The scheduling of various modules on particular processing nodes may be preceded by appropriate allocation of modules of the different tasks to various processing nodes and then only the appropriate execution characteristic can be obtained. A number of algorithms have been proposed for allocation of tasks in a DCS. Most of the solutions proposed had simplifying assumptions. The very first assumption has been: consideration of a single task with their corresponding modules only; second, no consideration of the status of processing nodes in terms of the previously allocated modules of various tasks and third, the capacity and capability of the processing nodes. This work proposes algorithms for a realistic situation wherein multiple tasks with their modules compete for execution on a DCS dynamically considering their architectural capability. In this work, we propose two algorithms based on the two well-known A* and GA for the task allocation models. The paper explains the algorithms elaborately by illustrated examples and presents a comparative performance study among our algorithms and the algorithms for task allocation proposed in the various literatures. The results demonstrate that our GA based task allocation algorithm achieves better performance compared with the other algorithms.

  • Experimental Evaluation of Task Scheduling Accuracy: Implications for the Scheduling Model

    Oliver SINNEN  Leonel SOUSA  

     
    PAPER-Algorithms and Applications

      Page(s):
    1620-1627

    Most heuristics for the task scheduling problem employ a simple model of the target system, assuming fully connected processors, a dedicated communication subsystem and no contention for communication resources. A small number of algorithms is aware of the contention, using an undirected graph model of the communication network. Although, many scheduling algorithms have been compared in the literature, little is known about the accuracy and appropriateness of the employed models. This article evaluates the accuracy of task scheduling algorithms on generic parallel systems. The performed experiments show a significant inaccuracy of the schedules produced. In an extensive analysis, the reasons for these results are identified and the implications for the scheduling model are discussed.

  • Node-to-Set Disjoint Paths Problem in Pancake Graphs

    Keiichi KANEKO  Yasuto SUZUKI  

     
    PAPER-Algorithms and Applications

      Page(s):
    1628-1633

    In this paper, we give an algorithm for the node-to-set disjoint paths problem in pancake graphs with its evaluation results. The algorithm is of polynomial order of n for an n-pancake graph. It is based on recursion and divided into two cases according to the distribution of destination nodes in classes into which all the nodes in a pancake graph are categorized. The sum of lengths of paths obtained and the time complexity of the algorithm are estimated and the average performance is evaluated based on computer simulation.

  • Minimum Feedback Node Sets in Trivalent Cayley Graphs

    Yasuto SUZUKI  Keiichi KANEKO  

     
    LETTER

      Page(s):
    1634-1636

    A minimum feedback node set in a graph is a minimum node subset whose deletion makes the graph acyclic. Its detection in a dependency graph is important to recover from a deadlock configuration. A livelock configuration is also avoidable if a check point is set in each node in the minimum feedback node set. Hence, its detection is very important to establish dependable network systems. In this letter, we give a minimum feedback node set in a trivalent Cayley graph. Assuming that each word has n bits, for any node, we can judge if it is included in the set or not in constant time.

  • Special Issue on Text Processing for Information Access
  • FOREWORD

    Hitoshi ISAHARA  Fumito MASUI  Manabu OKUMURA  

     
    FOREWORD

      Page(s):
    1637-1637
  • Use of Dynamic Passage Selection and Lexico-Semantic Patterns for Japanese Natural Language Question Answering

    Seungwoo LEE  Gary Geunbae LEE  

     
    PAPER

      Page(s):
    1638-1647

    This paper describes a practical Japanese natural language Question Answering system adopting effective selection of dynamic passages, Lexico-Semantic Patterns (LSP), and Predictive Answer Indexing. By analyzing the previous TREC QA data, we defined a dynamic passage unit and developed a passage selection method suitable for Question Answering. Using LSP, we identify the answer type of a question and detect answer candidates without any deep linguistic analyses of the texts. To guarantee a short response time, Predictive Answer Indexing is combined into our overall system architecture. As a result of the three engineering techniques, our system showed excellent performance when evaluated by mean reciprocal rank (MRR) in NTCIR-3 QAC-1.

  • Comparison of Document Retrieval and Passage Retrieval Using a New Measurement in an Initial Narrowing-Down Search for Question Answering

    Toru TAKAKI  

     
    PAPER

      Page(s):
    1648-1657

    In a question-answering (QA) task, "real" information retrieval, rather than document retrieval is required. Effective QA thus involves complicated and time-consuming processing, such as natural language processing and named-entity processing. To reduce the amount of processing needed, the quantity of documents in a database can be narrowed down during an initial stage of the answering procedure. This paper proposes a new evaluation measurement and compares the retrieval accuracy of initial-stage searching that uses "overall" document retrieval and "partial" passage retrieval with the TREC QA data set. The initial search and final result accuracy for various cutoff points defined according to the number of documents or words that are output is evaluated. A variety of experiments demonstrate that middle-length passage-retrieval is effective for QA, and short-length passage-retrieval could improve the accuracy of the final result for a specific question type.

  • An A* Search in Sentential Matching for Question Answering

    Tatsunori MORI  Tomohiro OHTA  Katsuyuki FUJIHATA  Ryutaro KUMON  

     
    PAPER

      Page(s):
    1658-1668

    In this paper, we propose a method to introduce an A* search control into sentential matching mechanism for question-answering systems, in order to reduce the response time while the accuracy of the answer is preserved. The question answering is a new technology to retrieve not relevant documents but the answer(s) directly by combining several methodology including IR and IE. One of the essential processes is the sentential matching between the user's query and each sentence in documents. In general, in order to obtain matching score precisely in higher resolution, we need some processes with higher computational costs. We therefore introduce an A* search in which both the processing cost and the resolution of matching score are took into account simultaneously. According to the experiments in NTCIR3 QAC1 Task1, the system with the controlled search is 3.4-8.5 times faster than the system with no control.

  • Question Answering as Abduction: A Feasibility Study at NTCIR QAC1

    Yutaka SASAKI  

     
    PAPER

      Page(s):
    1669-1676

    This paper presents a Japanese Question Answering (QA) system based on a "Question Answering as Abduction" perspective. This perspective regards QA as the process of abductively explaining why a question is true based on logical contents of appropriately described textual information. This perspective is strongly inspired by Jerry Hobbs et al.'s "Interpretation as Abduction". It is also a simple conceptualization of Harabagiu et al.'s logic based QA system. We reify this concept in our QA system called SAIQA-Is. This system was designed to output only most likely answer candidates to a question. This system was participated in NTCIR QAC1. SAIQA-Is provided very good results in Task 2 and Task 3 of the QAC experiments. This results demonstrated strong feasibility and high potential of our Question Answering as Abduction approach.

  • Effects of Structural Matching and Paraphrasing in Question Answering

    Tetsuro TAKAHASHI  Kozo NAWATA  Kentaro INUI  Yuji MATSUMOTO  

     
    PAPER

      Page(s):
    1677-1685

    In this paper, we propose an answer seeking algorithm for question answering that integrates structural matching and paraphrasing, and report the results of our empirical evaluation conducted with the aim of examining effects of incorporating those two components. According to the results, the contribution of structural matching and paraphrasing was not so large as expected. Based on error analysis, we conclude that structural matching-based approaches to answer seeking require technologies for (a) coreference resolution, (b) processing of parse forests instead of parse trees, and (c) large-scale acquisition of paraphrase patterns.

  • Sentence Extraction by Spreading Activation through Sentence Similarity

    Naoaki OKAZAKI  Yutaka MATSUO  Naohiro MATSUMURA  Mitsuru ISHIZUKA  

     
    PAPER

      Page(s):
    1686-1694

    Although there has been a great deal of research on automatic summarization, most methods rely on statistical methods, disregarding relationships between extracted textual segments. We propose a novel method to extract a set of comprehensible sentences which centers on several key points to ensure sentence connectivity. It features a similarity network from documents with a lexical dictionary, and spreading activation to rank sentences. We show evaluation results of a multi-document summarization system based on the method participating in a competition of summarization, TSC (Text Summarization Challenge) task, organized by the third NTCIR project.

  • Topic Keyword Identification for Text Summarization Using Lexical Clustering

    Youngjoong KO  Kono KIM  Jungyun SEO  

     
    PAPER

      Page(s):
    1695-1701

    Automatic text summarization has the goal of reducing the size of a document while preserving its content. Generally, producing a summary as extracts is achieved by including only sentences which are the most topic-related. DOCUSUM is our summarization system based on a new topic keyword identification method. The process of DOCUSUM is as follows. First, DOCUSUM converts the content words of a document into elements of a context vector space. It then constructs lexical clusters from the context vector space and identifies core clusters. Next, it selects topic keywords from the core clusters. Finally, it generates a summary of the document using the topic keywords. In the experiments on various compression ratios (the compression of 30%, the compression of 10%, and the extraction of the fixed number of sentences: 4 or 8 sentences), DOCUSUM showed better performance than other methods.

  • SVM-Based Multi-Document Summarization Integrating Sentence Extraction with Bunsetsu Elimination

    Tsutomu HIRAO  Kazuhiro TAKEUCHI  Hideki ISOZAKI  Yutaka SASAKI  Eisaku MAEDA  

     
    PAPER

      Page(s):
    1702-1709

    In this paper, we propose a machine learning-based method of multi-document summarization integrating sentence extraction with bunsetsu elimination. We employ Support Vector Machines for both of the modules used. To evaluate the effect of bunsetsu elimination, we participated in the multi-document summarization task at TSC-2 by the following two approaches: (1) sentence extraction only, and (2) sentence extraction + bunsetsu elimination. The results of subjective evaluation at TSC-2 show that both approaches are superior to the Lead-based method from the viewpoint of information coverage. In addition, we made extracts from given abstracts to quantitatively examine the effectiveness of bunsetsu elimination. The experimental results showed that our bunsetsu elimination makes summaries more informative. Moreover, we found that extraction based on SVMs trained by short extracts are better than the Lead-based method, but that SVMs trained by long extracts are not.

  • A Statistical Method for Acquiring Knowledge about the Abbreviation Possibility of Some of Multiple Adnominal Phrases

    Hiroyuki SAKAI  Shigeru MASUYAMA  

     
    PAPER

      Page(s):
    1710-1718

    This paper proposes a statistical method of acquiring knowledge about the abbreviation possibility of some of multiple adnominal phrases. Our method calculates weight values of multiple adnominal phrases by mutual information based on the strength of relation between the adnominal phrases and modified nouns. Among adnominal phrases, those having relatively low weight values are deleted. The evaluation of our method by experiments shows that precision attains about 84.1% and recall attains about 59.2%, respectively.

  • Multiple Document Summarization System GOLD

    Tatsumi YOSHIDA  Shigeru MASUYAMA  

     
    PAPER

      Page(s):
    1719-1727

    We developed a multiple document summarization system GOLD. This system generates a single summary from relevant newspaper articles with any summarization rate specified by a user. GOLD is incorporated a number of methods to summarize. In particular, some methods for sentence reduction are useful to shorten each sentence. As a result, it increased the number of outputted sentences which include important information. We participated in task B of NTCIR3 TSC2 to evaluate this system. GOLD exhibits a good performance in content-based evaluation which suggests that summarization methods employed by GOLD are promising for practical use.

  • Information Extraction and Summarization for Newspaper Articles on Sassho-jiken

    Teiji FURUGORI  Rihua LIN  Takeshi ITO  Dongli HAN  

     
    PAPER

      Page(s):
    1728-1735

    Described here is an automatic text summarization system for Japanese newspaper articles on sassho-jiken (murders and bodily harms). We extract the pieces of information from a text, inter-connect them to represent the scenes and participants involved in the sassho-jiken, and finally produce a summary by generating sentences from the information extracted. An experiment and its evaluation show that, while a limitation being imposed on the domain, our method works well in depicting important information from the newspaper articles and the summaries produced are better in adequacy and readability than those obtained by extracting sentences.

  • Corpus Based Method of Transforming Nominalized Phrases into Clauses for Text Mining Application

    Akira TERADA  Takenobu TOKUNAGA  

     
    PAPER

      Page(s):
    1736-1744

    Nominalization is a linguistic phenomenon in which events usually described in terms of clauses are expressed in the form of noun phrases. Extracting event structures is an important task in text mining applications. To achieve this goal, clauses are parsed and the argument structure of main verbs are extracted from the parsed results. This kind of preprocessing has been commonly done in the past research. In order to extract event structure from nominalized phrases as well, we need to establish a technique to transform nominalized phrases into clauses. In this paper, we propose a method to transform nominalized phrases into clauses by using corpus-based approach. The proposed method first enumerates possible predicate/argument structures by referring to a nominalized phrase (noun phrase) and makes their ranking based on the frequency of each argument in the corpus. The algorithm based on this method was evaluated using a corpus consisting of 24,626 aviation safety reports in English and it achieved a 78% accuracy in transformation. The algorithm was also evaluated by applying a text mining application to extract events and their cause-effect relations from the texts. This application produced an improvement in the text mining application's performance.

  • Hybrid Chinese Term Indexing and the 2-Poisson Model

    Robert W.P. LUK  Kam Fai WONG  

     
    PAPER

      Page(s):
    1745-1752

    Retrieval effectiveness depends on both the retrieval model and how terms are extracted and indexed. For Chinese, Japanese and Korea text, there are no spaces to delimit words. Indexing using hybrid terms (i.e. words and bigrams) was found to be effective and efficient using the 2-Poisson model in NTCIR-III open evaluation workshop. Here, we explore another Okapi weight, BM25, based on the 2-Poisson model and compared their performances with bigram and word indexing strategies. Results show that word indexing is the most efficient in terms of indexing time and storage but hybrid term indexing requires the least amount of retrieval time per query. Without pseudo-relevance feedback (PRF), our BM25 appeared to yield better retrieval effectiveness performance for short queries. With PRF, our implementation of the BM11 weights, which are a simplified version of BM25, with hybrid term indexing remains the most effective combination for retrieval in this study.

  • Effectiveness of Passage-Based Document Retrieval for Short Queries

    Koichi KISE  Markus JUNKER  Andreas DENGEL  Keinosuke MATSUMOTO  

     
    PAPER

      Page(s):
    1753-1761

    Document retrieval is a fundamental but important task for intelligent access to a huge amount of information stored in documents. Although the history of its research is long, it is still a hard task especially in the case that lengthy documents are retrieved with very short queries (a few keywords). For the retrieval of long documents, methods called passage-based document retrieval have proven to be effective. In this paper, we experimentally show that a passage-based method based on window passages is also effective for dealing with short queries on condition that documents are not too short. We employ a method called "density distributions" as a method based on window passages, and compare it with three conventional methods: the simple vector space model, pseudo relevance feedback and latent semantic indexing. We also compare it with a passage-based method based on discourse passages.

  • Optimal Local Dimension Analysis of Latent Semantic Indexing on Query Neighbor Space

    Yinghui XU  Kyoji UMEMURA  

     
    PAPER

      Page(s):
    1762-1772

    In this paper, we present our investigation of Latent Semantic Indexing (LSI) on the local query regions for solving the computation restrictions of the LSI on the global information space. Through the experiments with different SVD dimensionality on the local query regions, the results show that low-dimensional LSI can achieve much better precision than VSM and similar precision to global LSI. Such small SVD factors indicate that there is an almost linear surface in the local query regions. The largest or the two largest singular vectors have the ability to capture such a linear surface and benefit the particular query. In spite of the fact that Local LSI analysis needs to perform the Singular Value Decomposition (SVD) computation for each query, the surprisingly small requirements of the SVD dimension resolve the computation restrictions of LSI for large scale IR tasks. Moreover, on the condition that several relevant sample documents are available, application of low dimensional LSI for these documents can obtain comparable precision with the Local RF in a different manner.

  • Results Merging with the OASIS System: An Experimental Comparison of Two Techniques

    Vitaliy KLUEV  

     
    PAPER

      Page(s):
    1773-1780

    Mechanisms used for results merging are very important for distributed search systems. They are to select the most relevant documents retrieved by different servers and put them on the top of the list returned to the end user. There are several approaches to solve key problems of this task such as eliminating duplicates and ranking results combined. But it is still not clear how to achieve this. We use the clustering technique to divide retrieved results into several groups and a metric on the base of the vector space model to arrange items inside each group. Preliminary tests were conducted using the OASIS system and several collections of real Internet data. They showed relatively superior results when compared to the neural network clustering and LSI calculation. Proposed mechanisms can be applied to metasearch systems and to distributed search systems as well because such mechanisms do not require any special information except standard de facto data received from servers.

  • Determining Indexing Strings with Statistical Analysis

    Yoshiyuki TAKEDA  Kyoji UMEMURA  Eiko YAMAMOTO  

     
    PAPER

      Page(s):
    1781-1787

    Determining indexing strings is an important factor in information retrieval. Ideally, the strings should be words that represent documents or queries. Although any single word may be the first candidate for indexing strings for an English corpus, it may not be ideal due to the existence of compound nouns, which are often good indexing strings, and which often depend on the genre of the corpus used. The situation is even worse in Japanese or Chinese where the words are not separated by spaces. In this paper, we propose a method of determining indexing strings based on statistical analysis. The novel features of our method are to make the most of the statistical measure called "adaptation" and not to use language-dependent resources such as dictionaries and stop word lists. In evaluating our method using a Japanese test collection, we found that it actually improves the precision of information retrieval systems.

  • A Collaborative Personal Repository System and Its Information Retrieval Scheme

    Takashi YUKAWA  Sen YOSHIDA  Kazuhiro KUWABARA  

     
    PAPER

      Page(s):
    1788-1795

    A framework is described for a peer-to-peer information exchange system, and a collaborative information retrieval (IR) scheme for the system is proposed. The aims of the system include smooth knowledge and information management to activate organizations or communities. Conventional server-centric systems are weak because they create information-providing bottlenecks. Accordingly, the proposed framework targets the collaborative inter-working of personal repositories that accumulate per-user information, and accept and service requests. Issues concerning the framework are addressed. One issue is the retrieval of information from another's personal repository; the retrieval criteria of a system are tightly personalized for its user. The system is assumed to employ a vector space model with a concept-base as its IR mechanism. The vector space on one system is very different from that on another system. Another issue is the automated control of the information-providing criteria. This paper presents solutions to the first problem. To achieve IR that provides satisfactory results to a user requiring information from another's personal repository, we need vector space equalization to compensate for the differences in the vector spaces of the personal repositories. The paper presents a vector space equalization scheme, the automated relevance feedback scheme, that compensates the differences in the vector spaces of the personal repositories. We implement the scheme as a system and evaluate its performance using documents on the Internet.

  • Factor Controlled Hierarchical SOM Visualization for Large Set of Data

    Junan CHAKMA  Kyoji UMEMURA  

     
    PAPER

      Page(s):
    1796-1803

    Self-organizing map is a widely used tool in high-dimensional data visualization. However, despite its benefits of plotting very high-dimensional data on a low-dimensional grid, browsing and understanding the meaning of a trained map turn to be a difficult task -- specially when number of nodes or the size of data increases. Though there are some well-known techniques to visualize SOMs, they mainly deals with cluster boundaries and they fail to consider raw information available in original data in browsing SOMs. In this paper, we propose our Factor controlled Hierarchical SOM that enables us select number of data to train and label a particular map based on a pre-defined factor and provides consistent hierarchical SOM browsing.

  • Evaluation Methods for Web Retrieval Tasks Considering Hyperlink Structure

    Koji EGUCHI  Keizo OYAMA  Emi ISHIDA  Noriko KANDO  Kazuko KURIYAMA  

     
    PAPER

      Page(s):
    1804-1813

    This paper proposes the evaluation methods for measuring retrieval effectiveness of Web search engine systems, attempting to make them suitable for real Web environment. With this objective, we constructed 100-gigabyte and 10-gigabyte document sets that were mainly gathered from the '.jp' domain, and conducted an evaluation workshop at the third NTCIR Workshop from 2001 to 2002, where we assessed the retrieval effectiveness of a certain number of Web search engine systems using the common data set. Conventional evaluation workshops assessed the relevance of the retrieved documents, which were submitted by the workshop participants, by considering the contents of individual pages. On the other hand, we assessed the relevance of the retrieved pages by considering the relationship between the pages referenced by hyperlinks.

  • Regular Section
  • On 1-Inkdot Alternating Pushdown Automata with Sublogarithmic Space

    Jianliang XU  Yong CHEN  Tsunehiro YOSHINAGA  Katsushi INOUE  

     
    PAPER-Theory of Automata, Formal Language Theory

      Page(s):
    1814-1824

    This paper introduces a 1-inkdot two-way alternating pushdown automaton which is a two-way alternating pushdown automaton (2apda) with the additional power of marking at most 1 tape-cell on the input (with an inkdot) once. We first investigate a relationship between the accepting powers of sublogarithmically space-bounded 2apda's with and without 1 inkdot, and show, for example, that sublogarithmically space-bounded 2apda's with 1 inkdot are more powerful than those which have no inkdots. We next investigate an alternation hierarchy for sublogarithmically space-bounded 1-inkdot 2apda's, and show that the alternation hierarchy on the first level for 1-inkdot 2apda's holds, and we also show that 1-inkdot two-way nondeterministic pushdown automata using sublogarithmic space are incomparable with 1-inkdot two-way alternating pushdown automata with only universal states using the same space.

  • Efficient Loop Partitioning for Parallel Codes of Irregular Scientific Computations

    Minyi GUO  

     
    PAPER-Software Systems

      Page(s):
    1825-1834

    In most cases of distributed memory computations, node programs are executed on processors according to the owner computes rule. However, owner computes rule is not best suited for irregular application codes. In irregular application codes, use of indirection in accessing left hand side array makes it difficult to partition the loop iterations, and because of use of indirection in accessing right hand side elements, we may reduce total communication by using heuristics other than owner computes rule. In this paper, we propose a communication cost reduction computes rule for irregular loop partitioning, called least communication computes rule. We partition a loop iteration to a processor on which the minimal communication cost is ensured when executing that iteration. Then, after all iterations are partitioned into various processors, we give global vs. local data transformation rule, indirection arrays remapping and communication optimization methods. The experimental results show that, in most cases, our approaches achieved better performance than other loop partitioning rules.

  • Probabilistic Automaton-Based Fuzzy English-Text Retrieval

    Manabu OHTA  Atsuhiro TAKASU  Jun ADACHI  

     
    PAPER-Software Systems

      Page(s):
    1835-1844

    Optical Character Reader (OCR) incorrect recognition is a serious problem when searching for OCR-scanned documents in databases such as digital libraries. In order to reduce costs, this paper proposes fuzzy retrieval methods for English text containing errors in the recognized text without correcting the errors manually. The proposed methods generate multiple search terms for each input query term based on probabilistic automata which reflect both error-occurrence probabilities and character-connection probabilities. Experimental results of test-set retrieval indicate that one of the proposed methods improves the recall rate from 95.96% to 98.15% at the cost of a decrease in precision from 100.00% to 96.01% with 20 expanded search terms.

  • CLOCK: Clustering for Common Knowledge Extraction in a Set of Transactions

    Sang Hyun OH  Won Suk LEE  

     
    PAPER-Databases

      Page(s):
    1845-1855

    Association mining extracts common relationships among a finite number of categorical data objects in a set of transactions. However, if the data objects are not categorical and potentially unlimited, it is impossible to employ the association mining approach. On the other hand, clustering is suitable for modeling a large number of non-categorical data objects as long as there exists a distance measure among them. Although it has been used to classify data objects in a data set into groups of similar objects based on data similarity, it can be used to extract the properties of similar data objects commonly appearing in a set of transactions. In this paper, a new clustering method, CLOCK, is proposed to find common knowledge such as frequent ranges of similar objects in a set of transactions. The common knowledge of data objects in the transactions can be represented by the occurrence frequency of similar data objects in terms of a transaction as well as the common repetitive ratio of similar data objects in each transaction. Furthermore, the proposed method also addresses how to maintain identified common knowledge as a summarized profile. As a result, any data difference between a newly collected transaction and the common knowledge of past transactions can be easily identified.

  • Batch-Incremental Nearest Neighbor Search Algorithm and Its Performance Evaluation

    Yaokai FENG  Akifumi MAKINOUCHI  

     
    PAPER-Databases

      Page(s):
    1856-1867

    In light of the increasing number of computer applications that rely heavily on multimedia data, the database community has focused on the management and retrieval of multidimensional data. Nearest Neighbor queries (NN queries) have been widely used to perform content-based retrieval (e.g., similarity search) in multimedia applications. Incremental NN (INN) query is a kind of NN queries and can also be used when the number of the NN objects to be retrieved is not known in advance. This paper points out the weaknesses of the existing INN search algorithms and proposes a new one, called Batch-Incremental Nearest Neighbor search algorithm (denoted B-INN search algorithm), which can be used to process the INN query efficiently. The B-INN search algorithm is different from the existing INN search algorithms in that it does not employ the priority queue that is used in the existing INN search algorithms and is very CPU and memory intensive for large databases in high-dimensional spaces. And it incrementally reports b(b > 1) objects simultaneously (Batch-Incremental), whereas the existing INN search algorithms report the neighbors one by one. In order to implement the B-INN search, a new search (called k-d-NN search) with a new pruning strategy is proposed. Performance tests indicate that the B-INN search algorithm clearly outperforms the existing INN search algorithms in high-dimensional spaces.

  • An Electronic Voting Protocol Preserving Voter's Privacy

    Hiroshi YAMAGUCHI  Atsushi KITAZAWA  Hiroshi DOI  Kaoru KUROSAWA  Shigeo TSUJII  

     
    PAPER-Applications of Information Security Techniques

      Page(s):
    1868-1878

    In this paper we present a new, two-centered electronic voting scheme that is capable of preserving privacy, universal verifiability, and robustness. An interesting property of our scheme is the use of double encryption with additive homomorphic encryption functions. In the two-centered scheme, the first center decrypts the ballots, checks the eligibility of the voters, and multiplies each eligible vote, which is still encrypted in the cryptosystem of the second center. After the deadline is reached, the second center obtains the final tally by decrypting the accumulated votes. As such, both centers cannot know the content of any individual vote, as each vote is hidden in the accumulated result, therefore the privacy of the voters is preserved. Our protocols, together with some existing protocols, allow everyone to verify that all valid votes are correctly counted. We apply the r-th residue cryptosystem as the homomorphic encryption function. Although decryption in the r-th residue cryptosystem requires an exhaustive search for all possible values, based on experiments we show that it is possible to achieve desirable performance for large-scale elections.

  • Construction of an Electroencephalogram-Based Brain-Computer Interface Using an Artificial Neural Network

    Xicheng LIU  Shin HIBINO  Taizo HANAI  Toshiaki IMANISHI  Tatsuaki SHIRATAKI  Tetsuo OGAWA  Hiroyuki HONDA  Takeshi KOBAYASHI  

     
    PAPER-Welfare Engineering

      Page(s):
    1879-1886

    A brain-computer interface using an electroencephalogram as input into an artificial neural network is investigated as a potentially general control system applicable to all subjects and time frames. Using the intent and imagination of bending the left or right elbow, the left and right desired movements are successfully distinguished using event-related desynchronization resolved by fast Fourier transformation of the electroencephalogram and analysis of the power spectrum using the artificial neural network. The influence of age was identified and eliminated through the use of a frequency distribution in the α band, and the recognition rate was further improved by confirmation based on forced excitement of the β band in the case of an error. The proposed system was effectively trained for general use by using the combined data of a cross-section of subjects.

  • Effective Multi-Vehicle Tracking in Nighttime Condition Using Imaging Sensors

    Hanseok KO  Ilkwang LEE  Jihyo LEE  David HAN  

     
    PAPER-Image Processing, Image Pattern Recognition

      Page(s):
    1887-1895

    In this paper, we develop an image-based tracking algorithm of multiple vehicles performing effective detection and tracking of moving objects under adverse environmental conditions. In particular, we employ low cost commercial off-the-shelf IR or CCD image sensor for generating continuous images of multiple moving vehicles. The motion in image sequences is first detected by adaptive background estimation and then tracked by Kalman filtering with the attribute information being updated by data association. Upon applying a modified Retinex procedure as preprocessing to reduce the illumination effects, we proceed with a two-step tracking algorithm. The first step achieves blob grouping and then judicially selects the true targets for tracking using data association through information registration. In the second stage, all blobs detected go through a validation for screening as well as for occlusion reasoning, and those found pertinent to the real object survive to become the 'Object' state for stable tracking. The results of representative tests confirm its effectiveness in vehicle tracking under both daylight and nighttime conditions while resolving occlusions.

  • Encoding of Still Pictures by Wavelet Transform with Vector Quantization Using a Rough Fuzzy Neural Network

    Shao-Han LIU  Jzau-Sheng LIN  

     
    PAPER-Image Processing, Image Pattern Recognition

      Page(s):
    1896-1902

    In this paper color image compression using a fuzzy Hopfield-model net based on rough-set reasoning is created to generate optimal codebook based on Vector Quantization (VQ) in Discrete Wavelet Transform (DWT). The main purpose is to embed rough-set learning scheme into the fuzzy Hopfield network to construct a compression system named Rough Fuzzy Hopfield Net (RFHN). First a color image is decomposed into 3-D pyramid structure with various frequency bands. Then the RFHN is used to create different codebooks for various bands. The energy function of RFHN is defined as the upper- and lower-bound fuzzy membership grades between training samples and codevectors. Finally, near global-minimum codebooks in frequency domain can be obtained when the energy function converges to a stable state. Therefore, only 32/N pixels are selected as the training samples if a 3N-dimensional color image was used. In the simulation results, the proposed network not only reduces the consuming time but also preserves the compression performance.

  • Measuring Errors on 3D Meshes Using Pixel Based Search

    Kohji INAGAKI  Masahiro OKUDA  Masaaki IKEHARA  Shin-ichi TAKAHASHI  

     
    PAPER-Computer Graphics

      Page(s):
    1903-1908

    Due to the explosive growth of the network technologies, 3D models and animations have led to a great interest in various media. Especially 3D mesh models (3D meshes), which approximate surfaces by polygonal meshes are widely used to model 3D objects. In 1D and 2D signals such as speech, audio, images, video, etc., the signal values are located on "grids", for example the signals of images are defined on pixels. Thus, the errors of such signals can be explicitly defined by differences of the values on the "grids". However since in the 3D meshes, vertices are located on arbitrary positions in a 3D space and are triangulated in arbitrary ways, the grids cannot be defined. This makes it difficult to measure error on the 3D meshes. In this paper, we propose a new numerical method to measure the errors between two different 3D meshes.

  • EEG Cortical Potential Imaging of Brain Electrical Activity by means of Parametric Projection Filters

    Junichi HORI  Bin HE  

     
    PAPER-Biocybernetics, Neurocomputing

      Page(s):
    1909-1920

    The objective of this study was to explore suitable spatial filters for inverse estimation of cortical potentials from the scalp electroencephalogram. The effect of incorporating noise covariance into inverse procedures was examined by computer simulations. The parametric projection filter, which allows inverse estimation with the presence of information on the noise covariance, was applied to an inhomogeneous three-concentric-sphere model under various noise conditions in order to estimate the cortical potentials from the scalp potentials. The present simulation results suggest that incorporation of information on the noise covariance allows better estimation of cortical potentials, than inverse solutions without knowledge about the noise covariance, when the correlation between the signal and noise is low. The method for determining the optimum regularization parameter, which can be applied for parametric inverse techniques, is also discussed.

  • A Deformable Surface Model Based on Boundary and Region Information for Pulmonary Nodule Segmentation from 3-D Thoracic CT Images

    Yoshiki KAWATA  Noboru NIKI  Hironobu OHMATSU  Noriyuki MORIYAMA  

     
    PAPER-Medical Engineering

      Page(s):
    1921-1930

    Accurately segmenting and quantifying pulmonary nodule structure is a key issue in three-dimensional (3-D) computer-aided diagnosis (CAD) schemes. This paper presents a nodule segmentation method from 3-D thoracic CT images based on a deformable surface model. In this method, first, a statistical analysis of the observed intensity is performed to measure differences between the nodule and other regions. Based on this analysis, the boundary and region information are represented by boundary and region likelihood, respectively. Second, an initial surface in the nodule is manually set. Finally, the deformable surface model moves the initial surface so that the surface provides high boundary likelihood and high posterior segmentation probability with respect to the nodule. For the purpose, the deformable surface model integrates the boundary and region information. This integration makes it possible to cope with inappropriate position or size of an initial surface in the nodule. Using the practical 3-D thoracic CT images, we demonstrate the effectiveness of the proposed method.

  • A Multipurpose Image Watermarking Method for Copyright Notification and Protection

    Zhe-Ming LU  Hao-Tian WU  Dian-Guo XU  Sheng-He SUN  

     
    LETTER-Applications of Information Security Techniques

      Page(s):
    1931-1933

    This paper presents an image watermarking method for two purposes: to notify the copyright owner with a visible watermark, and to protect the copyright with an invisible watermark. These two watermarks are embedded in different blocks with different methods. Simulation results show that the visible watermark is hard to remove and the invisible watermark is robust.

  • Energy Spectrum-Based Analysis of Musical Sounds Using Self-Organizing Map

    Masao MASUGI  

     
    LETTER-Speech and Hearing

      Page(s):
    1934-1938

    This paper describes a method of analyzing musical sound using a self-organizing map. To take compound factors into account, energy spectra whose frequency ranges were based on the psycho-acoustic experiments were used as input data. Results for music compact discs confirmed that our method could effectively display the positioning and relationships among musical sounds on a map.