The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SPE(2504hit)

2441-2460hit(2504hit)

  • Hybrid Photonic-Microwave Systems and Devices

    Peter R. HERCZFELD  

     
    INVITED PAPER

      Vol:
    E76-C No:2
      Page(s):
    191-197

    Research in optical microwave interaction, at its earlier stages, was spured by the desire to make an optically fed and controlled phased array antenna with monolithic microwave integrated circuit (MMIC) transmit/receive (T/R) modules. In the first part of this paper experimental results are presented demonstrating an optically fed phased array antenna operating at C-band in the 5.5 to 5.8 GHz frequency range. The present system consists of two optically fed 14 subarrays with MMIC based active T/R modules. Custom designed fiber optic links have been employed to provide distribution of data and frequency reference signals to phased array antenna. One of the challenges of the future is the development of better interfaces between electronic (microwave) and optical components, including the chip level merging of photonic and electronic components on III-V compounds. This aspect of the research is covered in the second half of the paper.

  • Cascaded Co-Channel Interference Cancelling and Diversity Combining for Spread-Spectrum Multi-Access over Multipath Fading Channels

    Young C. YOON  Ryuji KOHNO  Hideki IMAI  

     
    LETTER

      Vol:
    E76-B No:2
      Page(s):
    163-168

    We propose a direct-sequence spread-spectrum multi-access (DS/SSMA) receiver that incorporates multipath diversity combining and multistage co-channel interference (CCI) cancellation. This receiver structure which is more resistant to the near/far problem essentially removes more and more of the CCI with each successive cancellation stage. With the assumption that perfect channel estimates have been obtained, we analyze the bit error rate (BER) performance of this system when received powers are unequal. Results show that the BER can approach that of a single-user case as the number of CCI cancellation stages increases.

  • Speaker Weighted Training of HMM Using Multiple Reference Speakers

    Hiroaki HATTORI  Satoshi NAKAMURA  Kiyohiro SHIKANO  Shigeki SAGAYAMA  

     
    PAPER-Speech Processing

      Vol:
    E76-D No:2
      Page(s):
    219-226

    This paper proposes a new speaker adaptation method using a speaker weighting technique for multiple reference speaker training of a hidden Markov model (HMM). The proposed method considers the similarities between an input speaker and multiple reference speakers, and use the similarities to control the influence of the reference speakers upon HMM. The evaluation experiments were carried out through the/b, d, g, m, n, N/phoneme recognition task using 8 speakers. Average recognition rates were 68.0%, 66.4%, and 65.6% respectively for three test sets which have different speech styles. These were 4.8%, 8.8%, and 10.5% higher than the rates of the spectrum mapping method, and also 1.6%, 6.7%, and 8.2% higher than the rates of the multiple reference speaker training, the supplemented HMM. The evaluation experiments clarified the effectiveness of the proposed method.

  • Design and Creation of Speech and Text Corpora of Dialogue

    Satoru HAYAMIZU  Shuichi ITAHASHI  Tetsunori KOBAYASHI  Toshiyuki TAKEZAWA  

     
    INVITED PAPER

      Vol:
    E76-D No:1
      Page(s):
    17-22

    This paper describes issues on dialogue corpora for speech and natural language research. Speech and text corpora of dialogue have recently become more important for the development and the evaluation of speech and text-based dialogue systems. However, the design and the construction of dialogue corpora themselves still remain research issues and many problems have not yet been clarified. Many kinds of corpus are necessary to study various aspects of dialogues. On the other hand, each corpus should contain a certain quantity for each purpose in order to make it statistically meaningful. This paper presents the issues related with design and creation of dialogue corpora; the selection of a task domain, transcription conventions, situations for the collection, syntactic and semantic ill-formedness, and politeness. Future directions for dialogue corpora creation are also discussed.

  • MASCOTS II: A Dialog Manager in General Interface for Speech Input and Output

    Yoichi YAMASHITA  Hideaki YOSHIDA  Takashi HIRAMATSU  Yasuo NOMURA  Riichiro MIZOGUCHI  

     
    PAPER

      Vol:
    E76-D No:1
      Page(s):
    74-83

    This paper describes a general interface system for speech input and output and a dialog management system, MASCOTS, which is a component of the interface system. The authors designed this interface system, paying attention to its generality; that is, it is not dependent on the problem-solving system it is connected to. The previous version of MASCOTS dealt with the dialog processing only for the speech input based on the SR-plans. We extend MASCOTS to cover the speech output to the user. The revised version of MASCOTS, named MASCOTS II, makes use of topic information given by the topic packet network (TPN) which models the topic transitions in dialogs. Input and output messages are described with the concept representation based on the case structure. For the speech input, prediction of user's utterance is focused and enhanced by using the TPN. The TPN compensates for the shortages of the SR-plan and improves the accuracy of prediction as to stimulus utterances of the user. As the dialog processing in the speech output, MASCOTS II extracts emphatic words and restores missing words to the output message if necessary, e.g., in order to notify the results of speech recognition. The basic mechanisms of the SR-plan and the TPN are shared between the speech input and output processes in MASCOTS II.

  • A Linguistic Procedure for an Extension Number Guidance System

    Naomi INOUE  Izuru NOGAITO  Masahiko TAKAHASHI  

     
    PAPER

      Vol:
    E76-D No:1
      Page(s):
    106-111

    This paper describes the linguistic procedure of our speech dialogue system. The procedure is composed of two processes, syntactic analysis using a finite state network, and discourse analysis using a plan recognition model. The finite state network is compiled from regular grammar. The regular grammar is described in order to accept sentences with various styles, for example ellipsis and inversion. The regular grammar is automatically generated from the skeleton of the grammar. The discourse analysis module understands the utterance, generates the next question for users and also predicts words which will be in the next utterance. For an extension number guidance task, we obtained correct recognition results for 93% of input sentences without word prediction and for 98% if prediction results include proper words.

  • A Real-Time Speech Dialogue System Using Spontaneous Speech Understanding

    Yoichi TAKEBAYASHI  Hiroyuki TSUBOI  Hiroshi KANAZAWA  Yoichi SADAMOTO  Hideki HASHIMOTO  Hideaki SHINCHI  

     
    PAPER

      Vol:
    E76-D No:1
      Page(s):
    112-120

    This paper describes a task-oriented speech dialogue system based on spontaneous speech understanding and response generation (TOSBURG). The system has been developed for a fast food ordering task using speaker-independent keyword-based spontaneous speech understanding. Its purpose being to understand the user's intention from spontaneous speech, the system consists of a noise-robust keyword-spotter, a semantic keyword lattice parser, a user-initiated dialogue manager and a multimodal response generator. After noise immunity keyword-spotting is performed, the spotted keyword candidates are analyzed by a keyword lattice parser to extract the semantic content of the input speech. Then, referring to the dialogue history and context, the dialogue manager interprets the semantic content of the input speech. In cases where the interpretation is ambiguous or uncertain, the dialogue manager invites the user to confirm verbally the system's understanding of the speech input. The system's response to the user throughout the dialogue is multimodal; that is, several modes of communication (synthesized speech, text, animated facial expressions and ordered food items) are used to convey the system's state to the user. The object here is to emulate the multimodal interaction that occurs between humans, and so achieve more natural and efficient human-computer interaction. The real-time dialogue system has been constructed using two general purpose workstations and four DSP accelerators (520MFLOPS). Experimental results have shown the effectiveness of the newly developed speech dialogue system.

  • LR Parsing with a Category Reachability Test Applied to Speech Recognition

    Kenji KITA  Tsuyoshi MORIMOTO  Shigeki SAGAYAMA  

     
    PAPER

      Vol:
    E76-D No:1
      Page(s):
    23-28

    In this paper, we propose an extended LR parsing algorithm, called LR parsing with a category reachability test (the LR-CRT algorithm). The LR-CRT algorithm enables a parser to efficiently recognize those sentences that belong to a specified grammatical category. The key point of the algorithm is to use an augmented LR parsing table in which each action entry contains a set of reachable categories. When executing a shift or reduce action, the parser checks whether the action can reach a given category using the augmented table. We apply the LR-CRT algorithm to improve a speech recognition system based on two-level LR parsing. This system uses two kinds of grammars, inter- and intra-phrase grammars, to recognize Japanese sentential speech. Two-level LR parsing guides the search of speech recognition through two-level symbol prediction, phrase category prediction and phone prediction, based on these grammars. The LR-CRT algorithm makes possible the efficient phone prediction based on the phrase category prediction. The system was evaluated using sentential speech data uttered phrase by phrase, and attained a word accuracy of 97.5% and a sentence accuracy of 91.2%

  • Predicting the Next Utterance Linguistic Expressions Using Contextual Information

    Hitoshi IIDA  Takayuhi YAMAOKA  Hidekazu ARITA  

     
    PAPER

      Vol:
    E76-D No:1
      Page(s):
    62-73

    A context-sensitive method to predict linguistic expressions in the next utterance in inquiry dialogues is proposed. First, information of the next utterance, the utterance type, the main action and the discourse entities, can be grasped using a dialogue interpretation model. Secondly, focusing in particular on dialogue situations in context, a domain-dependent knowledge-base for literal usage of both noun phrases and verb phrases is developed. Finally, a strategy to make a set of linguistic expressions which are derived from semantic concepts consisting of appropriate expressions can be used to select the correct candidate from the speech recognition output. In this paper, some of the processes are particularly examined in which sets of polite expressions, vocatives, compound nominal phrases, verbal phrases, and intention expressions, which are common in telephone inquiry dialogue, are created.

  • A Spoken Dialog System with Verification and Clarification Queries

    Mikio YAMAMOTO  Satoshi KOBAYASHI  Yuji MORIYA  Seiichi NAKAGAWA  

     
    PAPER

      Vol:
    E76-D No:1
      Page(s):
    84-94

    We studied the manner of clarification and verification in real dialogs and developed a spoken dialog system that can cope with the disambiguation of meanings of user input utterances. We analyzed content, query types and responses of human clarification queries. In human-human communications, ten percent of all sentences are concerned with meaning clarification. Therefore, in human-machine communications, we believe it is important that the machine verifies ambiguities occurring in dialog processing. We propose an architecture for a dialog system with this capability. Also, we have investigated the source of ambiguities in dialog processing and methods of dialog clarification for each part of the dialog system.

  • Task Adaptation in Syllable Trigram Models for Continuous Speech Recognition

    Sho-ichi MATSUNAGA  Tomokazu YAMADA  Kiyohiro SHIKANO  

     
    PAPER

      Vol:
    E76-D No:1
      Page(s):
    38-43

    In speech recognition systeme dealing with unlimited vocabulary and based on stochastic language models, when the target recognition task is changed, recognition performance decreases because the language model is no longer appropriate. This paper describes two approaches for adapting a specific/general syllable trigram model to a new task. One uses a amall amount of text data similar to the target task, and the other uses supervised learning using the most recent input phrases and similar text. In this paper, these adaptation methods are called preliminary learning" and successive learning", respectively. These adaptation are evaluated using syllable perplexity and phrase recognition rates. The perplexity was reduced from 24.5 to 14.3 for the adaptation using 1000 phrases of similar text by preliminary learning, and was reduced to 12.1 using 1000 phrases including the 100 most recent phrases by successive learning. The recognition rates were also improved from 42.3% to 51.3% and 52.9%, respectively. Text similarity for the approaches is also studied in this paper.

  • Prospects for Advanced Spoken Dialogue Processing

    Hitoshi IIDA  

     
    INVITED PAPER

      Vol:
    E76-D No:1
      Page(s):
    2-8

    This paper discusses the problems facing spoken dialogue processing and the prospects for future improvements. Research on elemental topics like speech recognition, speech synthesis and language understanding has led to improvements in the accuracy and sophistication of each area of study. First, in order to handle a spoken dialogue, we show the necessity for information exchanges between each area of processing as seen through the analysis of spoken dialogue characteristics. Second, we discuss how to integrate those processes and show that the memory-basad approach to spontaneous speech interpretation offers a solution to the problem of process integration. The key to this is setting up a mental state affected by both speech and linguistic information. Finally, we discuss how those mental states are structured and a method for constructing them.

  • How Might One Comfortably Converse with a Machine ?

    Yasuhisa NIIMI  

     
    INVITED PAPER

      Vol:
    E76-D No:1
      Page(s):
    9-16

    Progress of speech recognition based on the hidden Markov model has made it possible to realize man-machine dialogue systems capable of operating in real time. In spite of considerable effort, however, few systems have been successfully developed because of the lack of appropriate dialogue models. This paper reports on some of technology necessary to develop a dialogue system with which one can converse comfortably. The emphasis is placed on the following three points: how a human converses with a machine; how errors of speech recognition can be recovered through conversation; and what it means for a machine to be cooperative. We examine the first problem by investigating dialogues between human speakers, and dialogues between a human speaker and a simulated machine. As a consideration in the design of dialogue control, we discuss the relation between efficiency and cooperativeness of dialogue, the method for confirming what the machine has recognized, and dynamic adaptation of the machine. Thirdly, we review the research on the friendliness of a natural language interface, mainly concerning the exchange of initiative, corrective and suggestive answers, and indirect questions. Lastly, we describe briefly the current state of the art in speech recognition and synthesis, and suggest what should be done for acceptance of spontaneous speech and production of a voice suitable to the output of a dialogue system.

  • A Dialogue Processing System for Speech Response with High Adaptability to Dialogue Topics

    Yasuharu ASANO  Keikichi HIROSE  

     
    PAPER

      Vol:
    E76-D No:1
      Page(s):
    95-105

    A system is constructed for the processing of question-answer dialogue as a subsystem of the speech response device. In order to increase the adaptability to dialogue topics, rules for dialogue processing are classified into three groups; universal rules, topic-dependent rules and task-dependent rules, and example-based description is adopted for the second group. The system is disigned to operate only with information on the content words of the user input. As for speech synthesis, a function is included in the system to control the focal position. Introduction and guidance of ski areas are adopted as the dialogue domain, and a prototype system is realized on a computer. The dialogue example performed with the prototype indicates the propriety of our method for dialogue processing.

  • A Unification-Based Japanese Parser for Speech-to-Speech Translation

    Masaaki NAGATA  Tsuyoshi MORIMOTO  

     
    PAPER

      Vol:
    E76-D No:1
      Page(s):
    51-61

    A unification-based Japanese parser has been implemented for an experimental Japanese-to-English spoken language translation system (SL-TRANS). The parser consists of a unification-based spoken-style Japanese grammar and an active chart parser. The grammar handles the syntactic, semantic, and pragmatic constraints in an integrated fashion using HPSG-based framework in order to cope with speech recognition errors. The parser takes multiple sentential candidates from the HMM-LR speech recognizer, and produces a semantic representation associated with the best scoring parse based on acoustic and linguistic plausibility. The unification-based parser has been tested using 12 dialogues in the conference registration domain, which include 261 sentences uttered by one male speaker. The sentence recognition accuracy of the underlying speech recognizer is 73.6% for the top candidate, and 83.5% for the top three candidates, where the test-set perplexity of the CFG grammar is 65. By ruling out erroneous speech recognition results using various linguistic constraints, the parser improves the sentence recognition accuracy up to 81.6% for the top candidate, and 85.8% for the top three candidates. From the experiment result, we found that the combination of syntactic restriction, selectional restriction and coordinate structure restriction can provide a sufficient restriction to rule out the recognition errors between case-marking particles with the same vowel, which are the type of errors most likely to occur. However, we also found that it is necessary to use pragmatic information, such as topic, presupposition, and discourse structure, to rule out the recognition errors involved with topicalizing particles and sentence final particles.

  • System Design, Data Collection and Evaluation of a Speech Dialogue System

    Katunobu ITOU  Satoru HAYAMIZU  Kazuyo TANAKA  Hozumi TANAKA  

     
    PAPER

      Vol:
    E76-D No:1
      Page(s):
    121-127

    This paper describes design issues of a speech dialogue system, the evaluation of the system, and the data collection of spontaneous speech in a transportation guidance domain. As it is difficult to collect spontaneous speech and to use a real system for the collection and evaluation, the phenomena related with dialogues have not been quantitatively clarified yet. The authors constructed a speech dialogue system which operates in almost real time, with acceptable recognition accuracy and flexible dialogue control. The system was used for spontaneous speech collection in a transportation guidance domain. The system performance evaluated in the domain is the understanding rate of 84.2% for the utterances within the predefined grammar and the lexicon. Also some statistics of the spontaneous speech collected are given.

  • Methods to Securely Realize Caller-Authenticated and Callee-Specified Telephone Calls

    Tomoyuki ASANO  Tsutomu MATSUMOTO  Hideki IMAI  

     
    PAPER

      Vol:
    E76-A No:1
      Page(s):
    88-95

    This paper presents two methods for securely realizing caller-authenticated and callee-specified calls over telecommunication networks with terminals that accept IC cards having KPS-based cryptographic functions. In the proposed protocols, users can verify that the partner is the proper owner of a certain ID or a certain pen name. Users' privacy is protected even if they do the caller-authenticated and callee-specified calls and do not pay their telephone charge in advance.

  • Three Different LR Parsing Algorithms for Phoneme-Context-Dependent HMM-Based Continuous Speech Recognition

    Akito NAGAI  Shigeki SAGAYAMA  Kenji KITA  Hideaki KIKUCHI  

     
    PAPER

      Vol:
    E76-D No:1
      Page(s):
    29-37

    This paper discusses three approaches for combining an efficient LR parser and phoneme-context-dependent HMMs and compares them through continuous speech recognition experiments. In continuous speech recognition, phoneme-context-dependent allophonic models are considered very helpful for enhancing the recognition accuracy. They precisely represent allophonic variations caused by the difference in phoneme-contexts. With grammatical constraints based on a context free grammar (CFG), a generalized LR parser is one of the most efficient parsing algorithms for speech recognition. Therefore, the combination of allophonic models and a generalized LR parser is a powerful scheme enabling accurate and efficient speech recognition. In this paper, three phoneme-context-dependent LR parsing algorithms are proposed, which make it possible to drive allophonic HMMs. The algorithms are outlined as follows: (1) Algorithm for predicting the phonemic context dynamically in the LR parser using a phoneme-context-independent LR table. (2) Algorithm for converting an LR table into a phoneme-context-dependent LR table. (3) Algorithm for converting a CFG into a phoneme-context-dependent CFG. This paper also includes discussion of the results of recognition experiments, and a comparison of performance and efficiency of these three algorithms.

  • Transient Analysis of Packet Transmission Rate Control to Release Congestion in High Speed Networks

    Hiroshi INAI  Manabu KATO  Yuji OIE  Masayuki MURATA  Hideo MIYAHARA  

     
    PAPER

      Vol:
    E75-B No:12
      Page(s):
    1354-1366

    Rate based control is a promising way to achieve an efficient packet transmission especially in high speed packet switching networks where round trip delay is much larger than packet transmission time. Although inappropriate tuning for the parameters, increasing and decreasing factors, of the rate control function causes the performance degradation, most of the previous works so far have not studied the effect of the parameters on the performance. In this paper, we investigate the effect of the rate control parameters on the throughput under the condition that the packet loss probability is kept below a specific value, say 10-6. For this purpose, we build a queueing model and carry out a transient analysis to examine the dynamic behavior of the queue length at an intermediate node in a high speed network suffering from large propagation delay. Numerical examples exploit the optimal value of the parameters when one or two source-destination pairs transmit packets. We also discuss the effect of the propagation delay on the performance. Our model can be applicable to investigate the performance of various kinds of rate-based congestion control when the relation between the congestion measure and the rate control mechanism is given explicitly.

  • Spectral Structure of M/G/1 Systems: Asymptotic Behavior and Relaxation Time

    Julian KEILSON  Fumiaki MACHIHARA  Ushio SUMITA  

     
    INVITED PAPER

      Vol:
    E75-B No:12
      Page(s):
    1245-1254

    Let TBP be the server busy period of an M/G/1 queueing system characterized by arrival intensity λ and service time c.d.f. A(τ). In this paper, we investigate the regularity structure of the Laplace transform σBP(s)=E[] on the complex s-plane. It is shown, under certain broad conditions, that finite singular points of σBP(s) are all branch points. Furthermore the branch point s0 having the greatest real part is always purely negative and is of multiplicity two. The basic branch point s0 and the associated complex structure provide a basis for an asymptotic representation of various descriptive distributions of interest. For a natural relaxation time |s0|-1 of the M/G/1 system, some useful bounds are obtained and the asymptotic behavior as traffic intensity approaches one is also discussed. Detailed results of engineering value are provided for two important classes of service time distributions, the completely monotone class and the Erlang class.

2441-2460hit(2504hit)