1-5hit |
Kazuya TAKEDA Hiroshi FUJIMURA Katsunobu ITOU Nobuo KAWAGUCHI Shigeki MATSUBARA Fumitada ITAKURA
In this paper, we discuss the construction of a large in-car spoken dialogue corpus and the result of its analysis. We have developed a system specially built into a Data Collection Vehicle (DCV) which supports the synchronous recording of multichannel audio data from 16 microphones that can be placed in flexible positions, multichannel video data from 3 cameras, and vehicle related data. Multimedia data has been collected for three sessions of spoken dialogue with different modes of navigation, during approximately a 60 minute drive by each of 800 subjects. We have characterized the collected dialogues across the three sessions. Some characteristics such as sentence complexity and SNR are found to differ significantly among the sessions. Linear regression analysis results also clarify the relative importance of various corpus characteristics.
Yohei IWASAKI Nobuo KAWAGUCHI Yasuyoshi INAGAKI
In this paper, we propose an advanced location-based service that we call a direction-based service, which utilizes both the position and direction of a user. The direction-based service enables a user to point to an object of interest for command or investigation. We also describe the design, implementation and evaluations of a direction-based service system named Azim. With this system, the direction of the user can be obtained by a magnetic-based direction sensor. The sensor is also used for azimuth-based position estimation, in which a user's position is estimated by having the user point to and measure azimuths of several markers or objects whose positions are already known. Because this approach does not require any other accurate position sensors or positive beacons, it can be deployed cost-effectively. Also, because the measurements are naturally associated with some degree of error, the position is calculated as a probability distribution. The calculation considers the error of direction measurement and the pre-obtained field information such as obstacles and magnetic field disturbance, which enables robust position measurements even in geomagnetically disturbed environments. For wide-area use, the system also utilizes a wireless LAN to obtain rough position information by identifying base stations. We have implemented a prototype system for the proposed method and some applications for the direction-based services. Furthermore, we have conducted experiments both indoors and outdoors, and exemplified that positioning accuracy by the proposed method is precise enough for a direction-based service.
Tomohiro OHNO Shigeki MATSUBARA Nobuo KAWAGUCHI Yasuyoshi INAGAKI
Spontaneously spoken Japanese includes a lot of grammatically ill-formed linguistic phenomena such as fillers, hesitations, inversions, and so on, which do not appear in written language. This paper proposes a novel method of robust dependency parsing using a large-scale spoken language corpus, and evaluates the availability and robustness of the method using spontaneously spoken dialogue sentences. By utilizing stochastic information about the appearance of ill-formed phenomena, the method can robustly parse spoken Japanese including fillers, inversions, or dependencies over utterance units. Experimental results reveal that the parsing accuracy reached 87.0%, and we confirmed that it is effective to utilize the location information of a bunsetsu, and the distance information between bunsetsus as stochastic information.
Hiroya MURAO Nobuo KAWAGUCHI Shigeki MATSUBARA Yasuyoshi INAGAKI
This paper proposes a new method of example-based query generation for spontaneous speech. Along with modeling the information flows of human dialogues, the authors have designed a system that allows users to retrieve information while driving a car. The system refers to the dialogue corpus to find an example that is similar to input speech, and it generates a query from the example. The experimental results for the prototype system show that 1) for transcribed text input, it provides the correct query in about 64% of cases and the partially collect query in about 88% 2) it has the ability to create correct queries for the utterances not including keywords, compared with the conventional keyword extraction method.
Nobuo KAWAGUCHI Shigeki MATSUBARA Kazuya TAKEDA Fumitada ITAKURA
CIAIR, Nagoya University, has been compiling an in-car speech database since 1999. This paper discusses the basic information contained in this database and an analysis on the effects of driving status based on the database. We have developed a system called the Data Collection Vehicle (DCV), which supports synchronous recording of multi-channel audio data from 12 microphones which can be placed throughout the vehicle, multi-channel video recording from three cameras, and the collection of vehicle-related data. In the compilation process, each subject had conversations with three types of dialog system: a human, a "Wizard of Oz" system, and a spoken dialog system. Vehicle information such as speed, engine RPM, accelerator/brake-pedal pressure, and steering-wheel motion were also recorded. In this paper, we report on the effect that driving status has on phenomena specific to spoken language