1-4hit |
Xiaohong YANG Mingxing XU Yufang YANG
The research reported in this paper is an attempt to elucidate the predictors of pause duration in read-aloud discourse. Through simple linear regression analysis and stepwise multiple linear regression, we examined how different factors (namely, syntactic structure, discourse hierarchy, topic structure, preboundary length, and postboundary length) influenced pause duration both separately and jointly. Results from simple regression analysis showed that discourse hierarchy, syntactic structure, topic structure, and postboundary length had significant impacts on boundary pause duration. However, when these factors were tested in a stepwise regression analysis, only discourse hierarchy, syntactic structure, and postboundary length were found to have significant impacts on boundary pause duration. The regression model that best predicted boundary pause duration in discourse context was the one that first included syntactic structure, and then included discourse hierarchy and postboundary length. This model could account for about 80% of the variance of pause duration. Tests of mediation models showed that the effects of topic structure and discourse hierarchy were significantly mediated by syntactic structure, which was most closely correlated with pause duration. These results support an integrated model combining the influence of several factors and can be applied to text-to-speech systems.
Seongyong KIM Kong-Joo LEE Key-Sun CHOI
We propose a normalization scheme of syntactic structures using a binary phrase structure grammar with composite labels. The normalization adopts binary rules so that the dependency between two sub-trees can be represented in the label of the tree. The label of a tree is composed of two attributes, each of which is extracted from each sub-tree, so that it can represent the compositional information of the tree. The composite label is generated from part-of-speech tags using an automatic labelling algorithm. Since the proposed normalization scheme is binary and uses only part-of-speech information, it can readily be used to compare the results of different syntactic analyses independently of their syntactic description and can be applied to other languages as well. It can also be used for syntactic analysis, which performs higher than the previous syntactic description for Korean corpus. We implement a tool that transforms a syntactic description into normalized one based on this proposed scheme. It can help construct a unified syntactic corpus and extract syntactic information from various types of syntactic corpus in a uniform way.
Tomoko OHSUGA Yasuo HORIUCHI Akira ICHIKAWA
In this study, we introduce a method for estimating the syntactic structure of Japanese speech from F0 contour and pause duration. We defined a prosodic unit (PU) which is divided by the local minimal point of an F0 contour or pause. Combining PUs repeatedly (a pair of PUs is combined into one PU), a tree structure is gradually generated. Which pair of PUs in a sequence of three PUs should be combined is decided by a discriminant function based on the discriminant analysis of a corpus of speech data. We applied the method to the ATR Phonetically Balanced Sentences read by four Japanese speakers. We found that with this method, the correct rate of judgement for each sequence of three PUs is 79% and the estimation accuracy of the entire syntactic structure for each sentence is 26%. We consider this result to demonstrate a good degree of accuracy for the difficult task of estimating syntactic structure only from prosody.
Hiroya FUJISAKI Keikichi HIROSE Noboru TAKAHASHI
Prosodic features of the spoken Japanese play an important role in the transmission of linguistic information concerning the lexical word accent, the sentence structure and the discourse structure. In order to construct prosodic rules for synthesizing high-quality speech, therefore, prosodic features of speech should be quantitatively analyzed with respect to the linguistic information. With a special focus on the fundamental frequency contour, we first define four prosodic units for the spoken Japanese, viz., prosodic word, prosodic phrase, prosodic clause and prosodic sentence, based on a decomposition of the fundamental frequency contour using a functional model for the generation process. Syntactic units are also introduced which have rough correspondence to these prosodic units. The relationships between the linguistic information and the characteristics of the components of the fundamental frequency contour are then described on the basis of results obtained by the analysis of two sets of speech material. Analysis of weathercast and newscast sentences showed that prosodic boundaries given by the manner of continuation/termination of phrase components fall into three categories, and are primarily related to the syntactic boundaries. On the other hand, analysis of noun phrases with various combinations of word accent types, syntactic structures, and focal conditions, indicated that the magnitude and the shape of the accent components, which of course reflect the information concerning the lexical accent types of constituent words, are largely influenced by the focal structure. The results also indicated that there are cases where prosody fails to meet all the requirements presented by word accent, syntax and discourse.