1-7hit |
This paper addresses the novel task of detecting chorus sections in English and Japanese lyrics text. Although chorus-section detection using audio signals has been studied, whether chorus sections can be detected from text-only lyrics is an open issue. Another open issue is whether patterns of repeating lyric lines such as those appearing in chorus sections depend on language. To investigate these issues, we propose a neural-network-based model for sequence labeling. It can learn phrase repetition and linguistic features to detect chorus sections in lyrics text. It is, however, difficult to train this model since there was no dataset of lyrics with chorus-section annotations as there was no prior work on this task. We therefore generate a large amount of training data with such annotations by leveraging pairs of musical audio signals and their corresponding manually time-aligned lyrics; we first automatically detect chorus sections from the audio signals and then use their temporal positions to transfer them to the line-level chorus-section annotations for the lyrics. Experimental results show that the proposed model with the generated data contributes to detecting the chorus sections, that the model trained on Japanese lyrics can detect chorus sections surprisingly well in English lyrics, and that patterns of repeating lyric lines are language-independent.
Go FUJII Masahiro UKIBE Shigetomo SHIKI Masataka OHKUBO
Superconducting tunnel junction (STJ) array detectors can exhibit excellent performance with respect to energy resolution, detection efficiency, and counting rate in the soft X-ray energy range, by which those excellent properties STJ array detectors are well suited for detecting X-rays at synchrotron radiation facilities. However, in order to achieve a high throughput analysis for trace impurity elements such as dopants in structural or functional materials, the sensitive area of STJ array detectors should be further enlarged up to more than 10 times larger by increasing the pixel number in array detectors. In this work, for a large STJ-pixel number of up to 1000 within a 10,mm- square compact chip, we have introduced three-dimensional (3D) structure by embedding a wiring layer in a SiO$_{2}$ isolation layer underneath a base electrode layer of STJs. The 3D structure is necessary for close-packed STJ arrangement, avoiding overlay of lead wiring, which is common in conventional two-dimensional layout. The fabricated STJ showed excellent current-voltage characteristics having low subgap currents less than 2,nA, which are the same as those of conventional STJs. An STJ pixel has an energy resolution of 31,eV (FWHM) for C-K$alpha $ (277,eV).
Ryohei SASANO Daisuke KAWAHARA Sadao KUROHASHI
This paper reports the effect of corpus size on case frame acquisition for predicate-argument structure analysis in Japanese. For this study, we collect a Japanese corpus consisting of up to 100 billion words, and construct case frames from corpora of six different sizes. Then, we apply these case frames to syntactic and case structure analysis, and zero anaphora resolution, in order to investigate the relationship between the corpus size for case frame acquisition and the performance of predicate-argument structure analysis. We obtained better analyses by using case frames constructed from larger corpora; the performance was not saturated even with a corpus size of 100 billion words.
A new method for logical structure analysis of document images is proposed in this paper as the basis for a document reader which can extract logical information from various printed documents. The proposed system consists of five basic modules: text line classification, object recognition, object segmentation, object grouping, and object modification. Emergent computation, which is a key concept of artificial life, is adopted for the cooperative interaction among modules in the system in order to achieve effective and flexible behavior of the whole system. It has three principal advantages over other methods: adaptive system configuration for various and complex logical structures, robust document analysis tolerant of erroneous feature detection, and feedback of high-level logical information to the low-level physical process for accurate analysis. Experimental results obtained for 150 documents show that the method is adaptable, robust, and effective for various document structures.
Won Seug CHOI Harksoo KIM Jungyun SEO
Analysis of speech acts and discourse structures is essential to a dialogue understanding system because speech acts and discourse structures are closely tied with the speaker's intention. However, it has been difficult to infer a speech act and a discourse structure from a surface utterance because they highly depend on the context of the utterance. We propose a statistical dialogue analysis model to determine discourse structures as well as speech acts using a maximum entropy model. The model can automatically acquire probabilistic discourse knowledge from an annotated dialogue corpus. Moreover, the model can analyze speech acts and discourse structures in one framework. In the experiment, the model showed better performance than other previous works.
Kanad KEENI Hiroshi SHIMODAIRA Tetsuro NISHINO Yasuo TAN
Devanagari is the most widely used script in India. Here, a method is introduced for recognizing Devanagari characters using Neural network. The proposed method reduces the number of output unit necessary for a conventional neural network where the classification is based on a winner take all basis. An automatic coding procedure for representing the output layer of the network and a different method for the final classification is also proposed. Along with the automatic coding procedure, a heuristic method for representing the output units by exploiting the structural information of Devanagari character is also demonstrated. Besides, it has been shown by random representation of the output layer that the representation effects the generalization/performance of the network. The proposed automatic representation gave the recognition rate of 98.09% for 44 categories.
Table-form document structure analysis is an important problem in the document processing domain. This paper presents a new method called Box-Driven Reasoning (BDR) to robustly analyze the structure of table-form documents that include touching characters and broken lines. Real documents are copied repeatedly and overlaid with printed data, resulting in characters that touch cells and lines that are broken. Most previous methods employ a line-oriented approach, but touching characters and broken lines make the procedure fail at an early stage. BDR deals with regions directly in contrast with other previous methods and a reduced resolution image is introduced to supplement information deteriorated by noise. Experimental tests show that BDR reliably recognizes cells and strings in document images with touching characters and broken lines.