1-3hit |
Yukiko I. NAKANO Toshiyasu MURAYAMA Toyoaki NISHIDA
In story-based communication, where a message is conveyed in story form, it is important to embody the story with expressive materials. However, it is quite difficult for users to create rich multimedia contents using multimedia editing tools. This paper proposes a web-based multimedia environment, SPOC (Stream-oriented Public Opinion Channel), aiming at helping non-skillful people to convert their stories into TV-like programs very easily. The system can produce a digital camera work for graphics and video clips as well as generate an agent animation automatically according to a narration text. Findings in evaluation experiments showed that SPOC is easy-to-use and easy-to-learn for novice users. Given a short instruction, the subjects not only mastered the operations of the software, but also succeeded in creating highly original programs. In subjective evaluation, the subjects answered that they enjoyed using the software without feeling difficulty. These results suggest that this system reduces user's cost in making a program, and encourages communication in a network community.
Yasuyuki NAKAJIMA Masaru SUGANO
Scalabilities of bit rate and coding format in coded multimedia contents have become very important for the efficient use of network bandwidth and storage capacity with the recent availability of a wide variety of bandwidth and storage media. However, the conventional approach uses decompression and recompression processes to realize the above scalabilities, which require very expensive computations. In addition, a very large cache space is required for storing the decoded audio-video data. This paper describes three fast scalability methods for MPEG audio and video data, MPEG audio/video bit rate conversion and MPEG format conversion, in order to address these problems. As for the first scalability, MPEG audio coding bit rate conversions, we describe subband domain conversion using bandwidth limitation, requantization and a requantization reflecting phychoacoustic model. Four types of MPEG video bit rate conversion are described that use bandwidth limitation, out-loop requantization, in-loop requantization, and hybrid requantization. As for the format conversion, the fast baseband domain format conversion is performed using coding information such as motion vectors and coding types extracted from input coded video. The experimental results of several comparisons with the above scalabilities and conventional transcoding methods are also shown.
Yoshiyuki HARA Tsuneo NITTA Hiroyoshi SAITO Ken'ichiro KOBAYASHI
Text-to-speech synthesis (TTS) is currently one of the most important media conversion techniques. In this paper, we describe a Japanese TTS card developed for constructing a personal-computer-based multimedia platform, and a TTS software package developed for a workstation-based multimedia platform. Some applications of this hardware and software are also discussed. The TTS consists of a linguistic processing stage for converting text into phonetic and prosodic information, and a speech processing stage for producing speech from the phonetic and prosodic symbols. The linguistic processing stage uses morphological analysis, rewriting rules for accent movement and pause insertion, and other techniques to impart correct accentuation and a natural-sounding intonation to the synthesized speech. The speech processing stage employs the cepstrum method with consonant-vowel (CV) syllables as the synthesis unit to achieve clear and smooth synthesized speech. All of the processing for converting Japanese text (consisting of mixed Japanese Kanji and Kana characters) to synthesized speech is done internally on the TTS card. This allows the card to be used widely in various applications, including electronic mail and telephone service systems without placing any processing burden on the personal computer. The TTS software was used for an E-mail reading tool on a workstation.