1-3hit |
Munekazu DATE Shinya SHIMIZU Hideaki KIMATA Dan MIKAMI Yoshinori KUSACHI
3D video contents depend on the shooting condition, which is camera positioning. Depth range control in the post-processing stage is not easy, but essential as the video from arbitrary camera positions must be generated. If light field information can be obtained, video from any viewpoint can be generated exactly and post-processing is possible. However, a light field has a huge amount of data, and capturing a light field is not easy. To compress data quantity, we proposed the visually equivalent light field (VELF), which uses the characteristics of human vision. Though a number of cameras are needed, VELF can be captured by a camera array. Since camera interpolation is made using linear blending, calculation is so simple that we can construct a ray distribution field of VELF by optical interpolation in the VELF3D display. It produces high image quality due to its high pixel usage efficiency. In this paper, we summarize the relationship between the characteristics of human vision, VELF and VELF3D display. We then propose a method to control the depth range for the observed image on the VELF3D display and discuss the effectiveness and limitations of displaying the processed image on the VELF3D display. Our method can be applied to other 3D displays. Since the calculation is just weighted averaging, it is suitable for real-time applications.
Masayuki SUZUKI Ryo KUROIWA Keisuke INNAMI Shumpei KOBAYASHI Shinya SHIMIZU Nobuaki MINEMATSU Keikichi HIROSE
When synthesizing speech from Japanese text, correct assignment of accent nuclei for input text with arbitrary contents is indispensable in obtaining naturally-sounding synthetic speech. A phenomenon called accent sandhi occurs in utterances of Japanese; when a word is uttered in a sentence, its accent nucleus may change depending on the contexts of preceding/succeeding words. This paper describes a statistical method for automatically predicting the accent nucleus changes due to accent sandhi. First, as the basis of the research, a database of Japanese text was constructed with labels of accent phrase boundaries and accent nucleus positions when uttered in sentences. A single native speaker of Tokyo dialect Japanese annotated all the labels for 6,344 Japanese sentences. Then, using this database, a conditional-random-field-based method was developed using this database to predict accent phrase boundaries and accent nuclei. The proposed method predicted accent nucleus positions for accent phrases with 94.66% accuracy, clearly surpassing the 87.48% accuracy obtained using our rule-based method. A listening experiment was also conducted on synthetic speech obtained using the proposed method and that obtained using the rule-based method. The results show that our method significantly improved the naturalness of synthetic speech.
Shoichiro TAKEDA Megumi ISOGAI Shinya SHIMIZU Hideaki KIMATA
Phase-based video magnification methods can magnify and reveal subtle motion changes invisible to the naked eye. In these methods, each image frame in a video is decomposed into an image pyramid, and subtle motion changes are then detected as local phase changes with arbitrary orientations at each pixel and each pyramid level. One problem with this process is a long computational time to calculate the local phase changes, which makes high-speed processing of video magnification difficult. Recently, a decomposition technique called the Riesz pyramid has been proposed that detects only local phase changes in the dominant orientation. This technique can remove the arbitrariness of orientations and lower the over-completeness, thus achieving high-speed processing. However, as the resolution of input video increases, a large amount of data must be processed, requiring a long computational time. In this paper, we focus on the correlation of local phase changes between adjacent pyramid levels and present a novel decomposition technique called the local Riesz pyramid that enables faster phase-based video magnification by automatically processing the minimum number of sufficient local image areas at several pyramid levels. Through this minimum pyramid processing, our proposed phase-based video magnification method using the local Riesz pyramid achieves good magnification results within a short computational time.