Hiroaki AKUTSU Ko ARAI
Lanxi LIU Pengpeng YANG Suwen DU Sani M. ABDULLAHI
Xiaoguang TU Zhi HE Gui FU Jianhua LIU Mian ZHONG Chao ZHOU Xia LEI Juhang YIN Yi HUANG Yu WANG
Yingying LU Cheng LU Yuan ZONG Feng ZHOU Chuangao TANG
Jialong LI Takuto YAMAUCHI Takanori HIRANO Jinyu CAI Kenji TEI
Wei LEI Yue ZHANG Hanfeng XIE Zebin CHEN Zengping CHEN Weixing LI
David CLARINO Naoya ASADA Atsushi MATSUO Shigeru YAMASHITA
Takashi YOKOTA Kanemitsu OOTSU
Xiaokang Jin Benben Huang Hao Sheng Yao Wu
Tomoki MIYAMOTO
Ken WATANABE Katsuhide FUJITA
Masashi UNOKI Kai LI Anuwat CHAIWONGYEN Quoc-Huy NGUYEN Khalid ZAMAN
Takaharu TSUBOYAMA Ryota TAKAHASHI Motoi IWATA Koichi KISE
Chi ZHANG Li TAO Toshihiko YAMASAKI
Ann Jelyn TIEMPO Yong-Jin JEONG
Haruhisa KATO Yoshitaka KIDANI Kei KAWAMURA
Jiakun LI Jiajian LI Yanjun SHI Hui LIAN Haifan WU
Gyuyeong KIM
Hyun KWON Jun LEE
Fan LI Enze YANG Chao LI Shuoyan LIU Haodong WANG
Guangjin Ouyang Yong Guo Yu Lu Fang He
Yuyao LIU Qingyong LI Shi BAO Wen WANG
Cong PANG Ye NI Jia Ming CHENG Lin ZHOU Li ZHAO
Nikolay FEDOROV Yuta YAMASAKI Masateru TSUNODA Akito MONDEN Amjed TAHIR Kwabena Ebo BENNIN Koji TODA Keitaro NAKASAI
Yukasa MURAKAMI Yuta YAMASAKI Masateru TSUNODA Akito MONDEN Amjed TAHIR Kwabena Ebo BENNIN Koji TODA Keitaro NAKASAI
Kazuya KAKIZAKI Kazuto FUKUCHI Jun SAKUMA
Yitong WANG Htoo Htoo Sandi KYAW Kunihiro FUJIYOSHI Keiichi KANEKO
Waqas NAWAZ Muhammad UZAIR Kifayat ULLAH KHAN Iram FATIMA
Haeyoung Lee
Ji XI Pengxu JIANG Yue XIE Wei JIANG Hao DING
Weiwei JING Zhonghua LI
Sena LEE Chaeyoung KIM Hoorin PARK
Akira ITO Yoshiaki TAKAHASHI
Rindo NAKANISHI Yoshiaki TAKATA Hiroyuki SEKI
Chuzo IWAMOTO Ryo TAKAISHI
Chih-Ping Wang Duen-Ren Liu
Yuya TAKADA Rikuto MOCHIDA Miya NAKAJIMA Syun-suke KADOYA Daisuke SANO Tsuyoshi KATO
Yi Huo Yun Ge
Rikuto MOCHIDA Miya NAKAJIMA Haruki ONO Takahiro ANDO Tsuyoshi KATO
Koichi FUJII Tomomi MATSUI
Yaotong SONG Zhipeng LIU Zhiming ZHANG Jun TANG Zhenyu LEI Shangce GAO
Souhei TAKAGI Takuya KOJIMA Hideharu AMANO Morihiro KUGA Masahiro IIDA
Jun ZHOU Masaaki KONDO
Tetsuya MANABE Wataru UNUMA
Kazuyuki AMANO
Takumi SHIOTA Tonan KAMATA Ryuhei UEHARA
Hitoshi MURAKAMI Yutaro YAMAGUCHI
Jingjing Liu Chuanyang Liu Yiquan Wu Zuo Sun
Zhenglong YANG Weihao DENG Guozhong WANG Tao FAN Yixi LUO
Yoshiaki TAKATA Akira ONISHI Ryoma SENDA Hiroyuki SEKI
Dinesh DAULTANI Masayuki TANAKA Masatoshi OKUTOMI Kazuki ENDO
Kento KIMURA Tomohiro HARAMIISHI Kazuyuki AMANO Shin-ichi NAKANO
Ryotaro MITSUBOSHI Kohei HATANO Eiji TAKIMOTO
Genta INOUE Daiki OKONOGI Satoru JIMBO Thiem Van CHU Masato MOTOMURA Kazushi KAWAMURA
Hikaru USAMI Yusuke KAMEDA
Yinan YANG
Takumi INABA Takatsugu ONO Koji INOUE Satoshi KAWAKAMI
Fengshan ZHAO Qin LIU Takeshi IKENAGA
Naohito MATSUMOTO Kazuhiro KURITA Masashi KIYOMI
Tomohiro KOBAYASHI Tomomi MATSUI
Shin-ichi NAKANO
Ming PAN
Isao YAGI Yoshiaki TAKATA Hiroyuki SEKI
This paper proposes an event-based transition system called A-LTS. An A-LTS is a simple system consisting of two agents, a basic program and a monitor. The monitor observes the behavior of the basic program and if the behavior matches some pre-defined pattern, then the monitor interrupts the execution of the basic program and possibly triggers the execution of another specific program. An A-LTS models a common feature found in recent software technologies such as Aspect-Oriented Programming (AOP), history-based access control and active database. We investigate the expressive power of A-LTS and show that it is strictly stronger than finite state machines and strictly weaker than pushdown automata (PDA). This implies that the model checking problem for A-LTS is decidable. It is also shown that the expressive power of A-LTS, linear context-free grammar and deterministic PDA are mutually incomparable. We also discuss the relationship between A-LTS and pointcut/advice in AOP.
Existing vision substitute systems have insufficient spatial resolution to provide environmental information. To present detailed spatial information, we propose two stimulation methods to enhance transfer information using a 2-D tactile stimulator array. First, stimulators are divided into several groups. Since each stimulator group is activated alternately, the interval of stimulations can be shortened to less than the two-point discrimination threshold. In the case that stimulators are divided into two and four groups, the number of stimulators increases to twice and four times, respectively, that in the case of the two-point discrimination threshold. Further, a user selects the measurement range and the system presents targets within the range. The user acquires spatial information of the entire measurement area by changing the measurement range. This method can accurately present a range of targets. We examine and confirm these methods experimentally.
This paper describes a novel parameter generation algorithm for an HMM-based speech synthesis technique. The conventional algorithm generates a parameter trajectory of static features that maximizes the likelihood of a given HMM for the parameter sequence consisting of the static and dynamic features under an explicit constraint between those two features. The generated trajectory is often excessively smoothed due to the statistical processing. Using the over-smoothed speech parameters usually causes muffled sounds. In order to alleviate the over-smoothing effect, we propose a generation algorithm considering not only the HMM likelihood maximized in the conventional algorithm but also a likelihood for a global variance (GV) of the generated trajectory. The latter likelihood works as a penalty for the over-smoothing, i.e., a reduction of the GV of the generated trajectory. The result of a perceptual evaluation demonstrates that the proposed algorithm causes considerably large improvements in the naturalness of synthetic speech.
Heiga ZEN Keiichi TOKUDA Takashi MASUKO Takao KOBAYASIH Tadashi KITAMURA
A statistical speech synthesis system based on the hidden Markov model (HMM) was recently proposed. In this system, spectrum, excitation, and duration of speech are modeled simultaneously by context-dependent HMMs, and speech parameter vector sequences are generated from the HMMs themselves. This system defines a speech synthesis problem in a generative model framework and solves it based on the maximum likelihood (ML) criterion. However, there is an inconsistency: although state duration probability density functions (PDFs) are explicitly used in the synthesis part of the system, they have not been incorporated into its training part. This inconsistency can make the synthesized speech sound less natural. In this paper, we propose a statistical speech synthesis system based on a hidden semi-Markov model (HSMM), which can be viewed as an HMM with explicit state duration PDFs. The use of HSMMs can solve the above inconsistency because we can incorporate the state duration PDFs explicitly into both the synthesis and the training parts of the system. Subjective listening test results show that use of HSMMs improves the reported naturalness of synthesized speech.
Akio KOBAYASHI Kazuo ONOE Shinichi HOMMA Shoei SATO Toru IMAI
This paper describes a new criterion for speech recognition using an integrated confidence measure to minimize the word error rate (WER). The conventional criteria for WER minimization obtain the expected WER of a sentence hypothesis merely by comparing it with other hypotheses in an n-best list. The proposed criterion estimates the expected WER by using an integrated confidence measure with word posterior probabilities for a given acoustic input. The integrated confidence measure, which is implemented as a classifier based on maximum entropy (ME) modeling or support vector machines (SVMs), is used to acquire probabilities reflecting whether the word hypotheses are correct. The classifier is comprised of a variety of confidence measures and can deal with a temporal sequence of them to attain a more reliable confidence. Our proposed criterion for minimizing WER achieved a WER of 9.8% and a 3.9% reduction, relative to conventional n-best rescoring methods in transcribing Japanese broadcast news in various environments such as under noisy field and spontaneous speech conditions.
This paper presents a method for lossy compression of digital video data by parametric line and Natural cubic spline approximation. The method estimates the variation of pixel values in the temporal dimension by taking group of pixels together as keyblocks and interpolating them in Euclidean space. Break and fit criterion is used to minimize the number of keyblocks required for encoding and decoding of approximated data. Each group of pixels at fixed spatial location is encoded/decoded independently. The proposed method can easily be incorporated in the existing video data compression techniques based on Discrete Cosine Transform or Wavelet Transform.
Kenichi KANATANI Yasuyuki SUGAYA Hanno ACKERMANN
In order to reconstruct 3-D Euclidean shape by the Tomasi-Kanade factorization, one needs to specify an affine camera model such as orthographic, weak perspective, and paraperspective. We present a new method that does not require any such specific models. We show that a minimal requirement for an affine camera to mimic perspective projection leads to a unique camera model, called symmetric affine camera, which has two free functions. We determine their values from input images by linear computation and demonstrate by experiments that an appropriate camera model is automatically selected.
Despite the importance of domain-specific resource construction for domain ontology development, few studies have sought to develop a method for automatically identifying domain ontology-relevant web pages. To address this situation, here we propose a web page filtering scheme for domain ontology that identifies domain-relevant web pages from the web based on the context of concepts. Testing of the proposed filtering scheme with a business domain ontology on YahooPicks web pages yielded promising filtering results that were superior to those obtained using the baseline system.
M. Shahidur RAHMAN Tetsuya SHIMAMURA
A two-stage least square identification method is proposed for estimating ARMA (autoregressive moving average) coefficients from speech signals. A pulse-train like input sequence is often employed to account for the source effects in estimating vocal tract parameters of voiced speech. Due to glottal and radiation effects, the pulse train, however, does not represent the effective voice source. The authors have already proposed a simple but effective model of voice source for estimating AR (autoregressive) coefficients. This letter extends our approach to ARMA analysis to wider varieties of speech sounds including nasal vowels and consonants. Analysis results on both synthetic and natural nasal speech are presented to demonstrate the analysis ability of the method.
Sangbae JEONG Hoirin KIM Minsoo HAHN
In this paper, we propose a useful algorithm that can be applied to reduce the response time of speech recognizers based on HMM's. In our algorithm, to reduce the response time, promising HMM states are selected by single Gaussians. In speech recognition, HMM state likelihoods are evaluated by the corresponding single Gaussians first, and then likelihoods by original full Gaussians are computed and replaced only for the HMM states having relatively large likelihoods. By doing so, we can reduce the pattern-matching time for speech recognition significantly without any noticeable loss of the recognition rate. In addition, we cluster the single Gaussians into groups by measuring the distance between Gaussians. Therefore, we can reduce the extra memory much more. In our 10,000 word Korean POI (point-of-interest) recognition task, our proposed algorithm shows 35.57% reduction of the response time in comparison with that of the baseline system at the cost of 10% degradation of the WER.
Yutao DONG Xiangzhong FANG Jing YANG
The frame-level R-D optimization in H.264 is very important in video storage scenarios. Among all of the sub-optimal algorithms, a greedy iteration algorithm (GIA) can best lower the computational complexity of frame-level R-D optimization. In order to further lower the computational complexity, a ρ-domain frame-level R-D optimization algorithm is proposed in this letter. Different from GIA, every frame's rate and distortion can be estimated accurately without actual encoding in our proposed algorithm. Simulation results show that our proposed algorithm can lower the computational complexity greatly with negligible variation in peak signal-to-noise ratio (PSNR) compared with GIA.
Chih-Cheng LO Pao-Tung WANG Jeng-Shyang PAN Bin-Yih LIAO
In this letter, we propose a novel subsampling based image watermark sequentially embedding scheme to reduce the risk of common permutation attack. The image is still perceptual after watermarking, and experimental results also show its effectiveness and robustness.