1-3hit |
Hirofumi TAKANO Naoyuki AWANO Kenji SUGIYAMA
High dynamic range (HDR) images that include large differences in brightness levels are studied to address the lack of knowledge on the quality estimation method for real HDR images. For this, we earlier proposed a new metric, the independent signal-to-noise ratio (ISNR), using the independent pixel value as the signal instead of the peak value (PSNR). Next, we proposed the local peak signal-to-noise ratio (LPSNR), using the maximum value of neighboring pixels, as an improved version. However, these methods did not sufficiently consider human perception. To address this issue, here we proposed an objective estimation method that considers spatial frequency characteristics based on the actual brightness. In this method, the approximated function for human characteristics is calculated and used as a 2D filter on an FFT for spatial frequency weighting. In order to confirm the usefulness of this objective estimation method, we compared the results of the objective estimation with a subjective assessment. We used the organic EL display which has a perfect contrast ratio for the subjective assessment. The results of experiments showed that perceptual weighting improves the correlation between the SNR and MOS of the subjective assessment. It is recognized that the weighted LPSNR gives the best correlation.
We attempted to estimate subjective scores of the Japanese Diagnostic Rhyme Test (DRT), a two-to-one forced selection speech intelligibility test. We used automatic speech recognizers with language models that force one of the words in the word-pair, mimicking the human recognition process of the DRT. Initial testing was done using speaker-independent models, and they showed significantly lower scores than subjective scores. The acoustic models were then adapted to each of the speakers in the corpus, and then adapted to noise at a specified SNR. Three different types of noise were tested: white noise, multi-talker (babble) noise, and pseudo-speech noise. The match between subjective and estimated scores improved significantly with noise-adapted models compared to speaker-independent models and the speaker-adapted models, when the adapted noise level and the tested level match. However, when SNR conditions do not match, the recognition scores degraded especially when tested SNR conditions were higher than the adapted noise level. Accordingly, we adapted the models to mixed levels of noise, i.e., multi-condition training. The adapted models now showed relatively high intelligibility matching subjective intelligibility performance over all levels of noise. The correlation between subjective and estimated intelligibility scores increased to 0.94 with multi-talker noise, 0.93 with white noise, and 0.89 with pseudo-speech noise, while the root mean square error (RMSE) reduced from more than 40 to 13.10, 13.05 and 16.06, respectively.
Takeshi YAMADA Masakazu KUMAKURA Nobuhiko KITAWAKI
It is essential to ensure a satisfactory QoS (Quality of Service) when offering a speech communication system with a noise reduction algorithm. In this paper, we propose a new obejective test methodology for noise-reduced speech that estimates word intelligibility by using a distortion measure. Experimental results confirmed that the proposed methodology gives an accurate estimate with independence of noise reduction algorithms and noise types.