IEICE global.ieice.org Site

Keyword Search Result

[Keyword] perceptual(42hit)

1-20hit(42hit)

A Luminance Expansion Method for Displaying HDR Video in SDR Display
Takashi YAMAZOE Jinyu TANG Gin INOUE Kenji SUGIYAMA

LETTER-Vision

Pubricized:
2023/06/27
Vol:
E106-A No:9
Page(s):
1220-1223
HDR video is possible to display the maximum 1200% luminance, however, it is limited in SDR display. In this study, we expand high luminance area considering with perceptual performance to improve a presentation performance of HDR video in the SDR display. As results of objective experiments, it is recognized that the proposed method can improve the presentation performance maximally 0.8dB in WPSNR.
Image and Model Transformation with Secret Key for Vision Transformer
Hitoshi KIYA Ryota IIJIMA Aprilpyone MAUNGMAUNG Yuma KINOSHITA

INVITED PAPER

Pubricized:
2022/11/02
Vol:
E106-D No:1
Page(s):
2-11
In this paper, we propose a combined use of transformed images and vision transformer (ViT) models transformed with a secret key. We show for the first time that models trained with plain images can be directly transformed to models trained with encrypted images on the basis of the ViT architecture, and the performance of the transformed models is the same as models trained with plain images when using test images encrypted with the key. In addition, the proposed scheme does not require any specially prepared data for training models or network modification, so it also allows us to easily update the secret key. In an experiment, the effectiveness of the proposed scheme is evaluated in terms of performance degradation and model protection performance in an image classification task on the CIFAR-10 dataset.
Research on the Algorithm of License Plate Recognition Based on MPGAN Haze Weather
Weiguo ZHANG Jiaqi LU Jing ZHANG Xuewen LI Qi ZHAO

PAPER-Image Recognition, Computer Vision

Pubricized:
2022/02/21
Vol:
E105-D No:5
Page(s):
1085-1093
The haze situation will seriously affect the quality of license plate recognition and reduce the performance of the visual processing algorithm. In order to improve the quality of haze pictures, a license plate recognition algorithm based on haze weather is proposed in this paper. The algorithm in this paper mainly consists of two parts: The first part is MPGAN image dehazing, which uses a generative adversarial network to dehaze the image, and combines multi-scale convolution and perceptual loss. Multi-scale convolution is conducive to better feature extraction. The perceptual loss makes up for the shortcoming that the mean square error (MSE) is greatly affected by outliers; the second part is to recognize the license plate, first we use YOLOv3 to locate the license plate, the STN network corrects the license plate, and finally enters the improved LPRNet network to get license plate information. Experimental results show that the dehazing model proposed in this paper achieves good results, and the evaluation indicators PSNR and SSIM are better than other representative algorithms. After comparing the license plate recognition algorithm with the LPRNet algorithm, the average accuracy rate can reach 93.9%.
A Scheme of Reversible Data Hiding for the Encryption-Then-Compression System
Masaaki FUJIYOSHI Ruifeng LI Hitoshi KIYA

PAPER

Pubricized:
2020/10/21
Vol:
E104-D No:1
Page(s):
43-50
This paper proposes an encryption-then-compression (EtC) system-friendly data hiding scheme for images, where an EtC system compresses images after they are encrypted. The EtC system divides an image into non-overlapping blocks and applies four block-based processes independently and randomly to the image for visual encryption of the image. The proposed scheme hides data to a plain, i.e., unencrypted image and the scheme can take hidden data out from the image encrypted by the EtC system. Furthermore, the scheme serves reversible data hiding, so it can perfectly recover the unmarked image from the marked image whereas the scheme once distorts unmarked image for hiding data to the image. The proposed scheme copes with the three of four processes in the EtC system, namely, block permutation, rotation/flipping of blocks, and inverting brightness in blocks, whereas the conventional schemes for the system do not cope with the last one. In addition, these conventional schemes have to identify the encrypted image so that image-dependent side information can be used to extract embedded data and to restore the unmarked image, but the proposed scheme does not need such identification. Moreover, whereas the data hiding process must know the block size of encryption in conventional schemes, the proposed scheme needs no prior knowledge of the block size for encryption. Experimental results show the effectiveness of the proposed scheme.
Mimicking Lombard Effect: An Analysis and Reconstruction
Thuan Van NGO Rieko KUBO Masato AKAGI

PAPER-Speech and Hearing

Pubricized:
2020/02/13
Vol:
E103-D No:5
Page(s):
1108-1117
Lombard speech is produced in noisy environments due to the Lombard effect and is intelligible in adverse environments. To adaptively control the intelligibility of transmitted speech for public announcement systems, in this study, we focus on perceptually mimicking Lombard speech under backgrounds with varying noise levels. Other approaches map corresponding neutral speech features to Lombard speech features, but as this can only be applied to one noise level at a time, it is unsuitable for varying noise levels because the characteristics of Lombard speech are varied according to noise level. Instead, we utilize a rule-based method that automatically generates rules and flexibly controls features with any change of noise level. Specifically, we conduct a feature tendency analysis and propose a continuous rule generation model to estimate the effect of varying noise levels on features. The proposed techniques, which are based on a coarticulation model, MRTD, and spectral-GMM, can easily modify neutral speech features by following the generated rules. Voices having these features are then synthesized by STRAIGHT to obtain Lombard speech fitting to noises with varying levels. To validate our proposed method, the quality of mimicking speech is evaluated in subjective listening experiments on similarity, intelligibility, and naturalness. In varying noise levels, the results show equal similarity with Lombard speech between the proposed method and a state-of-the-art method. Intelligibility and naturalness are comparable with some feature modifications.
A Rate Perceptual-Distortion Optimized Video Coding HEVC
Bumshik LEE Jae Young CHOI

PAPER-Image Processing and Video Processing

Pubricized:
2018/08/24
Vol:
E101-D No:12
Page(s):
3158-3169
In this paper, a perceptual distortion based rate-distortion optimized video coding scheme for High Efficiency Video Coding (HEVC) is proposed. Structural Similarity Index (SSIM) in transform domain, which is known as distortion metric to better reflect human's perception, is derived for the perceptual distortion model to be applied for hierarchical coding block structure of HEVC. A SSIM-quantization model is proposed using the properties of DCT and high resolution quantization assumption. The SSIM model is obtained as the sum of SSIM in each Coding Unit (CU) depth of HEVC, which precisely predict SSIM values for the hierarchical quadtree structure of CU in HEVC. The rate model is derived from the entropy, based on Laplacian distributions of transform residual coefficients and is jointly combined with the SSIM-based distortion model for rate-distortion optimization in an HEVC video codec and can be compliantly applied to HEVC. The experimental results demonstrate that the proposed method achieves 8.1% and 4.0% average bit rate reductions in rate-SSIM performance for low-delay and random access configurations respectively, outperforming other existing methods. The proposed method provides better visual quality than the conventional mean square error (MSE)-based RDO coding scheme.
Cube-Based Encryption-then-Compression System for Video Sequences
Kosuke SHIMIZU Taizo SUZUKI Keisuke KAMEYAMA

PAPER-Image

Vol:
E101-A No:11
Page(s):
1815-1822
We propose the cube-based perceptual encryption (C-PE), which consists of cube scrambling, cube rotation, cube negative/positive transformation, and cube color component shuffling, and describe its application to the encryption-then-compression (ETC) system of Motion JPEG (MJPEG). Especially, cube rotation replaces the blocks in the original frames with ones in not only the other frames but also the depth-wise cube sides (spatiotemporal sides) unlike conventional block-based perceptual encryption (B-PE). Since it makes intra-block observation more difficult and prevents unauthorized decryption from only a single frame, it is more robust than B-PE against attack methods without any decryption key. However, because the encrypted frames including the blocks from the spatiotemporal sides affect the MJPEG compression performance slightly, we also devise a version of C-PE with no spatiotemporal sides (NSS-C-PE) that hardly affects compression performance. C-PE makes the encrypted video sequence robust against the only single frame-based algorithmic brute force (ABF) attack with only 21 cubes. The experimental results show the compression efficiency and encryption robustness of the C-PE/NSS-C-PE-based ETC system. C-PE-based ETC system shows mixed results depending on videos, whereas NSS-C-PE-based ETC system shows that the BD-PSNR can be suppressed to about -0.03dB not depending on videos.
Research on Analytical Solution Tensor Voting
Hongbin LIN Zheng WU Dong LEI Wei WANG Xiuping PENG

LETTER-Pattern Recognition

Pubricized:
2017/12/01
Vol:
E101-D No:3
Page(s):
817-820
This letter presents a novel tensor voting mechanism — analytic tensor voting (ATV), to get rid of the difficulties in original tensor voting, especially the efficiency. One of the main advantages is its explicit voting formulations, which benefit the completion of tensor voting theory and computational efficiency. Firstly, new decaying function was designed following the basic spirit of decaying function in original tensor voting (OTV). Secondly, analytic stick tensor voting (ASTV) was formulated using the new decaying function. Thirdly, analytic plate and ball tensor voting (APTV, ABTV) were formulated through controllable stick tensor construction and tensorial integration. These make the each voting of tensor can be computed by several non-iterative matrix operations, improving the efficiency of tensor voting remarkably. Experimental results validate the effectiveness of proposed method.
Perceptual Distributed Compressive Video Sensing via Reweighted Sampling and Rate-Distortion Optimized Measurements Allocation
Jin XU Yan ZHANG Zhizhong FU Ning ZHOU

LETTER-Image Processing and Video Processing

Pubricized:
2017/01/06
Vol:
E100-D No:4
Page(s):
918-922
Distributed compressive video sensing (DCVS) is a new paradigm for low-complexity video compression. To achieve the highest possible perceptual coding performance under the measurements budget constraint, we propose a perceptual optimized DCVS codec by jointly exploiting the reweighted sampling and rate-distortion optimized measurements allocation technologies. A visual saliency modulated just-noticeable distortion (VS-JND) profile is first developed based on the side information (SI) at the decoder side. Then the estimated correlation noise (CN) between each non-key frame and its SI is suppressed by the VS-JND. Subsequently, the suppressed CN is utilized to determine the weighting matrix for the reweighted sampling as well as to design a perceptual rate-distortion optimization model to calculate the optimal measurements allocation for each non-key frame. Experimental results indicate that the proposed DCVS codec outperforms the other existing DCVS codecs in term of both the objective and subjective performance.
An Improved Supervised Speech Separation Method Based on Perceptual Weighted Deep Recurrent Neural Networks
Wei HAN Xiongwei ZHANG Meng SUN Li LI Wenhua SHI

LETTER-Speech and Hearing

Vol:
E100-A No:2
Page(s):
718-721
In this letter, we propose a novel speech separation method based on perceptual weighted deep recurrent neural network (DRNN) which incorporate the masking properties of the human auditory system. In supervised training stage, we firstly utilize the clean label speech of two different speakers to calculate two perceptual weighting matrices. Then, the obtained different perceptual weighting matrices are utilized to adjust the mean squared error between the network outputs and the reference features of both the two clean speech so that the two different speech can mask each other. Experimental results on TSP speech corpus demonstrate that the proposed speech separation approach can achieve significant improvements over the state-of-the-art methods when tested with different mixing cases.
Joint Optimization of Perceptual Gain Function and Deep Neural Networks for Single-Channel Speech Enhancement
Wei HAN Xiongwei ZHANG Gang MIN Xingyu ZHOU Meng SUN

LETTER-Noise and Vibration

Vol:
E100-A No:2
Page(s):
714-717
In this letter, we explore joint optimization of perceptual gain function and deep neural networks (DNNs) for a single-channel speech enhancement task. A DNN architecture is proposed which incorporates the masking properties of the human auditory system to make the residual noise inaudible. This new DNN architecture directly trains a perceptual gain function which is used to estimate the magnitude spectrum of clean speech from noisy speech features. Experimental results demonstrate that the proposed speech enhancement approach can achieve significant improvements over the baselines when tested with TIMIT sentences corrupted by various types of noise, no matter whether the noise conditions are included in the training set or not.
JND-Based Power Consumption Reduction for OLED Displays
Ji-Hoon CHOI Oh-Young LEE Myong-Young LEE Kyung-Jin KANG Jong-Ok KIM

PAPER-Image

Vol:
E99-A No:9
Page(s):
1691-1699
With the appearance of large OLED panels, the OLED TV industry has experienced significant growth. However, this technology is still in the early stages of commercialization, and some technical challenges remain to be overcome. During the development phase of a product, power consumption is one of the most important considerations. To reduce power consumption in OLED displays, we propose a method based on just-noticeable difference (JND). JND refers to the minimum visibility threshold when visual content is altered and results from physiological and psychophysical phenomena in the human visual system (HVS). A JND model suitable for OLED displays is derived from numerous experiments with OLED displays. With the use of JND, it is possible to reduce power consumption while minimizing perceptual image quality degradation.
Adaptive Perceptual Block Compressive Sensing for Image Compression
Jin XU Yuansong QIAO Zhizhong FU

LETTER-Image Processing and Video Processing

Pubricized:
2016/03/09
Vol:
E99-D No:6
Page(s):
1702-1706
Because the perceptual compressive sensing framework can achieve a much better performance than the legacy compressive sensing framework, it is very promising for the compressive sensing based image compression system. In this paper, we propose an innovative adaptive perceptual block compressive sensing scheme. Firstly, a new block-based statistical metric which can more appropriately measure each block's sparsity and perceptual sensibility is devised. Then, the approximated theoretical minimum measurement number for each block is derived from the new block-based metric and used as weight for adaptive measurements allocation. The obtained experimental results show that our scheme can significantly enhance both objective and subjective performance of a perceptual compressive sensing framework.
A Perceptually Motivated Approach for Speech Enhancement Based on Deep Neural Network
Wei HAN Xiongwei ZHANG Gang MIN Meng SUN

LETTER-Speech and Hearing

Vol:
E99-A No:4
Page(s):
835-838
In this letter, a novel perceptually motivated single channel speech enhancement approach based on Deep Neural Network (DNN) is presented. Taking into account the good masking properties of the human auditory system, a new DNN architecture is proposed to reduce the perceptual effect of the residual noise. This new DNN architecture is directly trained to learn a gain function which is used to estimate the power spectrum of clean speech and shape the spectrum of the residual noise at the same time. Experimental results demonstrate that the proposed perceptually motivated speech enhancement approach could achieve better objective speech quality when tested with TIMIT sentences corrupted by various types of noise, no matter whether the noise conditions are included in the training set or not.
QP Selection Optimization for Intra-Frame Encoding Based on Constant Perceptual Quality
Chao WANG Xuanqin MOU Lei ZHANG

PAPER-Image Processing and Video Processing

Pubricized:
2015/11/17
Vol:
E99-D No:2
Page(s):
443-453
In lossy image/video encoding, there is a compromise between the number of bits and the extent of distortion. Optimizing the allocation of bits to different sources, such as frames or blocks, can improve the encoding performance. In intra-frame encoding, due to the dependency among macro blocks (MBs) introduced by intra prediction, the optimization of bit allocation to the MBs usually has high complexity. So far, no practical optimal bit allocation methods for intra-frame encoding exist, and the commonly used method for intra-frame encoding is the fixed-QP method. We suggest that the QP selection inside an image/a frame can be optimized by aiming at the constant perceptual quality (CPQ). We proposed an iteration-based bit allocation scheme for H.264/AVC intra-frame encoding, in which all the local areas (which is defined by a group of MBs (GOMBs) in this paper) in the frame are encoded to have approximately the same perceptual quality. The SSIM index is used to measure the perceptual quality of the GOMBs. The experimental results show that the encoding performance on intra-frames can be improved greatly by the proposed method compared with the fixed-QP method. Furthermore, we show that the optimization on the intra-frame can bring benefits to the whole sequence encoding, since a better reference frame can improve the encoding of the subsequent frames. The proposed method has acceptable encoding complexity for offline applications.
An Encryption-then-Compression System for JPEG/Motion JPEG Standard
Kenta KURIHARA Masanori KIKUCHI Shoko IMAIZUMI Sayaka SHIOTA Hitoshi KIYA

PAPER

Vol:
E98-A No:11
Page(s):
2238-2245
In many multimedia applications, image encryption has to be conducted prior to image compression. This paper proposes a JPEG-friendly perceptual encryption method, which enables to be conducted prior to JPEG and Motion JPEG compressions. The proposed encryption scheme can provides approximately the same compression performance as that of JPEG compression without any encryption, where both gray scale images and color ones are considered. It is also shown that the proposed scheme consists of four block-based encryption steps, and provide a reasonably high level of security. Most of conventional perceptual encryption schemes have not been designed for international compression standards, but this paper focuses on applying the JPEG and Motion JPEG standards, as one of the most widely used image compression standards. In addition, this paper considers an efficient key management scheme, which enables an encryption with multiple keys to be easy to manage its keys.
A Study on Consistency between MINAVE and MINMAX in SSIM Based Independent Perceptual Video Coding
Chao WANG Xuanqin MOU Lei ZHANG

LETTER-Image Processing and Video Processing

Pubricized:
2015/04/13
Vol:
E98-D No:7
Page(s):
1417-1421
In this letter, we study the R-D properties of independent sources based on MSE and SSIM, and compare the bit allocation performance under the MINAVE and MINMAX criteria in video encoding. The results show that MINMAX has similar results in terms of average distortion with MINAVE by using SSIM, which illustrates the consistency between these two criteria in independent perceptual video coding. Further more, MINMAX results in lower quality fluctuation, which shows its advantage for perceptual video coding.
Similar Speaker Selection Technique Based on Distance Metric Learning Using Highly Correlated Acoustic Features with Perceptual Voice Quality Similarity
Yusuke IJIMA Hideyuki MIZUNO

PAPER-Speech and Hearing

Pubricized:
2014/10/15
Vol:
E98-D No:1
Page(s):
157-165
This paper analyzes the correlation between various acoustic features and perceptual voice quality similarity, and proposes a perceptually similar speaker selection technique based on distance metric learning. To analyze the relationship between acoustic features and voice quality similarity, we first conduct a large-scale subjective experiment using the voices of 62 female speakers and perceptual voice quality similarity scores between all pairs of speakers are acquired. Next, multiple linear regression analysis is carried out; it shows that four acoustic features are highly correlated to voice quality similarity. The proposed speaker selection technique first trains a transform matrix based on distance metric learning using the perceptual voice quality similarity acquired in the subjective experiment. Given an input speech, acoustic features of the input speech are transformed using the trained transform matrix, after which speaker selection is performed based on the Euclidean distance on the transformed acoustic feature space. We perform speaker selection experiments and evaluate the performance of the proposed technique by comparing it to speaker selection without feature space transformation. The results indicate that transformation based on distance metric learning reduces the error rate by 53.9%.
Hand Gesture Recognition Based on Perceptual Shape Decomposition with a Kinect Camera
Chun WANG Zhongyuan LAI Hongyuan WANG

LETTER-Pattern Recognition

Vol:
E96-D No:9
Page(s):
2147-2151
In this paper, we propose the Perceptual Shape Decomposition (PSD) to detect fingers for a Kinect-based hand gesture recognition system. The PSD is formulated as a discrete optimization problem by removing all negative minima with minimum cost. Experiments show that our PSD is perceptually relevant and robust against distortion and hand variations, and thus improves the recognition system performance.
Detection of Tongue Protrusion Gestures from Video
Luis Ricardo SAPAICO Hamid LAGA Masayuki NAKAJIMA

PAPER-Image Recognition, Computer Vision

Vol:
E94-D No:8
Page(s):
1671-1682
We propose a system that, using video information, segments the mouth region from a face image and then detects the protrusion of the tongue from inside the oral cavity. Initially, under the assumption that the mouth is closed, we detect both mouth corners. We use a set of specifically oriented Gabor filters for enhancing horizontal features corresponding to the shadow existing between the upper and lower lips. After applying the Hough line detector, the extremes of the line that was found are regarded as the mouth corners. Detection rate for mouth corner localization is 85.33%. These points are then input to a mouth appearance model which fits a mouth contour to the image. By segmenting its bounding box we obtain a mouth template. Next, considering the symmetric nature of the mouth, we divide the template into right and left halves. Thus, our system makes use of three templates. We track the mouth in the following frames using normalized correlation for mouth template matching. Changes happening in the mouth region are directly described by the correlation value, i.e., the appearance of the tongue in the surface of the mouth will cause a decrease in the correlation coefficient through time. These coefficients are used for detecting the tongue protrusion. The right and left tongue protrusion positions will be detected by analyzing similarity changes between the right and left half-mouth templates and the currently tracked ones. Detection rates under the default parameters of our system are 90.20% for the tongue protrusion regardless of the position, and 84.78% for the right and left tongue protrusion positions. Our results demonstrate the feasibility of real-time tongue protrusion detection in vision-based systems and motivates further investigating the usage of this new modality in human-computer communication.

1-20hit(42hit)

Keyword Search Result

[Keyword] perceptual(42hit)

A Luminance Expansion Method for Displaying HDR Video in SDR Display

Image and Model Transformation with Secret Key for Vision Transformer

Research on the Algorithm of License Plate Recognition Based on MPGAN Haze Weather

A Scheme of Reversible Data Hiding for the Encryption-Then-Compression System

Mimicking Lombard Effect: An Analysis and Reconstruction

A Rate Perceptual-Distortion Optimized Video Coding HEVC

Cube-Based Encryption-then-Compression System for Video Sequences

Research on Analytical Solution Tensor Voting

Perceptual Distributed Compressive Video Sensing via Reweighted Sampling and Rate-Distortion Optimized Measurements Allocation

An Improved Supervised Speech Separation Method Based on Perceptual Weighted Deep Recurrent Neural Networks

Joint Optimization of Perceptual Gain Function and Deep Neural Networks for Single-Channel Speech Enhancement

JND-Based Power Consumption Reduction for OLED Displays

Adaptive Perceptual Block Compressive Sensing for Image Compression

A Perceptually Motivated Approach for Speech Enhancement Based on Deep Neural Network

QP Selection Optimization for Intra-Frame Encoding Based on Constant Perceptual Quality

An Encryption-then-Compression System for JPEG/Motion JPEG Standard

A Study on Consistency between MINAVE and MINMAX in SSIM Based Independent Perceptual Video Coding

Similar Speaker Selection Technique Based on Distance Metric Learning Using Highly Correlated Acoustic Features with Perceptual Voice Quality Similarity

Hand Gesture Recognition Based on Perceptual Shape Decomposition with a Kinect Camera

Detection of Tongue Protrusion Gestures from Video

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles