Yo UMEKI Taichi YOSHIDA Masahiro IWAHASHI
In this paper, we propose a method of salient object detection based on distributed seeds and a co-propagation of seed information. Salient object detection is a technique which estimates important objects for human by calculating saliency values of pixels. Previous salient object detection methods often produce incorrect saliency values near salient objects in the case of images which have some objects, called the leakage of saliencies. Therefore, a method based on a co-propagation, the scale invariant feature transform, the high dimensional color transform, and machine learning is proposed to reduce the leakage. Firstly, the proposed method estimates regions clearly located in salient objects and the background, which are called as seeds and resultant seeds, are distributed over images. Next, the saliency information of seeds is simultaneously propagated, which is then referred as a co-propagation. The proposed method can reduce the leakage caused because of the above methods when the co-propagation of each information collide with each other near the boundary. Experiments show that the proposed method significantly outperforms the state-of-the-art methods in mean absolute error and F-measure, which perceptually reduces the leakage.
Jianbin ZHOU Dajiang ZHOU Takeshi YOSHIMURA Satoshi GOTO
Compressed Sensing based CMOS image sensor (CS-CIS) is a new generation of CMOS image sensor that significantly reduces the power consumption. For CS-CIS, the image quality and data volume of output are two important issues to concern. In this paper, we first proposed an algorithm to generate a series of deterministic and ternary matrices, which improves the image quality, reduces the data volume and are compatible with CS-CIS. Proposed matrices are derived from the approximate DCT and trimmed in 2D-zigzag order, thus preserving the energy compaction property as DCT does. Moreover, we proposed matrix row operations adaptive to the proposed matrix to further compress data (measurements) without any image quality loss. At last, a low-cost VLSI architecture of measurements compression with proposed matrix row operations is implemented. Experiment results show our proposed matrix significantly improve the coding efficiency by BD-PSNR increase of 4.2 dB, comparing with the random binary matrix used in the-state-of-art CS-CIS. The proposed matrix row operations for measurement compression further increases the coding efficiency by 0.24 dB BD-PSNR (4.8% BD-rate reduction). The VLSI architecture is only 4.3 K gates in area and 0.3 mW in power consumption.
Kengo TSUDA Takanori FUJISAWA Masaaki IKEHARA
In this paper, we introduce a new method to remove random-valued impulse noise in an image. Random-valued impulse noise replaces the pixel value at a random position by a random value. Due to the randomness of the noisy pixel values, it is difficult to detect them by comparison with neighboring pixels, which is used in many conventional methods. Then we improve the recent noise detector which uses a non-local search of similar structure. Next we propose a new noise removal algorithm by sparse representation using DCT basis. Furthermore, the sparse representation can remove impulse noise by using the neighboring similar image patch. This method has much more superior noise removal performance than conventional methods at images. We confirm the effectiveness of the proposed method quantitatively and qualitatively.
Yibo FAN Leilei HUANG Zheng XIE Xiaoyang ZENG
In the newly finalized video coding standard, namely high efficiency video coding (HEVC), new notations like coding unit (CU), prediction unit (PU) and transformation unit (TU) are introduced to improve the coding performance. As a result, the reconstruction loop in intra encoding is heavily burdened to choose the best partitions or modes for them. In order to solve the bottleneck problems in cycle and hardware cost, this paper proposed a high-throughput and compact implementation for such a reconstruction loop. By “high-throughput”, it refers to that it has a fixed throughput of 32 pixel/cycle independent of the TU/PU size (except for 4×4 TUs). By “compact”, it refers to that it fully explores the reusability between discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) as well as that between quantization (Q) and de-quantization (IQ). Besides the contributions made in designing related hardware, this paper also provides a universal formula to analyze the cycle cost of the reconstruction loop and proposed a parallel-process scheme to further reduce the cycle cost. This design is verified on the Stratix IV FPGA. The basic structure achieved a maximum frequency of 150MHz and a hardware cost of 64K ALUTs, which could support the real time TU/PU partition decision for 4K×2K@20fps videos.
Yusuke MIYAGI Keita TAKAHASHI Toshiaki FUJII
Light field data, which is composed of multi-view images, have various 3D applications. However, the cost of acquiring many images from slightly different viewpoints sometimes makes the use of light fields impractical. Here, compressive sensing is a new way to obtain the entire light field data from only a few camera shots instead of taking all the images individually. In paticular, the coded aperture/mask technique enables us to capture light field data in a compressive way through a single camera. A pixel value recorded by such a camera is a sum of the light rays that pass though different positions on the coded aperture/mask. The target light field can be reconstructed from the recorded pixel values by using prior information on the light field signal. As prior information, the current state of the art uses a dictionary (light field atoms) learned from training datasets. Meanwhile, it was reported that general bases such as those of the discrete cosine transform (DCT) are not suitable for efficiently representing prior information. In this study, however, we demonstrate that a 4D-DCT basis works surprisingly well when it is combined with a weighting scheme that considers the amplitude differences between DCT coefficients. Simulations using 18 light field datasets show the superiority of the weighted 4D-DCT basis to the learned dictionary. Furthermore, we analyzed a disparity-dependent property of the reconstructed data that is unique to light fields.
Meng ZHANG Tinghuan CHEN Xuchao SHI Peng CAO
The development of image acquisition technology and display technology provide the base for popularization of high-resolution images. On the other hand, the available bandwidth is not always enough to data stream such high-resolution images. Down- and up-sampling, which decreases the data volume of images and increases back to high-resolution images, is a solution for the transmission of high-resolution images. In this paper, motivated by the observation that the high-frequency DCT components are sparse in the spatial domain, we propose a scheme combined with Discrete Cosine Transform (DCT) and Compressed Sensing (CS) to achieve arbitrary-ratio down-sampling. Our proposed scheme makes use of two properties: First, the energy of a image concentrates on the low-frequency DCT components. Second, the high-frequency DCT components are sparse in the spatial domain. The scheme is able to preserve the most information and avoid absolutely blindly estimating the high-frequency components. Experimental results show that the proposed down- and up-sampling scheme produces better performance compared with some state-of-the-art schemes in terms of peak signal to noise ratio (PSNR), structural similarity index measurement (SSIM) and processing time.
Yazhong ZHANG Jinjian WU Guangming SHI Xuemei XIE Yi NIU Chunxiao FAN
Reduced-reference (RR) image quality assessment (IQA) algorithm aims to automatically evaluate the distorted image quality with partial reference data. The goal of RR IQA metric is to achieve higher quality prediction accuracy using less reference information. In this paper, we introduce a new RR IQA metric by quantifying the difference of discrete cosine transform (DCT) entropy features between the reference and distorted images. Neurophysiological evidences indicate that the human visual system presents different sensitivities to different frequency bands. Moreover, distortions on different bands result in individual quality degradations. Therefore, we suggest to calculate the information degradation on each band separately for quality assessment. The information degradations are firstly measured by the entropy difference of reorganized DCT coefficients. Then, the entropy differences on all bands are pooled to obtain the quality score. Experimental results on LIVE, CSIQ, TID2008, Toyama and IVC databases show that the proposed method performs highly consistent with human perception with limited reference data (8 values).
Heming SUN Dajiang ZHOU Peilin LIU Satoshi GOTO
In this paper, we present an area-efficient 4/8/16/32-point inverse discrete cosine transform (IDCT) architecture for a HEVC decoder. Compared with previous work, this work reduces the hardware cost from two aspects. First, we reduce the logical costs of 1D IDCT by proposing a reordered parallel-in serial-out (RPISO) scheme. By using the RPISO scheme, we can reduce the required calculations for butterfly inputs in each cycle. Secondly, we reduce the area of transpose architecture by proposing a cyclic data mapping scheme that can achieve 100% I/O utilization of each SRAM. To design a fully pipelined 2D IDCT architecture, we propose a pipelining schedule for row and column transform. The results show that the normalized area by maximum throughput for the logical IDCT part can be reduced by 25%, and the memory area can be reduced by 62%. The maximum throughput reaches 1248 Mpixels/s, which can support real-time decoding of a 4K × 2K 60fps video sequence.
Keunseok CHO Sangbae JEONG Minsoo HAHN
This paper proposes a new algorithm to encode the spectral envelope for G.729.1 more accurately. It applies the normalized least-mean- square (NLMS) algorithm to each subband energy of the modified discrete cosine transform (MDCT) in the time-domain alias cancellation (TDAC) of G.729.1. By utilizing the estimation error of subband energies by means of NLMS, allocated bit reduction for spectral envelope coding is achieved. The saved bits are then reused to improve the spectral envelope estimation and thus enhance the sound quality. Experimental results confirm that the proposed algorithm improves the sound quality under both clean and packet loss conditions.
Sha SHEN Weiwei SHEN Yibo FAN Xiaoyang ZENG
This paper describes a unified VLSI architecture which can be applied to various types of transforms used in MPEG-2/4, H.264, VC-1, AVS and the emerging new video coding standard named HEVC (High Efficiency Video Coding). A novel design named configurable butterfly array (CBA) is also proposed to support both the forward transform and the inverse transform in this unified architecture. Hadamard transform or 4/8-point DCT/IDCT are used in traditional video coding standards while 16/32-point DCT/IDCT are newly introduced in HEVC. The proposed architecture can support all these transform types in a unified architecture. Two levels (architecture level and block level) of hardware sharing are adopted in this design. In the architecture level, the forward transform can share the hardware resource with the inverse transform. In the block level, the hardware for smaller size transform can be recursively reused by larger size transform. The multiplications of 4 or 8-point transform are implemented with Multiplierless MCM (Multiple Constant Multiplication). In order to reduce the hardware overhead, the multiplications of 16/32 point DCT are implemented with ICM (input-muxed constant multipliers) instead of MCM or regular multipliers. The proposed design is 51% more area efficient than previous work. To the author's knowledge, this is the first published work to support both forward and inverse 4/8/16/32-point integer transform for HEVC standard in a unified architecture.
Byeong-No KIM Chan-Ho HAN Kyu-Ik SOHNG
We propose a composite DCT basis line test signal to evaluate the video quality of a DTV encoder. The proposed composite test signal contains a frame index, a calibration square wave, and 7-field basis signals. The results show that the proposed method may be useful for an in-service video quality verifier, using an ordinary oscilloscope instead of special equipment.
This paper presents an M-channel (M=2n (n ∈ N)) integer discrete cosine transforms (IntDCTs) based on fast Hartley transform (FHT) for lossy-to-lossless image coding which has image quality scalability from lossy data to lossless data. Many IntDCTs with lifting structures have already been presented to achieve lossy-to-lossless image coding. Recently, an IntDCT based on direct-lifting of DCT/IDCT, which means direct use of DCT and inverse DCT (IDCT) to lifting blocks, has been proposed. Although the IntDCT shows more efficient coding performance than any conventional IntDCT, it entails many computational costs due to an extra information that is a key point to realize its direct-lifting structure. On the other hand, the almost conventional IntDCTs without an extra information cannot be easily expanded to a larger size than the standard size M=8, or the conventional IntDCT should be improved for efficient coding performance even if it realizes an arbitrary size. The proposed IntDCT does not need any extra information, can be applied to size M=2n for arbitrary n, and shows better coding performance than the conventional IntDCTs without any extra information by applying the direct-lifting to the pre- and post-processing block of DCT. Moreover, the proposed IntDCT is implemented with a half of the computational cost of the IntDCT based on direct-lifting of DCT/IDCT even though it shows the best coding performance.
Wyllian B. da SILVA Keiko V. O. FONSECA Alexandre de A. P. POHL
A simple and efficient reduced-reference video quality assessment method based on the activity-difference of DCT coefficients is proposed. The method provides better accuracy, monotonicity, and consistent predictions than the PSNR full-reference metric and comparable results with the full-reference SSIM. It also shows an improved performance to a similar VQ technique based on the calculation of the pixel luminance differences performed in the spatial-domain.
Xue ZHANG Anhong WANG Bing ZENG Lei LIU Zhuo LIU
Numerous examples in image processing have demonstrated that human visual perception can be exploited to improve processing performance. This paper presents another showcase in which some visual information is employed to guide adaptive block-wise compressive sensing (ABCS) for image data, i.e., a varying CS-sampling rate is applied on different blocks according to the visual contents in each block. To this end, we propose a visual analysis based on the discrete cosine transform (DCT) coefficients of each block reconstructed at the decoder side. The analysis result is sent back to the CS encoder, stage-by-stage via a feedback channel, so that we can decide which blocks should be further CS-sampled and what is the extra sampling rate. In this way, we can perform multiple passes of reconstruction to improve the quality progressively. Simulation results show that our scheme leads to a significant improvement over the existing ones with a fixed sampling rate.
A new peak-to-average power ratio (PAPR) reduction scheme for orthogonal frequency division multiplexing (OFDM) is proposed based on the discrete cosine transform (DCT) and discrete sine transform (DST) precodings. Since the DCT and DST precodings concentrate energy into a few components, the hybrid application of the precodings to the real and imaginary parts of mapping symbols can significantly reduce the PAPR. Simulations show that the proposed scheme achieves significantly low PAPR without degrading the bit error rate (BER) compared to existing precoding schemes.
System-on-display panel design methodologies are proposed with the purpose of integrating DCT and IDCT on display panels for image codec and peripheral systems so as to reduce the bus data rate, memory size and power consumption. Unified constant geometry algorithms and architectures including recursive additions are proposed for DCT and IDCT butterfly computation, recursive additions and interconnections between stages. These schemes facilitate VLSI implementation and improve fault tolerance, suitable for low-yield SOP processing technologies through duplicate use of a PE as all the butterfly and recursive addition stages are composed and interconnected in a regular fashion. Efficient redundancy replacement methodologies optimizing the computation speed and the amount of hardware in various application areas are also described with testability and reliability issues. Finally, a performance analysis of speed, hardware and interconnection complexity is described with the proposed work's advantages.
Naoya SAGARA Takayuki SUZUKI Kenji SUGIYAMA
The non-reference method is widely useful to estimation picture quality on the decoder side. In this paper, we discuss the estimation method for spatial blur that divides the frequency zones by the absolute value of 64 coefficients with an 8-by-8 DCT and compares them. It is recognized that absolute blur estimation is possible with the decoded picture only.
Yuki IKEGAKI Toshiaki MIYAZAKI Stanislav G. SEDUKHIN
Conventional array processors randomly access input/coefficient data stored in memory many times during three-dimensional discrete cosine transform (3D-DCT) calculations. This causes a calculation bottleneck. In this paper, a 3D array processor dedicated to 3D-DCT is proposed. The array processor drastically reduces data swapping or replacement during the calculation and thus improves performance. The time complexity of the proposed NNN array processor is O(N) for an N3-size input data cube, and that of the 3D-DCT sequential calculation is O(N4). A specific I/O architecture, throughput-improved architectures, and more scalable architecture are also discussed in terms of practical implementation. Experimental results of implementation on FPGA (field-programmable gate array) suggest that our architecture provides good performance for real-time 3D-DCT calculations.
Jae-seong LEE Young-cheol PARK Dae-hee YOUN Kyung-ok KANG
Although the AMR-WB+ coder provides excellent quality for speech signal, its coding model for music signals is not as optimal as the HE-AAC v2. The main causes of the poor quality of the AMR-WB+ TCX are the non-critical sampling and block artifacts. The new TCX windowing scheme proposed in this paper uses an MDCT with a 50% frame overlap, so that the problems of non-critical sampling and blocking artifacts are significantly mitigated. Due to long overlaps, the proposed scheme involves an additional codec delay. It is, however, moderate for audio services. The results of objective and subjective tests indicate that the proposed scheme achieves noticeable quality improvements for music signals over the previous TCX schemes.
Naoya SAGARA Yousuke KASHIMURA Kenji SUGIYAMA
DCT encoding of images leads to block artifact and mosquito noise degradations in the decoded pictures. We propose an estimation to determine the mosquito noise block and level; however, this technique lacks sufficient linearity. To improve its performance, we use the sub-divided block for edge effect suppression. The subsequent results are mostly linear with the quantization.