Daisuke KOBAYASHI Ken NAKAMURA Masaki KITAHARA Tatsuya OSAWA Yuya OMORI Takayuki ONISHI Hiroe IWASAKI
This paper describes a novel low-latency 4K 60 fps HEVC (high efficiency video coding)/H.265 multi-channel encoding system with content-aware bitrate control for live streaming. Adaptive bitrate (ABR) streaming techniques, such as MPEG-DASH (dynamic adaptive streaming over HTTP) and HLS (HTTP live streaming), spread widely on Internet video streaming. Live content has increased with the expansion of streaming services, which has led to demands for traffic reduction and low latency. To reduce network traffic, we propose content-aware dynamic and seamless bitrate control that supports multi-channel real-time encoding for ABR, including 4K 60 fps video. Our method further supports chunked packaging transfer to provide low-latency streaming. We adopt a hybrid architecture consisting of hardware and software processing. The system consists of multiple 4K HEVC encoder LSIs that each LSI can encode 4K 60 fps or up to high-definition (HD) ×4 videos efficiently with the proposed bitrate control method. The software takes the packaging process according to the various streaming protocol. Experimental results indicate that our method reduces encoding bitrates obtained with constant bitrate encoding by as much as 56.7%, and the streaming latency over MPEG-DASH is 1.77 seconds.
Ryuta SAKAMOTO Takahiro SHOBUDANI Ryosuke HOTCHI Ryogo KUBO
In video distribution services such as video streaming, the providers must satisfy the various quality demands of the users. One of the human-centric indexes used to assess video quality is the quality of experience (QoE). In video streaming, the video bitrate, video freezing time, and video bitrate switching are significant determiners of QoE. To provide high-quality video streaming services, adaptive streaming using the Moving Picture Experts Group dynamic adaptive streaming over Hypertext Transfer Protocol (MPEG-DASH) is widely utilized. One of the conventional bitrate selection methods for MPEG-DASH selects the bitrate such that the amount of buffered data in the playback buffer, i.e., the playback buffer level, can be maintained at a constant value. This method can avoid buffer overflow and video freezing based on feedback control; however, this method induces high-frequency video bitrate switching, which can degrade QoE. To overcome this issue, this paper proposes a bitrate selection method in an adaptive video steaming for MPEG-DASH to improve the QoE by minimizing the bitrate fluctuation. To this end, the proposed method does not change the bitrate if the playback buffer level is not around its upper or lower limit, corresponding to the full or empty state of the playback buffer, respectively. In particular, to avoid buffer overflow and video freezing, the proposed method selects the bitrate based on proportional-derivative (PD) control to maintain the playback buffer level at a target level, which corresponds to an upper or lower threshold of the playback buffer level. Simulations confirm that, the proposed method offers better QoE than the conventional method for users with various preferences.
Hyun-Ho KIM Sung-Gyun LIM Gwangsoon LEE Jun Young JEONG Jae-Gon KIM
The emerging three degree of freedom plus (3DoF+) video provides more interactive and deep immersive visual experience. 3DoF+ video introduces motion parallax to 360 video providing omnidirectional view with limited changes of the view position. A large set of views are required to support such 3DoF+ visual experience, hence it is essential to compress a tremendous amount of 3DoF+ video. Recently, MPEG is developing a standard for efficient coding of 3DoF+ video that consists of multiple videos, and its test model named Test Model for Immersive Video (TMIV). In the TMIV, the redundancy between the input source views is removed as much as possible by selecting one or several basic views and predicting the remaining views from the basic views. Each unpredicted region is cropped to a bounding box called patch, and then a large number of patches are packed into atlases together with the selected basic views. As a result, multiple source views are converted into one or more atlas sequences to be compressed. In this letter, we present an improved clustering method using patch merging in the atlas construction in the TMIV. The proposed method achieves significant BD-rate reduction in terms of various end-to-end evaluation metrics in the experiment, and was adopted in TMIV6.0.
Yasuhiro MOCHIDA Takayuki NAKACHI Takahiro YAMAGUCHI
High frame rate (HFR) video is attracting strong interest since it is considered as a next step toward providing Ultra-High Definition video service. For instance, the Association of Radio Industries and Businesses (ARIB) standard, the latest broadcasting standard in Japan, defines a 120 fps broadcasting format. The standard stipulates temporally scalable coding and hierarchical transmission by MPEG Media Transport (MMT), in which the base layer and the enhancement layer are transmitted over different paths for flexible distribution. We have developed the first ever MMT transmitter/receiver module for 4K/120fps temporally scalable video. The module is equipped with a newly proposed encapsulation method of temporally scalable bitstreams with correct boundaries. It is also designed to be tolerant to severe network constraints, including packet loss, arrival timing offset, and delay jitter. We conducted a hierarchical transmission experiment for 4K/120fps temporally scalable video. The experiment demonstrated that the MMT module was successfully fabricated and capable of dealing with severe network constraints. Consequently, the module has excellent potential as a means to support HFR video distribution in various network situations.
Kenji KANAI Bo WEI Zhengxue CHENG Masaru TAKEUCHI Jiro KATTO
This paper introduces recent trends in video streaming and four methods proposed by the authors for video streaming. Video traffic dominates the Internet as seen in current trends, and new visual contents such as UHD and 360-degree movies are being delivered. MPEG-DASH has become popular for adaptive video streaming, and machine learning techniques are being introduced in several parts of video streaming. Along with these research trends, the authors also tried four methods: route navigation, throughput prediction, image quality assessment, and perceptual video streaming. These methods contribute to improving QoS/QoE performance and reducing power consumption and storage size.
Illo YOON Saehanseul YI Chanyoung OH Hyeonjin JUNG Youngmin YI
Video analytics is usually time-consuming as it not only requires video decoding as a first step but also usually applies complex computer vision and machine learning algorithms to the decoded frame. To achieve high efficiency in video analytics with ever increasing frame size, many researches have been conducted for distributed video processing using Hadoop. However, most approaches focused on processing multiple video files on multiple nodes. Such approaches require a number of video files to achieve any speedup, and could easily result in load imbalance when the size of video files is reasonably long since a video file itself is processed sequentially. In contrast, we propose a distributed video decoding method with an extended FFmpeg and VideoRecordReader, by which a single large video file can be processed in parallel across multiple nodes in Hadoop. The experimental results show that a case study of face detection and SURF system achieve 40.6 times and 29.1 times of speedups respectively on a four-node cluster with 12 mappers in each node, showing good scalability.
Kugjin YUN Won-sik CHEONG Kyuheon KIM
Recently, standard organizations of ATSC, DVB and TTA have been working to design various immersive media broadcasting services such as the hybrid network-based 3D video, UHD video and multiple views. This letter focuses on providing a new synchronization and transport system target decoder (T-STD) model of 3D video distribution based on heterogeneous transmission protocol in a hybrid network environment, where a broadcasting network and broadband (IP) network are combined. On the basis of the experimental results, the proposed technology has been proved to be successfully used as a core element for synchronization and T-STD model in a hybrid network-based 3D broadcasting. It has been also found out that it could be used as a base technique for various IP associated hybrid broadcasting services.
Wyllian B. da SILVA Keiko V. O. FONSECA Alexandre de A. P. POHL
Digital video signals are subject to several distortions due to compression processes, transmission over noisy channels or video processing. Therefore, the video quality evaluation has become a necessity for broadcasters and content providers interested in offering a high video quality to the customers. Thus, an objective no-reference video quality assessment metric is proposed based on the sigmoid model using spatial-temporal features weighted by parameters obtained through the solution of a nonlinear least squares problem using the Levenberg-Marquardt algorithm. Experimental results show that when it is applied to MPEG-2 streams our method presents better linearity than full-reference metrics, and its performance is close to that achieved with full-reference metrics for H.264 streams.
Kwangki KIM Minsoo HAHN Jinsul KIM
MPEG spatial audio object coding (SAOC) is a new audio coding standard which efficiently represents various audio objects as a down-mix signal and spatial parameters. MPEG SAOC has a backward compatibility with existing playback systems for the down-mix signal. If a mastering signal is used for providing CD-like sound quality instead of the down-mix signal, an output signal decoded with the mastering signal may be easily degraded due to the difference between the down-mix and the mastering signals. To successfully use the mastering signal in MPEG SAOC, the difference between two signals should be eliminated. As a simple mastering signal processing, we propose a mastering signal processing using the mastering down-mix gain (MDG) which is similar to the arbitrary down-mix gain of MPEG Surround. Also, we propose an enhanced mastering signal processing using the MDG bias in order to reduce quantization errors of the MDG. Experimental results show that the proposed schemes can improve sound quality of the output signal decoded with the mastering signal. Especially, the enhanced method shows better performance than the simple method in the aspects of the quantization errors and the sound quality.
Our research is focused on examining the video quality assessment model based on the MPEG-7 descriptor. Video quality is estimated by using several features based on the predicted frame quality such as average value, worst value, best value, standard deviation, and the predicted frame rate obtained from descriptor information. As a result, assessment of video quality can be conducted with a high prediction accuracy with correlation coefficient=0.94, standard deviation of error=0.24, maximum error=0.68 and outlier ratio=0.23.
Our research is focused on examining a stereoscopic quality assessment model for stereoscopic images with disparate quality in left and right images for glasses-free stereo vision. In this paper, we examine the objective assessment model of 3-D images, considering the difference in image quality between each view-point generated by the disparity-compensated coding. A overall stereoscopic image quality can be estimated by using only predicted values of left and right 2-D image qualities based on the MPEG-7 descriptor information without using any disparity information. As a result, the stereoscopic still image quality is assessed with high prediction accuracy with correlation coefficient=0.98 and average error=0.17.
Sungyong YOON Hee-Suk PANG Koeng-Mo SUNG
We propose a new coding scheme for lossless bit rate reduction of the MPEG Surround module in unified speech and audio coding (USAC). The proposed scheme is based on context-adaptive arithmetic coding for efficient bit stream composition of spatial parameters. Experiments show that it achieves the significant lossless bit reduction of 9.93% to 12.14% for spatial parameters and 8.64% to 8.96% for the overall MPEG Surround bit streams compared to the original scheme. The proposed scheme, which is not currently included in USAC, can be used for the improved coding efficiency of MPEG Surround in USAC, where the saved bits can be utilized by the other modules in USAC.
Yibo FAN Jialiang LIU Dexue ZHANG Xiaoyang ZENG Xinhua CHEN
Fidelity Range Extension (FRExt) (i.e. High Profile) was added to the H.264/AVC recommendation in the second version. One of the features included in FRExt is the Adaptive Block-size Transform (ABT). In order to conform to the FRExt, a Fractional Motion Estimation (FME) architecture is proposed to support the 88/44 adaptive Hadamard Transform (88/44 AHT). The 88/44 AHT circuit contributes to higher throughput and encoding performance. In order to increase the utilization of SATD (Sum of Absolute Transformed Difference) Generator (SG) in unit time, the proposed architecture employs two 8-pel interpolators (IP) to time-share one SG. These two IPs can work in turn to provide the available data continuously to the SG, which increases the data throughput and significantly reduces the cycles that are needed to process one Macroblock. Furthermore, this architecture also exploits the linear feature of Hadamard Transform to generate the quarter-pel SATD. This method could help to shorten the long datapath in the second-step of two-iteration FME algorithm. Finally, experimental results show that this architecture could be used in the applications requiring different performances by adjusting the supported modes and operation frequency. It can support the real-time encoding of the seven-mode 4 K2 K@24 fps or six-mode 4 K2 K@30 fps video sequences.
Our research is focused on examining the Image Quality Assessment Model based on the MPEG-7 descriptor and the No Reference model. The model retrieves a reference image using image search and evaluate its subject score as a pseudo Reduced Reference model. The MPEG-7 descriptor was originally used for content retrieval, but we discovered that the MPEG-7 descriptor can also be used for image quality assessment. We examined the performance of the proposed model and the results revealed that this method has a higher performance rating than the SSIM.
Bing-Fei WU Hao-Yu HUANG Yen-Lin CHEN Hsin-Yuan PENG Jia-Hsiung HUANG
This study presents several optimization approaches for the MPEG-2/4 Audio Advanced Coding (AAC) Low Complexity (LC) encoding and decoding processes. Considering the power consumption and the peripherals required for consumer electronics, this study adopts the TI OMAP5912 platform for portable devices. An important optimization issue for implementing AAC codec on embedded and mobile devices is to reduce computational complexity and memory consumption. Due to power saving issues, most embedded and mobile systems can only provide very limited computational power and memory resources for the coding process. As a result, modifying and simplifying only one or two blocks is insufficient for optimizing the AAC encoder and enabling it to work well on embedded systems. It is therefore necessary to enhance the computational efficiency of other important modules in the encoding algorithm. This study focuses on optimizing the Temporal Noise Shaping (TNS), Mid/Side (M/S) Stereo, Modified Discrete Cosine Transform (MDCT) and Inverse Quantization (IQ) modules in the encoder and decoder. Furthermore, we also propose an efficient memory reduction approach that provides a satisfactory balance between the reduction of memory usage and the expansion of the encoded files. In the proposed design, both the AAC encoder and decoder are built with fixed-point arithmetic operations and implemented on a DSP processor combined with an ARM-core for peripheral controlling. Experimental results demonstrate that the proposed AAC codec is computationally effective, has low memory consumption, and is suitable for low-cost embedded and mobile applications.
Jae-Seong LEE Chang-Joon LEE Young-Cheol PARK Dae-Hee YOUN
This paper proposes an efficient FFT algorithm for the Psycho-Acoustic Model (PAM) of MPEG-4 AAC. The proposed algorithm synthesizes FFT coefficients using MDCT and MDST coefficients through circular convolution. The complexity of the MDCT and MDST coefficients is approximately half of the original FFT. We also design a new PAM based on the proposed FFT algorithm, which has 15% lower computational complexity than the original PAM without degradation of sound quality. Subjective as well as objective test results are presented to confirm the efficiency of the proposed FFT computation algorithm and the PAM.
Sumek WISAYATAKSIN Dongju LI Tsuyoshi ISSHIKI Hiroaki KUNIEDA
An entropy decoding engine plays an important role in modern multimedia decoders. Previous researches that focused on the decoding performance paid a considerable attention to only one parameter such as the data parsing speed, but they did not consider the performance caused by a table configuration time and memory size. In this paper, we developed a novel method of entropy decoding based on the two step group matching scheme. Our approach achieves the high performance on both data parsing speed and configuration time with small memory needed. We also deployed our decoding scheme to implement an entropy decoding processor, which performs operations based on normal processor instructions and VLD instructions for decoding variable length codes. Several extended VLD instructions are prepared to increase the bitstream parsing process in modern multimedia applications. This processor provides a solution with software flexibility and hardware high speed for stand-alone entropy decoding engines. The VLSI hardware is designed by the Language for Instruction Set Architecture (LISA) with 23 Kgates and 110 MHz maximum clock frequency under TSMC 0.18 µm technology. The experimental simulations revealed that proposed processor achieves the higher performance and suitable for many practical applications such as MPEG-2, MPEG-4, H.264/AVC and AAC.
Tae-Kyoung KIM Jeong-Hwan BOO Sang Ju PARK
Scalable video coding (SVC) was standardized as an extension of H.264/AVC by the JVT (Joint Video Team) in Nov. 2007. The biggest feature of SVC is multi-layered coding where two or more video sequences are compressed into a single bit-stream. This letter proposes a fast block mode decision algorithm in spatial enhancement layer of SVC. The proposed algorithm achieves early decision by limiting the number of candidate modes for block with certain characteristic called same motion vector block (SMVB). Our proposed method reduces the complexity, in terms of encoding time by up to 66.17%. Nevertheless, it shows negligible PSNR degradation by only up to 0.16 dB and increases the bit-rate by only up to 0.64%, respectively.
Xianghui WEI Takeshi IKENAGA Satoshi GOTO
Motion estimation (ME) is a computation and data intensive module in video coding system. The search window reuse methods play a critical role in bandwidth reduction by exploiting the data locality in video coding system. In this paper, a search window reuse method (Level C+) is proposed for MPEG-2 to H.264/AVC transcoding. The proposed method is designed for ultra-low bandwidth application, while the on-chip memory is not a main constraining factor. By loading search window for the motion estimation unit (MEU) and applying motion vector clipping processing, each MB in MEU can utilize both horizontal and vertical search reuse. A very low bandwidth level (Rα<2) can be achieved with an acceptable on-chip memory.
Min-Woo PARK Jong-Tae PARK Gwang-Hoon PARK Doug-Young SUH
This letter introduces a cost-effective chrominance compensation scheme. The proposed method is applied to both 'INTER 1616' and 'SKIP' modes in only anchor P-pictures. By testing using JVT common test condition, simulation results show that proposed method can obtain average BD-PSNR gains for U and V as 0.14 dB and 0.13 dB, respectively while maintaining almost the same BD-PSNR's for Y. For the range of low bit-rate, it is observed that average BD-PSNR gains for Y, U and V are 0.14 dB, 0.49 dB and 0.53 dB, respectively. Necessary computational complexity is very marginal because the number of anchor P-pictures is very small in comparison with whole coded video sequences. However it can be found that the proposed method can significantly improve the coding efficiencies of color components.