The search functionality is under construction.

Keyword Search Result

[Keyword] MPEG(158hit)

1-20hit(158hit)

  • A Low-Latency 4K HEVC Multi-Channel Encoding System with Content-Aware Bitrate Control for Live Streaming

    Daisuke KOBAYASHI  Ken NAKAMURA  Masaki KITAHARA  Tatsuya OSAWA  Yuya OMORI  Takayuki ONISHI  Hiroe IWASAKI  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2022/09/30
      Vol:
    E106-D No:1
      Page(s):
    46-57

    This paper describes a novel low-latency 4K 60 fps HEVC (high efficiency video coding)/H.265 multi-channel encoding system with content-aware bitrate control for live streaming. Adaptive bitrate (ABR) streaming techniques, such as MPEG-DASH (dynamic adaptive streaming over HTTP) and HLS (HTTP live streaming), spread widely on Internet video streaming. Live content has increased with the expansion of streaming services, which has led to demands for traffic reduction and low latency. To reduce network traffic, we propose content-aware dynamic and seamless bitrate control that supports multi-channel real-time encoding for ABR, including 4K 60 fps video. Our method further supports chunked packaging transfer to provide low-latency streaming. We adopt a hybrid architecture consisting of hardware and software processing. The system consists of multiple 4K HEVC encoder LSIs that each LSI can encode 4K 60 fps or up to high-definition (HD) ×4 videos efficiently with the proposed bitrate control method. The software takes the packaging process according to the various streaming protocol. Experimental results indicate that our method reduces encoding bitrates obtained with constant bitrate encoding by as much as 56.7%, and the streaming latency over MPEG-DASH is 1.77 seconds.

  • QoE-Aware Stable Adaptive Video Streaming Using Proportional-Derivative Controller for MPEG-DASH Open Access

    Ryuta SAKAMOTO  Takahiro SHOBUDANI  Ryosuke HOTCHI  Ryogo KUBO  

     
    PAPER-Network

      Pubricized:
    2020/09/24
      Vol:
    E104-B No:3
      Page(s):
    286-294

    In video distribution services such as video streaming, the providers must satisfy the various quality demands of the users. One of the human-centric indexes used to assess video quality is the quality of experience (QoE). In video streaming, the video bitrate, video freezing time, and video bitrate switching are significant determiners of QoE. To provide high-quality video streaming services, adaptive streaming using the Moving Picture Experts Group dynamic adaptive streaming over Hypertext Transfer Protocol (MPEG-DASH) is widely utilized. One of the conventional bitrate selection methods for MPEG-DASH selects the bitrate such that the amount of buffered data in the playback buffer, i.e., the playback buffer level, can be maintained at a constant value. This method can avoid buffer overflow and video freezing based on feedback control; however, this method induces high-frequency video bitrate switching, which can degrade QoE. To overcome this issue, this paper proposes a bitrate selection method in an adaptive video steaming for MPEG-DASH to improve the QoE by minimizing the bitrate fluctuation. To this end, the proposed method does not change the bitrate if the playback buffer level is not around its upper or lower limit, corresponding to the full or empty state of the playback buffer, respectively. In particular, to avoid buffer overflow and video freezing, the proposed method selects the bitrate based on proportional-derivative (PD) control to maintain the playback buffer level at a target level, which corresponds to an upper or lower threshold of the playback buffer level. Simulations confirm that, the proposed method offers better QoE than the conventional method for users with various preferences.

  • Efficient Patch Merging for Atlas Construction in 3DoF+ Video Coding

    Hyun-Ho KIM  Sung-Gyun LIM  Gwangsoon LEE  Jun Young JEONG  Jae-Gon KIM  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2020/12/14
      Vol:
    E104-D No:3
      Page(s):
    477-480

    The emerging three degree of freedom plus (3DoF+) video provides more interactive and deep immersive visual experience. 3DoF+ video introduces motion parallax to 360 video providing omnidirectional view with limited changes of the view position. A large set of views are required to support such 3DoF+ visual experience, hence it is essential to compress a tremendous amount of 3DoF+ video. Recently, MPEG is developing a standard for efficient coding of 3DoF+ video that consists of multiple videos, and its test model named Test Model for Immersive Video (TMIV). In the TMIV, the redundancy between the input source views is removed as much as possible by selecting one or several basic views and predicting the remaining views from the basic views. Each unpredicted region is cropped to a bounding box called patch, and then a large number of patches are packed into atlases together with the selected basic views. As a result, multiple source views are converted into one or more atlas sequences to be compressed. In this letter, we present an improved clustering method using patch merging in the atlas construction in the TMIV. The proposed method achieves significant BD-rate reduction in terms of various end-to-end evaluation metrics in the experiment, and was adopted in TMIV6.0.

  • An MMT-Based Hierarchical Transmission Module for 4K/120fps Temporally Scalable Video

    Yasuhiro MOCHIDA  Takayuki NAKACHI  Takahiro YAMAGUCHI  

     
    PAPER

      Pubricized:
    2020/06/22
      Vol:
    E103-D No:10
      Page(s):
    2059-2066

    High frame rate (HFR) video is attracting strong interest since it is considered as a next step toward providing Ultra-High Definition video service. For instance, the Association of Radio Industries and Businesses (ARIB) standard, the latest broadcasting standard in Japan, defines a 120 fps broadcasting format. The standard stipulates temporally scalable coding and hierarchical transmission by MPEG Media Transport (MMT), in which the base layer and the enhancement layer are transmitted over different paths for flexible distribution. We have developed the first ever MMT transmitter/receiver module for 4K/120fps temporally scalable video. The module is equipped with a newly proposed encapsulation method of temporally scalable bitstreams with correct boundaries. It is also designed to be tolerant to severe network constraints, including packet loss, arrival timing offset, and delay jitter. We conducted a hierarchical transmission experiment for 4K/120fps temporally scalable video. The experiment demonstrated that the MMT module was successfully fabricated and capable of dealing with severe network constraints. Consequently, the module has excellent potential as a means to support HFR video distribution in various network situations.

  • Methods for Adaptive Video Streaming and Picture Quality Assessment to Improve QoS/QoE Performances Open Access

    Kenji KANAI  Bo WEI  Zhengxue CHENG  Masaru TAKEUCHI  Jiro KATTO  

     
    INVITED PAPER

      Pubricized:
    2019/01/22
      Vol:
    E102-B No:7
      Page(s):
    1240-1247

    This paper introduces recent trends in video streaming and four methods proposed by the authors for video streaming. Video traffic dominates the Internet as seen in current trends, and new visual contents such as UHD and 360-degree movies are being delivered. MPEG-DASH has become popular for adaptive video streaming, and machine learning techniques are being introduced in several parts of video streaming. Along with these research trends, the authors also tried four methods: route navigation, throughput prediction, image quality assessment, and perceptual video streaming. These methods contribute to improving QoS/QoE performance and reducing power consumption and storage size.

  • Distributed Video Decoding on Hadoop

    Illo YOON  Saehanseul YI  Chanyoung OH  Hyeonjin JUNG  Youngmin YI  

     
    PAPER-Cluster Computing

      Pubricized:
    2018/09/18
      Vol:
    E101-D No:12
      Page(s):
    2933-2941

    Video analytics is usually time-consuming as it not only requires video decoding as a first step but also usually applies complex computer vision and machine learning algorithms to the decoded frame. To achieve high efficiency in video analytics with ever increasing frame size, many researches have been conducted for distributed video processing using Hadoop. However, most approaches focused on processing multiple video files on multiple nodes. Such approaches require a number of video files to achieve any speedup, and could easily result in load imbalance when the size of video files is reasonably long since a video file itself is processed sequentially. In contrast, we propose a distributed video decoding method with an extended FFmpeg and VideoRecordReader, by which a single large video file can be processed in parallel across multiple nodes in Hadoop. The experimental results show that a case study of face detection and SURF system achieve 40.6 times and 29.1 times of speedups respectively on a four-node cluster with 12 mappers in each node, showing good scalability.

  • A Synchronization and T-STD Model for 3D Video Distribution and Consumption over Hybrid Network

    Kugjin YUN  Won-sik CHEONG  Kyuheon KIM  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2015/07/13
      Vol:
    E98-D No:10
      Page(s):
    1884-1887

    Recently, standard organizations of ATSC, DVB and TTA have been working to design various immersive media broadcasting services such as the hybrid network-based 3D video, UHD video and multiple views. This letter focuses on providing a new synchronization and transport system target decoder (T-STD) model of 3D video distribution based on heterogeneous transmission protocol in a hybrid network environment, where a broadcasting network and broadband (IP) network are combined. On the basis of the experimental results, the proposed technology has been proved to be successfully used as a core element for synchronization and T-STD model in a hybrid network-based 3D broadcasting. It has been also found out that it could be used as a base technique for various IP associated hybrid broadcasting services.

  • Objective No-Reference Video Quality Assessment Method Based on Spatio-Temporal Pixel Analysis

    Wyllian B. da SILVA  Keiko V. O. FONSECA  Alexandre de A. P. POHL  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2015/04/03
      Vol:
    E98-D No:7
      Page(s):
    1325-1332

    Digital video signals are subject to several distortions due to compression processes, transmission over noisy channels or video processing. Therefore, the video quality evaluation has become a necessity for broadcasters and content providers interested in offering a high video quality to the customers. Thus, an objective no-reference video quality assessment metric is proposed based on the sigmoid model using spatial-temporal features weighted by parameters obtained through the solution of a nonlinear least squares problem using the Levenberg-Marquardt algorithm. Experimental results show that when it is applied to MPEG-2 streams our method presents better linearity than full-reference metrics, and its performance is close to that achieved with full-reference metrics for H.264 streams.

  • Mastering Signal Processing in MPEG SAOC

    Kwangki KIM  Minsoo HAHN  Jinsul KIM  

     
    PAPER-Speech and Hearing

      Vol:
    E95-D No:12
      Page(s):
    3053-3059

    MPEG spatial audio object coding (SAOC) is a new audio coding standard which efficiently represents various audio objects as a down-mix signal and spatial parameters. MPEG SAOC has a backward compatibility with existing playback systems for the down-mix signal. If a mastering signal is used for providing CD-like sound quality instead of the down-mix signal, an output signal decoded with the mastering signal may be easily degraded due to the difference between the down-mix and the mastering signals. To successfully use the mastering signal in MPEG SAOC, the difference between two signals should be eliminated. As a simple mastering signal processing, we propose a mastering signal processing using the mastering down-mix gain (MDG) which is similar to the arbitrary down-mix gain of MPEG Surround. Also, we propose an enhanced mastering signal processing using the MDG bias in order to reduce quantization errors of the MDG. Experimental results show that the proposed schemes can improve sound quality of the output signal decoded with the mastering signal. Especially, the enhanced method shows better performance than the simple method in the aspects of the quantization errors and the sound quality.

  • Reduced-Reference Objective Quality Assessment Model of Coded Video Sequences Based on the MPEG-7 Descriptor

    Masaharu SATO  Yuukou HORITA  

     
    LETTER-Quality Metrics

      Vol:
    E95-A No:8
      Page(s):
    1259-1263

    Our research is focused on examining the video quality assessment model based on the MPEG-7 descriptor. Video quality is estimated by using several features based on the predicted frame quality such as average value, worst value, best value, standard deviation, and the predicted frame rate obtained from descriptor information. As a result, assessment of video quality can be conducted with a high prediction accuracy with correlation coefficient=0.94, standard deviation of error=0.24, maximum error=0.68 and outlier ratio=0.23.

  • A Study of Stereoscopic Image Quality Assessment Model Corresponding to Disparate Quality of Left/Right Image for JPEG Coding

    Masaharu SATO  Yuukou HORITA  

     
    LETTER-Quality Metrics

      Vol:
    E95-A No:8
      Page(s):
    1264-1269

    Our research is focused on examining a stereoscopic quality assessment model for stereoscopic images with disparate quality in left and right images for glasses-free stereo vision. In this paper, we examine the objective assessment model of 3-D images, considering the difference in image quality between each view-point generated by the disparity-compensated coding. A overall stereoscopic image quality can be estimated by using only predicted values of left and right 2-D image qualities based on the MPEG-7 descriptor information without using any disparity information. As a result, the stereoscopic still image quality is assessed with high prediction accuracy with correlation coefficient=0.98 and average error=0.17.

  • Context-Adaptive Arithmetic Coding Scheme for Lossless Bit Rate Reduction of MPEG Surround in USAC

    Sungyong YOON  Hee-Suk PANG  Koeng-Mo SUNG  

     
    LETTER-Speech and Hearing

      Vol:
    E95-D No:7
      Page(s):
    2013-2016

    We propose a new coding scheme for lossless bit rate reduction of the MPEG Surround module in unified speech and audio coding (USAC). The proposed scheme is based on context-adaptive arithmetic coding for efficient bit stream composition of spatial parameters. Experiments show that it achieves the significant lossless bit reduction of 9.93% to 12.14% for spatial parameters and 8.64% to 8.96% for the overall MPEG Surround bit streams compared to the original scheme. The proposed scheme, which is not currently included in USAC, can be used for the improved coding efficiency of MPEG Surround in USAC, where the saved bits can be utilized by the other modules in USAC.

  • An 88/44 Adaptive Hadamard Transform Based FME VLSI Architecture for 4 K2 K H.264/AVC Encoder

    Yibo FAN  Jialiang LIU  Dexue ZHANG  Xiaoyang ZENG  Xinhua CHEN  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    447-455

    Fidelity Range Extension (FRExt) (i.e. High Profile) was added to the H.264/AVC recommendation in the second version. One of the features included in FRExt is the Adaptive Block-size Transform (ABT). In order to conform to the FRExt, a Fractional Motion Estimation (FME) architecture is proposed to support the 88/44 adaptive Hadamard Transform (88/44 AHT). The 88/44 AHT circuit contributes to higher throughput and encoding performance. In order to increase the utilization of SATD (Sum of Absolute Transformed Difference) Generator (SG) in unit time, the proposed architecture employs two 8-pel interpolators (IP) to time-share one SG. These two IPs can work in turn to provide the available data continuously to the SG, which increases the data throughput and significantly reduces the cycles that are needed to process one Macroblock. Furthermore, this architecture also exploits the linear feature of Hadamard Transform to generate the quarter-pel SATD. This method could help to shorten the long datapath in the second-step of two-iteration FME algorithm. Finally, experimental results show that this architecture could be used in the applications requiring different performances by adjusting the supported modes and operation frequency. It can support the real-time encoding of the seven-mode 4 K2 K@24 fps or six-mode 4 K2 K@30 fps video sequences.

  • An Image Quality Assessment Model Based on the MPEG-7 Descriptor

    Masaharu SATO  Yuukou HORITA  

     
    PAPER-Evaluation

      Vol:
    E94-A No:2
      Page(s):
    509-518

    Our research is focused on examining the Image Quality Assessment Model based on the MPEG-7 descriptor and the No Reference model. The model retrieves a reference image using image search and evaluate its subject score as a pseudo Reduced Reference model. The MPEG-7 descriptor was originally used for content retrieval, but we discovered that the MPEG-7 descriptor can also be used for image quality assessment. We examined the performance of the proposed model and the results revealed that this method has a higher performance rating than the SSIM.

  • MPEG-2/4 Low-Complexity Advanced Audio Coding Optimization and Implementation on DSP

    Bing-Fei WU  Hao-Yu HUANG  Yen-Lin CHEN  Hsin-Yuan PENG  Jia-Hsiung HUANG  

     
    PAPER-Speech and Hearing

      Vol:
    E93-D No:5
      Page(s):
    1225-1237

    This study presents several optimization approaches for the MPEG-2/4 Audio Advanced Coding (AAC) Low Complexity (LC) encoding and decoding processes. Considering the power consumption and the peripherals required for consumer electronics, this study adopts the TI OMAP5912 platform for portable devices. An important optimization issue for implementing AAC codec on embedded and mobile devices is to reduce computational complexity and memory consumption. Due to power saving issues, most embedded and mobile systems can only provide very limited computational power and memory resources for the coding process. As a result, modifying and simplifying only one or two blocks is insufficient for optimizing the AAC encoder and enabling it to work well on embedded systems. It is therefore necessary to enhance the computational efficiency of other important modules in the encoding algorithm. This study focuses on optimizing the Temporal Noise Shaping (TNS), Mid/Side (M/S) Stereo, Modified Discrete Cosine Transform (MDCT) and Inverse Quantization (IQ) modules in the encoder and decoder. Furthermore, we also propose an efficient memory reduction approach that provides a satisfactory balance between the reduction of memory usage and the expansion of the encoded files. In the proposed design, both the AAC encoder and decoder are built with fixed-point arithmetic operations and implemented on a DSP processor combined with an ARM-core for peripheral controlling. Experimental results demonstrate that the proposed AAC codec is computationally effective, has low memory consumption, and is suitable for low-cost embedded and mobile applications.

  • Efficient FFT Algorithm for Psychoacoustic Model of the MPEG-4 AAC

    Jae-Seong LEE  Chang-Joon LEE  Young-Cheol PARK  Dae-Hee YOUN  

     
    LETTER-Speech and Hearing

      Vol:
    E92-D No:12
      Page(s):
    2535-2539

    This paper proposes an efficient FFT algorithm for the Psycho-Acoustic Model (PAM) of MPEG-4 AAC. The proposed algorithm synthesizes FFT coefficients using MDCT and MDST coefficients through circular convolution. The complexity of the MDCT and MDST coefficients is approximately half of the original FFT. We also design a new PAM based on the proposed FFT algorithm, which has 15% lower computational complexity than the original PAM without degradation of sound quality. Subjective as well as objective test results are presented to confirm the efficiency of the proposed FFT computation algorithm and the PAM.

  • Entropy Decoding Processor for Modern Multimedia Applications

    Sumek WISAYATAKSIN  Dongju LI  Tsuyoshi ISSHIKI  Hiroaki KUNIEDA  

     
    PAPER-Embedded, Real-Time and Reconfigurable Systems

      Vol:
    E92-A No:12
      Page(s):
    3248-3257

    An entropy decoding engine plays an important role in modern multimedia decoders. Previous researches that focused on the decoding performance paid a considerable attention to only one parameter such as the data parsing speed, but they did not consider the performance caused by a table configuration time and memory size. In this paper, we developed a novel method of entropy decoding based on the two step group matching scheme. Our approach achieves the high performance on both data parsing speed and configuration time with small memory needed. We also deployed our decoding scheme to implement an entropy decoding processor, which performs operations based on normal processor instructions and VLD instructions for decoding variable length codes. Several extended VLD instructions are prepared to increase the bitstream parsing process in modern multimedia applications. This processor provides a solution with software flexibility and hardware high speed for stand-alone entropy decoding engines. The VLSI hardware is designed by the Language for Instruction Set Architecture (LISA) with 23 Kgates and 110 MHz maximum clock frequency under TSMC 0.18 µm technology. The experimental simulations revealed that proposed processor achieves the higher performance and suitable for many practical applications such as MPEG-2, MPEG-4, H.264/AVC and AAC.

  • Fast Mode Decision on the Enhancement Layer in H.264 Scalable Extension

    Tae-Kyoung KIM  Jeong-Hwan BOO  Sang Ju PARK  

     
    LETTER-Image Processing and Video Processing

      Vol:
    E92-D No:12
      Page(s):
    2545-2547

    Scalable video coding (SVC) was standardized as an extension of H.264/AVC by the JVT (Joint Video Team) in Nov. 2007. The biggest feature of SVC is multi-layered coding where two or more video sequences are compressed into a single bit-stream. This letter proposes a fast block mode decision algorithm in spatial enhancement layer of SVC. The proposed algorithm achieves early decision by limiting the number of candidate modes for block with certain characteristic called same motion vector block (SMVB). Our proposed method reduces the complexity, in terms of encoding time by up to 66.17%. Nevertheless, it shows negligible PSNR degradation by only up to 0.16 dB and increases the bit-rate by only up to 0.64%, respectively.

  • An Ultra-Low Bandwidth Design Method for MPEG-2 to H.264/AVC Transcoding

    Xianghui WEI  Takeshi IKENAGA  Satoshi GOTO  

     
    PAPER

      Vol:
    E92-A No:4
      Page(s):
    1072-1079

    Motion estimation (ME) is a computation and data intensive module in video coding system. The search window reuse methods play a critical role in bandwidth reduction by exploiting the data locality in video coding system. In this paper, a search window reuse method (Level C+) is proposed for MPEG-2 to H.264/AVC transcoding. The proposed method is designed for ultra-low bandwidth application, while the on-chip memory is not a main constraining factor. By loading search window for the motion estimation unit (MEU) and applying motion vector clipping processing, each MB in MEU can utilize both horizontal and vertical search reuse. A very low bandwidth level (Rα<2) can be achieved with an acceptable on-chip memory.

  • Chrominance Compensation for Multi-View Video Coding

    Min-Woo PARK  Jong-Tae PARK  Gwang-Hoon PARK  Doug-Young SUH  

     
    LETTER-Image Processing and Video Processing

      Vol:
    E92-D No:2
      Page(s):
    353-356

    This letter introduces a cost-effective chrominance compensation scheme. The proposed method is applied to both 'INTER 1616' and 'SKIP' modes in only anchor P-pictures. By testing using JVT common test condition, simulation results show that proposed method can obtain average BD-PSNR gains for U and V as 0.14 dB and 0.13 dB, respectively while maintaining almost the same BD-PSNR's for Y. For the range of low bit-rate, it is observed that average BD-PSNR gains for Y, U and V are 0.14 dB, 0.49 dB and 0.53 dB, respectively. Necessary computational complexity is very marginal because the number of anchor P-pictures is very small in comparison with whole coded video sequences. However it can be found that the proposed method can significantly improve the coding efficiencies of color components.

1-20hit(158hit)