In this letter, we propose a Hadoop-based Video Transcoding System (HVTS), which is designed to run on all major cloud computing services. HVTS is highly adapted to the structure and policies of Hadoop, thus it has additional capacities for transcoding, task distribution, load balancing, and content replication and distribution. To evaluate, our proposed system, we carry out two performance tests on our local testbed, transcoding and robustness to data node and task failures. The results confirmed that our system delivers satisfactory performance in facilitating seamless streaming services in cloud computing environments.
With the proliferation of hand-held devices in recent years, mobile video streaming has become an extremely popular application. However, Internet video streaming to mobile devices faces several problems, such as unstable connections, long latency, high jitter, etc. We present a system, OptVid, which enhances the user's experiences of video streaming service on cellular networks. OptVid takes the user's profile and provides seamless adaptive bitrate streaming by leveraging the video transcoding solution. It provides very agile bitrate adaptation, especially in the mobile scenario where the wireless channel is not stable. We prototype video transcoding on a WiMAX testbed to bridge the gap between the wireless channel capacity and the video quality. Our evaluations reveal that OptVid provides better user experience than conventional schemes in terms of PSNR, video stalls, and buffering time. OptVid does not require any additional storage since it transcodes videos on-the-fly upon receiving requests and delivers them directly to the client.
Soongi HONG Yoonsik CHOE Yong-Goo KIM
In transcoding, it is well known that refinement of the motion vectors is critical to enhance the quality of transcoded video while significantly reducing transcoding complexity. This paper proposes a novel cost model to estimate the rate-distortion cost of motion vector composition in order to develop a reliable motion vector re-estimation method that has reasonable computation cost. Based on a statistical analysis of motion compensated prediction errors, we design a basic form of the proposed cost model as a function of distance from the optimal motion vector. Simulations with a transcoder employing the proposed cost model demonstrate a significant quality gain over representative video transcoding schemes with no complexity increase.
Lei SUN Zhenyu LIU Takeshi IKENAGA
Scalable Video Coding (SVC) is an extension of H.264/AVC, aiming to provide the ability to adapt to heterogeneous networks or requirements. It offers great flexibility for bitstream adaptation in multi-point applications such as videoconferencing. However, transcoding between SVC and AVC is necessary due to the existence of legacy AVC-based systems. The straightforward re-encoding method requires great computational cost, and delay-sensitive applications like videoconferencing require much faster transcoding scheme. This paper proposes a 3-stage fast SVC-to-AVC transcoder with medium-grain quality scalability (MGS) for videoconferencing applications. Hierarchical-P structured SVC bitstream is transcoded into IPPP structured AVC bitstream with multiple reference frames. In the first stage, mode decision is accelerated by proposed SVC-to-AVC mode mapping scheme. In the second stage, INTER motion estimation is accelerated by an optimized motion vector (MV) conjunction method to predict the MV with a reduced search range. In the last stage, hadamard-based all zero block (AZB) detection is utilized for early termination. Simulation results show that proposed transcoder achieves very similar coding efficiency to the optimal result, but with averagely 89.6% computational time saving.
Lei SUN Zhenyu LIU Takeshi IKENAGA
As an extension of H.264/AVC, Scalable Video Coding (SVC) provides the ability to adapt to heterogeneous networks and user-end requirements, which offers great scalability in multi-point applications such as videoconferencing. However, transcoding between SVC and AVC becomes necessary due to the existence of legacy AVC-based systems. The straightforward full re-encoding method requires great computational cost, and the fast SVC-to-AVC spatial transcoding techniques have not been thoroughly investigated yet. This paper proposes a low-complexity hybrid-domain SVC-to-AVC spatial transcoder with drift compensation, which provides even better coding efficiency than the full re-encoding method. The macroblocks (MBs) of input SVC bitstream are divided into two types, and each type is suitable for pixel- or transform-domain processing respectively. In the pixel-domain transcoding, a fast re-encoding method is proposed based on mode mapping and motion vector (MV) refinement. In the transform-domain transcoding, the quantized transform coefficients together with other motion data are reused directly to avoid re-quantization loss. The drift problem caused by proposed transcoder is solved by compensation techniques for I frame and P frame respectively. Simulation results show that proposed transcoder achieves averagely 96.4% time reduction compared with the full re-encoding method, and outperforms the reference methods in coding efficiency.
Lei SUN Zhenyu LIU Takeshi IKENAGA
Scalable Video Coding (SVC) is an extension of H.264/AVC, aiming to provide the ability to adapt to heterogeneous networks or requirements. It offers great flexibility for bitstream adaptation in multi-point applications such as videoconferencing. However, transcoding between SVC and AVC is necessary due to the existence of legacy AVC-based systems. The straightforward re-encoding method requires great computational cost, and delay-sensitive applications like videoconferencing require much faster transcoding scheme. This paper proposes an ultra-low-delay SVC-to-AVC MGS (Medium-Grain quality Scalability) transcoder for videoconferencing applications. Transcoding is performed in pure frequency domain with partial decoding/encoding in order to achieve significant speed-up. Three fast transcoding methods in frequency domain are proposed for macroblocks with different coding modes in non-KEY pictures. KEY pictures are transcoded by reusing the base layer motion data, and error propagation is constrained between KEY pictures. Simulation results show that proposed transcoder achieves averagely 38.5 times speed-up compared with the re-encoding method, while introducing merely 0.71 dB BDPSNR coding quality loss for videoconferencing sequences as compared with the re-encoding algorithm.
Bin SONG Haixiao LIU Hao QIN Jie QIN
A direct inter-mode selection algorithm for P-frames in fast homogeneous H.264/AVC bit-rate reduction transcoding is proposed in this paper. To achieve the direct inter-mode selection, we firstly develop a low-complexity distortion estimation method for fast transcoding, in which the distortion is directly calculated from the decoded residual together with the reference frames. We also present a linear estimation method to approximate the coding rate. With the estimated distortion and rate, the rate-distortion cost can be easily computed in the transcoder. In our algorithm, a method based on the normalized rate difference of P-frames (RP) is used to detect the high motion scene. To achieve fast transcoding, only for the P-frames with RP larger than a threshold, the rate-distortion optimized (RDO) mode decision is performed; meanwhile, the average cost of each inter-mode (ACM) is calculated. Then for the subsequent frames transcoding, the optimal coding mode can be directly selected using the estimated cost and the ACM threshold. Experiments show that the proposed method can significantly simplify the complex RDO mode decision, and achieve transcoding time reductions of up to 62% with small loss of rate-distortion performance.
Lei SUN Jie LENG Jia SU Yiqing HUANG Hiroomi MOTOHASHI Takeshi IKENAGA
Scalable Video Coding (SVC) was standardized as an extension of H.264/AVC with the intention to provide flexible adaptation to heterogeneous networks and different end-user requirements, which provides great scalability in multi-point applications such as video conferencing. However, due to the existence of H.264/AVC-based systems, transcoding between AVC and SVC becomes necessary. Most existing works focus on temporal transcoding, quality transcoding or SVC-to-AVC spatial transcoding while the straightforward re-encoding method requires high computational cost. This paper proposes a low-complexity AVC-to-SVC spatial transcoder based on coarse-level mode mapping for video conferencing scenes. First, to omit unnecessary motion estimations (ME) for layers with reduced resolution, an ME skipping scheme based on AVC mode distribution is proposed with an adaptive search range. Then a probability-profile based scheme is proposed for further mode skipping. After that 3 coarse-level mode-mapping methods are presented for fast mode decision and the adaptive usage of the 3 methods is discussed. Finally, motion vector (MV) refinement is introduced for further lower-layer time reduction. As for the top layer, direct encapsulation is proposed to preserve better quality and another scheme involving inter-layer predictions is also provided for bandwidth-crucial applications. Simulation results show that proposed transcoder achieves up to 92.6% time reduction without significant coding efficiency loss compared to re-encoding method.
Xianghui WEI Takeshi IKENAGA Satoshi GOTO
Motion estimation (ME) is a computation and data intensive module in video coding system. The search window reuse methods play a critical role in bandwidth reduction by exploiting the data locality in video coding system. In this paper, a search window reuse method (Level C+) is proposed for MPEG-2 to H.264/AVC transcoding. The proposed method is designed for ultra-low bandwidth application, while the on-chip memory is not a main constraining factor. By loading search window for the motion estimation unit (MEU) and applying motion vector clipping processing, each MB in MEU can utilize both horizontal and vertical search reuse. A very low bandwidth level (Rα<2) can be achieved with an acceptable on-chip memory.
I Gusti Bagus Baskara NUGRAHA Hiroyoshi MORITA
Delivering video streaming service over the Internet encounters some challenges. Two of them are heterogeneity of networks capacity and variability of video data rate. The capacity of network segments are constrained. Meanwhile, the rate of video data to be transmitted is highly variable in order to get near-constant images quality. Therefore, to send variable bit rate (VBR) video data over capacity-constrained network, its bit rate should be adjusted. In this paper a system to adjust the transmission bit rate of VBR MPEG video data called Transcoding-after-Smoothing (TaS), which is a combination of bit rate transcoding and bit rate smoothing algorithm, is proposed. The system smoothes out transmission rate of video data while at the same time also performs transcoding on some video frames when necessary in order to keep the transmission bit rate below a certain threshold value. Two kinds of TaS methods are proposed. One method does not have transcoding preference, while the other method uses frame type preference where an intra-coded frame is the last one that will be transcoded. These methods are implemented in our video server where a VBR video data is accessed by a client. Our experiment results show that the first TaS method yields significant reduction in the number of transcoded frames compared with the second TaS method and conventional frame-level transcoding. However, the second method performs better in minimizing the quality distortion.
Xiang-Hui WEI Shen LI Yang SONG Satoshi GOTO
Motion estimation (ME) is a computation-intensive module in video coding system. In MPEG-2 to H.264 transcoding, motion vector (MV) from MPEG-2 reused as search center in H.264 encoder is a simple but effective technique to simplify ME processing. However, directly applying MPEG-2 MV as search center will bring difficulties on application of data reuse method in hardware design, because the irregular overlapping of search windows between successive macro block (MB). In this paper, we propose a search window reuse scheme for transcoding, especially for HDTV application. By utilizing the similarity between neighboring MV, overlapping area of search windows can be regularized. Experiment results show that our method achieves average 93.1% search window reuse-rate in HDTV720p sequence with almost no video quality degradation. Compared to transcoding method without any data reuse scheme, bandwidth of the proposed method can be reduced to 40.6% of that.
A rate control algorithm for logo insertion which does not require full decoding and encoding in compressed video is proposed. A perceptual approach is adopted in order to reduce the distortion introduced by the rate control. The start position of rate control is randomly varied for each frame so that the perceptual distortion is evenly dispersed across the whole picture. The number of rate-controlled slices is changed instead of the quantization scale in order to maintain original bit rate. Simulations show that the original bit rate can be maintained by the rate control without noticeable distortion. The proposed rate control algorithm can be easily extended to other transcoding applications.
Min-Cheol HWANG Seung-Kyun KIM Sung-Jea KO
Existing methods for inverse motion compensation (IMC) in the DCT domain have not considered the unrestricted motion vector (UMV). In the existing methods, IMC is performed to deal with the UMV in the spatial domain after the inverse DCT (IDCT). We propose an IMC method which can deal with the UMV directly in the DCT domain without the use of the IDCT/DCT required by the existing methods. The computational complexity of the proposed method can be reduced by about half of that of the brute-force method operating in the spatial domain. Experimental results show that the proposed method can efficiently reduce the processing time with similar visual quality.
Shen LI Lingfeng LI Takeshi IKENAGA Shunichi ISHIWATA Masataka MATSUI Satoshi GOTO
The coexistence of MPEG-2 and its powerful successor H.264/AVC has created a huge need for MPEG-2/H.264 video transcoding. However, a traditional transcoder where an MPEG-2 decoder is simply cascaded to an H.264 encoder requires huge computational power due to the adoption of a complicated rate-distortion based mode decision process in H.264. This paper proposes a 2-D Sobel filter based motion vector domain method and a DCT domain method to measure macroblock complexity and realize content-based H.264 candidate mode decision. A new local edge based fast INTRA prediction mode decision method is also adopted to boost the encoding efficiency. Simulation results confirm that with the proposed methods the computational burden of a traditional transcoder can be reduced by 20%30% with only a negligible bit-rate increase for a wide range of video sequences.
Jae-Won KIM Goo-Rak KWON June-Sok LEE Nam-Hyeong KIM Sung-Jea KO
Video transcoding technique is an efficient mechanism to deliver visual contents to a variety of users who have different network conditions or terminal devices with different display capabilities. In this paper, we propose two types of transcoding methods for adapting the bitrate of streaming video to the bandwidth of the transmission channel; spatial resolution reduction (SRR) transcoding and temporal resolution reduction (TRR) transcoding. The two transcoding methods are alternatively operated according to the requirements of users. Experimental results show that the proposed transcoding methods can preserve image quality while transcoding to the low bitrate.
Yusuke HIWASAKI Hitoshi OHMURO Takeshi MORI Sachiko KURIHARA Akitoshi KATAOKA
This paper proposes a wideband speech coder in which a G.711 bitstream is embedded. This coder has an advantage over conventional coders in that it has a high interoperability with existing terminals so costly transcoding involving decoding and re-encoding can be avoided. We also propose a partial mixing method that effectively reduces the mixing complexity in multiple-point remote conferences. To reduce the complexity, we take advantage of the scalable structure of the bitstream and mix only the lower band of the signal. For the higher band, the main speaker location is selected among remote locations and is redistributed with the mixed lower-band signal. By subjective evaluations, we show that the speech quality can be maintained even when the speech signals are partially mixed.
When the frame size is downscaled for video transcoding, the new motion vector (MV) must be computed. This paper presents an algorithm to utilize the activity measurement by DC value and the number of non-zero quantized DCT coefficients in the residual macroblock to compose the motion vector. It can reduce the complexity for motion estimation and improve the performance of the spatial domain video transcoder.
Yasuo SAMBE Shintaro WATANABE Dong YU Taichi NAKAMURA Naoki WAKAMIYA
This paper describes a distributed video transcoding system that can simultaneously transcode an MPEG-2 video file into various video coding formats with different rates. The transcoder divides the MPEG-2 file into small segments along the time axis and transcodes them in parallel. Efficient video segment handling methods are proposed that minimize the inter-processor communication overhead and eliminate temporal discontinuities from the re-encoded video. We investigate how segment transcoding should be distributed to obtain the shortest total transcoding time. Experimental results show that implementing distributed transcoding on 10 PCs can decrease the total transcoding time by a factor of about 7 for single transcoding and by a factor of 9.5 for simultaneous three kinds of transcoding rates.
When video data are transmitted via the network, the quality of video data must be carefully chosen to be best under the condition that the transmission is not influenced by other internet services. They often use the simulcast type, which uses independent streams that are stored and transmitted for the quality, considering implementation, when they select the video quality. On the other hand, we had already proposed the scalable structure, which consists of base and enhancement data, but when they require the high quality video, these data are combined using the transcoding methods. In this paper, we propose the video contents delivery methods with scalable transcoding, in which users can update the quality of video data even after the transmission by base data and differential data. In order to reduce the total time of not only users' access time, but also watching time, we compare simulcast method with proposed methods in the total content utilization time using a video contents access model, and evaluate required transcoding time to reduce the waiting time of users.
As audio and video applications have proliferated on the Internet, transcoding proxy caching has been considered as an important technique for improving network performance, especially for mobile networks. Due to several new emerging factors in the transcoding proxy, existing methods for proxy placement for web caching cannot be simply applied to solve the problem of proxy placement for transcoding proxy caching. This paper addresses the problem of proxy placement for coordinated en-route transcoding proxy caching for tree networks. We propose a model for this problem by including the new emerging factors in the transcoding proxy and present optimal solutions for this problem with/without constraints on the number of transcoding proxies using dynamic programming. Finally, we implement our algorithm and evaluate our model on various performance metrics through extensive simulation experiments. The implementation results show that our model outperforms the existing model for transcoding proxy placement for linear topology, as well as the random proxy placement model. The average improvements of our model over the other models are about 7.2 percent and 21.4 percent in terms of all the performance metrics considered.