1-9hit |
Tingting HU Ryuji FUCHIKAMI Takeshi IKENAGA
High frame rate and ultra-low delay vision system, which can finish reading and processing of 1000fps sequence within 1ms/frame, draws increasing attention in the field of robotics that requires immediate feedback from image process core. Meanwhile, tracking task plays an important role in many computer vision applications. Among various tracking algorithms, Lucas Kanade (LK)-based template tracking, which tracks targets with high accuracy over the sub-pixel level, is one of the keys for robotic applications, such as factory automation (FA). However, the substantial spatial iterative processing and complex computation in the LK algorithm, make it difficult to achieve a high frame rate and ultra-low delay tracking with limited resources. Aiming at an LK-based template tracking system that reads and processes 1000fps sequences within 1ms/frame with small resource costs, this paper proposes: 1) High temporal resolution-based temporal iterative tracking, which maps the spatial iterations into the temporal domain, efficiently reduces resource cost and delay caused by spatial iterative processing. 2) Label scanner-based multi-stream spatial processing, which maps the local spatial processing into the labeled input pixel stream and aggregates them with a label scanner, makes the local spatial processing in the LK algorithm possible be implemented with a small resource cost. Algorithm evaluation shows that the proposed temporal iterative tracking performs dynamic tracking, which tracks object with coarse accuracy when it's moving fast and achieves higher accuracy when it slows down. Hardware evaluation shows that the proposed label scanner-based multi-stream architecture makes the system implemented on FPGA (zcu102) with resource cost less than 20%, and the designed tracking system supports to read and process 1000fps sequence within 1ms/frame.
Yuya OMORI Ken NAKAMURA Takayuki ONISHI Daisuke KOBAYASHI Tatsuya OSAWA Hiroe IWASAKI
This paper describes a novel 4K 120fps (frames per second) real-time HEVC (High Efficiency Video Coding) encoder for high-frame-rate video encoding and transmission. Motion portrayal problems such as motion blur and jerkiness may occur in video scenes containing fast-moving objects or quick camera panning. A high-frame-rate solves such problems and provides a more immersive viewing experience that can express even the fast-moving scenes without discomfort. It can also be used in remote operation for scenes with high motion, such as VAR (Video Assistant Referee) systems in sports. Real-time encoding of high-frame-rate videos with low latency and temporal scalability is required for providing such high-frame-rate video services. The proposed encoder achieves full 4K/120fps real-time encoding, which is twice the current 4K service frame rate of 60fps, by multichip configuration with two encoder LSI. Exchange of reference picture data near a spatially divided slice boundary provides cross-chip motion estimation, and maintains the coding efficiency. The encoder supports temporal-scalable coding mode, in which it output stream with temporal scalability transmitted over one or two transmission paths. The encoder also supports the other mode, low-delay coding mode, in which it achieves 21.8msec low-latency processing through motion vector restriction. Evaluation of the proposed encoder's multichip configuration shows that the BD-bitrate (the average rate of bitrate increase), compared to simple slice division without inter-chip transfer, is -2.86% at minimum and -2.41% on average in temporal-scalable coding mode. The proposed encoder system will open the door to the next generation of high-frame-rate UHDTV (ultra-high-definition television) services.
Songlin DU Yuan LI Takeshi IKENAGA
High frame rate and ultra-low delay are the most essential requirements for building excellent human-machine-interaction systems. As a state-of-the-art local keypoint detection and feature extraction algorithm, A-KAZE shows high accuracy and robustness. Nonlinear scale space is one of the most important modules in A-KAZE, but it not only has at least one frame delay and but also is not hardware friendly. This paper proposes a hardware oriented nonlinear scale space for high frame rate and ultra-low delay A-KAZE matching system. In the proposed matching system, one part of nonlinear scale space is temporally forward and calculated in the previous frame (proposal #1), so that the processing delay is reduced to be less than 1 ms. To improve the matching accuracy affected by proposal #1, pre-adjustment of nonlinear scale (proposal #2) is proposed. Previous two frames are used to do motion estimation to predict the motion vector between previous frame and current frame. For further improvement of matching accuracy, pixel-level pre-adjustment (proposal #3) is proposed. The pre-adjustment changes from block-level to pixel-level, each pixel is assigned an unique motion vector. Experimental results prove that the proposed matching system shows average matching accuracy higher than 95% which is 5.88% higher than the existing high frame rate and ultra-low delay matching system. As for hardware performance, the proposed matching system processes VGA videos (640×480 pixels/frame) at the speed of 784 frame/second (fps) with a delay of 0.978 ms/frame.
Songlin DU Zhe WANG Takeshi IKENAGA
High frame rate and ultra-low delay matching system plays an increasingly important role in human-machine interactions, because it guarantees high-quality experiences for users. Existing image matching algorithms always generate mismatches which heavily weaken the performance the human-machine-interactive systems. Although many mismatch removal algorithms have been proposed, few of them achieve real-time speed with high frame rate and low delay, because of complicated arithmetic operations and iterations. This paper proposes a temporal constraints and block weighting judgement based high frame rate and ultra-low delay mismatch removal system. The proposed method is based on two temporal constraints (proposal #1 and proposal #2) to firstly find some true matches, and uses these true matches to generate block weighting (proposal #3). Proposal #1 finds out some correct matches through checking a triangle route formed by three adjacent frames. Proposal #2 further reduces mismatch risk by adding one more time of matching with opposite matching direction. Finally, proposal #3 distinguishes the unverified matches to be correct or incorrect through weighting of each block. Software experiments show that the proposed mismatch removal system achieves state-of-the-art accuracy in mismatch removal. Hardware experiments indicate that the designed image processing core successfully achieves real-time processing of 784fps VGA (640×480 pixels/frame) video on field programmable gate array (FPGA), with a delay of 0.858 ms/frame.
Songlin DU Yuhao XU Tingting HU Takeshi IKENAGA
High frame rate and ultra-low delay matching system plays an important role in various human-machine interactive applications, which demands better performance in matching deformable and out-of-plane rotating objects. Although many algorithms have been proposed for deformation tracking and matching, few of them are suitable for hardware implementation due to complicated operations and large time consumption. This paper proposes a hardware-oriented template update and recovery method for high frame rate and ultra-low delay deformation matching system. In the proposed method, the new template is generated in real time by partially updating the template descriptor and adding new keypoints simultaneously with the matching process in pixels (proposal #1), which avoids the large inter-frame delay. The size and shape of region of interest (ROI) are made flexible and the Hamming threshold used for brute-force matching is adjusted according to pixel position and the flexible ROI (proposal #2), which solves the problem of template drift. The template is recovered by the previous one with a relative center-shifting vector when it is judged as lost via region-wise difference check (proposal #3). Evaluation results indicate that the proposed method successfully achieves the real-time processing of 784fps at the resolution of 640×480 on field-programmable gate array (FPGA), with a delay of 0.808ms/frame, as well as achieves satisfactory deformation matching results in comparison with other general methods.
High frame rate and ultra-low delay matching system plays an increasingly important role in human-machine interactive applications which call for higher frame rate and lower delay for a better experience. The large amount of processing data and the complex computation in a local feature based matching system, make it difficult to achieve a high process speed and ultra-low delay matching with limited resource. Aiming at a matching system with the process speed of more than 1000 fps and with the delay of less than 1 ms/frame, this paper puts forward a local binary feature based matching system with field-programmable gate array (FPGA). Pixel selection based 4-1-4 parallel matching and intensity directed symmetry are proposed for the implementation of this system. To design a basic framework with the high process speed and ultra-low delay using limited resource, pixel selection based 4-1-4 parallel matching is proposed, which makes it possible to use only one-thread resource consumption to achieve a four-thread processing. Assumes that the orientation of the keypoint will bisect the patch best and will point to the region with high intensity, intensity directed symmetry is proposed to calculate the keypoint orientation in a hardware friendly way, which is an important part for a rotation-robust matching system. Software experiment result shows that the proposed keypoint orientation calculation method achieves almost the same performance with the state-of-art intensity centroid orientation calculation method in a matching system. Hardware experiment result shows that the designed image process core supports to process VGA (640×480) videos at a process speed of 1306 fps and with a delay of 0.8083 ms/frame.
Lei SUN Zhenyu LIU Takeshi IKENAGA
Scalable Video Coding (SVC) is an extension of H.264/AVC, aiming to provide the ability to adapt to heterogeneous networks or requirements. It offers great flexibility for bitstream adaptation in multi-point applications such as videoconferencing. However, transcoding between SVC and AVC is necessary due to the existence of legacy AVC-based systems. The straightforward re-encoding method requires great computational cost, and delay-sensitive applications like videoconferencing require much faster transcoding scheme. This paper proposes an ultra-low-delay SVC-to-AVC MGS (Medium-Grain quality Scalability) transcoder for videoconferencing applications. Transcoding is performed in pure frequency domain with partial decoding/encoding in order to achieve significant speed-up. Three fast transcoding methods in frequency domain are proposed for macroblocks with different coding modes in non-KEY pictures. KEY pictures are transcoded by reusing the base layer motion data, and error propagation is constrained between KEY pictures. Simulation results show that proposed transcoder achieves averagely 38.5 times speed-up compared with the re-encoding method, while introducing merely 0.71 dB BDPSNR coding quality loss for videoconferencing sequences as compared with the re-encoding algorithm.
Masayuki MIYAMA Yuusuke INOIE Takafumi KASUGA Ryouichi INADA Masashi NAKAO Yoshio MATSUDA
This paper describes a 158 MS/s JPEG 2000 codec with an embedded block coder (EBC) based on bit-plane and pass-parallel architecture. The EBC contains bit-plane coders (BPCs) corresponding to each bit-plane in a code-block. The upper and the lower bit-plane coding overlap in time with a 1-stripe and 1-column gap. The bit-modeling passes in the bit-plane coding also overlap in time with the same gap. These methods increase throughput by 30 times in comparison with the conventional. In addition, the methods support not only vertically causal mode, but also regular mode, which enhances the image quality. Furthermore, speculative decoding is adopted to increase throughput. This codec LSI was designed using 0.18 µm process. The core area is 4.74.7 mm2 and the frequency is 160 MHz. A system including the codec enables image transmission of PC desktop with 8 ms delay.
In this letter, a numerical design approach for FIR diamond-shaped filter banks (DFB) with perfect reconstruction (PR) and low delay for tree-structured HDTV coding is presented. The system delay of the designed DFB can be controlled below the category of the linear-phase. Moreover, the necessary conditions for the system delay of the designed DFB are derived. The considered problem is formulated as the minimization of the real and imaginary parts of weighted peak ripple errors of the designed analysis filters subject to PR constraints. Simulation example is provided to show the efficacy of this proposed design technique.