The search functionality is under construction.

Author Search Result

[Author] Satoshi GOTO(92hit)

1-20hit(92hit)

  • An Integrated Hole-Filling Algorithm for View Synthesis

    Wenxin YU  Weichen WANG  Minghui WANG  Satoshi GOTO  

     
    PAPER

      Vol:
    E96-A No:6
      Page(s):
    1306-1314

    Multi-view video can provide users with three-dimensional (3-D) and virtual reality perception through multiple viewing angles. In recent years, depth image-based rendering (DIBR) has been generally used to synthesize virtual view images in free viewpoint television (FTV) and 3-D video. To conceal the zero-region more accurately and improve the quality of a virtual view synthesized frame, an integrated hole-filling algorithm for view synthesis is proposed in this paper. The proposed algorithm contains five parts: an algorithm for distinguishing different regions, foreground and background boundary detection, texture image isophotes detection, a textural and structural isophote prediction algorithm, and an in-painting algorithm with gradient priority order. Based on the texture isophote prediction with a geometrical principle and the in-painting algorithm with a gradient priority order, the boundary information of the foreground is considerably clearer and the texture information in the zero-region can be concealed much more accurately than in previous works. The vision quality mainly depends on the distortion of the structural information. Experimental results indicate that the proposed algorithm improves not only the objective quality of the virtual image, but also its subjective quality considerably; human vision is also clearly improved based on the subjective results. In particular, the algorithm ensures the boundary contours of the foreground objects and the textural and structural information.

  • Content Adaptive Hierarchical Decision of Variable Coding Block Sizes in High Efficiency Video Coding for High Resolution Videos

    Guifen TIAN  Xin JIN  Satoshi GOTO  

     
    PAPER-Digital Signal Processing

      Vol:
    E96-A No:4
      Page(s):
    780-789

    The quadtree-based variable block sized prediction makes the biggest contribution for dramatically improved coding efficiency in the new video coding standard named HEVC. However, this technique takes about 75–80% computational complexity of an HEVC encoder. This paper brings forward an adaptive scheme that exploits temporal, spatial and transform-domain features to speed up the original quadtree-based prediction, targeting at high resolution videos. Before encoding starts, analysis on utilization ratio of each coding depth is performed to skip rarely adopted coding depths at frame level. Then, texture complexity (TC) measurement is applied to filter out none-contributable coding blocks for each largest coding unit (LCU). In this step, a dynamic threshold setting approach is proposed to make filtering adaptable to videos and coding parameters. Thirdly, during encoding process, sum of absolute quantized residual coefficient (SAQC) is used as criterion to prune useless coding blocks for both LCUs and 3232 blocks. By using proposed scheme, motion estimation is performed for prediction blocks within a narrowed range. Experiments show that proposed scheme outperforms existing works and speeds up original HEVC by a factor of up to 61.89% and by an average of 33.65% for 4kx2k video sequences. Meanwhile, the peak signal-to-noise ratio (PSNR) degradation and bit increment are trivial.

  • Pedestrian Detection Using Gradient Local Binary Patterns

    Ning JIANG  Jiu XU  Satoshi GOTO  

     
    PAPER-Coding & Processing

      Vol:
    E95-A No:8
      Page(s):
    1280-1287

    In recent years, local pattern based features have attracted increasing interest in object detection and recognition systems. Local Binary Pattern (LBP) feature is widely used in texture classification and face detection. But the original definition of LBP is not suitable for human detection. In this paper, we propose a novel feature named gradient local binary patterns (GLBP) for human detection. In this feature, original 256 local binary patterns are reduced to 56 patterns. These 56 patterns named uniform patterns are used for generating a 56-bin histogram. And gradient value of each pixel is set as the weight which is always same in LBP based features in histogram calculation to computing the values in 56 bins for histogram. Experiments are performed on INRIA dataset, which shows the proposal GLBP feature is discriminative than histogram of orientated gradient (HOG), Semantic Local Binary Patterns (S-LBP) and histogram of template (HOT). In our experiments, the window size is fixed. That means the performance can be improved by boosting methods. And the computation of GLBP feature is parallel, which make it easy for hardware acceleration. These factors make GLBP feature possible for real-time pedestrian detection.

  • A 48 Cycles/MB H.264/AVC Deblocking Filter Architecture for Ultra High Definition Applications

    Dajiang ZHOU  Jinjia ZHOU  Jiayi ZHU  Satoshi GOTO  

     
    PAPER-Embedded, Real-Time and Reconfigurable Systems

      Vol:
    E92-A No:12
      Page(s):
    3203-3210

    In this paper, a highly parallel deblocking filter architecture for H.264/AVC is proposed to process one macroblock in 48 clock cycles and give real-time support to QFHD@60 fps sequences at less than 100 MHz. 4 edge filters organized in 2 groups for simultaneously processing vertical and horizontal edges are applied in this architecture to enhance its throughput. While parallelism increases, pipeline hazards arise owing to the latency of edge filters and data dependency of deblocking algorithm. To solve this problem, a zig-zag processing schedule is proposed to eliminate the pipeline bubbles. Data path of the architecture is then derived according to the processing schedule and optimized through data flow merging, so as to minimize the cost of logic and internal buffer. Meanwhile, the architecture's data input rate is designed to be identical to its throughput, while the transmission order of input data can also match the zig-zag processing schedule. Therefore no intercommunication buffer is required between the deblocking filter and its previous component for speed matching or data reordering. As a result, only one 2464 two-port SRAM as internal buffer is required in this design. When synthesized with SMIC 130 nm process, the architecture costs a gate count of 30.2 k, which is competitive considering its high performance.

  • Permutation Network for Reconfigurable LDPC Decoder Based on Banyan Network

    Xiao PENG  Zhixiang CHEN  Xiongxin ZHAO  Fumiaki MAEHARA  Satoshi GOTO  

     
    PAPER

      Vol:
    E93-C No:3
      Page(s):
    270-278

    Since the structured quasi-cyclic low-density parity-check (QC-LDPC) codes for most modern wireless communication systems include multiple code rates, various block lengths, and the corresponding different sizes of submatrices in parity check matrix (PCM), the reconfigurable LDPC decoder is desirable and the permutation network is needed to accommodate any input number (IN) and shift number (SN) for cyclic shift. In this paper, we propose a novel permutation network architecture for the reconfigurable QC-LDPC decoders based on Banyan network. We prove that Banyan network has the nonblocking property for cyclic shift when the IN is power of 2, and give the control signal generating algorithm. Through introducing the bypass network, we put forward the nonblocking scheme for any IN and SN. In addition, we present the hardware design of the control signal generator, which can greatly reduce the hardware complexity and latency. The synthesis results using the TSMC 0.18 µm library demonstrate that the proposed permutation network can be implemented with the area of 0.546 mm2 and the frequency of 292 MHz.

  • Fast SAO Estimation Algorithm and Its Implementation for 8K×4K @ 120 FPS HEVC Encoding

    Jiayi ZHU  Dajiang ZHOU  Shinji KIMURA  Satoshi GOTO  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E97-A No:12
      Page(s):
    2488-2497

    High efficiency video coding (HEVC) is the new generation video compression standard. Sample adaptive offset (SAO) is a new compression tool adopted in HEVC which reduces the distortion between original samples and reconstructed samples. SAO estimation is the process of determining SAO parameters in video encoding. It is divided into two phases: statistic collection and parameters determination. There are two difficulties for VLSI implementation of SAO estimation. The first is that there are huge amount of samples to deal with in statistic collection phase. The other is that the complexity of Rate Distortion Optimization (RDO) in parameters determination phase is very high. In this article, a fast SAO estimation algorithm and its corresponding VLSI architecture are proposed. For the first difficulty, we use bitmaps to collect statistics of all the 16 samples in one 4×4 block simultaneously. For the second difficulty, we simplify a series of complicated procedures in HM to balance the algorithms complexity and BD-rate performance. Experimental results show that the proposed algorithm maintains the picture quality improvement. The VLSI design based on this algorithm can be implemented using 156.32K gates, 8,832bits single port RAM for 8bits depth case. It can be synthesized to 400MHz @ 65nm technology and is capable of 8K×4K @ 120fps encoding.

  • Real-Time UHD Background Modelling with Mixed Selection Block Updates

    Axel BEAUGENDRE  Satoshi GOTO  Takeshi YOSHIMURA  

     
    PAPER-IMAGE PROCESSING

      Vol:
    E100-A No:2
      Page(s):
    581-591

    The vast majority of foreground detection methods require heavy hardware optimization to process in real-time standard definition videos. Indeed, those methods process the whole frame for the detection but also for the background modelling part which makes them resource-guzzlers (time, memory, etc.) unable to be applied to Ultra High Definition (UHD) videos. This paper presents a real-time background modelling method called Mixed Block Background Modelling (MBBM). It is a spatio-temporal approach which updates the background model by carefully selecting block by a linear and pseudo-random orders and update the corresponding model's block parts. The two block selection orders make sure that every block will be updated. For foreground detection purposes, the method is combined with a foreground detection designed for UHD videos such as the Adaptive Block-Propagative Background Subtraction method. Experimental results show that the proposed MBBM can process 50min. of 4K UHD videos in less than 6 hours. while other methods are estimated to take from 8 days to more than 21 years. Compared to 10 state-of-the-art foreground detection methods, the proposed MBBM shows the best quality results with an average global quality score of 0.597 (1 being the maximum) on a dataset of 4K UHDTV sequences containing various situation like illumination variation. Finally, the processing time per pixel of the MBBM is the lowest of all compared methods with an average of 3.18×10-8s.

  • Greedy Algorithm for the On-Chip Decoupling Capacitance Optimization to Satisfy the Voltage Drop Constraint

    Mikiko SODE TANAKA  Nozomu TOGAWA  Masao YANAGISAWA  Satoshi GOTO  

     
    PAPER-Physical Level Design

      Vol:
    E94-A No:12
      Page(s):
    2482-2489

    With the progress of process technology in recent years, low voltage power supplies have become quite predominant. With this, the voltage margin has decreased and therefore the on-chip decoupling capacitance optimization that satisfies the voltage drop constraint becomes more important. In addition, the reduction of the on-chip decoupling capacitance area will reduce the chip area and, therefore, manufacturing costs. Hence, we propose an algorithm that satisfies the voltage drop constraint and at the same time, minimizes the total on-chip decoupling capacitance area. The proposed algorithm uses the idea of the network algorithm where the path which has the most influence on voltage drop is found. Voltage drop is improved by adding the on-chip capacitance to the node on the path. The proposed algorithm is efficient and effectively adds the on-chip capacitance to the greatest influence on the voltage drop. Experimental results demonstrate that, with the proposed algorithm, real size power/ground network could be optimized in just a few minutes which are quite practical. Compared with the conventional algorithm, we confirmed that the total on-chip decoupling capacitance area of the power/ground network was reducible by about 4050%.

  • Correlated Noise Reduction for Electromagnetic Analysis

    Hongying LIU  Xin JIN  Yukiyasu TSUNOO  Satoshi GOTO  

     
    PAPER-Implementation

      Vol:
    E96-A No:1
      Page(s):
    185-195

    Electromagnetic emissions leak confidential data of cryptographic devices. Electromagnetic Analysis (EMA) exploits such emission for cryptanalysis. The performance of EMA dramatically decreases when correlated noise, which is caused by the interference of clock network and exhibits strong correlation with encryption signal, is present in the acquired EM signal. In this paper, three techniques are proposed to reduce the correlated noise. Based on the observation that the clock signal has a high variance at the signal edges, the first technique: single-sample Singular Value Decomposition (SVD), extracts the clock signal with only one EM sample. The second technique: multi-sample SVD is capable of suppressing the clock signal with short sampling length. The third one: averaged subtraction is suitable for estimation of correlated noise when background samplings are included. Experiments on the EM signal during AES encryption on the FPGA and ASIC implementation demonstrate that the proposed techniques increase SNR as much as 22.94 dB, and the success rates of EMA show that the data-independent information is retained and the performance of EMA is improved.

  • A Contour-Based Robust Algorithm for Text Detection in Color Images

    Yangxing LIU  Satoshi GOTO  Takeshi IKENAGA  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E89-D No:3
      Page(s):
    1221-1230

    Text detection in color images has become an active research area in the past few decades. In this paper, we present a novel approach to accurately detect text in color images possibly with a complex background. The proposed algorithm is based on the combination of connected component and texture feature analysis of unknown text region contours. First, we utilize an elaborate color image edge detection algorithm to extract all possible text edge pixels. Connected component analysis is performed on these edge pixels to detect the external contour and possible internal contours of potential text regions. The gradient and geometrical characteristics of each region contour are carefully examined to construct candidate text regions and classify part non-text regions. Then each candidate text region is verified with texture features derived from wavelet domain. Finally, the Expectation maximization algorithm is introduced to binarize each text region to prepare data for recognition. In contrast to previous approach, our algorithm combines both the efficiency of connected component based method and robustness of texture based analysis. Experimental results show that our proposed algorithm is robust in text detection with respect to different character size, orientation, color and language and can provide reliable text binarization result.

  • Adaptive Block-Propagative Background Subtraction Method for UHDTV Foreground Detection

    Axel BEAUGENDRE  Satoshi GOTO  

     
    PAPER-Image

      Vol:
    E98-A No:11
      Page(s):
    2307-2314

    This paper presents an Adapting Block-Propagative Background Subtraction (ABPBGS) designed for Ultra High Definition Television (UHDTV) foreground detection. The main idea is to detect block after block along the objects in order to skip all areas of the image in which there is no moving object. This is particularly interesting for UHDTV when the objects of interest could represent not even 0.1% of the total area. From a seed block which is determined in a previous iteration, the detection will spread along an object as long as it detects a part of that object. A block history map guaranties that each block is processed only once. Moreover, only small blocks are loaded and processed, thus saving computational time and memory usage. The process of each block is independent enough to be easily parallelized. Compared to 9 state-of-the-art works, the ABPBGS achieved the best results with an average global quality score of 0.57 (1 being the maximum) on a dataset of 4K and 8K UHDTV sequences developed for this work. None of the state-of-the-art methods could process 4K videos in reasonable time while the ABPBGS has shown an average speed of 5.18fps. In comparison, 5 of the 9 state-of-the-art methods performed slower on 270p down-scale version of the same videos. The experiments have also shown that for the process an 8K UHDTV video the ABPBGS can divide the memory required by about 24 for a total of 450MB.

  • Partially-Parallel LDPC Decoder Achieving High-Efficiency Message-Passing Schedule

    Kazunori SHIMIZU  Tatsuyuki ISHIKAWA  Nozomu TOGAWA  Takeshi IKENAGA  Satoshi GOTO  

     
    PAPER

      Vol:
    E89-A No:4
      Page(s):
    969-978

    In this paper, we propose a partially-parallel LDPC decoder which achieves a high-efficiency message-passing schedule. The proposed LDPC decoder is characterized as follows: (i) The column operations follow the row operations in a pipelined architecture to ensure that the row and column operations are performed concurrently. (ii) The proposed parallel pipelined bit functional unit enables the column operation module to compute every message in each bit node which is updated by the row operations. These column operations can be performed without extending the single iterative decoding delay when the row and column operations are performed concurrently. Therefore, the proposed decoder performs the column operations more frequently in a single iterative decoding, and achieves a high-efficiency message-passing schedule within the limited decoding delay time. Hardware implementation on an FPGA and simulation results show that the proposed partially-parallel LDPC decoder improves the decoding throughput and bit error performance with a small hardware overhead.

  • An Unequal Secure Encryption Scheme for H.264/AVC Video Compression Standard

    Yibo FAN  Jidong WANG  Takeshi IKENAGA  Yukiyasu TSUNOO  Satoshi GOTO  

     
    PAPER-Symmetric Cryptography

      Vol:
    E91-A No:1
      Page(s):
    12-21

    H.264/AVC is the newest video coding standard. There are many new features in it which can be easily used for video encryption. In this paper, we propose a new scheme to do video encryption for H.264/AVC video compression standard. We define Unequal Secure Encryption (USE) as an approach that applies different encryption schemes (with different security strength) to different parts of compressed video data. This USE scheme includes two parts: video data classification and unequal secure video data encryption. Firstly, we classify the video data into two partitions: Important data partition and unimportant data partition. Important data partition has small size with high secure protection, while unimportant data partition has large size with low secure protection. Secondly, we use AES as a block cipher to encrypt the important data partition and use LEX as a stream cipher to encrypt the unimportant data partition. AES is the most widely used symmetric cryptography which can ensure high security. LEX is a new stream cipher which is based on AES and its computational cost is much lower than AES. In this way, our scheme can achieve both high security and low computational cost. Besides the USE scheme, we propose a low cost design of hybrid AES/LEX encryption module. Our experimental results show that the computational cost of the USE scheme is low (about 25% of naive encryption at Level 0 with VEA used). The hardware cost for hybrid AES/LEX module is 4678 Gates and the AES encryption throughput is about 50 Mbps.

  • A VLSI Architecture for Variable Block Size Motion Estimation in H.264/AVC with Low Cost Memory Organization

    Yang SONG  Zhenyu LIU  Takeshi IKENAGA  Satoshi GOTO  

     
    PAPER-VLSI Architecture

      Vol:
    E89-A No:12
      Page(s):
    3594-3601

    A one-dimensional (1-D) full search variable block size motion estimation (VBSME) architecture is presented in this paper. By properly choosing the partial sum of absolute differences (SAD) registers and scheduling the addition operations, the architecture can be implemented with simple control logic and regular workflow. Moreover, only one single-port SRAM is used to store the search area data. The design is realized in TSMC 0.18 µm 1P6M technology with a hardware cost of 67.6K gates. In typical working conditions (1.8 V, 25), a clock frequency of 266 MHz can be achieved.

  • Histogram of Template for Pedestrian Detection

    Shaopeng TANG  Satoshi GOTO  

     
    PAPER

      Vol:
    E93-D No:7
      Page(s):
    1737-1744

    In this paper, we propose a novel feature named histogram of template (HOT) for human detection in still images. For every pixel of an image, various templates are defined, each of which contains the pixel itself and two of its neighboring pixels. If the texture and gradient values of the three pixels satisfy a pre-defined formula, the central pixel is regarded to meet the corresponding template for this formula. Histograms of pixels meeting various templates are calculated for a set of formulas, and combined to be the feature for detection. Compared to the other features, the proposed feature takes texture as well as the gradient information into consideration. Besides, it reflects the relationship between 3 pixels, instead of focusing on only one. Experiments for human detection are performed on INRIA dataset, which shows the proposed HOT feature is more discriminative than histogram of orientated gradient (HOG) feature, under the same training method.

  • All-Zero Block-Based Optimization for Quadtree-Structured Prediction and Residual Encoding in High Efficiency Video Coding

    Guifen TIAN  Xin JIN  Satoshi GOTO  

     
    PAPER-Digital Signal Processing

      Vol:
    E96-A No:4
      Page(s):
    769-779

    High Efficiency Video Coding (HEVC) outperforms H.264 High Profile with bitrate saving of about 43%, mostly because block sizes for hybrid prediction and residual encoding are recursively chosen using a quadtree structure. Nevertheless, the exhaustive quadtree-based partition is not always necessary. This paper takes advantage of all-zero residual blocks at every quadtree depth to accelerate the prediction and residual encoding processes. First, we derive a near-sufficient condition to detect variable-sized all-zero blocks (AZBs). For these blocks, discrete cosine transform (DCT) and quantization can be skipped. Next, using the derived condition, we propose an early termination technique to reduce the complexity for motion estimation (ME). More significantly, we present a two-dimensional pruning technique based on AZBs to constrain prediction units (PU) that contribute negligibly to rate-distortion (RD) performance. Experiments on a wide range of videos with resolution ranging from 416240 to 4k2k, show that the proposed scheme can reduce computational complexity for the HEVC encoder by up to 70.46% (50.34% on average), with slight loss in terms of the peak signal-to-noise ratio (PSNR) and bitrate. The proposal also outperforms other state-of-the-art methods by achieving greater complexity reduction and improved bitrate performance.

  • An Ultra-Low Bandwidth Design Method for MPEG-2 to H.264/AVC Transcoding

    Xianghui WEI  Takeshi IKENAGA  Satoshi GOTO  

     
    PAPER

      Vol:
    E92-A No:4
      Page(s):
    1072-1079

    Motion estimation (ME) is a computation and data intensive module in video coding system. The search window reuse methods play a critical role in bandwidth reduction by exploiting the data locality in video coding system. In this paper, a search window reuse method (Level C+) is proposed for MPEG-2 to H.264/AVC transcoding. The proposed method is designed for ultra-low bandwidth application, while the on-chip memory is not a main constraining factor. By loading search window for the motion estimation unit (MEU) and applying motion vector clipping processing, each MB in MEU can utilize both horizontal and vertical search reuse. A very low bandwidth level (Rα<2) can be achieved with an acceptable on-chip memory.

  • Voltage and Level-Shifter Assignment Driven Floorplanning

    Bei YU  Sheqin DONG  Song CHEN  Satoshi GOTO  

     
    PAPER-Physical Level Desing

      Vol:
    E92-A No:12
      Page(s):
    2990-2997

    Low Power Design has become a significant requirement when the CMOS technology entered the nanometer era. Multiple-Supply Voltage (MSV) is a popular and effective method for both dynamic and static power reduction while maintaining performance. Level shifters may cause area and Interconnect Length Overhead (ILO), and should be considered at both floorplanning and post-floorplanning stages. In this paper, we propose a two phases algorithm framework, called VLSAF, to solve voltage and level shifter assignment problem. At floorplanning phase, we use a convex cost network flow algorithm to assign voltage and a minimum cost flow algorithm to handle level-shifter assignment. At post-floorplanning phase, a heuristic method is adopted to redistribute white spaces and calculate the positions and shapes of level shifters. The experimental results show VLSAF is effective.

  • A 7-Die 3D Stacked 3840×2160@120 fps Motion Estimation Processor

    Shuping ZHANG  Jinjia ZHOU  Dajiang ZHOU  Shinji KIMURA  Satoshi GOTO  

     
    PAPER

      Vol:
    E100-C No:3
      Page(s):
    223-231

    In this paper, a hamburger architecture with a 3D stacked reconfigurable memory is proposed for a 4K motion estimation (ME) processor. By positioning the memory dies on both the top and bottom sides of the processor die, the proposed hamburger architecture can reduce the usage of the signal through-silicon via (TSV), and balance the power delivery network and the clock tree of the entire system. It results in 1/3 reduction of the usage of signal TSVs. Moreover, a stacked reconfigurable memory architecture is proposed to reduce the fabrication complexity and further reduce the number of signal TSVs by more than 1/2. The reduction of signal TSVs in the entire design is 71.24%. Finally, we address unique issues that occur in electronic design automation (EDA) tools during 3D large-scale integration (LSI) designs. As a result, a 4K ME processor with 7-die stacking 3D system-on-chip design is implemented. The proposed design can support real time 3840 × 2160 @ 120 fps encoding at 130 MHz with less than 540 mW.

  • A 5.83pJ/bit/iteration High-Parallel Performance-Aware LDPC Decoder IP Core Design for WiMAX in 65nm CMOS

    Xiongxin ZHAO  Zhixiang CHEN  Xiao PENG  Dajiang ZHOU  Satoshi GOTO  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E96-A No:12
      Page(s):
    2623-2632

    In this paper, we propose a synthesizable LDPC decoder IP core for the WiMAX system with high parallelism and enhanced error-correcting performance. By taking the advantages of both layered scheduling and fully-parallel architecture, the decoder can fully support multi-mode decoding specified in WiMAX with the parallelism much higher than commonly used partial-parallel layered LDPC decoder architecture. 6-bit quantized messages are split into bit-serial style and 2bit-width serial processing lines work concurrently so that only 3 cycles are required to decode one layer. As a result, 12∼24 cycles are enough to process one iteration for all the code-rates specified in WiMAX. Compared to our previous bit-serial decoder, it doubles the parallelism and solves the message saturation problem of the bit-serial arithmetic, with minor gate count increase. Power synthesis result shows that the proposed decoder achieves 5.83pJ/bit/iteration energy efficiency which is 46.8% improvement compared to state-of-the-art work. Furthermore, an advanced dynamic quantization (ADQ) technique is proposed to enhance the error-correcting performance in layered decoder architecture. With about 2% area overhead, 6-bit ADQ can achieve the error-correcting performance close to 7-bit fixed quantization with improved error floor performance.

1-20hit(92hit)