The search functionality is under construction.

Author Search Result

[Author] Satoshi GOTO(92hit)

41-60hit(92hit)

  • Reconfigurable Adaptive FEC System Based on Reed-Solomon Code with Interleaving

    Kazunori SHIMIZU  Nozomu TOGAWA  Takeshi IKENAGA  Satoshi GOTO  

     
    PAPER-Adaptive Signal Processing

      Vol:
    E88-D No:7
      Page(s):
    1526-1537

    This paper proposes a reconfigurable adaptive FEC system based on Reed-Solomon (RS) code with interleaving. In adaptive FEC schemes, error correction capability t is changed dynamically according to the communication channel condition. For given error correction capability t, we can implement an optimal RS decoder composed of minimum hardware units for each t. If the hardware units of the RS decoder can be reduced for any given error correction capability t, we can embed as large deinterleaver as possible into the RS decoder for each t. Reconfiguring the RS decoder embedded with the expanded deinterleaver dynamically for each error correction capability t allows us to decode larger interleaved codes which are more robust error correction codes to burst errors. In a reliable transport protocol, experimental results show that our system achieves up to 65% lower packet error rate and 5.9% higher data transmission throughput compared to the adaptive FEC scheme on a conventional fixed hardware system. In an unreliable transport protocol, our system achieves up to 76% better bit error performance with higher code rate compared to the adaptive FEC scheme on a conventional fixed hardware system.

  • Lossless VLSI Oriented Full Computation Reusing Algorithm for H.264/AVC Fractional Motion Estimation

    Ming SHAO  Zhenyu LIU  Satoshi GOTO  Takeshi IKENAGA  

     
    PAPER

      Vol:
    E90-A No:4
      Page(s):
    756-763

    Fractional Motion Estimation (FME) is an advanced feature adopted in H.264/AVC video compression standard with quarter-pixel accuracy. Although FME could gain considerably higher encoding efficiency, sub-pixel interpolation and sum of absolute transformed difference (SATD) computation, as main parts of FME, increase the computation complexity a lot. To reduce the complexity of FME, this paper proposes a full computation reusable VLSI oriented algorithm. Through exploiting the similarity among motion vectors (MVs) of partitions in the same macroblock (MB), temporary computation results can be fully reused. Furthermore, a simple and effective searching method is adopted to make the proposed method more suitable for VLSI implementation. Experiment results show that up to 80% add operations and 85% internal reference frame memory access operations are saved without any degradation in the coding quality.

  • A Hardware Implementation of a Content-Based Motion Estimation Algorithm for Real-Time MPEG-4 Video Coding

    Shen LI  Takeshi IKENAGA  Hideki TAKEDA  Masataka MATSUI  Satoshi GOTO  

     
    PAPER

      Vol:
    E89-A No:4
      Page(s):
    932-940

    Power efficiency and real-time processing capability are two major issues in today's mobile video applications. We proposed a novel Motion Estimation (ME) engine for power-efficient real-time MPEG-4 video coding based on our previously proposed content-based ME algorithm [8],[13]. By adopting Full Search (FS) and Three Step Search (TSS) alternatively according to the nature of video contents, this algorithm keeps the visual quality very close to that of FS with only 3% of its computational power. We designed a flexible Block Matching (BM) Unit with 16-PE SIMD data path so that the adaptive ME can be performed at a much lower clock frequency and hardware cost as compared with previous FS based work. To reduce the energy cost caused by excessive external memory access, on-chip SRAM is also utilized and optimized for parallel processing in the BM Unit. The ME engine is fabricated with TSMC 0.18 µm technology. When processing QCIF (15 fps) video, the estimated power is 2.88 mW@4.16 MHz (supply voltage: 1.62 V). It is believed to be a favorable contribution to the video encoder LSI design for mobile applications.

  • Framework and VLSI Architecture of Measurement-Domain Intra Prediction for Compressively Sensed Visual Contents

    Jianbin ZHOU  Dajiang ZHOU  Li GUO  Takeshi YOSHIMURA  Satoshi GOTO  

     
    PAPER

      Vol:
    E100-A No:12
      Page(s):
    2869-2877

    This paper presents a measurement-domain intra prediction coding framework that is compatible with compressive sensing (CS)-based image sensors. In this framework, we propose a low-complexity intra prediction algorithm that can be directly applied to measurements captured by the image sensor. We proposed a structural random 0/1 measurement matrix, embedding the block boundary information that can be extracted from the measurements for intra prediction. Furthermore, a low-cost Very Large Scale Integration (VLSI) architecture is implemented for the proposed framework, by substituting the matrix multiplication with shared adders and shifters. The experimental results show that our proposed framework can compress the measurements and increase coding efficiency, with 34.9% BD-rate reduction compared to the direct output of CS-based sensors. The VLSI architecture of the proposed framework is 9.1 Kin area, and achieves the 83% reduction in size of memory bandwidth and storage for the line buffer. This could significantly reduce both the energy consumption and bandwidth in communication of wireless camera systems, which are expected to be massively deployed in the Internet of Things (IoT) era.

  • Approximate-DCT-Derived Measurement Matrices with Row-Operation-Based Measurement Compression and its VLSI Architecture for Compressed Sensing

    Jianbin ZHOU  Dajiang ZHOU  Takeshi YOSHIMURA  Satoshi GOTO  

     
    PAPER

      Vol:
    E101-C No:4
      Page(s):
    263-272

    Compressed Sensing based CMOS image sensor (CS-CIS) is a new generation of CMOS image sensor that significantly reduces the power consumption. For CS-CIS, the image quality and data volume of output are two important issues to concern. In this paper, we first proposed an algorithm to generate a series of deterministic and ternary matrices, which improves the image quality, reduces the data volume and are compatible with CS-CIS. Proposed matrices are derived from the approximate DCT and trimmed in 2D-zigzag order, thus preserving the energy compaction property as DCT does. Moreover, we proposed matrix row operations adaptive to the proposed matrix to further compress data (measurements) without any image quality loss. At last, a low-cost VLSI architecture of measurements compression with proposed matrix row operations is implemented. Experiment results show our proposed matrix significantly improve the coding efficiency by BD-PSNR increase of 4.2 dB, comparing with the random binary matrix used in the-state-of-art CS-CIS. The proposed matrix row operations for measurement compression further increases the coding efficiency by 0.24 dB BD-PSNR (4.8% BD-rate reduction). The VLSI architecture is only 4.3 K gates in area and 0.3 mW in power consumption.

  • Hybrid Message-Passing Algorithm and Architecture for Decoding Cyclic Non-binary LDPC Codes

    Yichao LU  Gang HE  Guifen TIAN  Satoshi GOTO  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E96-A No:12
      Page(s):
    2652-2659

    Recently, non-binary low-density parity-check (NB-LDPC) codes starts to show their superiority in achieving significant coding gains when moderate codeword lengths are adopted. However, the overwhelming decoding complexity keeps NB-LDPC codes from being widely employed in modern communication devices. This paper proposes a hybrid message-passing decoding algorithm which consumes very low computational complexity. It achieves competitive error performance compared with conventional Min-max algorithm. Simulation result on a (255,174) cyclic code shows that this algorithm obtains at least 0.5dB coding gain over other state-of-the-art low-complexity NB-LDPC decoding algorithms. A partial-parallel NB-LDPC decoder architecture for cyclic NB-LDPC codes is also developed based on this algorithm. Optimization schemes are employed to cut off hard decision symbols in RAMs and also to store only part of the reliability messages. In addition, the variable node units are redesigned especially for the proposed algorithm. Synthesis results demonstrate that about 24.3% gates and 12% memories can be saved over previous works.

  • Power-Efficient LDPC Decoder Architecture Based on Accelerated Message-Passing Schedule

    Kazunori SHIMIZU  Tatsuyuki ISHIKAWA  Nozomu TOGAWA  Takeshi IKENAGA  Satoshi GOTO  

     
    PAPER-VLSI Architecture

      Vol:
    E89-A No:12
      Page(s):
    3602-3612

    In this paper, we propose a power-efficient LDPC decoder architecture based on an accelerated message-passing schedule. The proposed decoder architecture is characterized as follows: (i) Partitioning a pipelined operation not to read and write intermediate messages simultaneously enables the accelerated message-passing schedule to be implemented with single-port SRAMs. (ii) FIFO-based buffering reduces the number of SRAM banks and words of the LDPC decoder based on the accelerated message-passing schedule. The proposed LDPC decoder keeps a single message for each non-zero bit in a parity check matrix as well as a classical schedule while achieving the accelerated message-passing schedule. Implementation results in 0.18 [µm] CMOS technology show that the proposed decoder architecture reduces an area of the LDPC decoder by 43% and a power dissipation by 29% compared to the conventional architecture based on the accelerated message-passing schedule.

  • A Fine-Grain Scalable and Low Memory Cost Variable Block Size Motion Estimation Architecture for H.264/AVC

    Zhenyu LIU  Yang SONG  Takeshi IKENAGA  Satoshi GOTO  

     
    PAPER-Integrated Electronics

      Vol:
    E89-C No:12
      Page(s):
    1928-1936

    One full search variable block size motion estimation (VBSME) architecture with integer pixel accuracy is proposed in this paper. This proposed architecture has following features: (1) Through widening data path from the search area memories, m processing element groups (PEG) could be scheduled to work in parallel and fully utilized, where m is a factor of sixteen. Each PEG has sixteen processing elements (PE) and just costs 8.5K gates. This feature provides users more flexibility to make tradeoff between the hardware cost and the performance. (2) Based on pipelining and multi-cycle data path techniques, this architecture can work at high clock frequency. (3) The memory partition number is greatly reduced. When sixteen PEGs are adopted, only two memory partitions are required for the search area data storage. Therefore, both the system hardware cost and power consumption can be saved. A 16-PEG design with 4832 search range has been implemented with TSMC 0.18 µm CMOS technology. In typical work conditions, its maximum clock frequency is 261 MHz. Compared with the previous 2-D architecture [9], about 13.4% hardware cost and 5.7% power consumption can be saved.

  • Geometrical, Physical and Text/Symbol Analysis Based Approach of Traffic Sign Detection System

    Yangxing LIU  Takeshi IKENAGA  Satoshi GOTO  

     
    PAPER

      Vol:
    E90-D No:1
      Page(s):
    208-216

    Traffic sign detection is a valuable part of future driver support system. In this paper, we present a novel framework to accurately detect traffic signs from a single color image by analyzing geometrical, physical and text/symbol features of traffic signs. First, we utilize an elaborate edge detection algorithm to extract edge map and accurate edge pixel gradient information. Then, we extract 2-D geometric primitives (circles, ellipses, rectangles and triangles) efficiently from image edge map. Third, the candidate traffic sign regions are selected by analyzing the intrinsic color features, which are invariant to different illumination conditions, of each region circumvented by geometric primitives. Finally, a text and symbol detection algorithm is introduced to classify true traffic signs. Experimental results demonstrated the capabilities of our algorithm to detect traffic signs with respect to different size, shape, color and illumination conditions.

  • Fast Methods to Estimate Clock Jitter due to Power Supply Noise

    Koutaro HACHIYA  Takayuki OHSHIMA  Hidenari NAKASHIMA  Masaaki SODA  Satoshi GOTO  

     
    PAPER

      Vol:
    E90-A No:4
      Page(s):
    741-747

    In this paper, we propose two methods to estimate clock jitter caused by power supply noise in a LSI (Large-Scale Integrated circuit). One of the methods enables estimation of clock jitter at the initial design stage before floor-planning. The other method reduces simulation time of clock distribution network to analyze clock jitter at the design verification stage after place-and-route of the chip. For an example design, the relative difference between clock jitter estimated at the initial design stage and that of the design verification stage is 23%. The example result also shows that the proposed method for the verification stage is about 24 times faster than the conventional one to analyze clock jitter.

  • Lossy Strict Multilevel Successive Elimination Algorithm for Fast Motion Estimation

    Yang SONG  Zhenyu LIU  Takeshi IKENAGA  Satoshi GOTO  

     
    PAPER

      Vol:
    E90-A No:4
      Page(s):
    764-770

    This paper presents a simple and effective method to further reduce the search points in multilevel successive elimination algorithm (MSEA). Because the calculated sea values of those best matching search points are much smaller than the current minimum SAD, we can simply increase the calculated sea values to increase the elimination ratio without much affecting the coding quality. Compared with the original MSEA algorithm, the proposed strict MSEA algorithm (SMSEA) can provide average 6.52 times speedup. Compared with other lossy fast ME algorithms such as TSS and DS, the proposed SMSEA can maintain more stable image quality. In practice, the proposed technique can also be used in the fine granularity SEA (FGSEA) algorithm and the calculation process is almost the same.

  • Periodic Spectrum Transmission for Single-Carrier Transmission Frequency-Domain Equalization

    Fumiaki MAEHARA  Satoshi GOTO  Fumio TAKAHATA  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E90-B No:6
      Page(s):
    1407-1414

    This paper proposes a frequency diversity scheme using only even-numbered samples for single-carrier transmission with frequency-domain equalization (SC-FDE). In the proposed scheme, a periodical frequency spectrum generated by using only even-numbered samples in the time domain provides the frequency redundancy, which is utilized for frequency diversity. Moreover, in order to avoid the data rate reduction due to the decrease in the samples within one block, the high-level modulation is applied to each sample and the transmitting power of each sample can be doubled for the equivalent power transmission instead. Computer simulation results show that the proposed scheme provides a steeper BER curve than the typical SC-FDE over frequency selective fading channels, while the typical SC-FDE is more favorable than the proposed scheme over flat fading channels. Moreover, the proposed scheme still retains its characteristic even when channel estimation and channel coding are additionally taken into account.

  • Low-Power Partial Distortion Sorting Fast Motion Estimation Algorithms and VLSI Implementations

    Yang SONG  Zhenyu LIU  Takeshi IKENAGA  Satoshi GOTO  

     
    PAPER

      Vol:
    E90-D No:1
      Page(s):
    108-117

    This paper presents two hardware-friendly low-power oriented fast motion estimation (ME) algorithms and their VLSI implementations. The basic idea of the proposed partial distortion sorting (PDS) algorithm is to disable the search points which have larger partial distortions during the ME process, and only keep those search points with smaller ones. To further reduce the computation overhead, a simplified local PDS (LPDS) algorithm is also presented. Experiments show that the PDS and LPDS algorithms can provide almost the same image quality as full search only with 36.7% computation complexity. The proposed two algorithms can be integrated into different FSBMA architectures to save power consumption. In this paper, the 1-D inter ME architecture [12] is used as an detailed example. Under the worst working conditions (1.62 V, 125) and 166 MHz clock frequency, the PDS algorithm can reduce 33.3% power consumption with 4.05 K gates extra hardware cost, and the LPDS can reduce 37.8% power consumption with 1.73 K gates overhead.

  • An Integrated Hole-Filling Algorithm for View Synthesis

    Wenxin YU  Weichen WANG  Minghui WANG  Satoshi GOTO  

     
    PAPER

      Vol:
    E96-A No:6
      Page(s):
    1306-1314

    Multi-view video can provide users with three-dimensional (3-D) and virtual reality perception through multiple viewing angles. In recent years, depth image-based rendering (DIBR) has been generally used to synthesize virtual view images in free viewpoint television (FTV) and 3-D video. To conceal the zero-region more accurately and improve the quality of a virtual view synthesized frame, an integrated hole-filling algorithm for view synthesis is proposed in this paper. The proposed algorithm contains five parts: an algorithm for distinguishing different regions, foreground and background boundary detection, texture image isophotes detection, a textural and structural isophote prediction algorithm, and an in-painting algorithm with gradient priority order. Based on the texture isophote prediction with a geometrical principle and the in-painting algorithm with a gradient priority order, the boundary information of the foreground is considerably clearer and the texture information in the zero-region can be concealed much more accurately than in previous works. The vision quality mainly depends on the distortion of the structural information. Experimental results indicate that the proposed algorithm improves not only the objective quality of the virtual image, but also its subjective quality considerably; human vision is also clearly improved based on the subjective results. In particular, the algorithm ensures the boundary contours of the foreground objects and the textural and structural information.

  • Content Adaptive Hierarchical Decision of Variable Coding Block Sizes in High Efficiency Video Coding for High Resolution Videos

    Guifen TIAN  Xin JIN  Satoshi GOTO  

     
    PAPER-Digital Signal Processing

      Vol:
    E96-A No:4
      Page(s):
    780-789

    The quadtree-based variable block sized prediction makes the biggest contribution for dramatically improved coding efficiency in the new video coding standard named HEVC. However, this technique takes about 75–80% computational complexity of an HEVC encoder. This paper brings forward an adaptive scheme that exploits temporal, spatial and transform-domain features to speed up the original quadtree-based prediction, targeting at high resolution videos. Before encoding starts, analysis on utilization ratio of each coding depth is performed to skip rarely adopted coding depths at frame level. Then, texture complexity (TC) measurement is applied to filter out none-contributable coding blocks for each largest coding unit (LCU). In this step, a dynamic threshold setting approach is proposed to make filtering adaptable to videos and coding parameters. Thirdly, during encoding process, sum of absolute quantized residual coefficient (SAQC) is used as criterion to prune useless coding blocks for both LCUs and 3232 blocks. By using proposed scheme, motion estimation is performed for prediction blocks within a narrowed range. Experiments show that proposed scheme outperforms existing works and speeds up original HEVC by a factor of up to 61.89% and by an average of 33.65% for 4kx2k video sequences. Meanwhile, the peak signal-to-noise ratio (PSNR) degradation and bit increment are trivial.

  • Pedestrian Detection Using Gradient Local Binary Patterns

    Ning JIANG  Jiu XU  Satoshi GOTO  

     
    PAPER-Coding & Processing

      Vol:
    E95-A No:8
      Page(s):
    1280-1287

    In recent years, local pattern based features have attracted increasing interest in object detection and recognition systems. Local Binary Pattern (LBP) feature is widely used in texture classification and face detection. But the original definition of LBP is not suitable for human detection. In this paper, we propose a novel feature named gradient local binary patterns (GLBP) for human detection. In this feature, original 256 local binary patterns are reduced to 56 patterns. These 56 patterns named uniform patterns are used for generating a 56-bin histogram. And gradient value of each pixel is set as the weight which is always same in LBP based features in histogram calculation to computing the values in 56 bins for histogram. Experiments are performed on INRIA dataset, which shows the proposal GLBP feature is discriminative than histogram of orientated gradient (HOG), Semantic Local Binary Patterns (S-LBP) and histogram of template (HOT). In our experiments, the window size is fixed. That means the performance can be improved by boosting methods. And the computation of GLBP feature is parallel, which make it easy for hardware acceleration. These factors make GLBP feature possible for real-time pedestrian detection.

  • A 48 Cycles/MB H.264/AVC Deblocking Filter Architecture for Ultra High Definition Applications

    Dajiang ZHOU  Jinjia ZHOU  Jiayi ZHU  Satoshi GOTO  

     
    PAPER-Embedded, Real-Time and Reconfigurable Systems

      Vol:
    E92-A No:12
      Page(s):
    3203-3210

    In this paper, a highly parallel deblocking filter architecture for H.264/AVC is proposed to process one macroblock in 48 clock cycles and give real-time support to QFHD@60 fps sequences at less than 100 MHz. 4 edge filters organized in 2 groups for simultaneously processing vertical and horizontal edges are applied in this architecture to enhance its throughput. While parallelism increases, pipeline hazards arise owing to the latency of edge filters and data dependency of deblocking algorithm. To solve this problem, a zig-zag processing schedule is proposed to eliminate the pipeline bubbles. Data path of the architecture is then derived according to the processing schedule and optimized through data flow merging, so as to minimize the cost of logic and internal buffer. Meanwhile, the architecture's data input rate is designed to be identical to its throughput, while the transmission order of input data can also match the zig-zag processing schedule. Therefore no intercommunication buffer is required between the deblocking filter and its previous component for speed matching or data reordering. As a result, only one 2464 two-port SRAM as internal buffer is required in this design. When synthesized with SMIC 130 nm process, the architecture costs a gate count of 30.2 k, which is competitive considering its high performance.

  • Permutation Network for Reconfigurable LDPC Decoder Based on Banyan Network

    Xiao PENG  Zhixiang CHEN  Xiongxin ZHAO  Fumiaki MAEHARA  Satoshi GOTO  

     
    PAPER

      Vol:
    E93-C No:3
      Page(s):
    270-278

    Since the structured quasi-cyclic low-density parity-check (QC-LDPC) codes for most modern wireless communication systems include multiple code rates, various block lengths, and the corresponding different sizes of submatrices in parity check matrix (PCM), the reconfigurable LDPC decoder is desirable and the permutation network is needed to accommodate any input number (IN) and shift number (SN) for cyclic shift. In this paper, we propose a novel permutation network architecture for the reconfigurable QC-LDPC decoders based on Banyan network. We prove that Banyan network has the nonblocking property for cyclic shift when the IN is power of 2, and give the control signal generating algorithm. Through introducing the bypass network, we put forward the nonblocking scheme for any IN and SN. In addition, we present the hardware design of the control signal generator, which can greatly reduce the hardware complexity and latency. The synthesis results using the TSMC 0.18 µm library demonstrate that the proposed permutation network can be implemented with the area of 0.546 mm2 and the frequency of 292 MHz.

  • Fast SAO Estimation Algorithm and Its Implementation for 8K×4K @ 120 FPS HEVC Encoding

    Jiayi ZHU  Dajiang ZHOU  Shinji KIMURA  Satoshi GOTO  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E97-A No:12
      Page(s):
    2488-2497

    High efficiency video coding (HEVC) is the new generation video compression standard. Sample adaptive offset (SAO) is a new compression tool adopted in HEVC which reduces the distortion between original samples and reconstructed samples. SAO estimation is the process of determining SAO parameters in video encoding. It is divided into two phases: statistic collection and parameters determination. There are two difficulties for VLSI implementation of SAO estimation. The first is that there are huge amount of samples to deal with in statistic collection phase. The other is that the complexity of Rate Distortion Optimization (RDO) in parameters determination phase is very high. In this article, a fast SAO estimation algorithm and its corresponding VLSI architecture are proposed. For the first difficulty, we use bitmaps to collect statistics of all the 16 samples in one 4×4 block simultaneously. For the second difficulty, we simplify a series of complicated procedures in HM to balance the algorithms complexity and BD-rate performance. Experimental results show that the proposed algorithm maintains the picture quality improvement. The VLSI design based on this algorithm can be implemented using 156.32K gates, 8,832bits single port RAM for 8bits depth case. It can be synthesized to 400MHz @ 65nm technology and is capable of 8K×4K @ 120fps encoding.

  • Real-Time UHD Background Modelling with Mixed Selection Block Updates

    Axel BEAUGENDRE  Satoshi GOTO  Takeshi YOSHIMURA  

     
    PAPER-IMAGE PROCESSING

      Vol:
    E100-A No:2
      Page(s):
    581-591

    The vast majority of foreground detection methods require heavy hardware optimization to process in real-time standard definition videos. Indeed, those methods process the whole frame for the detection but also for the background modelling part which makes them resource-guzzlers (time, memory, etc.) unable to be applied to Ultra High Definition (UHD) videos. This paper presents a real-time background modelling method called Mixed Block Background Modelling (MBBM). It is a spatio-temporal approach which updates the background model by carefully selecting block by a linear and pseudo-random orders and update the corresponding model's block parts. The two block selection orders make sure that every block will be updated. For foreground detection purposes, the method is combined with a foreground detection designed for UHD videos such as the Adaptive Block-Propagative Background Subtraction method. Experimental results show that the proposed MBBM can process 50min. of 4K UHD videos in less than 6 hours. while other methods are estimated to take from 8 days to more than 21 years. Compared to 10 state-of-the-art foreground detection methods, the proposed MBBM shows the best quality results with an average global quality score of 0.597 (1 being the maximum) on a dataset of 4K UHDTV sequences containing various situation like illumination variation. Finally, the processing time per pixel of the MBBM is the lowest of all compared methods with an average of 3.18×10-8s.

41-60hit(92hit)