The search functionality is under construction.

Author Search Result

[Author] Yu LIU(46hit)

1-20hit(46hit)

  • Synchronization Mechanism for Timed/Untimed Mixed-Signal System Level Design Environment

    Yu LIU  Satoshi KOMATSU  Masahiro FUJITA  

     
    PAPER

      Vol:
    E89-A No:4
      Page(s):
    1018-1026

    Recently, system level design languages (SLDL), which can describe both hardware and software aspects of the design, are receiving attention. Mixed-signal extensions of SLDL enable current discrete-oriented SLDLs to describe and simulate not only digital systems but also digital-analog mixed-signal systems. The synchronization between discrete and continuous behaviors is widely regarded as a critical part in the extensions. In this paper, we present an event-driven synchronization mechanism for both timed and untimed system level designs through which discrete and continuous behaviors are synchronized via AD events and DA events. We also demonstrate how the synchronization mechanism can be incorporated into the kernel of SLDL, such as SpecC. In the extended kernel, a new simulation cycle, the AMS cycle, is introduced. Three case studies show that the extended SpecC-based system level design environment using our synchronization mechanism works well with timed/untimed mixed-signal system level description.

  • Hardware Oriented Enhanced Category Determination Based on CTU Boundary Deblocking Strength Prediction for SAO in HEVC Encoder

    Gaoxing CHEN  Zhenyu PEI  Zhenyu LIU  Takeshi IKENAGA  

     
    PAPER-Digital Signal Processing

      Vol:
    E99-A No:4
      Page(s):
    788-797

    High efficiency video coding (HEVC) is a video compression standard that outperforms the predecessor H.264/AVC by doubling the compression efficiency. To enhance the coding accuracy, HEVC adopts sample adaptive offset (SAO), which reduces the distortion of reconstructed pixels using classification based non-linear filtering. In the traditional coding tree unit (CTU) grain based VLSI encoder implementation, during the pixel classification stage, SAO cannot use the raw samples in the boundary of the current CTU because these pixels have not been processed by deblocking filter (DF). This paper proposes a hardware-oriented category determination algorithm based on estimating the deblocking strengths on CTU boundaries and selectively adopting the promising samples in these areas during SAO classification. Compared with HEVC test mode (HM11.0), experimental results indicate that the proposed method achieves an average 0.13%, 0.14%, and 0.12% BD-bitrate reduction (equivalent to 0.0055dB, 0.0058dB, and 0.0097dB increases in PSNR) in CTU sizes of 64 × 64, 32 × 32, and 16 × 16, respectively.

  • Low-Power Partial Distortion Sorting Fast Motion Estimation Algorithms and VLSI Implementations

    Yang SONG  Zhenyu LIU  Takeshi IKENAGA  Satoshi GOTO  

     
    PAPER

      Vol:
    E90-D No:1
      Page(s):
    108-117

    This paper presents two hardware-friendly low-power oriented fast motion estimation (ME) algorithms and their VLSI implementations. The basic idea of the proposed partial distortion sorting (PDS) algorithm is to disable the search points which have larger partial distortions during the ME process, and only keep those search points with smaller ones. To further reduce the computation overhead, a simplified local PDS (LPDS) algorithm is also presented. Experiments show that the PDS and LPDS algorithms can provide almost the same image quality as full search only with 36.7% computation complexity. The proposed two algorithms can be integrated into different FSBMA architectures to save power consumption. In this paper, the 1-D inter ME architecture [12] is used as an detailed example. Under the worst working conditions (1.62 V, 125) and 166 MHz clock frequency, the PDS algorithm can reduce 33.3% power consumption with 4.05 K gates extra hardware cost, and the LPDS can reduce 37.8% power consumption with 1.73 K gates overhead.

  • A Novel Cache Replacement Policy via Dynamic Adaptive Insertion and Re-Reference Prediction

    Xi ZHANG  Chongmin LI  Zhenyu LIU  Haixia WANG  Dongsheng WANG  Takeshi IKENAGA  

     
    PAPER

      Vol:
    E94-C No:4
      Page(s):
    468-476

    Previous research illustrates that LRU replacement policy is not efficient when applications exhibit a distant re-reference interval. Recently RRIP policy is proposed to improve the performance for such kind of workloads. However, the lack of access recency information in RRIP confuses the replacement policy to make the accurate prediction. To enhance the robustness of RRIP for recency-friendly workloads, we propose an Dynamic Adaptive Insertion and Re-reference Prediction (DAI-RRP) policy which evicts data based on both re-reference prediction value and the access recency information. DAI-RRP makes adaptive adjustment on insertion position and prediction value for different access patterns, which makes the policy robust across different workloads and different phases. Simulation results show that DAI-RRP outperforms LRU and RRIP. For a single-core processor with a 1 MB 16-way set last-level cache (LLC), DAI-RRP reduces CPI over LRU and Dynamic RRIP by an average of 8.1% and 2.7% respectively. Evaluations on quad-core CMP with a 4 MB shared LLC show that DAI-RRP outperforms LRU and Dynamic RRIP (DRRIP) on the weighted speedup metric by an average of 8.1% and 15.7% respectively. Furthermore, compared to LRU, DAI-RRP consumes the similar hardware for 16-way cache, or even less hardware for high-associativity cache. In summary, the proposed policy is practical and can be easily integrated into existing hardware approximations of LRU.

  • Texture Mapping Polygons Using Scanline Mapping Geometry

    Chung-Yu LIU  Tsorng-Lin CHIA  Yibin LU  

     
    PAPER-Computer Graphics

      Vol:
    E84-D No:9
      Page(s):
    1257-1265

    This work presents a novel description of texture mapping polygons in a geometric view about scanlines and a simplified mapping function to improve the performance. The conventional perspective-correct mapping requires costly division operations. In this work, two concepts in perspective geometry, cross-ratio and vanishing point, are exploited to simplify the mapping function. We substitute the point at infinity on scanline into the cross-ratio equation, then obtain a simple description of perspective mapping in polygons. Our mapping function allows the spatial mapping of a pixel from a scanline on a screen plane to a texture plane taking only one division, one multiplication and three additions. The proposed algorithm speeds up the mapping process without losing any correctness. Experimental results indicate that the performance of proposed method is superior to other correct mapping methods.

  • Bayesian Learning-Assisted Joint Frequency Tracking and Channel Estimation for OFDM Systems

    Hong-Yu LIU  

     
    PAPER-Communication Theory and Signals

      Pubricized:
    2023/03/30
      Vol:
    E106-A No:10
      Page(s):
    1336-1342

    Orthogonal frequency division multiplexing (OFDM) is very sensitive to the carrier frequency offset (CFO). The CFO estimation precision heavily makes impacts on the OFDM performance. In this paper, a new Bayesian learning-assisted joint CFO tracking and channel impulse response estimation is proposed. The proposed algorithm is modified from a Bayesian learning-assisted estimation (BLAE) algorithm in the literature. The BLAE is expectation-maximization (EM)-based and displays the estimator mean square error (MSE) lower than the Cramer-Rao bound (CRB) when the CFO value is near zero. However, its MSE value may increase quickly as the CFO value goes away from zero. Hence, the CFO estimator of the BLAE is replaced to solve the problem. Originally, the design criterion of the single-time-sample (STS) CFO estimator in the literature is maximum likelihood (ML)-based. Its MSE performance can reach the CRB. Also, its CFO estimation range can reach the widest range required for a CFO tracking estimator. For a CFO normalized by the sub-carrier spacing, the widest tracking range required is from -0.5 to +0.5. Here, we apply the STS CFO estimator design method to the EM-based Bayesian learning framework. The resultant Bayesian learning-assisted STS algorithm displays the MSE performance lower than the CRB, and its CFO estimation range is between ±0.5. With such a Bayesian learning design criterion, the additional channel noise power and power delay profile must be estimated, as compared with the ML-based design criterion. With the additional channel statistical information, the derived algorithm presents the MSE performance better than the CRB. Two frequency-selective channels are adopted for computer simulations. One has fixed tap weights, and the other is Rayleigh fading. Comparisons with the most related algorithms are also been provided.

  • Find the 'Best' Solution from Multiple Analog Topologies via Pareto-Optimality

    Yu LIU  Masato YOSHIOKA  Katsumi HOMMA  Toshiyuki SHIBUYA  

     
    PAPER-Device and Circuit Modeling and Analysis

      Vol:
    E92-A No:12
      Page(s):
    3035-3043

    This paper presents a novel method using multi-objective optimization algorithm to automatically find the best solution from a topology library of analog circuits. Firstly this method abstracts the Pareto-front of each topology in the library by SPICE simulation. Then, the Pareto-front of the topology library is abstracted from the individual Pareto-fronts of topologies in the library followed by the theorem we proved. The best solution which is defined as the nearest point to specification on the Pareto-front of the topology library is then calculated by the equations derived from collinearity theorem. After the local searching using Nelder-Mead method maps the calculated best solution backs to design variable space, the non-dominated best solution is obtained. Comparing to the traditional optimization methods using single-objective optimization algorithms, this work can efficiently find the best non-dominated solution from multiple topologies for different specifications without additional time-consuming optimizing iterations. The experiments demonstrate that this method is feasible and practical in actual analog designs especially for uncertain or variant multi-dimensional specifications.

  • A Mode Mapping and Optimized MV Conjunction Based H.264/SVC to H.264/AVC Transcoder with Medium-Grain Quality Scalability for Videoconferencing

    Lei SUN  Zhenyu LIU  Takeshi IKENAGA  

     
    PAPER

      Vol:
    E97-A No:2
      Page(s):
    501-509

    Scalable Video Coding (SVC) is an extension of H.264/AVC, aiming to provide the ability to adapt to heterogeneous networks or requirements. It offers great flexibility for bitstream adaptation in multi-point applications such as videoconferencing. However, transcoding between SVC and AVC is necessary due to the existence of legacy AVC-based systems. The straightforward re-encoding method requires great computational cost, and delay-sensitive applications like videoconferencing require much faster transcoding scheme. This paper proposes a 3-stage fast SVC-to-AVC transcoder with medium-grain quality scalability (MGS) for videoconferencing applications. Hierarchical-P structured SVC bitstream is transcoded into IPPP structured AVC bitstream with multiple reference frames. In the first stage, mode decision is accelerated by proposed SVC-to-AVC mode mapping scheme. In the second stage, INTER motion estimation is accelerated by an optimized motion vector (MV) conjunction method to predict the MV with a reduced search range. In the last stage, hadamard-based all zero block (AZB) detection is utilized for early termination. Simulation results show that proposed transcoder achieves very similar coding efficiency to the optimal result, but with averagely 89.6% computational time saving.

  • A Power Analysis Attack Countermeasure Based on Random Data Path Execution For CGRA

    Wei GE  Shenghua CHEN  Benyu LIU  Min ZHU  Bo LIU  

     
    PAPER-Computer System

      Pubricized:
    2020/02/10
      Vol:
    E103-D No:5
      Page(s):
    1013-1022

    Side-channel Attack, such as simple power analysis and differential power analysis (DPA), is an efficient method to gather the key, which challenges the security of crypto chips. Side-channel Attack logs the power trace of the crypto chip and speculates the key by statistical analysis. To reduce the threat of power analysis attack, an innovative method based on random execution and register randomization is proposed in this paper. In order to enhance ability against DPA, the method disorders the correspondence between power trace and operands by scrambling the data execution sequence randomly and dynamically and randomize the data operation path to randomize the registers that store intermediate data. Experiments and verification are done on the Sakura-G FPGA platform. The results show that the key is not revealed after even 2 million power traces by adopting the proposed method and only 7.23% slices overhead and 3.4% throughput rate cost is introduced. Compared to unprotected chip, it increases more than 4000× measure to disclosure.

  • Improved Trajectory Estimation of Reentry Vehicles from Radar Measurements Using On-Line Adaptive Input Estimator

    Sou-Chen LEE  Cheng-Yu LIU  

     
    PAPER-Control and Adaptive Systems

      Vol:
    E81-A No:9
      Page(s):
    1867-1876

    Modeling error is the major concerning issue in the trajectory estimation. This paper formulates the dynamic model of a reentry vehicle in reentry phase for identification with an unmodeled acceleration input covering possible model errors. Moreover, this work presents a novel on-line estimation approach, adaptive filter, to identify the trajectory of a reentry vehicle from a single radar measured data. This proposed approach combines the extended Kalman filter and the recursive least-squares estimator of input with the hypothetical testing scheme. The recursive least-squares estimator is provided not only to extract the magnitude of the unmodeled input but to offer a testing criterion to detect the onset and presence of the input. Numerical simulation demonstrates the superior capabilities in accuracy and robustness of the proposed method. In real flight analysis, the adaptive filter also performs an excellent estimation and prediction performances. The recommended trajectory estimation method can support defense and tactical operations for anti-tactical ballistic missile warfare.

  • Segmentation of the Speaker's Face Region with Audiovisual Correlation

    Yuyu LIU  Yoichi SATO  

     
    PAPER-Multimedia Pattern Processing

      Vol:
    E93-D No:7
      Page(s):
    1965-1975

    The ability to find the speaker's face region in a video is useful for various applications. In this work, we develop a novel technique to find this region within different time windows, which is robust against the changes of view, scale, and background. The main thrust of our technique is to integrate audiovisual correlation analysis into a video segmentation framework. We analyze the audiovisual correlation locally by computing quadratic mutual information between our audiovisual features. The computation of quadratic mutual information is based on the probability density functions estimated by kernel density estimation with adaptive kernel bandwidth. The results of this audiovisual correlation analysis are incorporated into graph cut-based video segmentation to resolve a globally optimum extraction of the speaker's face region. The setting of any heuristic threshold in this segmentation is avoided by learning the correlation distributions of speaker and background by expectation maximization. Experimental results demonstrate that our method can detect the speaker's face region accurately and robustly for different views, scales, and backgrounds.

  • lcyanalysis: An R Package for Technical Analysis in Stock Markets

    Chun-Yu LIU  Shu-Nung YAO  Ying-Jen CHEN  

     
    PAPER-Office Information Systems, e-Business Modeling

      Pubricized:
    2019/03/26
      Vol:
    E102-D No:7
      Page(s):
    1332-1341

    With advances in information technology and the development of big data, manual operation is unlikely to be a smart choice for stock market investing. Instead, the computer-based investment model is expected to bring investors more accurate strategic analysis and more effective investment decisions than human beings. This paper aims to improve investor profits by mining for critical information in the stock data, therefore helping big data analysis. We used the R language to find the technical indicators in the stock market, and then applied the technical indicators to the prediction. The proposed R package includes several analysis toolkits, such as trend line indicators, W type reversal patterns, V type reversal patterns, and the bull or bear market. The simulation results suggest that the developed R package can accurately present the tendency of the price and enhance the return on investment.

  • A VLSI Architecture for Variable Block Size Motion Estimation in H.264/AVC with Low Cost Memory Organization

    Yang SONG  Zhenyu LIU  Takeshi IKENAGA  Satoshi GOTO  

     
    PAPER-VLSI Architecture

      Vol:
    E89-A No:12
      Page(s):
    3594-3601

    A one-dimensional (1-D) full search variable block size motion estimation (VBSME) architecture is presented in this paper. By properly choosing the partial sum of absolute differences (SAD) registers and scheduling the addition operations, the architecture can be implemented with simple control logic and regular workflow. Moreover, only one single-port SRAM is used to store the search area data. The design is realized in TSMC 0.18 µm 1P6M technology with a hardware cost of 67.6K gates. In typical working conditions (1.8 V, 25), a clock frequency of 266 MHz can be achieved.

  • Content-Aware Write Reduction Mechanism of 3D Stacked Phase-Change RAM Based Frame Store in H.264 Video Codec System

    Sanchuan GUO  Zhenyu LIU  Guohong LI  Takeshi IKENAGA  Dongsheng WANG  

     
    PAPER

      Vol:
    E96-A No:6
      Page(s):
    1273-1282

    H.264 video codec system requires big capacity and high bandwidth of Frame Store (FS) for buffering reference frames. The up-to-date three dimensional (3D) stacked Phase change Random Access Memory (PRAM) is the promising approach for on-chip caching the reference signals, as 3D stacking offers high memory bandwidth, while PRAM possesses the advantages in terms of high density and low leakage power. However, the write endurance problem, that is a PRAM cell can only tolerant limited number of write operations, becomes the main barrier in practical applications. This paper studies the wear reduction techniques of PRAM based FS in H.264 codec system. On the basis of rate-distortion theory, the content oriented selective writing mechanisms are proposed to reduce bit updates in the reference frame buffers. With the proposed control parameter a, our methods make the quantitative trade off between the quality degradation and the PRAM lifetime prolongation. Specifically, taking a in the range of [0.2,2], experimental results demonstrate that, our methods averagely save 29.9–35.5% bit-wise write operations and reduce 52–57% power, at the cost of 12.95–20.57% BDBR bit-rate increase accordingly.

  • A Drift-Constrained Frequency-Domain Ultra-Low-Delay H.264/SVC to H.264/AVC Transcoder with Medium-Grain Quality Scalability for Videoconferencing

    Lei SUN  Zhenyu LIU  Takeshi IKENAGA  

     
    PAPER

      Vol:
    E96-A No:6
      Page(s):
    1253-1263

    Scalable Video Coding (SVC) is an extension of H.264/AVC, aiming to provide the ability to adapt to heterogeneous networks or requirements. It offers great flexibility for bitstream adaptation in multi-point applications such as videoconferencing. However, transcoding between SVC and AVC is necessary due to the existence of legacy AVC-based systems. The straightforward re-encoding method requires great computational cost, and delay-sensitive applications like videoconferencing require much faster transcoding scheme. This paper proposes an ultra-low-delay SVC-to-AVC MGS (Medium-Grain quality Scalability) transcoder for videoconferencing applications. Transcoding is performed in pure frequency domain with partial decoding/encoding in order to achieve significant speed-up. Three fast transcoding methods in frequency domain are proposed for macroblocks with different coding modes in non-KEY pictures. KEY pictures are transcoded by reusing the base layer motion data, and error propagation is constrained between KEY pictures. Simulation results show that proposed transcoder achieves averagely 38.5 times speed-up compared with the re-encoding method, while introducing merely 0.71 dB BDPSNR coding quality loss for videoconferencing sequences as compared with the re-encoding algorithm.

  • A High-Efficiency FIR Filter Design Combining Cyclic-Shift Synthesis with Evolutionary Optimization

    Xiangdong HUANG  Jingwen XU  Jiexiao YU  Yu LIU  

    This paper has been cancelled due to violation of duplicate submission policy on IEICE Transactions on Communications
     
    PAPER-Fundamental Theories for Communications

      Pubricized:
    2018/08/13
      Vol:
    E102-B No:2
      Page(s):
    266-276

    To optimize the performance of FIR filters that have low computation complexity, this paper proposes a hybrid design consisting of two optimization levels. The first optimization level is based on cyclic-shift synthesis, in which all possible sub filters (or windowed sub filters) with distinct cycle shifts are averaged to generate a synthesized filter. Due to the fact that the ripples of these sub filters' transfer curves can be individually compensated, this synthesized filter attains improved performance (besides two uprushes occur on the edges of a transition band) and thus this synthesis actually plays the role of ‘natural optimization’. Furthermore, this synthesis process can be equivalently summarized into a 3-step closed-form procedure, which converts the multi-variable optimization into a single-variable optimization. Hence, to suppress the uprushes, what the second optimization level (by Differential Evolution (DE) algorithm) needs to do is no more than searching for the optimum transition point which incurs only minimal complexity . Owning to the combination between the cyclic-shift synthesis and DE algorithm, unlike the regular evolutionary computing schemes, our hybrid design is more attractive due to its narrowed search space and higher convergence speed . Numerical results also show that the proposed design is superior to the conventional DE design in both filter performance and design efficiency, and it is comparable to the Remez design.

  • Perceptual Distortion Measure for Polygon-Based Shape Coding

    Zhongyuan LAI  Wenyu LIU  Fan ZHANG  Guang CHENG  

     
    LETTER-Image Processing and Video Processing

      Vol:
    E96-D No:3
      Page(s):
    750-753

    In this paper, we present a perceptual distortion measure (PDM) for polygon-based shape coding. We model the PDM as the salience of relevance triangle, and express the PDM by using three properties derived from the salience of visual part. Performance analysis and experimental results show that our proposal can improve the quality of the shape reconstruction when the object contour has sharp protrusions.

  • Region-Based Prediction Coding for Compression of Noisy Synthetic Images

    Yu LIU  Masayuki NAKAJIMA  

     
    PAPER-Image Processing,Computer Graphics and Pattern Recognition

      Vol:
    E82-D No:2
      Page(s):
    461-467

    Noise greatly degrades the image quality and performance of image compression algorithms. This paper presents an approach for the representation and compression of noisy synthetic images. A new concept region-based prediction (RBP) model is first introduced, and then the RBP model is utilized on noisy images. In the conventional predictive coding techniques, the context for prediction is always composed of individual pixels surrounding the pixel to be processed. The RBP model uses regions instead of individual pixels as the context for prediction. An algorithm for the implementation of RBP is proposed and applied to noisy synthetic images in our experiments. Using RBP to find the residual data and encoding them, we achieve a bit rate of 1.10 bits/pixel for the noisy synthetic image. The decompressed image achieves a peak SNR of 42.59 dB. Compared with a peak SNR of 41.01 dB for the noisy synthetic image, the quality of the decompressed synthetic image is improved by 1.58 dB in the MSE sense. In contrast to our proposed compression algorithm with its improvement in image quality, conventional coding methods can compress image data only at the expense of lower image quality. At the same bit rate, the image compression standard JPEG provides a peak SNR of 33.17 dB for the noisy synthetic image, and the conventional median filter with a 33 window provides a peak SNR of 25.89 dB.

  • Scalable VLSI Architecture for Variable Block Size Integer Motion Estimation in H.264/AVC

    Yang SONG  Zhenyu LIU  Satoshi GOTO  Takeshi IKENAGA  

     
    PAPER

      Vol:
    E89-A No:4
      Page(s):
    979-988

    Because of the data correlation in the motion estimation (ME) algorithm of H.264/AVC reference software, it is difficult to implement an efficient ME hardware architecture. In order to make parallel processing feasible, four modified hardware friendly ME workflows are proposed in this paper. Based on these workflows, a scalable full search ME architecture is presented, which has following characteristics: (1) The sum of absolute differences (SAD) results of 44 sub-blocks is accumulated and reused to calculate SADs of bigger sub-blocks. (2) The number of PE groups is configurable. For a search range of MN pixels, where M is width and N is height, up to M PE groups can be configured to work in parallel with a peak processing speed of N16 clock cycles to fulfill a full search variable block size ME (VBSME). (3) Only conventional single port SRAM is required, which makes this architecture suitable for standard-cell-based implementation. A design with 8 PE groups has been realized with TSMC 0.18 µm CMOS technology. The core area is 2.13 mm1.60 mm and clock frequency is 228 MHz in typical condition (1.8 V, 25).

  • Content-Aware Fast Motion Estimation for H.264/AVC

    Zhenyu LIU  Satoshi GOTO  Takeshi IKENAGA  

     
    PAPER

      Vol:
    E91-A No:8
      Page(s):
    1944-1952

    The key to high performance in video coding lies on efficiently reducing the temporal redundancies. For this purpose, H.264/AVC coding standard has adopted variable block size motion estimation on multiple reference frames to improve the coding gain. However, the computational complexity of motion estimation is also increased in proportion to the product of the reference frame number and the intermode number. The mathematical analysis in this paper reveals that the prediction errors mainly depend on the image edge gradient amplitude and quantization parameter. Consequently, this paper proposes the image content based early termination algorithm, which outperforms the original method adopted by JVT reference software, especially at high and moderate bit rates. In light of rate-distortion theory, this paper also relates the homogeneity of image to the quantization parameter. For the homogenous block, its search computation for futile reference frames and intermodes can be efficiently discarded. Therefore, the computation saving performance increases with the value of quantization parameter. These content based fast algorithms were integrated with Unsymmetrical-cross Multihexagon-grid Search (UMHexagonS) algorithm to demonstrate their performance. Compared to the original UMHexagonS fast matching algorithm, 26.14-54.97% search time can be saved with an average of 0.0369 dB coding quality degradation.

1-20hit(46hit)