The search functionality is under construction.

Author Search Result

[Author] Bo LIU(46hit)

1-20hit(46hit)

  • Noise-Analysis Based Threshold-Choosing Algorithm in Motion Estimation

    Xiaoying GAN  Shiying SUN  Wentao SONG  Bo LIU  

     
    LETTER-Multimedia Systems for Communications" Multimedia Systems for Communications

      Vol:
    E88-B No:4
      Page(s):
    1753-1755

    A novel threshold choosing method for the threshold-based skip mechanism is presented, in which the threshold is obtained from the analysis of the video device induced noise variance. Simulation results show that the proposed method can remarkably reduce the computation time consumption with only marginal performance penalty.

  • A Cycle-Accurate Simulator for a Reconfigurable Multi-Media System

    Min ZHU  Leibo LIU  Shouyi YIN  Chongyong YIN  Shaojun WEI  

     
    PAPER

      Vol:
    E93-D No:12
      Page(s):
    3202-3210

    This paper introduces a cycle-accurate Simulator for a dynamically REconfigurable MUlti-media System, called SimREMUS. SimREMUS can either be used at transaction-level, which allows the modeling and simulation of higher-level hardware and embedded software, or at register transfer level, if the dynamic system behavior is desired to be observed at signal level. Trade-offs among a set of criteria that are frequently used to characterize the design of a reconfigurable computing system, such as granularity, programmability, configurability as well as architecture of processing elements and route modules etc., can be quickly evaluated. Moreover, a complete tool chain for SimREMUS, including compiler and debugger, is developed. SimREMUS could simulate 270 k cycles per second for million gates SoC (System-on-a-Chip) and produced one H.264 1080p frame in 15 minutes, which might cost days on VCS (platform: CPU: E5200@ 2.5 Ghz, RAM: 2.0 GB). Simulation showed that 1080p@30 fps of H.264 High Profile@ Level 4 can be achieved when exploiting a 200 MHz working frequency on the VLSI architecture of REMUS.

  • Quadratic Compressed Sensing Based SAR Imaging Algorithm for Phase Noise Mitigation

    Xunchao CONG  Guan GUI  Keyu LONG  Jiangbo LIU  Longfei TAN  Xiao LI  Qun WAN  

     
    LETTER-Digital Signal Processing

      Vol:
    E99-A No:6
      Page(s):
    1233-1237

    Synthetic aperture radar (SAR) imagery is significantly deteriorated by the random phase noises which are generated by the frequency jitter of the transmit signal and atmospheric turbulence. In this paper, we recast the SAR imaging problem via the phase-corrupted data as for a special case of quadratic compressed sensing (QCS). Although the quadratic measurement model has potential to mitigate the effects of the phase noises, it also leads to a nonconvex and quartic optimization problem. In order to overcome these challenges and increase reconstruction robustness to the phase noises, we proposed a QCS-based SAR imaging algorithm by greedy local search to exploit the spatial sparsity of scatterers. Our proposed imaging algorithm can not only avoid the process of precise random phase noise estimation but also acquire a sparse representation of the SAR target with high accuracy from the phase-corrupted data. Experiments are conducted by the synthetic scene and the moving and stationary target recognition Sandia laboratories implementation of cylinders (MSTAR SLICY) target. Simulation results are provided to demonstrate the effectiveness and robustness of our proposed SAR imaging algorithm.

  • Mapping Optimization of Affine Loop Nests for Reconfigurable Computing Architecture

    Dajiang LIU  Shouyi YIN  Chongyong YIN  Leibo LIU  Shaojun WEI  

     
    PAPER-Computer Architecture

      Vol:
    E95-D No:12
      Page(s):
    2898-2907

    Reconfigurable computing system is a class of parallel architecture with the ability of computing in hardware to increase performance, while remaining much of flexibility of a software solution. This architecture is particularly suitable for running regular and compute-intensive tasks, nevertheless, most compute-intensive tasks spend most of their running time in nested loops. Polyhedron model is a powerful tool to give a reasonable transformation on such nested loops. In this paper, a number of issues are addressed towards the goal of optimization of affine loop nests for reconfigurable cell array (RCA), such as approach to make the most use of processing elements (PE) while minimizing the communication volume by loop transformation in polyhedron model, determination of tilling form by the intra-statement dependence analysis and determination of tilling size by the tilling form and the RCA size. Experimental results on a number of kernels demonstrate the effectiveness of the mapping optimization approaches developed. Compared with DFG-based optimization approach, the execution performances of 1-d jacobi and matrix multiplication are improved by 28% and 48.47%. Lastly, the run-time complexity is acceptable for the practical cases.

  • Configuration Context Reduction for Coarse-Grained Reconfigurable Architecture

    Shouyi YIN  Chongyong YIN  Leibo LIU  Min ZHU  Shaojun WEI  

     
    PAPER-Design Methodology

      Vol:
    E95-D No:2
      Page(s):
    335-344

    Coarse-grained reconfigurable architecture (CGRA) combines the performance of application-specific integrated circuits (ASICs) and the flexibility of general-purpose processors (GPPs), which is a promising solution for embedded systems. With the increasing complexity of reconfigurable resources (processing elements, routing cells, I/O blocks, etc.), the reconfiguration cost is becoming the performance bottleneck. The major reconfiguration cost comes from the frequent memory-read/write operations for transferring the configuration context from main memory to context buffer. To improve the overall performance, it is critical to reduce the amount of configuration context. In this paper, we propose a configuration context reduction method for CGRA. The proposed method exploits the structure correlation of computation tasks that are mapped onto CGRA and reduce the redundancies in configuration context. Experimental results show that the proposed method can averagely reduce the configuration context size up to 71% and speed up the execution up to 68%. The proposed method does not depend on any architectural feature and can be applied to CGRA with an arbitrary architecture.

  • Comparative Study of Head-Disk Spacing Measurement Techniques between Optical Method and Various In-Situ Methods

    Sheng-Bin HU  Zhi-Min YUAN  Wei ZHANG  Bo LIU  Lei WAN  Rui XIAN  

     
    PAPER

      Vol:
    E85-C No:10
      Page(s):
    1784-1788

    The interaction between slider, lubricant and disk surface is becoming the most crucial robustness concern of advanced data storage systems. This paper reports comparative studies among various techniques for the measurement of head-disk spacing. It is noticed that the triple harmonic method gives a reading much closer to the reading of the head-disk spacing obtained optically at on-track center case, comparing with the PW50 method. Specially prepared disks with different carbon overcoat thickness (6.5 nm, 11 nm, 16 nm and 22 nm) were also used to study the reliability and repeatability of the triple harmonic method.

  • Concurrent Detection and Recognition of Individual Object Based on Colour and p-SIFT Features

    Jienan ZHANG  Shouyi YIN  Peng OUYANG  Leibo LIU  Shaojun WEI  

     
    PAPER

      Vol:
    E96-A No:6
      Page(s):
    1357-1365

    In this paper we propose a method to use features of an individual object to locate and recognize this object concurrently in a static image with Multi-feature fusion based on multiple objects sample library. This method is proposed based on the observation that lots of previous works focuses on category recognition and takes advantage of common characters of special category to detect the existence of it. However, these algorithms cease to be effective if we search existence of individual objects instead of categories in complex background. To solve this problem, we abandon the concept of category and propose an effective way to use directly features of an individual object as clues to detection and recognition. In our system, we import multi-feature fusion method based on colour histogram and prominent SIFT (p-SIFT) feature to improve detection and recognition accuracy rate. p-SIFT feature is an improved SIFT feature acquired by further feature extraction of correlation information based on Feature Matrix aiming at low computation complexity with good matching rate that is proposed by ourselves. In process of detecting object, we abandon conventional methods and instead take full use of multi-feature to start with a simple but effective way-using colour feature to reduce amounts of patches of interest (POI). Our method is evaluated on several publicly available datasets including Pascal VOC 2005 dataset, Objects101 and datasets provided by Achanta et al.

  • CropNET: A Wireless Multimedia Sensor Network for Agricultural Monitoring

    Shouyi YIN  Zhongfu SUN  Leibo LIU  Shaojun WEI  

     
    LETTER

      Vol:
    E93-B No:8
      Page(s):
    2073-2076

    Motivated by the needs of modern agriculture, in this paper we present CropNET, a wireless multimedia sensor network system for agriculture monitoring. Both hardware and software designs of CropNET are tailored for sensing in wide farmland without human supervision. We have carried out multiple rounds of deployments. The evaluation results show that CropNET performs well and facilitates modern agriculture.

  • Multi-Battery Scheduling for Battery-Powered DVS Systems

    Peng OUYANG  Shouyi YIN  Leibo LIU  Shaojun WEI  

     
    PAPER-Energy in Electronics Communications

      Vol:
    E95-B No:7
      Page(s):
    2278-2285

    More and more mobile devices adopt multi-battery and dynamic voltage scaling policy (DVS) to reduce the energy consumption and extend the battery runtime. However, since the nonlinear characteristics of the multi-battery are not considered, the practical efficiency is not good enough. In order to reduce the energy consumption and extend the battery runtime, this paper proposes an approach based on the battery characteristics to implement the co-optimization of the multi-battery scheduling and dynamic voltage scaling on multi-battery powered systems. In this work, considering the nonlinear discharging characteristics of the existing batteries, we use the Markov process to depict the multi-battery discharging behavior, and build a multi-objective optimal model to denote the energy consumption and battery states, then propose a binary tree based algorithm to solve this model. By means of this method, we get an optimal and applicable scheme about multi-battery scheduling and dynamic voltage scaling. Experimental results show that this approach achieves an average improvement in battery runtime of 17.5% over the current methods in physical implementation.

  • Structure and Mechanics Study of Slider Design for 5-15 nm Head-Disk Spacing

    Gang SHENG  Bo LIU  Wei HUA  

     
    PAPER

      Vol:
    E82-C No:12
      Page(s):
    2125-2131

    An integrated slider-suspension system was designed and prototyped. The structure of this system has a full flying air-bearing surface in the leading part with a contamination-resistant feature, and it accommodates a slider with a 5-15 nm head-disk spacing at the trailing part. Performance analysis and simulation were conducted to validate the high performances of the design. Two key issues, the rigid motions (vibrations) and the elastic motions of the slider, were investigated systematically. For the rigid motions, it was found that the natural frequencies of the slider system are dependent on the disk contact stiffness and that the slider vibrations under excitation exhibit various nonlinear resonance. For the elastic motions, the average elastic response of the slider body under the random interaction of the interface was derived and characterized.

  • Probability Model and Its Application on the Interaction of Nano-Spaced Slider/Disk Interface

    Wei HUA  Bo LIU  Gang SHENG  

     
    PAPER

      Vol:
    E82-C No:12
      Page(s):
    2139-2147

    The effect of surface roughness is crucial for contact recording and proximity recording. In this paper a probability model is developed for investigation of the influence of surface roughness on flying performance and the contact force of the slider. Simulations are conducted for both the contact recording slider and the proximity recording slider, and the results are well coordinated with the reported experimental results and the self-conducted experimental results. Studies are further extended to the characterization of the roughness of the air bearing surface and the disk surface that may support head/disk spacing between 5 nm and 15 nm.

  • Experimental Study of Slider-Disk Interaction in a Nanometer Spaced Head-Disk Interface

    Bo LIU  Yao-Long ZHU  Ying-Hui LI  

     
    PAPER

      Vol:
    E82-C No:12
      Page(s):
    2148-2154

    A head-disk spacing tester that includes the effect of lubricant will be necessary if the slider-disk interaction is to be considered. The interaction and interaction induced spacing variation can be quantitatively characterized by optical method and by replacing the functional disk media with a glass disk covered with a carbon layer and a lubricant layer of the same materials and the same layer thickness as the functional disk media. This paper reports a tester configuration based on that concept. Experimental investigations into the nanometer spaced head-disk interface with such a setup are presented also. Results indicate that the lubricant plays an important role in slider-disk interaction and the vibration of the slider-disk interface. Two types of interface vibration were noticed: contact vibration and bouncing vibration. For the bouncing case, the natural frequency of air-bearing and its fold frequencies will be excited and air-bearing plays more important role in the determination of the slider vibration, comparing with the contact-vibration case.

  • ABS Designs for Load/Unload and Shock Resistance

    Wei HUA  Ni SHENG  Bo LIU  

     
    PAPER

      Vol:
    E85-C No:10
      Page(s):
    1789-1794

    Load/unload techniques are widely used in mobile hard disk drives which have to endure external shocks frequently. ABS designs must consider both the load/unload performance and the shock resistance performance. Three ABS designs with different positions of the suction force center are studied in simulation. It is observed that when the position of the suction force center moves frontward, the anti-shock performance improves, but the unload performance degrades, and vice versa. A slider is not necessary to be designed to have its suction force center significantly behind of its geometric center, as the traditional load/unload sliders do. Instead, the suction force center can be designed near the geometric center if the hook limiter is used.

  • Parallelization of Computing-Intensive Tasks of the H.264 High Profile Decoding Algorithm on a Reconfigurable Multimedia System

    Tongsheng GENG  Leibo LIU  Shouyi YIN  Min ZHU  Shaojun WEI  

     
    PAPER

      Vol:
    E93-D No:12
      Page(s):
    3223-3231

    This paper proposes approaches to perform HW/SW (Hardware/Software) partition and parallelization of computing-intensive tasks of the H.264 HiP (High Profile) decoding algorithm on an embedded coarse-grained reconfigurable multimedia system, called REMUS (REconfigurable MUltimedia System). Several techniques, such as MB (Macro-Block) based parallelization, unfixed sub-block operation etc., are utilized to speed up the decoding process, satisfying the requirements of real-time and high quality H.264 applications. Tests show that the execution performance of MC (Motion Compensation), deblocking, and IDCT-IQ (Inverse Discrete Cosine Transform-Inverse Quantization) on REMUS is improved by 60%, 73%, 88.5% in the typical case and 60%, 69%, 88.5% in the worst case, respectively compared with that on XPP PACT (a commercial reconfigurable processor). Compared with ASIC solutions, the performance of MC is improved by 70%, 74% in the typical and in the worst case, respectively, while those of Deblocking remain the same. As for IDCT_IQ, the performance is improved by 17% no matter in the typical or worst case. Relying on the proposed techniques, 1080p@30 fps of H.264 HiP@ Level 4 decoding could be achieved on REMUS when utilizing a 200 MHz working frequency.

  • Battery-Aware Task Mapping for Coarse-Grained Reconfigurable Architecture

    Shouyi YIN  Rui SHI  Leibo LIU  Shaojun WEI  

     
    PAPER

      Vol:
    E96-D No:12
      Page(s):
    2524-2535

    Coarse-grained Reconfigurable Architecture (CGRA) is a parallel computing platform that provides both high performance of hardware and high flexibility of software. It is becoming a promising platform for embedded and mobile applications. Since the embedded and mobile devices are usually battery-powered, improving battery lifetime becomes one of the primary design issues in using CGRAs. In this paper, we propose a battery-aware task-mapping method to optimize energy consumption and improve battery lifetime. The proposed method mainly addresses two problems: task partitioning and task scheduling when mapping applications onto CGRA. The task partitioning and scheduling are formulated as a joint optimization problem of minimizing the energy consumption. The nonlinear effects of real battery are taken into account in problem formulation. Using the insights from the problem formulation, we design the task-mapping algorithm. We have used several real-world benchmarks to test the effectiveness of the proposed method. Experiment results show that our method can dramatically lower the energy consumption and prolong the battery-life.

  • An Inductive-Coupling Interconnected Application-Specific 3D NoC Design

    Zhen ZHANG  Shouyi YIN  Leibo LIU  Shaojun WEI  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E96-A No:12
      Page(s):
    2633-2644

    TSV-interconnected 3D chips face problems such as high cost, low yield and large power dissipation. We propose a wireless 3D on-chip-network architecture for application-specific SoC design, using inductive-coupling interconnect instead of TSV for inter-layer communication. Primary design challenge of inductive-coupling 3D SoC is allocating wireless links in the 3D on-chip network effectively. We develop a design flow fully exploiting the design space brought by wireless links while providing flexible tradeoff for user's choice. Experimental results show that our design brings great improvement over uniform design and Sunfloor algorithm on latency (5% to 20%) and power consumption (10% to 45%).

  • An Implementation of Multiple-Standard Video Decoder on a Mixed-Grained Reconfigurable Computing Platform

    Leibo LIU  Dong WANG  Yingjie CHEN  Min ZHU  Shouyi YIN  Shaojun WEI  

     
    PAPER-Computer System

      Pubricized:
    2016/02/02
      Vol:
    E99-D No:5
      Page(s):
    1285-1295

    This paper presents the design of a multiple-standard 1080 high definition (HD) video decoder on a mixed-grained reconfigurable computing platform integrating coarse-grained reconfigurable processing units (RPUs) and FPGAs. The proposed RPU, including 16×16 multi-functional processing elements (PEs), is used to accelerate compute-intensive tasks in the video decoding. A soft-core-based microprocessor array is implemented on the FPGA and adopted to speed-up the dynamic reconfiguration of the RPU. Furthermore, a mail-box-based communication scheme is utilized to improve the communication efficiency between RPUs and FPGAs. By exploiting dynamic reconfiguration of the RPUs and static reconfiguration of the FPGAs, the proposed platform achieves scalable performances and cost trade-offs to support a variety of video coding standards, including MPEG-2, AVS, H.264, and HEVC. The measured results show that the proposed platform can support H.264 1080 HD video streams at up to 57 frames per second (fps) and HEVC 1080 HD video streams at up to 52fps under 250MHz, at the same time, it achieves a 3.6× performance gain over an industrial coarse-grained reconfigurable processor for H.264 decoding, and a 6.43× performance boosts over a general purpose processor based implementation for HEVC decoding.

  • On Finding Maximum Disjoint Paths for Many-to-One Routing in Wireless Multi-Hop Network

    Bo LIU  Junzhou LUO  Feng SHAN  Wei LI  Jiahui JIN  Xiaojun SHEN  

     
    PAPER

      Vol:
    E97-D No:10
      Page(s):
    2632-2640

    Provisioning multiple paths can improve fault tolerance and transport capability of multi-routing in wireless networks. Disjoint paths can improve the diversity of paths and further reduce the risk of simultaneous link failure and network congestion. In this paper we first address a many-to-one disjoint-path problem (MOND) for multi-path routing in a multi-hop wireless network. The objective of this problem is to maximize the minimum number of disjoint paths of every source to the destination. We prove that it is NP-hard to obtain k disjoint paths for every source when k ≥ 3. To solve this problem efficiently, we propose a heuristic algorithm called TOMAN based on network flow theory. Experimental results demonstrate that it outperforms three related algorithms.

  • Battery-Aware Loop Nests Mapping for CGRAs

    Yu PENG  Shouyi YIN  Leibo LIU  Shaojun WEI  

     
    PAPER-Architecture

      Vol:
    E98-D No:2
      Page(s):
    230-242

    Coarse-grained Reconfigurable Architecture (CGRA) is a promising mobile computing platform that provides both high performance and high energy efficiency. In an application, loop nests are usually mapped onto CGRA for further acceleration, so optimizing the mapping is an important goal for design of CGRAs. Moreover, obviously almost all of mobile devices are powered by batteries, how to reduce energy consumption also becomes one of primary concerns in using CGRAs. This paper makes three contributions: a) Proposing an energy consumption model for CGRA; b) Formulating loop nests mapping problem to minimize the battery charge loss; c) Extract an efficient heuristic algorithm called BPMap. Experiment results on most kernels of the benchmarks and real-life applications show that our methods can improve the performance of the kernels and lower the energy consumption.

  • Parallelization of Computing-Intensive Tasks of SIFT Algorithm on a Reconfigurable Architecture System

    Peng OUYANG  Shouyi YIN  Hui GAO  Leibo LIU  Shaojun WEI  

     
    PAPER

      Vol:
    E96-A No:6
      Page(s):
    1393-1402

    Scale Invariant Feature Transform (SIFT) algorithm is a very excellent approach for feature detection. It is characterized by data intensive computation. The current studies of accelerating SIFT algorithm are mainly reflected in three aspects: optimizing the parallel parts of the algorithm based on general-purpose multi-core processors, designing the customized multi-core processor dedicated for SIFT, and implementing it based on the FPGA platform. The real-time performance of SIFT has been highly improved. However, the factors such as the input image size, the number of octaves and scale factors in the SIFT algorithm are restricted for some solutions, the flexibility that ensures the high execution performance under variable factors should be improved. This paper proposes a reconfigurable solution to solve this problem. We fully exploit the algorithm and adopt several techniques, such as full parallel execution, block computation and CORDIC transformation, etc., to improve the execution efficiency on a REconfigurable MUltimedia System called REMUS. Experimental results show that the execution performance of the SIFT is improved by 33%, 50% and 8 times comparing with that executed in the multi-core platform, FPGA and ASIC separately. The scheme of dynamic reconfiguration in this work can configure the circuits to meet the computation requirements under different input image size, different number of octaves and scale factors in the process of computing.

1-20hit(46hit)