The search functionality is under construction.

Author Search Result

[Author] Ming-Der SHIEH(10hit)

1-10hit
  • Blind Channel Estimation for SIMO-OFDM Systems without Cyclic Prefix

    Shih-Hao FANG  Ju-Ya CHEN  Ming-Der SHIEH  Jing-Shiun LIN  

     
    LETTER-Communication Theory and Signals

      Vol:
    E93-A No:1
      Page(s):
    339-343

    A blind channel estimation algorithm based on the subspace method for single-input multiple-output (SIMO) orthogonal frequency division multiplexing (OFDM) systems is proposed in this letter. With the aid of a repetition index, the conventional algorithm is a special case of our algorithm. Compared with related studies, the proposed algorithm reduces the computational complexity of the SVD operation and is suitable for cyclic-prefix-free systems. In particular, the necessary condition of the proposed signal matrix to be full rank can be satisfied with fewer OFDM blocks. Simulation results demonstrate that the proposed algorithm outperforms conventional methods in normalized mean-square error.

  • Novel Algorithms and VLSI Design for Division over GF(2m)

    Chien-Hsing WU  Chien-Ming WU  Ming-Der SHIEH  Yin-Tsung HWANG  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E85-A No:5
      Page(s):
    1129-1139

    In this paper, we present the division algorithm (DA) for the computation of b=c/a over GF(2m) in two aspects. First, we derive a new formulation for the discrete-time Wiener-Hopf equation (DTWHE) Ab = c in GF(2) over any basis. Symmetry of the matrix A is observed on some special bases and a three-step procedure is developed to solve the symmetric DTWHE. Secondly, we extend a variant of Stein's binary algorithm and propose a novel iterative division algorithm EB*. Owing to its structural simplicity, this algorithm can be mapped onto a systolic array with high speed and low area complexity.

  • Low-Complexity Memory Access Architectures for Quasi-Cyclic LDPC Decoders

    Ming-Der SHIEH  Shih-Hao FANG  Shing-Chung TANG  Der-Wei YANG  

     
    PAPER-Computer System

      Vol:
    E95-D No:2
      Page(s):
    549-557

    Partially parallel decoding architectures are widely used in the design of low-density parity-check (LDPC) decoders, especially for quasi-cyclic (QC) LDPC codes. To comply with the code structure of parity-check matrices of QC-LDPC codes, many small memory blocks are conventionally employed in this architecture. The total memory area usually dominates the area requirement of LDPC decoders. This paper proposes a low-complexity memory access architecture that merges small memory blocks into memory groups to relax the effect of peripherals in small memory blocks. A simple but efficient algorithm is also presented to handle the additional delay elements introduced in the memory merging method. Experiment results on a rate-1/2 parity-check matrix defined in the IEEE 802.16e standard show that the LDPC decoder designed using the proposed memory access architecture has the lowest area complexity among related studies. Compared to a design with the same specifications, the decoder implemented using the proposed architecture requires 33% fewer gates and is more power-efficient. The proposed new memory access architecture is thus suitable for the design of low-complexity LDPC decoders.

  • Exploring General Memory Structures in Turbo Decoders Using Sliding-Window MAP Algorithm

    Chien-Ming WU  Ming-Der SHIEH  Chien-Hsing WU  

     
    PAPER-Communication Devices/Circuits

      Vol:
    E86-B No:11
      Page(s):
    3163-3173

    Turbo coding is a powerful coding technique that can provide highly reliable data transmission at extremely low signal-to-noise ratios. Owing to the computational complexity of the employed decoding algorithm, the realization of turbo decoders usually takes a large amount of memory space and potentially long decoding delay. Therefore, an efficient memory management strategy becomes one of the key factors toward successfully implementing turbo decoders. This paper focuses on the development of general structures for efficient memory management of turbo decoders employing the sliding-window (Log-) MAP algorithm. Three different structures and the associated mathematic representations are derived to evaluate the required memory size, average decoding rate, and latency based on the speed and the number of the adopted processors. Comparative results show the dependency of the resulting performance based on a set of parameters; thus provide useful and general information on practical implementations of turbo decoders.

  • Reducing Interconnect Complexity for Efficient Path Metric Memory Management in Viterbi Decoders

    Ming-Der SHIEH  Tai-Ping WANG  Chien-Ming WU  

     
    PAPER-VLSI Systems

      Vol:
    E91-D No:9
      Page(s):
    2300-2311

    We present a systematic and efficient way of managing the path metric memory and simplifying its connection network to the add_compare_select unit (ACSU) for Viterbi decoder (VD) design. Using the derived equations for memory partition and add-compare-select (ACS) arrangement together with the extended in-place scheduling scheme proposed in this work, we can increase the memory bandwidth for conflict-free path metric accesses with hardwired interconnection between the path metric memory and ACSU. Compared with the existing work, the developed architecture possesses the following advantages: (1) Each partitioned memory bank can be treated as a local memory of a specific processing element, inside the ACSU, with hardwired interconnection, so that the interconnect complexity is reduced significantly. (2) The partitioned memory banks can be merged into only two pseudo-banks regardless of the number of adopted ACS processing elements. This not only greatly simplifies the design of address generation unit, but also makes smaller the physical size of required memory. (3) The implementation can be accomplished in a systematic way with regular and simple controlling circuitry. Experimental results demonstrate the effectiveness of the developed architecture and the benefit will be more apparent for convolutional codes with large memory order.

  • Design and Implementation of a Low-Complexity Reed-Solomon Decoder for Optical Communication Systems

    Ming-Der SHIEH  Yung-Kuei LU  

     
    PAPER-Computer System

      Vol:
    E94-D No:8
      Page(s):
    1557-1564

    A low-complexity Reed-Solomon (RS) decoder design based on the modified Euclidean (ME) algorithm proposed by Truong is presented in this paper. Low complexity is achieved by reformulating Truong's ME algorithm using the proposed polynomial manipulation scheme so that a more compact polynomial representation can be derived. Together with the developed folding scheme and simplified boundary cell, the resulting design effectively reduces the hardware complexity while meeting the throughput requirements of optical communication systems. Experimental results demonstrate that the developed RS(255, 239) decoder, implemented in the TSMC 0.18 µm process, can operate at up to 425 MHz and achieve a throughput rate of 3.4 Gbps with a total gate count of 11,759. Compared to related works, the proposed decoder has the lowest area requirement and the smallest area-time complexity.

  • Design of a High-Throughput CABAC Encoder

    Chia-Cheng LO  Ying-Jhong ZENG  Ming-Der SHIEH  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E92-D No:4
      Page(s):
    681-688

    Context-based Adaptive Binary Arithmetic Coding(CABAC) is one of the algorithmic improvements that the H.264/AVC standard provides to enhance the compression ratio of video sequences. Compared with the context-based adaptive variable length coding (CAVLC), CABAC can obtain a better compression ratio at the price of higher computation complexity. In particular, the inherent data dependency and various types of syntax elements in CABAC results in a dramatically increased complexity if two bins obtained from binarized syntax elements are handled at a time. By analyzing the distribution of binarized bins in different video sequences, this work shows how to effectively improve the encoding rate with limited hardware overhead by allowing only a certain type of syntax element to be processed two bins at a time. Together with the proposed context memory management scheme and range renovation method, experimental results reveal that an encoding rate of up to 410 M-bin/s can be obtained with a limited increase in hardware requirement. Compared with related works that do not support multi-symbol encoding, our development can achieve nearly twice their throughput rates with less than 25 % hardware overhead.

  • High-Speed Low-Complexity Architecture for Reed-Solomon Decoders

    Yung-Kuei LU  Ming-Der SHIEH  

     
    PAPER-Computer System

      Vol:
    E93-D No:7
      Page(s):
    1824-1831

    This paper presents a high-speed, low-complexity VLSI architecture based on the modified Euclidean (ME) algorithm for Reed-Solomon decoders. The low-complexity feature of the proposed architecture is obtained by reformulating the error locator and error evaluator polynomials to remove redundant information in the ME algorithm proposed by Truong. This increases the hardware utilization of the processing elements used to solve the key equation and reduces hardware by 30.4%. The proposed architecture retains the high-speed feature of Truong's ME algorithm with a reduced latency, achieved by changing the initial settings of the design. Analytical results show that the proposed architecture has the smallest critical path delay, latency, and area-time complexity in comparison with similar studies. An example RS(255,239) decoder design, implemented using the TSMC 0.18 µm process, can reach a throughput rate of 3 Gbps at an operating frequency of 375 MHz and with a total gate count of 27,271.

  • Reconfigurable Homogenous Multi-Core FFT Processor Architectures for Hybrid SISO/MIMO OFDM Wireless Communications

    Chin-Long WEY  Shin-Yo LIN  Pei-Yun TSAI  Ming-Der SHIEH  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E94-A No:7
      Page(s):
    1530-1539

    Multi-core processors have been attracting a great deal of attention. In the domain of signal processing for communications, the current trends toward rapidly evolving standards and formats, and toward algorithms adaptive to dynamic factors in the environment, require programmable solutions that possess both algorithm flexibility and low implementation complexity. Reconfigurable architectures have demonstrated better tradeoffs between algorithm flexibility, implementation complexity, and energy efficiency. This paper presents a reconfigurable homogeneous memory-based FFT processor (MBFFT) architecture integrated in a single chip to provide hybrid SISO/MIMO OFDM wireless communication systems. For example, a reconfigurable MBFFT processor with eight processing elements (PEs) can be configured for one DVB-T/H with N=8192 and two 802.11n with N=128. The reconfigurable processors can perfectly fit the applications of Software Defined Radio (SDR) which requires more hardware flexibility.

  • High-Speed Design of Montgomery Inverse Algorithm over GF(2m)

    Ming-Der SHIEH  Jun-Hong CHEN  Chien-Ming WU  

     
    PAPER-Information Security

      Vol:
    E89-A No:2
      Page(s):
    559-565

    Montgomery algorithm has demonstrated its effectiveness in applications like cryptosystems. Most of the existing works on finding the Montgomery inverse of an element over the Galois field are based on the software implementation, which is then extended to derive the scalable hardware architecture. In this work, we consider a fundamental change at the algorithmic level and eliminate the potential problems in hardware implementation which makes the resulting modified Montgomery inverse algorithm over GF(2m) very suitable for hardware realization. Due to its structural simplicity, the modified algorithm can be easily mapped onto a high-speed and possibly low-complexity circuit. Experimental results show that our development can achieve both the area and speed advantages over the previous work when the inversion operation over GF(2m) is under consideration and the improvement becomes more significant when we increase the value of m as in the applications of cryptosystems. The salient property of our development sustains the high-speed operation as well as low hardware complexity over a wide range of m for commercial cryptographic applications and makes it suitable for both the scalable architecture and direct hardware implementation.