1-3hit |
Chin-Chin CHEN Chiou-Yng LEE Erl-Huei LU
This work presents a novel scalable and systolic Montgomery's algorithm in GF(2m). The proposed algorithm is based on the Toeplitz matrix-vector representation, which obtains the scalable and systolic Montgomery multiplier in a flexible manner, and can adapt to the required precision. Analytical results indicate that the proposed multiplier over the generic field of GF(2m) has a latency of d+n(2n+1), where n = m / d , and d denotes the selected digital size. The latency is reduced to d+n(n+1) clock cycles when the field is constructed from generalized equally-spaced polynomials. Since the selected digital size is d ≥5 bits, the proposed architectures have lower time-space complexity than traditional digit-serial multipliers. Moreover, the proposed architectures have regularity, modularity and local interconnect ability, making them very suitable for VLSI implementation.
Tso-Cho CHEN Erl-Huei LU Chia-Jung LI Kuo-Tsang HUANG
In this paper, a weighted multiple bit flipping (WMBF) algorithman for decoding low-density parity-check (LDPC) codes is proposed first. Then the improved WMBF algorithm which we call the efficient weighted bit-flipping (EWBF) algorithm is developed. The EWBF algorithm can dynamically choose either multiple bit-flipping or single bit-flipping in each iteration according to the log-likelihood ratio of the error probability of the received bits. Thus, it can efficiently increase the convergence speed of decoding and prevent the decoding process from falling into loop traps. Compared with the parallel weighted bit-flipping (PWBF) algorithm, the EWBF algorithm can achieve significantly lower computational complexity without performance degradation when the Euclidean geometry (EG)-LDPC codes are decoded. Furthermore, the flipping criterion does not require any parameter adjustment.
In order to reduce the iterative decoding delay of convolutional turbo codes, this paper presents a concurrent decoding algorithm for the hardware implementation of turbo convolutional decoders. Different than a general turbo code, the hardware turbo decoder based on the proposed algorithm can update the priori information of message for each component code in a bit-by-bit manner as soon as it is generated by the other component code. The two component codes in a turbo code can thus be decoded concurrently, by using a single MAP decoder, subsequently reducing the decoding latency by approximately half while maintaining the bit error rate performance and a comparable hardware complexity, as a general turbo decoder.