The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] radix(24hit)

1-20hit(24hit)

  • Optimization Algorithm with Automatic Adjustment of the Number of Switches in the Order/Radix Problem

    Masaki TSUKAMOTO  Yoshiko HANADA  Masahiro NAKAO  Keiji YAMAMOTO  

     
    PAPER

      Pubricized:
    2023/06/12
      Vol:
    E106-D No:12
      Page(s):
    1979-1987

    The Order/Radix Problem (ORP) is an optimization problem that can be solved to find an optimal network topology in distributed memory systems. It is important to find the optimum number of switches in the ORP. In the case of a regular graph, a good estimation of the preferred number of switches has been proposed, and it has been shown that simulated annealing (SA) finds a good solution given a fixed number of switches. However, generally the optimal graph does not necessarily satisfy the regular condition, which greatly increases the computational costs required to find a good solution with a suitable number of switches for each case. This study improved the new method based on SA to find a suitable number of switches. By introducing neighborhood searches in which the number of switches is increased or decreased, our method can optimize a graph by changing the number of switches adaptively during the search. In numerical experiments, we verified that our method shows a good approximation for the best setting for the number of switches, and can simultaneously generate a graph with a small host-to-host average shortest path length, using instances presented by Graph Golf, an international ORP competition.

  • A 12-bit 1.25MS/s Area-Efficient Radix-Value Self-Estimated Non-Binary Cyclic ADC with Relaxed Requirements on Analog Components

    Hao SAN  Rompei SUGAWARA  Masao HOTTA  Tatsuji MATSUURA  Kazuyuki AIHARA  

     
    PAPER

      Vol:
    E100-A No:2
      Page(s):
    534-540

    A 12-bit 1.25MS/s cyclic analog-to-digital converter (ADC) is designed and fabricated in 90nm CMOS technology, and only occupies an active area as small as 0.037mm2. The proposed ADC is composed of a non-binary AD convertion stage, and a on-chip non-binary-to-binary digital block includes a built-in radix-value self-estimation scheme. Therefore, althouh a non-binary convertion architechture is adopted, the proposed ADC is the same as other stand-alone binary ADCs. The redundancy of non-binary 1-bit/step architecture relaxes the accuracy requirement on analog components of ADC. As a result, the implementation of analog circuits such as amplifier and comparator becomes simple, and high-density Metal-Oxide-Metal (MOM) capacitors can be used to achieve a small chip area. Furthermore, the novel radix-value self-estimation technique can be realized by only simple logic circuits without any extra analog input, so that the total active area of ADC is dramatically reduced. The prototype ADC achieves a measured peak signal-to-noise-and-distortion-ratio (SNDR) of 62.3dB using a poor DC gain amplifier as low as 45dB and MOM capacitors without any careful layout techniques to improve the capacitor matching. The proposed ADC dissipated 490µW in analog circuits at 1.4V power supply and 1.25Msps (20MHz clocking). The measured DNL is +0.94/-0.71LSB and INL is +1.9/-1.2LSB at 30kHz sinusoidal input.

  • Experimental Implementation of Non-binary Cyclic ADCs with Radix Value Estimation Algorithm

    Rompei SUGAWARA  Hao SAN  Kazuyuki AIHARA  Masao HOTTA  

     
    PAPER

      Vol:
    E97-C No:4
      Page(s):
    308-315

    Proof-of-concept cyclic analog-to-digital converters (ADCs) have been designed and fabricated in 90-nm CMOS technology. The measurement results of an experimental prototype demonstrate the effectiveness of the proposed switched-capacitor (SC) architecture to realize a non-binary ADC based on β expansion. Different from the conventional binary ADC, a simple 1-bit/step structure for an SC multiplying digital-to-analog converter (MDAC) is proposed to present residue amplification by β (1 < β < 2). The redundancy of non-binary ADCs with radix β tolerates the non-linear conversion errors caused by the offsets of comparators, the mismatches of capacitors, and the finite DC gains of amplifiers, which are used in the MDAC. We also employed a radix value estimation algorithm to obtain an effective value of β for non-binary encoding; it can be realized by merely adding a simple conversion sequence and digital circuits. As a result, the power penalty of a high-gain wideband amplifier and the required accuracy of the circuit elements for a high-resolution ADC were largely relaxed so that the circuit design was greatly simplified. The implemented ADC achieves a measured peak signal-to-noise-and-distortion-ratio (SNDR) of 60.44dB, even with an op-amp with a poor DC gain (< 50dB) while dissipating 780µW in analog circuits at 1.4V and occupying an active area of 0.25 × 0.26mm2.

  • Robust Cyclic ADC Architecture Based on β-Expansion

    Rie SUZUKI  Tsubasa MARUYAMA  Hao SAN  Kazuyuki AIHARA  Masao HOTTA  

     
    PAPER

      Vol:
    E96-C No:4
      Page(s):
    553-559

    In this paper, a robust cyclic ADC architecture with β-encoder is proposed and circuit scheme using switched-capacitor (SC) circuit is introduced. Different from the conventional binary ADC, the redundancy of proposed cyclic ADC outputs β-expansion code and has an advantage of error correction. This feature makes ADC robust against the offset of comparator capacitor mismatch and finite DC gain of amplifier in multiplying-DAC (MDAC). Because the power penalty of high-gain wideband amplifier and the required accuracy of circuit elements for high resolution ADC can be relaxed, the proposed architecture is suitable for deep submicron CMOS technologies beyond 90 nm. We also propose a β-value estimation algorithm to realize high accuracy ADC based on β-expansion. The simulation results show the effectiveness of proposed architecture and robustness of β-encoder.

  • Low Cost CORDIC-Based Configurable FFT/IFFT Processor for OFDM Systems

    Dongpei LIU  Hengzhu LIU  Botao ZHANG  Jianfeng ZHANG  Shixian WANG  Zhengfa LIANG  

     
    PAPER-OFDM

      Vol:
    E95-A No:10
      Page(s):
    1683-1691

    High-performance FFT processor is indispensable for real-time OFDM communication systems. This paper presents a CORDIC based design of variable-length FFT processor which can perform various FFT lengths of 64/128/256/512/1024/2048/4096/8192-point. The proposed FFT processor employs memory based architecture in which mixed radix 4/2 algorithm, pipelined CORDIC, and conflict-free parallel memory access scheme are exploited. Besides, the CORDIC rotation angles are generated internally based on the transform of butterfly counter, which eliminates the need of ROM making it memory-efficient. The proposed architecture has a lower hardware complexity because it is ROM-free and with no dedicated complex multiplier. We implemented the proposed FFT processor and verified it on FPGA development platform. Additionally, the processor is also synthesized in 0.18 µm technology, the core area of the processor is 3.47 mm2 and the maximum operating frequency can be up to 500 MHz. The proposed FFT processor is better trade off performance and hardware overhead, and it can meet the speed requirement of most modern OFDM system, such as IEEE 802.11n, WiMax, 3GPP-LTE and DVB-T/H.

  • Design and Implementation of the Parameterized Multi-Standard High-Throughput Radix-4 Viterbi Decoder on FPGA

    Rongchun LI  Yong DOU  Yuanwu LEI  Shice NI  Song GUO  

     
    PAPER-Fundamental Theories for Communications

      Vol:
    E95-B No:5
      Page(s):
    1602-1611

    This paper presents a parameterized multi-standard adaptive radix-4 Viterbi decoder with high throughput and low complexity. The proposed Viterbi decoder supports constraint lengths ranging from 3-9, code rates in the range of 1/2-1/3, and arbitrary truncation lengths. We present a novel fabric of Add-Compare-Select Unit (ACSU) and methods of unsigned quantization and efficient normalization that shorten the critical path. The decoder achieves a low bit error ratio in multiple standards, such as GPRS, WiMax, LTE, CDMA, and 3G. The proposed decoder is implemented on Xilinx XC5VLX330 device and the frequency achieved is 181.7 MHz. The throughput of the proposed decoder can reach 363 Mbps, which is superior to the other current multi-standard Viterbi decoders or radix-4 Viterbi decoders on the FPGA platform.

  • A New DA Implementation Technique for Digital Filters Using Radix-16 Modified Booth Algorithm

    Ji-Hye SHIN  Young-Beom JANG  

     
    LETTER-Digital Signal Processing

      Vol:
    E94-A No:4
      Page(s):
    1136-1139

    In this paper, a new DA (Distributed Arithmetic) filter implementation technique is presented. Contrary to the conventional DA technique using ROM, the proposed implementation technique avoids the use of ROM since it does not need the combinations of filter coefficients. Furthermore, by using the Radix-16 modified Booth algorithm, implementation complexity of the proposed structure can be reduced. Through the HDL coding and synthesis, it was shown that 41.6% of implementation area can be reduced.

  • Unified Dual-Radix Architecture for Scalable Montgomery Multiplications in GF(P) and GF(2n)

    Kazuyuki TANIMURA  Ryuta NARA  Shunitsu KOHARA  Youhua SHI  Nozomu TOGAWA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E92-A No:9
      Page(s):
    2304-2317

    Modular multiplication is the most dominant arithmetic operation in elliptic curve cryptography (ECC), that is a type of public-key cryptography. Montgomery multiplier is commonly used to compute the modular multiplications and requires scalability because the bit length of operands varies depending on its security level. In addition, ECC is performed in GF(P) or GF(2n), and unified architecture for multipliers in GF(P) and GF(2n) is required. However, in previous works, changing frequency is necessary to deal with delay-time difference between GF(P) and GF(2n) multipliers because the critical path of the GF(P) multiplier is longer. This paper proposes unified dual-radix architecture for scalable Montgomery multiplications in GF(P) and GF(2n). This proposed architecture unifies four parallel radix-216 multipliers in GF(P) and a radix-264 multiplier in GF(2n) into a single unit. Applying lower radix to GF(P) multiplier shortens its critical path and makes it possible to compute the operands in the two fields using the same multiplier at the same frequency so that clock dividers to deal with the delay-time difference are not required. Moreover, parallel architecture in GF(P) reduces the clock cycles increased by dual-radix approach. Consequently, the proposed architecture achieves to compute a GF(P) 256-bit Montgomery multiplication in 0.28 µs. The implementation result shows that the area of the proposal is almost the same as that of previous works: 39 kgates.

  • Efficient MRC-Based Residue to Binary Converters for the New Moduli Sets {22n, 2n -1, 2n+1 -1} and {22n, 2n -1, 2n-1 -1}

    Amir Sabbagh MOLAHOSSEINI  Chitra DADKHAH  Keivan NAVI  Mohammad ESHGHI  

     
    PAPER-Computer Systems

      Vol:
    E92-D No:9
      Page(s):
    1628-1638

    In this paper, the new residue number system (RNS) moduli sets {22n, 2n -1, 2n+1 -1} and {22n, 2n -1, 2n-1 -1} are introduced. These moduli sets have 4n-bit dynamic range and well-formed moduli which can result in high-performance residue to binary converters as well as efficient RNS arithmetic unit. Next, efficient residue to binary converters for the proposed moduli sets based on mixed-radix conversion (MRC) algorithm are presented. The converters are ROM-free and they are realized using carry-save adders and modulo adders. Comparison with the other residue to binary converters for 4n-bit dynamic range moduli sets shown that the presented designs based on new moduli sets {22n, 2n -1, 2n+1 -1} and {22n, 2n -1, 2n-1 -1} are improved the conversion delay and result in hardware savings. Also, the proposed moduli sets can lead to efficient binary to residue converters, and they can speed-up internal RNS arithmetic processing, compared with the other 4n-bit dynamic range moduli sets.

  • Bucket Sieving

    Kazumaro AOKI  Hiroki UEDA  

     
    PAPER-Theory

      Vol:
    E92-A No:8
      Page(s):
    1845-1850

    This paper proposes a new sieving algorithm that employs a bucket sort as a part of a factoring algorithm such as the number field sieve. The sieving step requires an enormous number of memory updates; however, these updates usually cause cache hit misses. The proposed algorithm significantly reduces the number of cache hit misses when the size of the sieving region is roughly less than the square of the cache size, and the memory updates are several times faster than the straightforward implementation according to the PC experiments.

  • A High-Speed Two-Parallel Radix-24 FFT/IFFT Processor for MB-OFDM UWB Systems

    Jeesung LEE  Hanho LEE  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E91-A No:4
      Page(s):
    1206-1211

    This paper presents a novel high-speed, low-complexity two-parallel 128-point radix-24 FFT/IFFT processor for MB-OFDM ultrawideband (UWB) systems. The proposed high-speed, low-complexity FFT architecture can provide a higher throughput rate and low hardware complexity by using a two-parallel data-path scheme and a single-path delay-feedback (SDF) structure. The radix-24 FFT algorithm is also realized in our processor to reduce the number of complex multiplications. The proposed FFT/IFFT processor has been designed and implemented with 0.18-µm CMOS technology in a supply voltage of 1.8 V. The proposed two-parallel FFT/IFFT processor has a throughput rate of up to 900 Msample/s at 450 MHz while requiring much smaller hardware complexity and low power consumption.

  • A High-Speed Design of Montgomery Multiplier

    Yibo FAN  Takeshi IKENAGA  Satoshi GOTO  

     
    PAPER

      Vol:
    E91-A No:4
      Page(s):
    971-977

    With the increase of key length used in public cryptographic algorithms such as RSA and ECC, the speed of Montgomery multiplication becomes a bottleneck. This paper proposes a high speed design of Montgomery multiplier. Firstly, a modified scalable high-radix Montgomery algorithm is proposed to reduce critical path. Secondly, a high-radix clock-saving dataflow is proposed to support high-radix operation and one clock cycle delay in dataflow. Finally, a hardware-reused architecture is proposed to reduce the hardware cost and a parallel radix-16 design of data path is proposed to accelerate the speed. By using HHNEC 0.25 µm standard cell library, the implementation results show that the total cost of Montgomery multiplier is 130 KGates, the clock frequency is 180 MHz and the throughput of 1024-bit RSA encryption is 352 kbps. This design is suitable to be used in high speed RSA or ECC encryption/decryption. As a scalable design, it supports any key-length encryption/decryption up to the size of on-chip memory.

  • Design Methods of Radix Converters Using Arithmetic Decompositions

    Yukihiro IGUCHI  Tsutomu SASAO  Munehiro MATSUURA  

     
    PAPER-Computer Components

      Vol:
    E90-D No:6
      Page(s):
    905-914

    In arithmetic circuits for digital signal processing, radixes other than two are often used to make circuits faster. In such cases, radix converters are necessary. However, in general, radix converters tend to be complex. This paper considers design methods for p-nary to binary converters. First, it considers Look-Up Table (LUT) cascade realizations. Then, it introduces a new design technique called arithmetic decomposition by using LUTs and adders. Finally, it compares the amount of hardware and performance of radix converters implemented by FPGAs. 12-digit ternary to binary converters on Cyclone II FPGAs designed by the proposed method are faster than ones by conventional methods.

  • Cache Efficient Radix Sort for String Sorting

    Waihong NG  Katsuhiko KAKEHI  

     
    PAPER-Algorithms and Data Structures

      Vol:
    E90-A No:2
      Page(s):
    457-466

    In this paper, we propose CRadix sort, a new string sorting algorithm based on MSD radix sort. CRadix sort causes fewer cache misses than MSD radix sort by uniquely associating a small block of main memory called the key buffer to each key and temporarily storing a portion of each key into its corresponding key buffer. Experimental results in running time comparisons with other string sorting algorithms are provided for showing the effectiveness of CRadix sort.

  • Fast Fourier Transform Algorithm for Low-Power and Area-Efficient Implementation

    Jung-Yeol OH  Myoung-Seob LIM  

     
    LETTER-Devices/Circuits for Communications

      Vol:
    E89-B No:4
      Page(s):
    1425-1429

    This paper proposes the new radix-24 FFT algorithm and an efficient pipeline FFT architecture based on the algorithm for wideband OFDM systems. The proposed pipeline architecture has the same number of multipliers as that of the radix-22 algorithm. However, the multiplication complexity is reduced more than 30% by using the newly proposed CSD constant multipliers instead of the programmable multipliers. From the synthesis simulations of a standard 0.35 µm CMOS SAMSUNG process, the proposed CSD constant complex multiplier achieved a reduction of more than 60% of the power consumption/area when compared with the conventional programmable complex multiplier.

  • Radix-r Non-Adjacent Form and Its Application to Pairing-Based Cryptosystem

    Tsuyoshi TAKAGI  David REIS, Jr.  Sung-Ming YEN  Bo-Ching WU  

     
    PAPER-Elliptic Curve Cryptography

      Vol:
    E89-A No:1
      Page(s):
    115-123

    Recently, the radix-3 representation of integers is used for the efficient implementation of pairing based cryptosystems. In this paper, we propose non-adjacent form of radix-r representation (rNAF) and efficient algorithms for generating rNAF. The number of non-trivial digits is (r-2)(r+1)/2 and its average density of non-zero digit is asymptotically (r-1)/(2r-1). For r=3, the non-trivial digits are {2, 4} and the non-zero density is 0.4. We then investigate the width-w version of rNAF for the general radix-r representation, which is a natural extension of the width-w NAF. Finally we compare the proposed algorithms with the generalized NAF (gNAF) discussed by Joye and Yen. The proposed scheme requires a larger table but its non-zero density is smaller even for large radix. We explain that gNAF is a simple degeneration of rNAF--we can consider that rNAF is a canonical form for the radix-r representation. Therefore, rNAF is a good alternative to gNAF.

  • New Radix-2 to the 4th Power Pipeline FFT Processor

    Jung-Yeol OH  Myoung-Seob LIM  

     
    PAPER

      Vol:
    E88-C No:8
      Page(s):
    1740-1746

    This paper proposes a new modified radix-24 FFT algorithm and an efficient pipeline FFT architecture based on this algorithm for OFDM systems. This pipeline FFT architecture has the same number of multipliers as that of the radix-22 algorithm. However, the multiplication complexity could be reduced by more than 30% by replacing one half of the programmable multipliers by the newly proposed CSD constant multipliers. From the synthesis simulations of a standard 0.35 µm CMOS SAMSUNG process, a proposed CSD constant complex multiplier achieved more than 60% area efficiency when compared to the conventional programmable complex multiplier. This promoted efficiency could be used to the design of a long length FFT processor in wireless OFDM applications, which needs more power and area efficiency.

  • An Enhanced Memory Assignment Scheme for Memory-Based FFT Processor

    Youn-Seog CHANG  Sin-Chong PARK  

     
    LETTER-VLSI Design Technology and CAD

      Vol:
    E87-A No:11
      Page(s):
    3020-3024

    In this study, we analyze the memory-based architecture of the FFT processor using the radix-4, and propose a novel mechanism that improves the throughput while simultaneously decreasing the area using single-port memories with several banks.

  • Interval Arithmetic Operations in Residue Number System

    Ki Ja LEE  

     
    PAPER-Algorithms

      Vol:
    E85-D No:9
      Page(s):
    1361-1371

    Algorithms are presented for the four elementary arithmetic operations, to perform reliable floating-point arithmetic operations. These arithmetic operations can be achieved by applying residue techniques to the weighted number systems and performed with no accuracy lost in the process of the computing. The arithmetic operations presented can be used as elementary tools (on many existing architectures) to ensure the reliability of numerical computations. Simulation results especially for the solutions of ill-conditioned problems are given with emphasis on the practical usability of the tools.

  • A Compact Radix-64 54 54 CMOS Redundant Binary Parallel Multiplier

    Sang-Hoon LEE  Seung-Jun BAE  Hong-June PARK  

     
    PAPER-Electronic Circuits

      Vol:
    E85-C No:6
      Page(s):
    1342-1350

    The radix-64 encoding scheme was used to reduce the number of partial products in the 5454 CMOS parallel multiplier. The transistor counts, the chip area and the power-delay product were reduced by 28%, 22%, and 17%, respectively, compared to any of the published 5454 CMOS parallel multipliers. A redundant binary (RB) number system was used to represent any of the 65 multiplying coefficients as a RB number which consists of two of 9 fundamental multiplying coefficients and their complements. The resultant RB partial products were added by using optimized RB adders. The total transistor count of the proposed multiplier was 43,579. The chip area in 0.25 µm CMOS process with 5 metal layers was 0.99 mm2. The power consumption and the multiplication time were 111 mW and 6.9 ns, respectively.

1-20hit(24hit)