The search functionality is under construction.

Keyword Search Result

[Keyword] pipeline(141hit)

1-20hit(141hit)

  • Improvement of Channel Capacity of MIMO Communication Using Yagi-Uda Planar Antennas with a Propagation Path through a PVC Pipe Wall

    Akihiko HIRATA  Keisuke AKIYAMA  Shunsuke KABE  Hiroshi MURATA  Masato MIZUKAMI  

     
    PAPER-Antennas and Propagation

      Pubricized:
    2023/10/13
      Vol:
    E107-B No:1
      Page(s):
    197-205

    This study investigates the improvement of the channel capacity of 5-GHz-band multiple-input multiple-output (MIMO) communication using microwave-guided modes propagating along a polyvinyl chloride (PVC) pipe wall for a buried pipe inspection robot. We design a planar Yagi-Uda antenna to reduce transmission losses in communication with PVC pipe walls as propagation paths. Coupling efficiency between the antenna and a PVC pipe is improved by attaching a PVC adapter with the same curvature as the PVC pipe's inner wall to the Yagi-Uda antenna to eliminate any gap between the antenna and the inner wall of the PVC pipe. The use of a planar Yagi-Uda antenna with a PVC adaptor decreases the transmission loss of a 5-GHz-band microwave signal propagating along a 1-m-lomg straight PVC pipe wall by 7dB compared to a dipole antenna. The channel capacity of a 2×2 MIMO system using planar Yagi-Uda antennas is more than twice that of the system using dipole antennas.

  • A Brief History of Nyquist Analog-to-Digital Converters Open Access

    Akira MATSUZAWA  

     
    INVITED PAPER

      Pubricized:
    2023/04/21
      Vol:
    E106-C No:10
      Page(s):
    493-505

    This paper reviews and discusses a brief history of Nyquist ADCs. Bipolar flash ADCs for early development stage of HDTV and digital oscilloscopes, a Bi-CMOS two-step flash ADC using resistive interpolation for home HDTV receivers, a CMOS two-step flash ADC using capacitive interpolation for handy camcorders, pipelined ADCs using CMOS operational amplifiers, CMOS flash ADCs using dynamic comparator and digital offset compensation, SAR ADCs using low noise dynamic comparators and MOM capacitors, and hybrid ADCs are reviewed.

  • A Fully Analog Deep Neural Network Inference Accelerator with Pipeline Registers Based on Master-Slave Switched Capacitors

    Yaxin MEI  Takashi OHSAWA  

     
    PAPER-Integrated Electronics

      Pubricized:
    2023/03/08
      Vol:
    E106-C No:9
      Page(s):
    477-485

    A fully analog pipelined deep neural network (DNN) accelerator is proposed, which is constructed by using pipeline registers based on master-slave switched capacitors. The idea of the master-slave switched capacitors is an analog equivalent of the delayed flip-flop (D-FF) which has been used as a digital pipeline register. To estimate the performance of the pipeline register, it is applied to a conventional DNN which performs non-pipeline operation. Compared with the conventional DNN, the cycle time is reduced by 61.5% and data rate is increased by 160%. The accuracy reaches 99.6% in MNIST classification test. The energy consumption per classification is reduced by 88.2% to 0.128µJ, achieving an energy efficiency of 1.05TOPS/W and a throughput of 0.538TOPS in 180nm technology node.

  • Dual Cuckoo Filter with a Low False Positive Rate for Deep Packet Inspection

    Yixuan ZHANG  Meiting XUE  Huan ZHANG  Shubiao LIU  Bei ZHAO  

     
    PAPER-Algorithms and Data Structures

      Pubricized:
    2023/01/26
      Vol:
    E106-A No:8
      Page(s):
    1037-1042

    Network traffic control and classification have become increasingly dependent on deep packet inspection (DPI) approaches, which are the most precise techniques for intrusion detection and prevention. However, the increasing traffic volumes and link speed exert considerable pressure on DPI techniques to process packets with high performance in restricted available memory. To overcome this problem, we proposed dual cuckoo filter (DCF) as a data structure based on cuckoo filter (CF). The CF can be extended to the parallel mode called parallel Cuckoo Filter (PCF). The proposed data structure employs an extra hash function to obtain two potential indices of entries. The DCF magnifies the superiority of the CF with no additional memory. Moreover, it can be extended to the parallel mode, resulting in a data structure referred to as parallel Dual Cuckoo filter (PDCF). The implementation results show that using the DCF and PDCF as identification tools in a DPI system results in time improvements of up to 2% and 30% over the CF and PCF, respectively.

  • Siamese Visual Tracking with Dual-Pipeline Correlated Fusion Network

    Ying KANG  Cong LIU  Ning WANG  Dianxi SHI  Ning ZHOU  Mengmeng LI  Yunlong WU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/07/09
      Vol:
    E104-D No:10
      Page(s):
    1702-1711

    Siamese visual tracking, viewed as a problem of max-similarity matching to the target template, has absorbed increasing attention in computer vision. However, it is a challenge for current Siamese trackers that the demands of balance between accuracy in real-time tracking and robustness in long-time tracking are hard to meet. This work proposes a new Siamese based tracker with a dual-pipeline correlated fusion network (named as ADF-SiamRPN), which consists of one initial template for robust correlation, and the other transient template with the ability of adaptive feature optimal selection for accurate correlation. By the promotion from the learnable correlation-response fusion network afterwards, we are in pursuit of the synthetical improvement of tracking performance. To compare the performance of ADF-SiamRPN with state-of-the-art trackers, we conduct lots of experiments on benchmarks like OTB100, UAV123, VOT2016, VOT2018, GOT-10k, LaSOT and TrackingNet. The experimental results of tracking demonstrate that ADF-SiamRPN outperforms all the compared trackers and achieves the best balance between accuracy and robustness.

  • Sorting Matrix Architecture for Continuous Data Sequences

    Meiting XUE  Huan ZHANG  Weijun LI  Feng YU  

     
    LETTER-Algorithms and Data Structures

      Vol:
    E103-A No:2
      Page(s):
    542-546

    Sorting is one of the most fundamental problems in mathematics and computer science. Because high-throughput and flexible sorting is a key requirement in modern databases, this paper presents efficient techniques for designing a high-throughput sorting matrix that supports continuous data sequences. There have been numerous studies on the optimization of sorting circuits on FPGA (field-programmable gate array) platforms. These studies focused on attaining high throughput for a single command with fixed data width. However, the architectures proposed do not meet the requirement of diversity for database data types. A sorting matrix architecture is thus proposed to overcome this problem. Our design consists of a matrix of identical basic sorting cells. The sorting cells work in a pipeline and in parallel, and the matrix can simultaneously process multiple data streams, which can be combined into a high-width single-channel data stream or low-width multiple-channel data streams. It can handle continuous sequences and allows for sorting variable-length data sequences. Its maximum throughput is approximately 1.4 GB/s for 32-bit sequences and approximately 2.5 GB/s for 64-bit sequences on our platform.

  • A Low-Latency Parallel Pipeline CORDIC

    Hong-Thu NGUYEN  Xuan-Thuan NGUYEN  Cong-Kha PHAM  

     
    PAPER

      Vol:
    E100-C No:4
      Page(s):
    391-398

    COordinate Rotation DIgital Computer (CORDIC) is an efficient algorithm to compute elementary arithmetic such as trigonometric, exponent, and logarithm. However, the main drawback of the conventional CORDIC is that the number of iterations is equal to the number of angle constants. Among a great deal of research to overcome this disadvantage, angle recording method is an effective method because it is capable of reducing 50% of the number of iterations. Nevertheless, the hardware architecture of this algorithm is difficult to implement in pipeline. Therefore, a low-latency parallel pipeline hybrid adaptive CORDIC (PP-CORDIC) architecture is proposed in this paper. In the design hybrid architecture was exploited together with pipeline and parallel technique to achieve low latency. This design is able to operate at 122.6 MHz frequency and costs 8, 12, and 15 clock cycles latency in the best, average, and worst case, respectively. More significantly, the latency of PP-CORDIC in the worst case is 1.1X lower than that of the Altera's commercial floating-point sine and cosine IP cores.

  • A Hardware Efficient Multiple-Stream Pipeline FFT Processor for MIMO-OFDM Systems

    Kai-Feng XIA  Bin WU  Tao XIONG  Tian-Chun YE  Cheng-Ying CHEN  

     
    PAPER-Digital Signal Processing

      Vol:
    E100-A No:2
      Page(s):
    592-601

    In this paper, a hardware efficient design methodology for a configurable-point multiple-stream pipeline FFT processor is presented. We first compared the memory and arithmetic components of different pipeline FFT architectures, and obtained the conclusion that MDF architecture is more hardware efficient than MDC for the overall processor. Then, in order to reduce the computational complexity, a binary-tree representation was adopted to analyze the decomposition algorithm. Consequently, the coefficient multiplications are minimized among all the decomposition probabilities. In addition, an efficient output reorder circuit was designed for the multiple-stream architecture. An 128∼2048 point 4-stream FFT processor in LTE system was designed in SMIC 55nm technology for evaluation. It owns 1.09mm2 core area with 82.6mW power consumption at 122.88MHz clock frequency.

  • Low Complexity Reed-Solomon Decoder Design with Pipelined Recursive Euclidean Algorithm

    Kazuhito ITO  

     
    PAPER

      Vol:
    E99-A No:12
      Page(s):
    2453-2462

    A Reed-Solomon (RS) decoder is designed based on the pipelined recursive Euclidean algorithm in the key equation solution. While the Euclidean algorithm uses less Galois multipliers than the modified Euclidean (ME) and reformulated inversionless Berlekamp-Massey (RiBM) algorithms, division between two elements in Galois field is required. By implementing the division with a multi-cycle Galois inverter and a serial Galois multiplier, the proposed key equation solver architecture achieves lower complexity than the conventional ME and RiBM based architectures. The proposed RS (255,239) decoder reduces the hardware complexity by 25.9% with 6.5% increase in decoding latency.

  • Montgomery Multiplier Design for ECDSA Signature Generation Processor

    Masato TAMURA  Makoto IKEDA  

     
    PAPER

      Vol:
    E99-A No:12
      Page(s):
    2444-2452

    This paper presents the optimal implementation methods for 256-bit elliptic curve digital signature algorithm (ECDSA) signature generation processors with high speed Montgomery multipliers. We have explored the radix of the data path of the Montgomery multiplier from 2-bit to 256-bit operation and proposed the use of pipelined Montgomery multipliers for signature generation speed, area, and energy optimization. The key factor in the design optimization is how to perform modular multiplication. The high radix Montgomery multiplier is known to be an efficient implementation for high-speed modular multiplication. We have implemented ECDSA signature generation processors with high radix Montgomery multipliers using 65-nm SOTB CMOS technology. Post-layout results show that the fastest ECDSA signature generation time of 63.5µs with radix-256-bit, a two-module four-streams pipeline architecture, and an area of 0.365mm2 (which is the smallest) with a radix-16-bit zero-pipeline architecture, and the smallest signature generation energy of 9.51µJ with radix-256-bit zero-pipeline architecture.

  • Hybrid MIC/CPU Parallel Implementation of MoM on MIC Cluster for Electromagnetic Problems Open Access

    Yan CHEN  Yu ZHANG  Guanghui ZHANG  Xunwang ZHAO  ShaoHua WU  Qing ZHANG  XiaoPeng YANG  

     
    INVITED PAPER

      Vol:
    E99-C No:7
      Page(s):
    735-743

    In this paper, a Many Integrated Core Architecture (MIC) accelerated parallel method of moment (MoM) algorithm is proposed to solve electromagnetic problems in practical applications, where MIC means a kind of coprocessor or accelerator in computer systems which is used to accelerate the computation performed by Central Processing Unit (CPU). Three critical points are introduced in this paper in detail. The first one is the design of the parallel framework, which ensures that the algorithm can run on distributed memory platform with multiple nodes. The hybrid Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) programming model is designed to achieve the purposes. The second one is the out-of-core algorithm, which greatly breaks the restriction of MIC memory. The third one is the pipeline algorithm which overlaps the data movement with MIC computation. The pipeline algorithm successfully hides the communication and thus greatly enhances the performance of hybrid MIC/CPU MoM. Numerical result indicates that the proposed algorithm has good parallel efficiency and scalability, and twice faster performance when compared with the corresponding CPU algorithm.

  • A SoC Integrating ADC and 2DDWT for Video/Image Processing

    Chin-Fa HSIEH  Tsung-Han TSAI  Shu-Chung YI  

     
    PAPER-Electronic Circuits

      Vol:
    E99-C No:3
      Page(s):
    415-426

    The memory issue plays a very important role for the performance evaluation of a design of 2-Dimensional Discrete Wavelet Transform (2DDWT). A traditional 2DDWT architecture generally needs DRAM to store the input pixel and memory to store temporary results between the row and column processors. In this article, we present a system on a chip (SoC) for video/image processing. The chip integrates an analog-to-digital converter (ADC) with a highly efficient-memory 2DDWT. The latter one contains two main components only: a row processor and a column processor. With this integrated chip plus the use of feedback shift registers (FSR) in the column processor, the architecture we propose can disuse the DRAM and reduce the memory. The pipelined technique is also utilized in the proposed 2DDWT to shorten the critical path to an adder delay. Our architecture outperforms the existing architectures in that it uses less memory size and has low control complexity. It needs only a 2N register instead of a 3.5N register of traditional architectures for a one-level 2DDWT of the 5/3 Lifting-based Discrete Wavelet Transform (LDWT) in an N x N image. Our 2DDWT architecture is coded in VerilogHDL and the Synopsys Design Compiler is employed to synthesize the design with the standard-cell from TSMC 0.18 µm cell library for verification. The ADC is designed by a full-custom methodology, plays as an IP of the SoC. With the integrated SoC, based on the mix-mode design flow, the proposed work requires no external memory, which accordingly reduces the power consumption by memory access and 20 I/O PADs, it also reduces the printed circuit board (PCB) size. Moreover, the proposed SoC supports the resolution of 10 bits and can easily integrate further with the CMOS image sensor (CIS) or other IPs. This, then, completes a single chip and which is ready for a real-time wavelet-based video coding.

  • A Load-Balanced Deterministic Runtime for Pipeline Parallelism

    Chen CHEN  Kai LU  Xiaoping WANG  Xu ZHOU  Zhendong WU  

     
    LETTER-Software System

      Pubricized:
    2014/10/21
      Vol:
    E98-D No:2
      Page(s):
    433-436

    Most existing deterministic multithreading systems are costly on pipeline parallel programs due to load imbalance. In this letter, we propose a Load-Balanced Deterministic Runtime (LBDR) for pipeline parallelism. LBDR deterministically takes some tokens from non-synchronization-intensive threads to synchronization-intensive threads. Experimental results show that LBDR outperforms the state-of-the-art design by an average of 22.5%.

  • Reference-Free Deterministic Calibration of Pipelined ADC

    Takashi OSHIMA  Taizo YAMAWAKI  

     
    PAPER-Analog Signal Processing

      Vol:
    E98-A No:2
      Page(s):
    665-675

    Novel deterministic digital calibration of pipelined ADC has been proposed and analyzed theoretically. Each MDAC is dithered exploiting its inherent redundancy during the calibration. The dither enables fast accurate convergence of calibration without requiring any accurate reference signal and hence with minimum area and power overhead. The proposed calibration can be applied to both the 1.5-bit/stage MDAC and the multi-bit/stage MDAC. Due to its simple structure and algorithm, it can be modified to the background calibration easily. The effectiveness of the proposed calibration has been confirmed by both the extensive simulations and the measurement of the prototype 0.13-µm-CMOS 50-MS/s pipelined ADC using the op-amps with only 37-dB gain. As expected, SNDR and SFDR have improved from 35.5dB to 58.1dB and from 37.4dB to 70.4dB, respectively by the proposed calibration.

  • Acceleration of the Fast Multipole Method on FPGA Devices

    Hitoshi UKAWA  Tetsu NARUMI  

     
    LETTER-Application

      Pubricized:
    2014/11/19
      Vol:
    E98-D No:2
      Page(s):
    309-312

    The fast multipole method (FMM) for N-body simulations is attracting much attention since it requires minimal communication between computing nodes. We implemented hardware pipelines specialized for the FMM on an FPGA device, the GRAPE-9. An N-body simulation with 1.6×107 particles ran 16 times faster than that on a CPU. Moreover the particle-to-particle stage of the FMM on the GRAPE-9 executed 2.5 times faster than on a GPU in a limited case.

  • Digital Background Calibration for a 14-bit 100-MS/s Pipelined ADC Using Signal-Dependent Dithering

    Zhao-xin XIONG  Min CAI  Xiao-Yong HE  Yun YANG  

     
    PAPER-Electronic Circuits

      Vol:
    E97-C No:3
      Page(s):
    207-214

    A digital background calibration technique using signal-dependent dithering is proposed, to correct the nonlinear errors which results from capacitor mismatches and finite opamp gain in pipelined analog-to-digital converter (ADC). Large magnitude dithers are used to measure and correct both errors simultaneously in background. In the proposed calibration system, the 2.5-bit capacitor-flip-over multiplying digital-to-analog converter (MDAC) stage is modified for the injection of large magnitude dithering by adding six additional comparators, and thus only three correction parameters in every stage subjected to correction were measured and extracted by a simple calibration algorithm with multibit first stage. Behavioral simulation results show that, using the proposed calibration technique, the signal-to-noise-and-distortion ratio improves from 63.3 to 79.3dB and the spurious-free dynamic range is increased from 63.9 to 96.4dB after calibrating the first two stages, in a 14-bit 100-MS/s pipelined ADC with σ=0.2% capacitor mismatches and 60dB nonideal opamp gain. The time of calibrating the first two stages is around 1.34 seconds for the modeled ADC.

  • A High Performance HEVC De-Blocking Filter and SAO Architecture for UHDTV Decoder

    Jiayi ZHU  Dajiang ZHOU  Satoshi GOTO  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E96-A No:12
      Page(s):
    2612-2622

    High efficiency video coding (HEVC) is the next generation video compression standard. In-loop filter is an important component of HEVC which is composed of two parts, deblocking filter (DBF) and sample adaptive offset (SAO). In this article, we propose a high performance in-loop filter architecture for HEVC which integrate both deblocking filter and SAO. To achieve it, several ideas are adopted. Firstly, SAO is processed based on drifted block, which suits the output pattern of deblocking filter and ease the coupling of deblocking filter and SAO. Secondly, luma and chroma samples of each 4×4 block are organized in same memory storage unit and they are processed simultaneously to raise the parallelism. Thirdly, in both deblocking filter and SAO, calculation core is implemented in combinational logic and data storage is implemented in register groups. Calculation core keeps processing data continually, which greatly raises the utilization of DBF core and SAO core. Fourthly, task level pipeline in processing 8×8 block is employed between deblocking filter and SAO. By these means, a high performance in-loop filter including both deblocking filter and SAO is achieved without any intermediate storage or circuit. It takes only four cycles to finish the deblocking filter and SAO of one 8×8 block. The implementation results show that the proposed solution can be synthesized to 240MHz with 65nm technology. Thus this solution can process 3.84G pixels/s at maximum. UHDTV 4320p (7680×4320) @ 60fps decoding can be realized with 124.4MHz working frequency by the proposed architecture.

  • A 12-bit Interpolated Pipeline ADC Using Body Voltage Controlled Amplifier

    Hyunui LEE  Masaya MIYAHARA  Akira MATSUZAWA  

     
    PAPER-Circuit Design

      Vol:
    E96-A No:12
      Page(s):
    2508-2515

    This paper presents a 12-bit interpolated pipeline analog to digital converter (ADC) using body voltage controlled amplifier for current biasing and common mode feedback (CMFB). The proposed body voltage control method allows the amplifier to achieve small power consumption and large output swing. The proposed amplifier has a power consumption lower than 15.6mW, almost half of the folded cascode amplifier satisfying 12-bit, 400MS/s ADC operation. Moreover, the proposed amplifier secures 600mV output swing, which is one drain source voltage (VDS) wider compared with the telescopic amplifier. The 12-bit interpolated pipeline ADC using the proposed amplifier is fabricated in a 1P9M 90nm CMOS technology with a 1.2V supply voltage. The ADC achieves an effective number of bit (ENOB) of about 10-bit at 300MS/s and an figure of merit (FoM) of 0.2pJ/conv. when the frequency of the input signal is sufficiently low.

  • Exploiting the Task-Pipelined Parallelism of Stream Programs on Many-Core GPUs

    Shuai MU  Dongdong LI  Yubei CHEN  Yangdong DENG  Zhihua WANG  

     
    PAPER-Computer System

      Vol:
    E96-D No:10
      Page(s):
    2194-2207

    By exploiting data-level parallelism, Graphics Processing Units (GPUs) have become a high-throughput, general purpose computing platform. Many real-world applications especially those following a stream processing pattern, however, feature interleaved task-pipelined and data parallelism. Current GPUs are ill equipped for such applications due to the insufficient usage of computing resources and/or the excessive off-chip memory traffic. In this paper, we focus on microarchitectural enhancements to enable task-pipelined execution of data-parallel kernels on GPUs. We propose an efficient adaptive dynamic scheduling mechanism and a moderately modified L2 design. With minor hardware overhead, our techniques orchestrate both task-pipeline and data parallelisms in a unified manner. Simulation results derived by a cycle-accurate simulator on real-world applications prove that the proposed GPU microarchitecture improves the computing throughput by 18% and reduces the overall accesses to off-chip GPU memory by 13%.

  • Design of Interpolated Pipeline ADC Using Low-Gain Open-Loop Amplifiers

    Hyunui LEE  Masaya MIYAHARA  Akira MATSUZAWA  

     
    PAPER

      Vol:
    E96-C No:6
      Page(s):
    838-849

    This paper describes the design of an interpolated pipeline analog-to-digital converter (ADC). By introducing the interpolation technique into the conventional pipeline topology, it becomes possible to realize a more than 10-bits resolution and several hundred MS/s ADC using low-gain open-loop amplifiers without any multiplying digital-to-analog converter (MDAC) calibration. In this paper, linearity requirement of the amplifier is analyzed with the relation of reference range and stage resolution first. Noise characteristic is also discussed with amplifier's noise bandwidth and load capacitance. After that, sampling speed and SNR characteristic are examined with various amplifier currents. Next, the resolution optimization of the pipeline stage is discussed based on the power consumption. Through the analysis, reasonable parameters for the amplifier can be defined, such as transconductance, source degeneration resistance and load capacitance. Also, optimized operating speed and stage resolution for interpolated pipelined ADC is shown. The analysis in this paper is valuable to both the design of interpolated pipeline ADCs and other circuits which incorporate interpolation and amplifiers.

1-20hit(141hit)