The search functionality is under construction.

Keyword Search Result

[Keyword] high throughput(11hit)

1-11hit
  • Energy-Efficient KBP: Kernel Enhancements for Low-Latency and Energy-Efficient Networking Open Access

    Kei FUJIMOTO  Ko NATORI  Masashi KANEKO  Akinori SHIRAGA  

     
    PAPER-Network

      Pubricized:
    2022/03/14
      Vol:
    E105-B No:9
      Page(s):
    1039-1052

    Real-time applications are becoming more and more popular, and due to the demand for more compact and portable user devices, offloading terminal processes to edge servers is being considered. Moreover, it is necessary to process packets with low latency on edge servers, which are often virtualized for operability. When trying to achieve low-latency networking, the increase in server power consumption due to performance tuning and busy polling for fast packet receiving becomes a problem. Thus, we design and implement a low-latency and energy-efficient networking system, energy-efficient kernel busy poll (EE-KBP), which meets four requirements: (A) low latency in the order of microseconds for packet forwarding in a virtual server, (B) lower power consumption than existing solutions, (C) no need for application modification, and (D) no need for software redevelopment with each kernel security update. EE-KBP sets a polling thread in a Linux kernel that receives packets with low latency in polling mode while packets are arriving, and when no packets are arriving, it sleeps and lowers the CPU operating frequency. Evaluations indicate that EE-KBP achieves microsecond-order low-latency networking under most traffic conditions, and 1.4× to 3.1× higher throughput with lower power consumption than NAPI used in a Linux kernel.

  • KBP: Kernel Enhancements for Low-Latency Networking for Virtual Machine and Container without Application Customization Open Access

    Kei FUJIMOTO  Masashi KANEKO  Kenichi MATSUI  Masayuki AKUTSU  

     
    PAPER-Network

      Pubricized:
    2021/10/26
      Vol:
    E105-B No:5
      Page(s):
    522-532

    Packet processing on commodity hardware is a cost-efficient and flexible alternative to specialized networking hardware. However, virtualizing dedicated networking hardware as a virtual machine (VM) or a container on a commodity server results in performance problems, such as longer latency and lower throughput. This paper focuses on obtaining a low-latency networking system in a VM and a container. We reveal mechanisms that cause millisecond-scale networking delays in a VM through a series of experiments. To eliminate such delays, we design and implement a low-latency networking system, kernel busy poll (KBP), which achieves three goals: (1) microsecond-scale tail delays and higher throughput than conventional solutions are achieved in a VM and a container; (2) application customization is not required, so applications can use the POSIX sockets application program interface; and (3) KBP software does not need to be developed for every Linux kernel security update. KBP can be applied to both a VM configuration and a container configuration. Evaluation results indicate that KBP achieves microsecond-scale tail delays in both a VM and a container. In the VM configuration, KBP reduces maximum round-trip latency by more than 98% and increases the throughput by up to three times compared with existing NAPI and Open vSwitch with the Data Plane Development Kit (OvS-DPDK). In the container configuration, KBP reduces maximum round-trip latency by 21% to 96% and increases the throughput by up to 1.28 times compared with NAPI.

  • A Filter Design Method of Direct RF Undersampling On-Board Receiver for Ka-Band HTS

    Tomoyuki FURUICHI  Yang GUI  Mizuki MOTOYOSHI  Suguru KAMEDA  Takashi SHIBA  Noriharu SUEMATSU  

     
    PAPER

      Pubricized:
    2020/03/27
      Vol:
    E103-B No:10
      Page(s):
    1078-1085

    In this paper, we propose a radio frequency (RF) anti-aliasing filter design method considering the effect of a roll-off characteristic on a noise figure (NF) in the direct RF undersampling receiver. The proposed method is useful for broadband reception that a system bandwidth (BW) has nearly half of the sampling frequency (1/2 fs). When the system BW is extended nearly 1/2 fs, the roll-off band is out of the desired Nyquist zone and it affects NF additionally. The proposed method offers a design target regarding the roll-off characteristic not only the rejection ratio. The target is helpful as a design guide to meet the allowed NF. We design the filter based on the proposed method and it is applied to the direct RF undersampling on-board receiver for Ka-band high throughput satellite (HTS). The measured NF value of the implemented receiver almost matched the designed value. Moreover, the receiver achieved the reception bandwidth which is 90% of 1/2 fs.

  • Dynamic Power Allocation Based on Rain Attenuation Prediction for High Throughput Broadband Satellite Systems

    Shengchao SHI  Guangxia LI  Zhiqiang LI  Bin GAO  Zhangkai LUO  

     
    LETTER-Numerical Analysis and Optimization

      Vol:
    E100-A No:9
      Page(s):
    2038-2043

    Broadband satellites, operating at Ka band and above, are playing more and more important roles in future satellite networks. Meanwhile, rain attenuation is the dominant impairment in these bands. In this context, a dynamic power allocation scheme based on rain attenuation prediction is proposed. By this scheme, the system can dynamically adjust the allocated power according to the time-varying predicted rain attenuation. Extensive simulation results demonstrate the improvement of the dynamic scheme over the static allocation. It can be concluded that the allocated capacities match the traffic demands better by introducing such dynamic power allocation scheme and the waste of power resources is also avoided.

  • The ASIC Implementation of SM3 Hash Algorithm for High Throughput

    Xiaojing DU  Shuguo LI  

     
    LETTER-Cryptography and Information Security

      Vol:
    E99-A No:7
      Page(s):
    1481-1487

    SM3 is a hash function standard defined by China. Unlike SHA-1 and SHA-2, it is hard for SM3 to speed up the throughput because it has more complicated compression function than other hash algorithm. In this paper, we propose a 4-round-in-1 structure to reduce the number of rounds, and a logical simplifying to move 3 adders and 3 XOR gates from critical path to the non-critical path. Based in SMIC 65nm CMOS technology, the throughput of SM3 can achieve 6.54Gbps which is higher than that of the reported designs.

  • Hardware Efficient and Low Latency Implementations of Look-Ahead ACS Computation for Viterbi Decoders

    Kazuhito ITO  Ryoto SHIRASAKA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E96-A No:12
      Page(s):
    2680-2688

    The throughput rate of Viterbi decoding (VD) is not limited by the speed of functional units when look-ahead computation techniques are used. The disadvantages of the look-ahead computation in VD are the hardware complexity and the decode latency. In this paper, implementation methods of the look-ahead ACS computation are proposed to improve the hardware efficiency and reduce the latency where the hardware efficiency and the latency can be balanced with a single parameter.

  • A Proposition of 600 Mbps WLAN-Like System with Low-Complexity MIMO Decoder for FPGA Implementation

    Wahyul Amien SYAFEI  Yuhei NAGAO  Ryuta IMASHIOYA  Masayuki KUROSAKI  Baiko SAI  Hiroshi OCHI  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E94-B No:2
      Page(s):
    491-498

    This paper deals with our works on developing a high-throughput wireless LAN using a group layered space-time (GLST) system with low-complexity MIMO decoder. It achieves the throughput of 600 Mbps for 30 meter propagation distance by utilizing 80 MHz bandwidth in the 5 GHz frequency band. Run test under channel model B of IEEE802.11TGn demonstrates its excellent performance. The register transfer level results show that the developed system is synthesized successfully and the prototyping in the target FPGA chips of Stratix II EP2S180F1508C4 gives the expected results.

  • A Low Power and High Throughput Self Synchronous FPGA Using 65 nm CMOS with Throughput Optimization by Pipeline Alignment

    Benjamin STEFAN DEVLIN  Toru NAKURA  Makoto IKEDA  Kunihiro ASADA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E93-A No:7
      Page(s):
    1319-1328

    We detail a self synchronous field programmable gate array (SSFPGA) with dual-pipeline (DP) architecture to conceal pre-charge time for dynamic logic, and its throughput optimization by using pipeline alignment implemented on benchmark circuits. A self synchronous LUT (SSLUT) consists of a three input tree-type structure with 8 bits of SRAM for programming. A self synchronous switch box (SSSB) consists of both pass transistors and buffers to route signals, with 12 bits of SRAM. One common block with one SSLUT and one SSSB occupies 2.2 Mλ2 area with 35 bits of SRAM, and the prototype SSFPGA with 3430 (1020) blocks is designed and fabricated using 65 nm CMOS. Measured results show at 1.2 V 430 MHz and 647 MHz operation for a 3 bit ripple carry adder, without and with throughput optimization, respectively. We find that using the proposed pipeline alignment techniques we can perform at maximum throughput of 647 MHz in various benchmarks on the SSFPGA. We demonstrate up to 56.1 times throughput improvement with our pipeline alignment techniques. The pipeline alignment is carried out within the number of logic elements in the array and pipeline buffers in the switching matrix.

  • A High Throughput On-Demand Routing Protocol for Multirate Ad Hoc Wireless Networks

    Md. Mustafizur RAHMAN  Choong Seon HONG  Sungwon LEE  

     
    PAPER-Network

      Vol:
    E93-B No:1
      Page(s):
    29-39

    Routing in wireless ad hoc networks is a challenging issue because it dynamically controls the network topology and determines the network performance. Most of the available protocols are based on single-rate radio networks and they use hop-count as the routing metric. There have been some efforts for multirate radios as well that use transmission-time of a packet as the routing metric. However, neither the hop-count nor the transmission-time may be a sufficient criterion for discovering a high-throughput path in a multirate wireless ad hoc network. Hop-count based routing metrics usually select a low-rate bound path whereas the transmission-time based metrics may select a path with a comparatively large number of hops. The trade-off between transmission time and effective transmission range of a data rate can be another key criterion for finding a high-throughput path in such environments. In this paper, we introduce a novel routing metric based on the efficiency of a data rate that balances the required time and covering distance by a transmission and results in increased throughput. Using the new metric, we propose an on-demand routing protocol for multirate wireless environment, dubbed MR-AODV, to discover high-throughput paths in the network. A key feature of MR-AODV is that it controls the data rate in transmitting both the data and control packets. Rate control during the route discovery phase minimizes the route request (RREQ) avalanche. We use simulations to evaluate the performance of the proposed MR-AODV protocol and results reveal significant improvements in end-to-end throughput and minimization of routing overhead.

  • VLSI Architecture for the Low-Computation Cycle and Power-Efficient Recursive DFT/IDFT Design

    Lan-Da VAN  Chin-Teng LIN  Yuan-Chu YU  

     
    PAPER-Digital Signal Processing

      Vol:
    E90-A No:8
      Page(s):
    1644-1652

    In this paper, we propose one low-computation cycle and power-efficient recursive discrete Fourier transform (DFT)/inverse DFT (IDFT) architecture adopting a hybrid of input strength reduction, the Chebyshev polynomial, and register-splitting schemes. Comparing with the existing recursive DFT/IDFT architectures, the proposed recursive architecture achieves a reduction in computation-cycle by half. Appling this novel low-computation cycle architecture, we could double the throughput rate and the channel density without increasing the operating frequency for the dual tone multi-frequency (DTMF) detector in the high channel density voice over packet (VoP) application. From the chip implementation results, the proposed architecture is capable of processing over 128 channels and each channel consumes 9.77 µW under 1.2 V@20 MHz in TSMC 0.13 1P8M CMOS process. The proposed VLSI implementation shows the power-efficient advantage by the low-computation cycle architecture.

  • A Study on Rate-Based Multi-Path Transmission Control Protocol (R-M/TCP) Using Packet Scheduling Algorithm

    Kultida ROJVIBOONCHAI  Toru OSUGA  Hitoshi AIDA  

     
    PAPER-TCP Protocol

      Vol:
    E89-D No:1
      Page(s):
    124-131

    We have proposed Rate-based Multi-path Transmission Control Protocol (R-M/TCP) for improving reliability and performance of data transfer over the Internet by using multiple paths. Congestion control in R-M/TCP is performed in a rate-based and loss-avoidance manner. It attempts to estimate the available bandwidth and the queue length of the used routes in order to fully utilize the bandwidth resources. However, it has been reported that when the used routes' characteristics, i.e. available bandwidth and delay, are much different, R-M/TCP cannot achieve the desired throughput from the routes. This is because R-M/TCP originally transmits data packets in a round-robin manner through the routes. In this paper, therefore, we propose R-M/TCP using Packet Scheduling Algorithm (PSA). Instead of using the round-robin manner, R-M/TCP utilizes PSA that accounts for time-varying bandwidth and delay of each path so that number of data packets arriving in out-of-order at the receiver can be minimized and the desired throughput can be achieved. Quantitative simulations are conducted to show effectiveness of R-M/TCP using PSA.