IEICE global.ieice.org Site

Keyword Search Result

[Keyword] low latency(17hit)

1-17hit

Traffic Reduction for Speculative Video Transmission in Cloud Gaming Systems Open Access
Takumasa ISHIOKA Tatsuya FUKUI Toshihito FUJIWARA Satoshi NARIKAWA Takuya FUJIHASHI Shunsuke SARUWATARI Takashi WATANABE

PAPER-Network

Vol:
E107-B No:5
Page(s):
408-418
Cloud gaming systems allow users to play games that require high-performance computational capability on their mobile devices at any location. However, playing games through cloud gaming systems increases the Round-Trip Time (RTT) due to increased network delay. To simulate a local gaming experience for cloud users, we must minimize RTTs, which include network delays. The speculative video transmission pre-generates and encodes video frames corresponding to all possible user inputs and sends them to the user before the user’s input. The speculative video transmission mitigates the network, whereas a simple solution significantly increases the video traffic. This paper proposes tile-wise delta detection for traffic reduction of speculative video transmission. More specifically, the proposed method determines a reference video frame from the generated video frames and divides the reference video frame into multiple tiles. We calculate the similarity between each tile of the reference video frame and other video frames based on a hash function. Based on calculated similarity, we determine redundant tiles and do not transmit them to reduce traffic volume in minimal processing time without implementing a high compression ratio video compression technique. Evaluations using commercial games showed that the proposed method reduced 40-50% in traffic volume when the SSIM index was around 0.98 in certain genres, compared with the speculative video transmission method. Furthermore, to evaluate the feasibility of the proposed method, we investigated the effectiveness of network delay reduction with existing computational capability and the requirements in the future. As a result, we found that the proposed scheme may mitigate network delay by one to two frames, even with existing computational capability under limited conditions.
Chunk Grouping Method to Estimate Available Bandwidth for Adaptive Bitrate Live Streaming
Daichi HATTORI Masaki BANDAI

PAPER-Network

Pubricized:
2023/07/24
Vol:
E106-B No:11
Page(s):
1133-1142
The Common Media Application Format (CMAF) is a standard for adaptive bitrate live streaming. The CMAF adapts chunk encoding and enables low-latency live streaming. However, conventional bandwidth estimation for adaptive bitrate streaming underestimates bandwidth because download time is affected not only by network bandwidth but also by the idle times between chunks in the same segment. Inaccurate bandwidth estimation decreases the quality of experience of the streaming client. In this paper, we propose a chunk-grouping method to estimate the available bandwidth for adaptive bitrate live streaming. In the proposed method, by delaying HTTP request transmission and bandwidth estimation using grouped chunks, the client estimates the available bandwidth accurately due to there being no idle times in the grouped chunks. In addition, we extend the proposed method to dynamically change the number of grouping chunks according to buffer length during downloading of the previous segment. We evaluate the proposed methods under various network conditions in order to confirm the effectiveness of the proposed methods.
EMRNet: Efficient Modulation Recognition Networks for Continuous-Wave Radar Signals
Kuiyu CHEN Jingyi ZHANG Shuning ZHANG Si CHEN Yue MA

BRIEF PAPER-Electronic Instrumentation and Control

Pubricized:
2023/03/24
Vol:
E106-C No:8
Page(s):
450-453
Automatic modulation recognition(AMR) of radar signals is a currently active area, especially in electronic reconnaissance, where systems need to quickly identify the intercepted signal and formulate corresponding interference measures on computationally limited platforms. However, previous methods generally have high computational complexity and considerable network parameters, making the system unable to detect the signal timely in resource-constrained environments. This letter firstly proposes an efficient modulation recognition network(EMRNet) with tiny and low latency models to match the requirements for mobile reconnaissance equipments. One-dimensional residual depthwise separable convolutions block(1D-RDSB) with an adaptive size of receptive fields is developed in EMRNet to replace the traditional convolution block. With 1D-RDSB, EMRNet achieves a high classification accuracy and dramatically reduces computation cost and network paraments. The experiment results show that EMRNet can achieve higher precision than existing 2D-CNN methods, while the computational cost and parament amount of EMRNet are reduced by about 13.93× and 80.88×, respectively.
Protection of Latency-Strict Stations on WLAN Systems Using CTS-to-STA Frames
Kenichi KAWAMURA Shouta NAKAYAMA Keisuke WAKAO Takatsune MORIYAMA Yasushi TAKATORI

PAPER-Wireless Communication Technologies

Pubricized:
2022/11/28
Vol:
E106-B No:6
Page(s):
518-527
Low-latency and highly reliable communication on wireless LAN (WLAN) is difficult due to interference from the surroundings. To overcome this problem, we have developed a scheme called Clear to Send-to-Station (CTS-STA) frame transmission control that enables stable latency communication in environments with strong interference from surrounding WLAN systems. This scheme uses the basic functions of WLAN standards and is effective for both the latest and legacy standard devices. It operates when latency-strict transmission is required for an STA and there is interference from surrounding WLAN devices while minimizing the control signal overhead. Experimental evaluations with prototype systems demonstrate the effectiveness of the proposed scheme.
Energy-Efficient KBP: Kernel Enhancements for Low-Latency and Energy-Efficient Networking Open Access
Kei FUJIMOTO Ko NATORI Masashi KANEKO Akinori SHIRAGA

PAPER-Network

Pubricized:
2022/03/14
Vol:
E105-B No:9
Page(s):
1039-1052
Real-time applications are becoming more and more popular, and due to the demand for more compact and portable user devices, offloading terminal processes to edge servers is being considered. Moreover, it is necessary to process packets with low latency on edge servers, which are often virtualized for operability. When trying to achieve low-latency networking, the increase in server power consumption due to performance tuning and busy polling for fast packet receiving becomes a problem. Thus, we design and implement a low-latency and energy-efficient networking system, energy-efficient kernel busy poll (EE-KBP), which meets four requirements: (A) low latency in the order of microseconds for packet forwarding in a virtual server, (B) lower power consumption than existing solutions, (C) no need for application modification, and (D) no need for software redevelopment with each kernel security update. EE-KBP sets a polling thread in a Linux kernel that receives packets with low latency in polling mode while packets are arriving, and when no packets are arriving, it sleeps and lowers the CPU operating frequency. Evaluations indicate that EE-KBP achieves microsecond-order low-latency networking under most traffic conditions, and 1.4× to 3.1× higher throughput with lower power consumption than NAPI used in a Linux kernel.
KBP: Kernel Enhancements for Low-Latency Networking for Virtual Machine and Container without Application Customization Open Access
Kei FUJIMOTO Masashi KANEKO Kenichi MATSUI Masayuki AKUTSU

PAPER-Network

Pubricized:
2021/10/26
Vol:
E105-B No:5
Page(s):
522-532
Packet processing on commodity hardware is a cost-efficient and flexible alternative to specialized networking hardware. However, virtualizing dedicated networking hardware as a virtual machine (VM) or a container on a commodity server results in performance problems, such as longer latency and lower throughput. This paper focuses on obtaining a low-latency networking system in a VM and a container. We reveal mechanisms that cause millisecond-scale networking delays in a VM through a series of experiments. To eliminate such delays, we design and implement a low-latency networking system, kernel busy poll (KBP), which achieves three goals: (1) microsecond-scale tail delays and higher throughput than conventional solutions are achieved in a VM and a container; (2) application customization is not required, so applications can use the POSIX sockets application program interface; and (3) KBP software does not need to be developed for every Linux kernel security update. KBP can be applied to both a VM configuration and a container configuration. Evaluation results indicate that KBP achieves microsecond-scale tail delays in both a VM and a container. In the VM configuration, KBP reduces maximum round-trip latency by more than 98% and increases the throughput by up to three times compared with existing NAPI and Open vSwitch with the Data Plane Development Kit (OvS-DPDK). In the container configuration, KBP reduces maximum round-trip latency by 21% to 96% and increases the throughput by up to 1.28 times compared with NAPI.
Field Trial on 5G Low Latency Radio Communication System Towards Application to Truck Platooning Open Access
Manabu MIKAMI Hitoshi YOSHINO

PAPER

Pubricized:
2019/02/20
Vol:
E102-B No:8
Page(s):
1447-1457
The fifth generation mobile communication system (5G) is designed to have new radio capabilities to support not only conventional enhanced Mobile Broadband (eMBB) communications but also new machine type communications such as Ultra-Reliable Low-Latency communications (URLLC) and massive Machine Type communications (m-MTC). In such new areas of URLLC and m-MTC, mobile operators need to explore new use cases and/or applications together with vertical industries, the industries which are potential users of 5G, in order to fully exploit the new 5G capabilities. Intelligent Transport System (ITS), including automated driving, is one of the most promising application areas of 5G since it requires both ultra-reliable and low-latency communications. We are actively working on the research and development of truck platooning as a new 5G application. We have developed a field trial system for vehicular-to-network (V2N) communications using 5G prototype equipment and actual large-size trucks in order to assess 5G capabilities, including ultra-low-latency, in automotive test courses in the field. This paper discusses the fundamental performance evaluation required for vehicular communications between platooning trucks, such as low-latency message communication for vehicle control and low-latency video monitoring of following platooning truck vehicles. The paper also addresses the field evaluation results of 5G V2N communications in a rural area. It clarifies the fundamental radio propagation issues at the leading and the following vehicles in truck platooning for V2N communications, and discusses the impact of the radio propagation over a road to the over-the-air transmission performance of 5G V2N communications.
A 197mW 70ms-Latency Full-HD 12-Channel Video-Processing SoC in 16nm CMOS for In-Vehicle Information Systems
Seiji MOCHIZUKI Katsushige MATSUBARA Keisuke MATSUMOTO Chi Lan Phuong NGUYEN Tetsuya SHIBAYAMA Kenichi IWATA Katsuya MIZUMOTO Takahiro IRITA Hirotaka HARA Toshihiro HATTORI

PAPER

Vol:
E100-A No:12
Page(s):
2878-2887
A 197mW 70ms-latency Full-HD 12-channel video-processing SoC for in-vehicle information systems has been implemented in 16nm CMOS. The SoC integrates 17 video processors of 6 types to operate video processing independently of other processing in CPU/GPU. The synchronous scheme between the video processors achieves 70ms low-latency for driver assistance. The optimized implementation of lossy and lossless video-data compression reduces memory access data by half and power consumption by 20%.
Simulation Study of Low Latency Network Architecture Using Mobile Edge Computing
Krittin INTHARAWIJITR Katsuyoshi IIDA Hiroyuki KOGA

PAPER

Pubricized:
2017/02/08
Vol:
E100-D No:5
Page(s):
963-972
Attaining extremely low latency service in 5G cellular networks is an important challenge in the communication research field. A higher QoS in the next-generation network could enable several unprecedented services, such as Tactile Internet, Augmented Reality, and Virtual Reality. However, these services will all need support from powerful computational resources provided through cloud computing. Unfortunately, the geolocation of cloud data centers could be insufficient to satisfy the latency aimed for in 5G networks. The physical distance between servers and users will sometimes be too great to enable quick reaction within the service time boundary. The problem of long latency resulting from long communication distances can be solved by Mobile Edge Computing (MEC), though, which places many servers along the edges of networks. MEC can provide shorter communication latency, but total latency consists of both the transmission and the processing times. Always selecting the closest edge server will lead to a longer computing latency in many cases, especially when there is a mass of users around particular edge servers. Therefore, the research studies the effects of both latencies. The communication latency is represented by hop count, and the computation latency is modeled by processor sharing (PS). An optimization model and selection policies are also proposed. Quantitative evaluations using simulations show that selecting a server according to the lowest total latency leads to the best performance, and permitting an over-latency barrier would further improve results.
Power Analysis on Unrolled Architecture with Points-of-Interest Search and Its Application to PRINCE Block Cipher
Ville YLI-MÄYRY Naofumi HOMMA Takafumi AOKI

PAPER

Vol:
E100-A No:1
Page(s):
149-157
This paper explores the feasibility of power analysis attacks against low-latency block ciphers implemented with unrolled architectures capable of encryption/decryption in a single clock cycle. Unrolled architectures have been expected to be somewhat resistant against side-channel attacks compared to typical loop architectures because of no memory (i.e. register) element storing intermediate results in a synchronous manner. In this paper, we present a systematic method for selecting Points-of-Interest for power analysis on unrolled architectures as well as calculating dynamic power consumption at a target function. Then, we apply the proposed method to PRINCE, which is known as one of the most efficient low-latency ciphers, and evaluate its validity with an experiment using a set of unrolled PRINCE processors implemented on an FPGA. Finally, a countermeasure against such analysis is discussed.
Extended Dual Virtual Paths Algorithm Considering the Timing Requirements of IEC61850 Substation Message Types
Seokjoon HONG Ducsun LIM Inwhee JOE

PAPER-Information Network

Pubricized:
2016/03/07
Vol:
E99-D No:6
Page(s):
1563-1575
The high-availability seamless redundancy (HSR) protocol is a representative protocol that fulfills the reliability requirements of the IEC61850-based substation automation system (SAS). However, it has the drawback of creating unnecessary traffic in a network. To solve this problem, a dual virtual path (DVP) algorithm based on HSR was recently presented. Although this algorithm dramatically reduces network traffic, it does not consider the substation timing requirements of messages in an SAS. To reduce unnecessary network traffic in an HSR ring network, we introduced a novel packet transmission (NPT) algorithm in a previous work that considers IEC61850 message types. To further reduce unnecessary network traffic, we propose an extended dual virtual paths (EDVP) algorithm in this paper that considers the timing requirements of IEC61850 message types. We also include sending delay (SD), delay queue (DQ), and traffic flow latency (TFL) features in our proposal. The source node sends data frames without SDs on the primary paths, and it transmits the duplicate data frames with SDs on the secondary paths. Since the EDVP algorithm discards all of the delayed data frames in DQs when there is no link or node failure, unnecessary network traffic can be reduced. We demonstrate the principle of the EDVP algorithm and its performance in terms of network traffic compared to the standard HSR, NPT, and DVP algorithm using the OPNET network simulator. Throughout the simulation results, the EDVP algorithm shows better traffic performance than the other algorithms, while guaranteeing the timing requirements of IEC61850 message types. Most importantly, when the source node transmits heavy data traffic, the EDVP algorithm shows greater than 80% and 40% network traffic reduction compared to the HSR and DVP approaches, respectively.
Network-Level FPGA Acceleration of Low Latency Market Data Feed Arbitration
Stewart DENHOLM Hiroaki INOUE Takashi TAKENAKA Tobias BECKER Wayne LUK

PAPER-Application

Pubricized:
2014/11/19
Vol:
E98-D No:2
Page(s):
288-297
Financial exchanges provide market data feeds to update their members about changes in the market. Feed messages are often used in time-critical automated trading applications, and two identical feeds (A and B feeds) are provided in order to reduce message loss. A key challenge is to support A/B line arbitration efficiently to compensate for missing packets, while offering flexibility for various operational modes such as prioritising for low latency or for high data reliability. This paper presents a reconfigurable acceleration approach for A/B arbitration operating at the network level, capable of supporting any messaging protocol. Two modes of operation are provided simultaneously: one prioritising low latency, and one prioritising high reliability with three dynamically configurable windowing methods. We also present a model for message feed processing latencies that is useful for evaluating scalability in future applications. We outline a new low latency, high throughput architecture and demonstrate a cycle-accurate testing framework to measure the actual latency of packets within the FPGA. We implement and compare the performance of the NASDAQ TotalView-ITCH, OPRA and ARCA market data feed protocols using a Xilinx Virtex-6 FPGA. For high reliability messages we achieve latencies of 42ns for TotalView-ITCH and 36.75ns for OPRA and ARCA. 6ns and 5.25ns are obtained for low latency messages. The most resource intensive protocol, TotalView-ITCH, is also implemented in a Xilinx Virtex-5 FPGA within a network interface card; it is used to validate our approach with real market data. We offer latencies 10 times lower than an FPGA-based commercial design and 4.1 times lower than the hardware-accelerated IBM PowerEN processor, with throughputs more than double the required 10Gbps line rate.
Network Interface Architecture with Scalable Low-Latency Message Receiving Mechanism
Noboru TANABE Atsushi OHTA

PAPER

Vol:
E96-D No:12
Page(s):
2536-2544
Most of scientists except computer scientists do not want to make efforts for performance tuning with rewriting their MPI applications. In addition, the number of processing elements which can be used by them is increasing year by year. On large-scale parallel systems, the number of accumulated messages on a message buffer tends to increase in some of their applications. Since searching message queue in MPI is time-consuming, system side scalable acceleration is needed for those systems. In this paper, a support function named LHS (Limited-length Head Separation) is proposed. Its performance in searching message buffer and hardware cost are evaluated. LHS accelerates searching message buffer by means of switching location to store limited-length heads of messages. It uses the effects such as increasing hit rate of cache on host with partial off-loading to hardware. Searching speed of message buffer when the order of message reception is different from the receiver's expectation is accelerated 14.3 times with LHS on FPGA-based network interface card (NIC) named DIMMnet-2. This absolute performance is 38.5 times higher than that of IBM BlueGene/P although the frequency is 8.5times slower than BlueGene/P. LHS has higher scalability than ALPU in the performance per frequency. Since these results are obtained with partially on loaded linear searching on old Pentium®4, performance gap will increase using state of art CPU. Therefore, LHS is more suitable for larger parallel systems. The discussions for adopting proposed method to state of art processors and systems are also presented.
A Novel and Very Fast 4-2 Compressor for High Speed Arithmetic Operations
Amir FATHI Sarkis AZIZIAN Khayrollah HADIDI Abdollah KHOEI

BRIEF PAPER

Vol:
E95-C No:4
Page(s):
710-712
A novel high speed 4-2 compressor using static and pass-transistor logic, has been designed in a 0.35 µm CMOS technology. In order to reduce gate level delay and increase the speed, some changes are performed in truth table of conventional 4-2 compressor which leaded to the simplification of logic function for all parameters. Therefore, power dissipation is decreased. In addition, because of similar paths from all inputs to the outputs, the delays are the same. So there will be no need for extra buffers in low latency paths to equalize the delays.
A Retransmission-Enhanced Duty-Cycle MAC Protocol Based on the Channel Quality for Wireless Sensor Networks
Kisuk KWEON Hanjin LEE Hyunsoo YOON

LETTER-Network

Vol:
E93-B No:11
Page(s):
3156-3160
Duty-cycle MAC protocols have been proposed for wireless sensor networks (WSNs) to reduce the energy consumed by idle listening, but they introduce significant end-to-end delivery latency. Several works have attempted to mitigate this latency, but they still have a problem on handling the packet loss. The quality of the wireless channel in WSNs is quite bad, so packets are frequently lost. In this letter, we present a novel duty-cycle MAC protocol, called REMAC (Retransmission-Enhanced duty-cycle MAC), which exploits both the network layer and the physical layer information. REMAC estimates the quality of the wireless channel and properly reserves the wireless channel to handle the packet loss. It can reduce the end-to-end packet delivery latency caused by the packet loss without sacrificing the energy efficiency. Simulation results show that REMAC outperforms RMAC in terms of the end-to-end packet delivery latency.
Concurrent Algorithm and Hardware Implementation for Low-Latency Turbo Decoder Using a Single MAP Decoder
Ya-Cheng LU Erl-Huei LU

PAPER-Fundamental Theories for Communications

Vol:
E93-B No:1
Page(s):
1-8
In order to reduce the iterative decoding delay of convolutional turbo codes, this paper presents a concurrent decoding algorithm for the hardware implementation of turbo convolutional decoders. Different than a general turbo code, the hardware turbo decoder based on the proposed algorithm can update the priori information of message for each component code in a bit-by-bit manner as soon as it is generated by the other component code. The two component codes in a turbo code can thus be decoded concurrently, by using a single MAP decoder, subsequently reducing the decoding latency by approximately half while maintaining the bit error rate performance and a comparable hardware complexity, as a general turbo decoder.
Online Speech Detection and Dual-Gender Speech Recognition for Captioning Broadcast News
Toru IMAI Shoei SATO Shinichi HOMMA Kazuo ONOE Akio KOBAYASHI

PAPER-Speech and Hearing

Vol:
E90-D No:8
Page(s):
1286-1291
This paper describes a new method to detect speech segments online with identifying gender attributes for efficient dual gender-dependent speech recognition and broadcast news captioning. The proposed online speech detection performs dual-gender phoneme recognition and detects a start-point and an end-point based on the ratio between the cumulative phoneme likelihood and the cumulative non-speech likelihood with a very small delay from the audio input. Obtaining the speech segments, the phoneme recognizer also identifies gender attributes with high discrimination in order to guide the subsequent dual-gender continuous speech recognizer efficiently. As soon as the start-point is detected, the continuous speech recognizer with paralleled gender-dependent acoustic models starts a search and allows search transitions between male and female in a speech segment based on the gender attributes. Speech recognition experiments on conversational commentaries and field reporting from Japanese broadcast news showed that the proposed speech detection method was effective in reducing the false rejection rate from 4.6% to 0.53% and also recognition errors in comparison with a conventional method using adaptive energy thresholds. It was also effective in identifying the gender attributes, whose correct rate was 99.7% of words. With the new speech detection and the gender identification, the proposed dual-gender speech recognition significantly reduced the word error rate by 11.2% relative to a conventional gender-independent system, while keeping the computational cost feasible for real-time operation.

Keyword Search Result

[Keyword] low latency(17hit)

Traffic Reduction for Speculative Video Transmission in Cloud Gaming Systems Open Access

Chunk Grouping Method to Estimate Available Bandwidth for Adaptive Bitrate Live Streaming

EMRNet: Efficient Modulation Recognition Networks for Continuous-Wave Radar Signals

Protection of Latency-Strict Stations on WLAN Systems Using CTS-to-STA Frames

Energy-Efficient KBP: Kernel Enhancements for Low-Latency and Energy-Efficient Networking Open Access

KBP: Kernel Enhancements for Low-Latency Networking for Virtual Machine and Container without Application Customization Open Access

Field Trial on 5G Low Latency Radio Communication System Towards Application to Truck Platooning Open Access

A 197mW 70ms-Latency Full-HD 12-Channel Video-Processing SoC in 16nm CMOS for In-Vehicle Information Systems

Simulation Study of Low Latency Network Architecture Using Mobile Edge Computing

Power Analysis on Unrolled Architecture with Points-of-Interest Search and Its Application to PRINCE Block Cipher

Extended Dual Virtual Paths Algorithm Considering the Timing Requirements of IEC61850 Substation Message Types

Network-Level FPGA Acceleration of Low Latency Market Data Feed Arbitration

Network Interface Architecture with Scalable Low-Latency Message Receiving Mechanism

A Novel and Very Fast 4-2 Compressor for High Speed Arithmetic Operations

A Retransmission-Enhanced Duty-Cycle MAC Protocol Based on the Channel Quality for Wireless Sensor Networks

Concurrent Algorithm and Hardware Implementation for Low-Latency Turbo Decoder Using a Single MAP Decoder

Online Speech Detection and Dual-Gender Speech Recognition for Captioning Broadcast News

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles