Network-Level FPGA Acceleration of Low Latency Market Data Feed Arbitration

Stewart DENHOLM; Hiroaki INOUE; Takashi TAKENAKA; Tobias BECKER; Wayne LUK

doi:10.1587/transinf.2014RCP0011

IEICE TRANSACTIONS on Information

Network-Level FPGA Acceleration of Low Latency Market Data Feed Arbitration

Stewart DENHOLM, Hiroaki INOUE, Takashi TAKENAKA, Tobias BECKER, Wayne LUK

Full Text Views

0

Cite this

Summary :

Financial exchanges provide market data feeds to update their members about changes in the market. Feed messages are often used in time-critical automated trading applications, and two identical feeds (A and B feeds) are provided in order to reduce message loss. A key challenge is to support A/B line arbitration efficiently to compensate for missing packets, while offering flexibility for various operational modes such as prioritising for low latency or for high data reliability. This paper presents a reconfigurable acceleration approach for A/B arbitration operating at the network level, capable of supporting any messaging protocol. Two modes of operation are provided simultaneously: one prioritising low latency, and one prioritising high reliability with three dynamically configurable windowing methods. We also present a model for message feed processing latencies that is useful for evaluating scalability in future applications. We outline a new low latency, high throughput architecture and demonstrate a cycle-accurate testing framework to measure the actual latency of packets within the FPGA. We implement and compare the performance of the NASDAQ TotalView-ITCH, OPRA and ARCA market data feed protocols using a Xilinx Virtex-6 FPGA. For high reliability messages we achieve latencies of 42ns for TotalView-ITCH and 36.75ns for OPRA and ARCA. 6ns and 5.25ns are obtained for low latency messages. The most resource intensive protocol, TotalView-ITCH, is also implemented in a Xilinx Virtex-5 FPGA within a network interface card; it is used to validate our approach with real market data. We offer latencies 10 times lower than an FPGA-based commercial design and 4.1 times lower than the hardware-accelerated IBM PowerEN processor, with throughputs more than double the required 10Gbps line rate.

Publication: IEICE TRANSACTIONS on Information Vol.E98-D No.2 pp.288-297

Publication Date: 2015/02/01

Publicized: 2014/11/19

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2014RCP0011

Type of Manuscript: Special Section PAPER (Special Section on Reconfigurable Systems)

Category: Application

Authors

Stewart DENHOLM
  Imperial College London
Hiroaki INOUE
  NEC Corporation
Takashi TAKENAKA
  NEC Corporation
Tobias BECKER
  Imperial College London
Wayne LUK
  Imperial College London

Keyword

data feed arbitration, acceleration, FPGA, low latency, finance

Cite this

Copy

Stewart DENHOLM, Hiroaki INOUE, Takashi TAKENAKA, Tobias BECKER, Wayne LUK, "Network-Level FPGA Acceleration of Low Latency Market Data Feed Arbitration" in IEICE TRANSACTIONS on Information, vol. E98-D, no. 2, pp. 288-297, February 2015, doi: 10.1587/transinf.2014RCP0011.
Abstract: Financial exchanges provide market data feeds to update their members about changes in the market. Feed messages are often used in time-critical automated trading applications, and two identical feeds (A and B feeds) are provided in order to reduce message loss. A key challenge is to support A/B line arbitration efficiently to compensate for missing packets, while offering flexibility for various operational modes such as prioritising for low latency or for high data reliability. This paper presents a reconfigurable acceleration approach for A/B arbitration operating at the network level, capable of supporting any messaging protocol. Two modes of operation are provided simultaneously: one prioritising low latency, and one prioritising high reliability with three dynamically configurable windowing methods. We also present a model for message feed processing latencies that is useful for evaluating scalability in future applications. We outline a new low latency, high throughput architecture and demonstrate a cycle-accurate testing framework to measure the actual latency of packets within the FPGA. We implement and compare the performance of the NASDAQ TotalView-ITCH, OPRA and ARCA market data feed protocols using a Xilinx Virtex-6 FPGA. For high reliability messages we achieve latencies of 42ns for TotalView-ITCH and 36.75ns for OPRA and ARCA. 6ns and 5.25ns are obtained for low latency messages. The most resource intensive protocol, TotalView-ITCH, is also implemented in a Xilinx Virtex-5 FPGA within a network interface card; it is used to validate our approach with real market data. We offer latencies 10 times lower than an FPGA-based commercial design and 4.1 times lower than the hardware-accelerated IBM PowerEN processor, with throughputs more than double the required 10Gbps line rate.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2014RCP0011/_p

Copy

@ARTICLE{e98-d_2_288,
author={Stewart DENHOLM, Hiroaki INOUE, Takashi TAKENAKA, Tobias BECKER, Wayne LUK, },
journal={IEICE TRANSACTIONS on Information},
title={Network-Level FPGA Acceleration of Low Latency Market Data Feed Arbitration},
year={2015},
volume={E98-D},
number={2},
pages={288-297},
abstract={Financial exchanges provide market data feeds to update their members about changes in the market. Feed messages are often used in time-critical automated trading applications, and two identical feeds (A and B feeds) are provided in order to reduce message loss. A key challenge is to support A/B line arbitration efficiently to compensate for missing packets, while offering flexibility for various operational modes such as prioritising for low latency or for high data reliability. This paper presents a reconfigurable acceleration approach for A/B arbitration operating at the network level, capable of supporting any messaging protocol. Two modes of operation are provided simultaneously: one prioritising low latency, and one prioritising high reliability with three dynamically configurable windowing methods. We also present a model for message feed processing latencies that is useful for evaluating scalability in future applications. We outline a new low latency, high throughput architecture and demonstrate a cycle-accurate testing framework to measure the actual latency of packets within the FPGA. We implement and compare the performance of the NASDAQ TotalView-ITCH, OPRA and ARCA market data feed protocols using a Xilinx Virtex-6 FPGA. For high reliability messages we achieve latencies of 42ns for TotalView-ITCH and 36.75ns for OPRA and ARCA. 6ns and 5.25ns are obtained for low latency messages. The most resource intensive protocol, TotalView-ITCH, is also implemented in a Xilinx Virtex-5 FPGA within a network interface card; it is used to validate our approach with real market data. We offer latencies 10 times lower than an FPGA-based commercial design and 4.1 times lower than the hardware-accelerated IBM PowerEN processor, with throughputs more than double the required 10Gbps line rate.},
keywords={},
doi={10.1587/transinf.2014RCP0011},
ISSN={1745-1361},
month={February},}

Copy

TY - JOUR
TI - Network-Level FPGA Acceleration of Low Latency Market Data Feed Arbitration
T2 - IEICE TRANSACTIONS on Information
SP - 288
EP - 297
AU - Stewart DENHOLM
AU - Hiroaki INOUE
AU - Takashi TAKENAKA
AU - Tobias BECKER
AU - Wayne LUK
PY - 2015
DO - 10.1587/transinf.2014RCP0011
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E98-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2015
AB - Financial exchanges provide market data feeds to update their members about changes in the market. Feed messages are often used in time-critical automated trading applications, and two identical feeds (A and B feeds) are provided in order to reduce message loss. A key challenge is to support A/B line arbitration efficiently to compensate for missing packets, while offering flexibility for various operational modes such as prioritising for low latency or for high data reliability. This paper presents a reconfigurable acceleration approach for A/B arbitration operating at the network level, capable of supporting any messaging protocol. Two modes of operation are provided simultaneously: one prioritising low latency, and one prioritising high reliability with three dynamically configurable windowing methods. We also present a model for message feed processing latencies that is useful for evaluating scalability in future applications. We outline a new low latency, high throughput architecture and demonstrate a cycle-accurate testing framework to measure the actual latency of packets within the FPGA. We implement and compare the performance of the NASDAQ TotalView-ITCH, OPRA and ARCA market data feed protocols using a Xilinx Virtex-6 FPGA. For high reliability messages we achieve latencies of 42ns for TotalView-ITCH and 36.75ns for OPRA and ARCA. 6ns and 5.25ns are obtained for low latency messages. The most resource intensive protocol, TotalView-ITCH, is also implemented in a Xilinx Virtex-5 FPGA within a network interface card; it is used to validate our approach with real market data. We offer latencies 10 times lower than an FPGA-based commercial design and 4.1 times lower than the hardware-accelerated IBM PowerEN processor, with throughputs more than double the required 10Gbps line rate.
ER -

IEICE TRANSACTIONS on Information