This presentation will show how far one can go tuning the system for measuring the accurate latency , these are the learnings made while measuring the latency using the DPDK skeleton application and i40e PMD.
Various kernel boot options , kernel system settings and secret i40e PMD setting will be explained and how they can affect the latency.
These learnings can be leveraged by ecosystem to measure other DPDK application latency.
Reshma Pattan is as network software engineer at Intel, Ireland. She has 5+years of DPDK experience mainly contributed to Power Management, Pdump,Telemetry , Reorder, Latency libraries and SoftNIC PMD.
DPDK is the go-to off the shelf, stable and reliable solution for data planes and switching applications globally. It is widely used to accelerate packet processing in various verticals, focusing more on throughput while providing decent latency.
In this presentation, we look at how to use DPDK to provide a network stack solution for ultra-low latency (ULL) applications in the world of algorithmic trading. We examine out of the box latency performance from DPDK. Next, we show how, through systematic tuning and benchmarking, we were able to reduce round trip time (RTT) latency. This involved configuring DPDK in scalar mode, pre-allocating mbuffs by enabling RX bulk allocation and using optimized versions of functions by enabling intrinsics. We used an open source FreeBSD network stack on top of DPDK and modified it in a way that favors low latency (burst_size=1, timeout=0). For low latency use cases, it is necessary that there are no context switches and data shared between the cores, so we used rte_flow to direct packets to specific cores. These optimizations enabled us to process the packets at wire speed and reduce latency by fivefold over the pre-tuning results. For benchmarking at these aggressively low latency levels we built a testbed with commodity hardware providing 7 nanosecond timestamp granularity. We replicated the STAC-T1 test which is a widely accepted latency benchmark in the electronic trading industry.
We also compare the results we achieved with DPDK against those we achieved with OpenOnload TCPDirect, the kernel bypass solution from Solarflare. We conclude with some thoughts on upstream contributions for enabling ULL use cases.
DPDK is a project known for its performance, but are the APIs really the best they could possibly be? In this talk we review the best-practices in DPDK datapath APIs (e.g. Ethdev, Rings, Eventdev) and understand how these contribute to the performance of DPDK: there will be lots of diagrams to help visualize things!
Next we explore the hazards in writing high performance code, with a focus on SIMD implementations. This leads to some observations about specific APIs, where DPDK does not enable the highest performing PMDs.
Finally we make suggestions as to how the DPDK APIs could be improved to provide a PMD context of the calling code, and by doing so achieve even higher performance!
Harry van Haaren is a network software engineer optimizing dataplane applications with DPDK and OVS. Interests range from high-performance API design to making every last instruction-per-cycle count towards your computing requirements. Of course functionality without security is nothing... Read More →
We introduce a new application that is aimed at providing easy to use and accurate measurement of rte flow Performance and footprint. The application support most of the matching items and some set of actions supported today in DPDK and can be extended as needed. In the session I’ll demonstrate the usage and discuss its features like: 1- Calculating rte_flow insertion rate. 2- Calculating rte_flow deletion rate. 3- Calculate Memory consumption of rte_flow 4- Packet forwarding performance stats in packet per second.
Debugging issues in DPDK applications running in production might be troublesome. Core dumps and sufficient logging can provide some insight, but finding root causes of application issues can be hard. Attaching debuggers to running applications can be sometimes unacceptable, because of application’s possible downtime. rr is a recording debugger, developed by Mozilla Foundation, which allows developers to record a trace of running application and debug it offline. This talk explores the possibility of using rr to troubleshoot issues with DPDK applications, steps required to use it in DPDK ecosystem and possible performance impact.
End-User Applications are often built with DPDK and other libraries. It becomes crumblesome to maintain well placed debug and counter logic without affecting performance.
We would like to share an approach with help of eBPF to accomodate debug, counters and metadata matching in various packet processing stages.
Currently working with Intel India for Network Custom Silicon Group as Network Software Engineer. Have a huge passion to share and contribute to DPDK, VPP, OVS and Github Projects.
GNU Makefile is getting phased out from DPDK build system, with meson. But there are many open source and custom application which relies on GNU Make. We would like to discuss our learnings while using meson build. a. Passing DPDK libraries build with meson to existing libraries with GNU make. b. Applications(OVS) making use of meson build c. Things to take care for cross-build of applications with DPDK meson libraries.
Currently working with Intel India for Network Custom Silicon Group as Network Software Engineer. Have a huge passion to share and contribute to DPDK, VPP, OVS and Github Projects.
As more and more packet processing applications need to maintain the connection state, we propose to introduce the SFT DPDK lib and to provide a framework for connection tracking, both for offloaded and lookaside processing.
Example for such applications: • Security (Suricata). • Virtual switches (OVS) • GTP
QEMU, often used as the hypervisor for virtual machines running in Cloud, can be susceptible to security attack because it is a large monolithic program. Disaggregated QEMU which involves separating QEMU services into separate host processes reduces the attack surface. Disaggregating IO services is a good place to begin QEMU disaggregating.
VFIO-over-Socket, also known as vfio-user, is a protocol that allows a device to be virtualized in separate process outside QEMU. It can be the main transport mechanism for multi-process QEMU, and it can be used the by other application offering device virtualization. DPDK will have vfio-user support by introducing and implementing vfio-user bus driver. That provides the framework for DPDK application to offer device virtualization and accommodates QEMU out-of-tree emulated devices in DPDK.
This presentation will cover below items: 1. Why and how allow a device to be virtualized outside QEMU 2. Introducing framework for accommodating emulated/virtualized in DPDK 3. Introducing a specific emulated/virtualized device in DPDK 4. Other potential emulated devices in DPDK (optional)
Chenbo is a network virtualization engineer at Intel PRC, member of Intel DPDK team. He mainly contributes to DPDK project, for which he is co-maintainer for the Vhost and Virtio subsystem. He has expertise in I/O virtualization and networking. He had two talks on previous DPDK summit... Read More →
vDPA, which stands for Virtio Datapath Acceleration, aims at providing wire-speed and wire-latency L2 open and standard interfaces. The fundamental idea of vDPA is to push the specification based virtio interface from SW to physical NICs for VMs and containers to consume it.
After a short introduction to vDPA technology and a high level presentation of both DPDK and Kernel alternatives, the presenters will provide an update on DPDK's vDPA framework which was introduced two years ago, and introduce the upcoming vDPA daemon which aims at managing DPDK vDPA VFs.
Then, they will give an update on the Virtio-user PMD driver which is being used in containers to consume both DPDK and Kernel vDPA interfaces.
Finally, the presenters will give an overview of the higher-level picture, presenting the work being done with the Kubernetes community to provide vDPA interfaces to containers as Multus seconday interfaces.
Maxime is principal engineer at Red Hat, member of its networking team. He mainly contributes to the DPDK project, for which he is co-maintainer for the Vhost & Virtio subsystems as well as maintainer for the BBDEV subsystem and member of the DPDK technical board.
For the NextGen Firewalls to inspect content, a high performant quic proxy is a must. This lead to explore kernel quic alternative (~300Mbps) to user-space quic based on DPDK (~2Gbps) per core.
Currently working with Intel India for Network Custom Silicon Group as Network Software Engineer. Have a huge passion to share and contribute to DPDK, VPP, OVS and Github Projects.
An Open Radio Access Network (O-RAN) is a totally disaggregated approach to deploying mobile fronthaul and mid-haul networks built entirely on cloud-native principles. Under O-RAN architecture NICs along with accelerators (such as GPU, FPGA etc…) will be placed on the network edge to handle the 5G mac layer. DPDK is a good framework to implement such functionality enabling receiving of the RAW 5G packets for the MAC layer processing. In this talk, we will show how we enabled a full softwarization of the telco Edge (not only 5G) using the different offloads in DPDK that can be used in order to accelerate the 5G packet processing. In specific, the ability to zero-copy between NIC and accelerator, the usage in PTP, advanced flow steering to HW dispatch between the control and data packets, and the usage in the NIC scheduling mechanisms to transmit a packet on a specific time fitting the radio unit receive window.