Revisiting network support for RDMA

TitleRevisiting network support for RDMA
Publication TypeConference Paper
Year of Publication2018
AuthorsMittal, R., Shpiner A., Panda A., Zahavi E., Krishnamurthy A., Ratnasamy S., & Shenker S.
Published inProceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication SIGCOMM '18

The advent of RoCE (RDMA over Converged Ethernet) has led to a significant increase in the use of RDMA in datacenter networks. To achieve good performance, RoCE requires a lossless network which is in turn achieved by enabling Priority Flow Control (PFC) within the network. However, PFC brings with it a host of problems such as head-of-the-line blocking, congestion spreading, and occasional deadlocks. Rather than seek to fix these issues, we instead ask: is PFC fundamentally required to support RDMA over Ethernet?

We show that the need for PFC is an artifact of current RoCE NIC designs rather than a fundamental requirement. We propose an improved RoCE NIC (IRN) design that makes a few simple changes to the RoCE NIC for better handling of packet losses. We show that IRN (without PFC) outperforms RoCE (with PFC) by 6-83% for typical network scenarios. Thus not only does IRN eliminate the need for PFC, it improves performance in the process! We further show that the changes that IRN introduces can be implemented with modest overheads of about 3-10% to NIC resources. Based on our results, we argue that research and industry should rethink the current trajectory of network support for RDMA.


We would like to thank Amin Tootoonchian, Anirudh Sivaraman, Emmanuel Amaro and Ming Liu for the helpful discussions on some of the implementation specific aspects of this work, and Brian Hausauer for his detailed feedback on an earlier version of this paper. We are also thankful to Nandita Dukkipati and Amin Vahdat for the useful discussions in the early stages of this work. We would finally like to thank our anonymous reviewers for their feedback which helped us in improving the paper, and our shepherd Srinivasan Seshan who helped shape the final version of this paper. This work was supported in parts by a Google PhD Fellowship and by Mellanox, Intel and the National Science Foundation under Grant No. 1704941, 1619377 and 1714508.

ICSI Research Group

Networking and Security