OpenFlow: Enabling Innovation in Campus Networks

Nick McKeown; Tom Anderson; Hari Balakrishnan; Guru Parulkar; Larry Peterson; Jennifer Rexford; Scott Shenker; Jonathan Turner; Pat Bosshart; Dan Daly; Glen Gibb; Martin Izzard; Nick McKeown; Jennifer Rexford; Cole Schlesinger; Dan Talayco; Amin Vahdat; George Varghese; David Walker; Bob Lantz; Brandon Heller; Nick McKeown; Nick McKeown; Pat Bosshart; Glen Gibb; Hun-Seok Kim; George Varghese; Nick McKeown; Martin Izzard; Fernando Mujica; Mark Horowitz; Brandon Heller; Srini Seetharaman; Priya Mahadevan; Yiannis Yiakoumis; Puneet Sharma; Sujata Banerjee; Nick McKeown; Martin Casado; Michael J. Freedman; Justin Pettit; Jianying Luo; Nick McKeown; Scott Shenker; Nick McKeown; Venkat Anantharam; Jean Walrand; Mohammad Alizadeh; Shuang Yang; Milad Sharif; Sachin Katti; Nick McKeown; Balaji Prabhakar; Scott Shenker; Guido Appenzeller; Isaac Keslassy; Nick McKeown; Stephen Ibanez; Gordon Brebner; Nick McKeown; Noa Zilberman; Jed Liu; William Hallahan; Cole Schlesinger; Milad Sarif; Jeongkeun Lee; Robert Soule; Han Wang; Calin Cascaval; Nick McKeown; Nate Foster; Eyan Cidon; Sean Choi; Sachin Katti; Nick McKeown; Anirudh Sivaraman; Alvin Cheung; Mihai Budiu; Changhoon Kim; Mohammad Alizadeh; Hari Balakrishnan; George Varghese; Nick McKeown; Steve Licking; Anirudh Sivaraman; Suvinay Subramanian; Mohammad Alizadeh; Sharad Chole; Shang-Tse Chuang; Anurag Agrawal; Hari Balakrishnan; Tom Edsall; Sachin Katti; Nick McKeown; Muhammad Shahbaz; Sean Choi; Ben Pfaff; Changhoon Kim; Nick Feamster; Nick McKeown; Jennifer Rexford; Anirudh Sivaraman; Suvinay Subramanian; Anurag Agrawal; Sharad Chole; Shang-Tse Chuang; Tom Edsall; Mohammad Alizadeh; Sachin Katti; Nick McKeown; Hari Balakrishnan; Lavanya Jose; Lisa Yan; George Varghese; Nick McKeown; Changhoon Kim; Ron Kao; Nick McKeown; Nick McKeown; Jennifer Rexford; Hari Balakrishnan; Sujata Banerjee; Israel Cidon; David Culler; Deborah Estrin; Ethan Katz-Bassett; Arvind Krishnamurthy; James McCauley; Nick McKeown; Aurojit Panda; Sylvia Ratnasamy; Jennifer Rexford; Michael Schapira; Scott Shenker; Ion Stoica; David Tennenhouse; Amin Vahdat; Ellen Zegura; Nate Foster; Nick McKeown; Jennifer Rexford; Guru Parulkar; Larry Peterson; Oguz Sunay; Larry Peterson; Tom Anderson; Sachin Katti; Nick McKeown; Guru Parulkar; Jennifer Rexford; Mahadev Satyanarayanan; Oguz Sunay; Amin Vahdat; Martin Casado; Nick McKeown; Scott Shenker; Stephen Ibanez; Gianni Antichi; Gordon Brebner; Nick McKeown; Stephen Ibanez; Muhammad Shahbaz; Nick McKeown; Yiannis Yiakoumis; Manu Bansal; Adam Covington; Johan van Reijendam; Sachin Katti; Nick McKeown; Gianni Antichi; Muhammad Shahbaz; Yilong Geng; Noa Zilberman; Adam Covington; Marc Bruyere; Nick McKeown; Nick Feamster; Nick Shelly; Ethan J. Jackson; Teemu Koponen; Nick McKeown; Jarno Rajahalme; Masayoshi Kobayashi; Srini Seetharaman; Guru Parulkar; Guido Appenzeller; Joseph Little; Johan van Reijendam; Paul Weissmann; Nick McKeown; Saurav Das; Guru Parulkar; Nick McKeown; Glen Gibb; George Varghese; Mark Horowitz; Nick McKeown; Brandon Heller; Rob Sherwood; Nick McKeown; Glen Gibb; Hongyi Zeng; Nick McKeown; Saurav Das; Guru Parulkar; Nick McKeown; Ali Reza Sharafat; Saurav Das; Guru Parulkar; Nick McKeown; Yiannis Yiakoumis; Kok-Kiong Yap; Sachin Katti; Guru Parulkar; Nick McKeown; Ali Ghodsi; P. Brighten Godfrey; Nick McKeown; Guru Parulkar; Barath Raghavan; Rob Sherwood; Glen Gibb; Kok-Kiong Yap; Guido Appenzeller; Martin Casado; Nick McKeown; Guru Parulkar; Kok-Kiong Yap; Masayoshi Kobayashi; Rob Sherwood; Nikhil Handigol; Te-Yuan Huang; Michael Chan; Nick McKeown; Martin Casado; Michael J. Freedman; Justin Pettit; Jianying Luo; Natasha Gude; Nick McKeown; Scott Shenker; Natasha Gude; Teemu Koponen; Justin Pettit; Ben Pfaff; Martin Casado; Nick McKeown; Scott Shenker; Jianying Luo; Justin Pettit; Martin Casado; John Lockwood; Nick McKeown; Martin Casado; Tal Garfinkel; Aditya Akella; Michael Freedman; Dan Boneh; Nick McKeown; Scott Shenker; Sundararajan Renganathan; Benny Rubin; Hyojoon Kim; Pier Luigi Ventre; Carmelo Cascone; Daniele Moro; Charles Chan; Nick McKeown; Nate Foster; Nikhil Handigol; Brandon Heller; Vimalkumar Jeyakumar; David Mazieres; Nick McKeown; Hongyi Zeng; Shidong Zhang; Fei Ye; Vimalkumar Jeyakumar; Mickey Ju; Junda Liu; Nick McKeown; Amin Vahdat; Peyman Kazemian; Michael Chang; Hongyi Zeng; George Varghese; Nick McKeown; Scott Whyte; Brandon Heller; Colin Scott; Nick McKeown; Scott Shenker; Andreas Wundsam; Hongyi Zeng; Sam Whitlock; Vimalkumar Jeyakumar; Nikhil Handigol; James McCauley; Kyriakos Zarifis; Peyman Kazemian; Hongyi Zeng; Peyman Kazemian; George Varghese; Nick McKeown; Nikhil Handigol; Brandon Heller; Vimalkumar Jeyakumar; David Mazieres; Nick McKeown; Peyman Kazemian; George Varghese; Nick McKeown; Sundararajan Renganathan; Nick McKeown; Stephen Ibanez; Alex Mallery; Serhat Arslan; Theo Jepsen; Muhammad Shahbaz; Changhoon Kim; Nick McKeown; Serhat Arslan; Stephen Ibanez; Alex Mallery; Changhoon Kim; Nick McKeown; Lavanya Jose; Stephen Ibanez; Mohammad Alizadeh; Nick McKeown; Mohammad Alizadeh; Shuang Yang; Sachin Katti; Nick McKeown; Balaji Prabhakar; Scott Shenker; David Erickson; Brandon Heller; Nick McKeown; Mendel Rosenblum; Bruce Spang; Shravya Kunamalla; Renata Teixeira; Te-Yuan Huang; Grenville Armitage; Ramesh Johari; Nick McKeown; Bruce Spang; Brady Walsh; Te-Yuan Huang; Tom Rusnock; J.C. Lawrence; Nick McKeown; Te-Yuan Huang; Ramesh Johari; Nick McKeown; Matthew Trunnell; Mark Watson; Te-Yuan Huang; Ramesh Johari; Nick McKeown; Te-Yuan Huang; Nikhil Handigol; Brandon Heller; Nick McKeown; Ramesh Johari; Bruce Spang; Veronica Hannan; Shravya Kunamalla; Te-Yuan Huang; Nick McKeown; Ramesh Johari; Serhat Arslan; Nick McKeown; Bryce Cronkite-Ratcliff; Aran Bergman; Shay Vargaftik; Madhusudhan Ravi; Nick McKeown; Ittai Abraham; Isaac Keslassy; Lavanya Jose; Lisa Yan; Mohammad Alizadeh; George Varghese; Nick McKeown; Sachin Katti; Hamsa Balakrishnan; Nandita Dukkipati; Nick McKeown; Claire Tomlin; Nandita Dukkipati; Nick McKeown; Alexander G. Fraser; Nandita Dukkipati; Nick McKeown; Sundar Iyer; Supratik Bhattacharyya; Nina Taft; Christophe Diot; Nick McKeown; Yiannis Yiakoumis; Sachin Katti; Nick McKeown; Yiannis Yiakoumis; Te-Yuan Huang; Kok-Kiong Yap; Sachin Katti; Ramesh Johari; Nick McKeown; Kok-Kiong Yap; Te-Yuan Huang; Yiannis Yiakoumis; Sandeep Chinchali; Nick McKeown; Sachin Katti; Kok-Kiong Yap; Te-Yuan Huang; Yiannis Yiakoumis; Nick McKeown; Sachin Katti; Kok-Kiong Yap; Te-Yuan Huang; Yiannis Yiakoumis; Masayoshi Kobayashi; Sachin Katti; Guru Parulkar; Nick McKeown; Michael Chan; Nikhil Handigol; Nick McKeown; Guru Parulkar; Nikhil Handigol; Brandon Heller; Vimalkumar Jeyakumar; Bob Lantz; Nick McKeown; Bruce Spang; Serhat Arslan; Nick McKeown; Nick McKeown; Guido Appenzeller; Isaac Keslassy; Neda Beheshti; Yashar Ganjali; Monia Ghobadi; Nick McKeown; Geoff Salmon; Sundar Iyer; Ramana Kompella; Nick McKeown; Yashar Ganjali; Nick McKeown; Mihaela Enachescu; Yashar Ganjali; Ashish Goel; Nick McKeown; Tim Roughgarden; Youngmi Joo; Nick McKeown; Isaac Keslassy; Shang-Tse Chuang; Kyoungsik Yu; David Miller; Mark Horowitz; Olav Solgaard; Nick McKeown; Sundar Iyer; Rui Zhang; Nick McKeown; Isaac Keslassy; Cheng-Shang Chang; Nick McKeown; Duan-Shin Lee; Rui Zhang-Shen; Nick McKeown; Sundar Iyer; Nick McKeown; Shang-Tse Chuang; Ashish Goel; Nick McKeown; Balaji Prabhakar; Adisak Mekkittikul; Nick McKeown; Nick McKeown; Thomas E. Anderson; Rietsh Ahuja; Balaji Prabhakar; Nick McKeown; Nick McKeown; Pankaj Gupta; Nick McKeown; Pankaj Gupta; Nick McKeown; Pankaj Gupta; Nick McKeown; Pankaj Gupta; Steven Lin; Nick McKeown; Nick McKeown; Martin Izzard; Adisak Mekkittikul; Bill Ellersick; Mark Horowitz; Kun-Yung Ken Chang; Shang-Tse Chuang; Nick McKeown; Mark Horowitz; Pablo Molinero-Fernandez; Nick McKeown; Hui Zhang; Steven Lin; Nick McKeown; Nick McKeown; Lisa Yan; Nick McKeown; Chris Piech; Chris Piech; Lisa Yan; Lisa Einstein; Ana Saavedra; Baris Bozkurt; Eliska Sestakova; Ondrej Guth; Nick McKeown; Lisa Yan; Nick McKeown; Mehran Sahami; Chris Piech; Lisa Yan; Nick McKeown; Glen Gibb; John W. Lockwood; Jad Naous; Paul Hartke; Nick McKeown; Martin Casado; Nick McKeown; Kristen Bell; Jenny Hong; Nick McKeown; Catalin Voss; Gianluca Iannaccone; Christophe Diot; Ian Graham; Nick McKeown; Richard Edell; Nick McKeown; Pravin Varaiya; Steven E. Shladover; Charles A. Desoer; J. Karl Hedrick; Masayoshi Tomizuka; Jean Walrand; Wei-Bin Zhang; Donn H. McMahon; Huei Peng; Shahab Sheikholeslam; Nick McKeown; Saurav Das; Yiannis Yiakoumis; Guru M. Parulkar; Preeti Singh; Daniel Getachew; Premal Dinesh Desai; Nick McKeown; Saurav Das; Ali Reza Sharafat; Guru M. Parulkar; Nick McKeown; Glen Gibb; Hongyi Zeng; Nick McKeown; Glen Gibb; Nick McKeown; Saurav Das; Guru Parulkar; Preeti Singh; Daniel Getachew; Lyndon Ong; Nick McKeown; Jad Naous; Ryan Stutsman; David Mazieres; Nick McKeown; Nickolai Zeldovich; Kok-Kiong Yap; Te-Yuan Huang; Masayoshi Kobayashi; Michael Chan; Rob Sherwood; Guru Parulkar; Nick McKeown; Kok-Kiong Yap; Masayoshi Kobayashi; David Underhill; Srinivasan Seetharaman; Peyman Kazemian; Nick McKeown; Saurav Das; Guru Parulkar; Nick McKeown; Kok-Kiong Yap; Te-Yuan Huang; Ben Dodson; Monica S. Lam; Nick McKeown; Te-Yuan Huang; Kok-Kiong Yap; Ben Dodson; Monica S. Lam; Nick McKeown; Vinesh Gudla; Saurav Das; Anujit Shastri; Guru Parulkar; Nick McKeown; Leonid Kazovsky; Rob Sherwood; Michael Chan; Adam Covington; Glen Gibb; Mario Flajslik; Nikhil Handigol; Te-Yuan Huang; Peyman Kazemian; Masayoshi Kobayashi; Jad Naous; Srinivasan Seetharaman; David Underhill; Tatsuya Yabe; Kok-Kiong Yap; Yiannis Yiakoumis; Hongyi Zeng; Guido Appenzeller; Ramesh Johari; Nick McKeown; Guru Parulkar; Jad Naous; David Erickson; Adam Covington; Guido Appenzeller; Nick McKeown; Nikhil Handigol; Brandon Heller; Vimalkumar Jeyakumar; David Mazieres; Nick McKeown; Hongyi Zeng; Peyman Kazemian; George Varghese; Nick McKeown; Martin Casado; David Erickson; Igor Anatolyevich Ganichev; Rean Griffith; Brandon Heller; Nick Mckeown; Daekyeong Moon; Teemu Koponen; Scott Shenker; Kyriakos Zarifis; David Erickson; Brandon Heller; Shuang Yang; Jonathan Chu; Jonathan D. Ellithorpe; Scott Whyte; Stephen Stuart; Nick McKeown; Guru M. Parulkar; Mendel Rosenblum; David Erickson; Glen Gibb; Brandon Heller; David Underhill; Jad Naous; Guido Appenzeller; Guru Parulkar; Nick McKeown; Mendel Rosenblum; Monica Lam; Sailesh Kumar; Valentina Alaria; Pere Monclus; Flavio Bonomi; Jean Tourrilhes; Praveen Yalagandula; Sujata Banerjee; Charles Clark; Rick McGeer; Ashvin Lakshmikantha; R. Srikant; Nandita Dukkipati; Nick McKeown; Carolyn Beck; Nandita Dukkipati; Nick McKeown; Nandita Dukkipati; Glen Gibb; Nick McKeown; Jiang Zhu; Anna Gilbert; Youngmi Joo; Nick McKeown; Kok-Kiong Yap; Nick McKeown; Sachin Katti; Kok-Kiong Yap; Yiannis Yiakoumis; Masayoshi Kobayashi; Sachin Katti; Guru Parulkar; Nick McKeown; Kok-Kiong Yap; Sachin Katti; Guru Parulkar; Nick McKeown; Neda Beheshti; Yashar Ganjali; Monia Ghobadi; Nick McKeown; Jad Naous; Geoff Salmon; Neda Beheshti; Emily F. Burmeister; Yashar Ganjali; John E. Bowers; Daniel J. Blumenthal; Nick McKeown; Daniel J. Blumenthal; John Barton; Neda Beheshti; John E. Bowers; Emily Burmeister; Larry A. Coldren; Matt Dummer; Garry Epps; Alexander Fang; Yashar Ganjali; John Garcia; Brian Koch; Vikrant Lal; Erica Lively; John Mack; Milan Masanovic; Nick McKeown; Kim Nguyen; Steven C. Nicholes; Hyundai Park; Biljana Stamenic; Anna Tauke-Pedretti; Henrik Poulsen; Matt Sysak; Sundar Iyer; Ramana Rao Kompella; Nick McKeown; Devavrat Shah; Sundar Iyer; Balaji Prabhakar; Nick McKeown; Sundar Iyer; Nick McKeown; Neda Beheshti; Yashar Ganjali; Monia Ghobadi; Nick McKeown; Geoff Salmon; N. Beheshti; D. Underhill; B. Heller; S. Bolouki; N. McKeown; Y. Ganjali; N. Beheshti; Y. Ganjali; N. McKeown; Neda Beheshti; Yashar Ganjali; Jad Naous; Nick McKeown; Neda Beheshti; Yashar Ganjali; Ashish Goel; Nick McKeown; Gireesh Shrimali; Nick McKeown; Damon Wischik; Nick McKeown; Neda Beheshti; Yashar Ganjali; Ramesh Rajaduray; Daniel Blumenthal; Nick McKeown; Gireesh Shrimali; Isaac Keslassy; Nick McKeown; Rui Zhang-Shen; Nick McKeown; Rui Zhang-Shen; Nick McKeown; Sundar Iyer; Nick McKeown; Isaac Keslassy; Shang-Tse Chuang; Nick McKeown; Sundar Iyer; Nick McKeown; Isaac Keslassy; Nick McKeown; Nick McKeown; Srikanth Arekapudi; Shang-Tse Chuang; Isaac Keslassy; Nick McKeown; Sundar Iyer; Amr A. Awadallah; Nick McKeown; Isaac Keslassy; Rui Zhang-Shen; Nick McKeown; Shang-Tse Chuang; Sundar Iyer; Nick McKeown; Isaac Keslassy; Nick McKeown; Sundar Iyer; Nick McKeown; Sundar Iyer; Nick McKeown; Mingjie Lin; Nick McKeown; Nick McKeown; Balaji Prabhakar; Mingyan Zhu; Balaji Prabhakar; Nick McKeown; Nick McKeown; Pravin Varaiya; Jean Walrand; Nick McKeown; Jean Walrand; Anthony Hung; George Kesidis; Nick McKeown; Pankaj Gupta; Nick McKeown; Adisak Mekkittikul; Nick McKeown; Balaji Prabhakar; Nick McKeown; Adisak Mekkittikul; D. Sadot; L.G. Kazovsky; Nick McKeown; Nick McKeown; Balaji Prabhakar; Balaji Prabhakar; Nick McKeown; Jean Mairesse; Pankaj Gupta; Nick McKeown; Nick McKeown; Martin Izzard; Adisak Mekkittikul; Bill Ellersick; Mark Horowitz; Nick McKeown; Martin Izzard; Adisak Mekkittikul;; Ken K.-Y. Chang; William Ellersick; Shang-Tse Chuang; Stefanos Sidiropoulos; Mark Horowitz; Nick McKeown; Pablo Molinero-Fernandez; Nick McKeown; Pablo Molinero-Fernandez; Nick McKeown; Nick McKeown; Costas Calamvokis; Shang-tse Chuang; Nick McKeown; Richard Edell; My T. Le; Nick McKeown; Richard Edell; My T. Le; My T. Le; Nick McKeown; Richard Edell; George Kesidis; Nick McKeown; Lisa Yan; Nick McKeown; Jad Naous; Glen Gibb; Sara Bolouki; Nick McKeown; Michaela Blott; Jonathan Ellithorpe; Nick McKeown; Kees Vissers; Hongyi Zeng; Martin Casado; Gregory Watson; Nick McKeown; Martin Casado; Gregory Watson; Nick McKeown; Dan Wendlandt; Martin Casado; Paul Tarjan; Nick McKeown; John W. Lockwood; Nick McKeown; Greg Watson; Glen Gibb; Paul Hartke; Jad Naous; Ramanan Raghuraman; Jianying Luo; Martin Casado; Vikram Vijayaraghavan; Guido Appenzeller; Nick McKeown; J Turner; N McKeown; David Erickson; Martin Casado; Nick McKeown; Karl Petty; Nick McKeown

Enabling the Reflex Plane with the nanoPU

Stephen Ibanez, Alex Mallery, Serhat Arslan, Theo Jepsen, Muhammad Shahbaz, Changhoon Kim, Nick McKeown

arXiv preprint arXiv:2212.06658, December 2022

PDF

@misc{ibanez2022nanoPUreflex, title={Enabling the Reflex Plane with the nanoPU}, author={Ibanez, Stephen and Mallery, Alex and Arslan, Serhat and Jepsen, Theo and Shahbaz, Muhammad and Kim, Changhoon and McKeown, Nick}, year={2022}, eprint={2212.06658}, archivePrefix={arXiv} }

From Sand to Flour: The Next Leap in Granular Computing with NanoSort

Theo Jepsen, Stephen Ibanez, Gregory Valiant, Nick McKeown

arXiv preprint arXiv:2204.12615, April 2022

PDF

@misc{jepsen2022nanosort, title={From Sand to Flour: The Next Leap in Granular Computing with NanoSort}, author={Jepsen, Theo and Ibanez, Stephen and Valiant, Gregory and McKeown, Nick}, year={2022}, eprint={2204.12615}, archivePrefix={arXiv} }

The nanoPU: Redesigning the CPU-Network Interface to Minimize RPC Tail Latency

Theo Jepsen, Muhammad Shahbaz, Nick McKeown, Changhoon Kim

arXiv preprint arXiv:2010.12114, October 2020

PDF

@misc{jepsen2020nanopu, title={The nanoPU: Redesigning the CPU-Network Interface to Minimize RPC Tail Latency}, author={Jepsen, Theo and Shahbaz, Muhammad and McKeown, Nick and Kim, Changhoon}, year={2020}, eprint={2010.12114}, archivePrefix={arXiv} }

The P4→NetFPGA Workflow for Line-Rate Packet Processing

Stephen Ibanez, Gordon Brebner, Nick McKeown, Noa Zilberman

FPGA '19, Seaside, CA, 2019

PDF

P4 has emerged as the de facto standard language for describing how network packets should be processed, and is becoming widely used by network owners, systems developers, researchers and in the classroom. The goal of the work presented here is to make it easier for engineers, researchers and students to learn how to program using P4, and to build prototypes running on real hardware. Our target is the NetFPGA SUME platform, a 4x10 Gb/s PCIe card designed for use in universities for teaching and research. Until now, NetFPGA users have needed to learn an HDL such as Verilog or VHDL, making it off limits to many software developers and students. Therefore, we developed the P4->NetFPGA workflow, allowing developers to describe how packets are to be processed in the high-level P4 language, then compile their P4 programs to run at line rate on the NetFPGA SUME board. The P4->NetFPGA workflow is built upon the Xilinx P4-SDNet compiler and the NetFPGA SUME open source code base. In this paper, we provide an overview of the P4 programming language and describe the P4->NetFPGA workflow. We also describe how the workflow is being used by the P4 community to build research prototypes, and to teach how network systems are built by providing students with hands-on experience working with real hardware.

@inproceedings{ibanez2019p4netfpga, title={The P4$\rightarrow$NetFPGA Workflow for Line-Rate Packet Processing}, author={Ibanez, Stephen and Brebner, Gordon and McKeown, Nick and Zilberman, Noa}, booktitle={ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA)}, year={2019} }

p4v: Practical Verification for Programmable Data Planes

Jed Liu, William Hallahan, Cole Schlesinger, Milad Sarif, Jeongkeun Lee, Robert Soule, Han Wang, Calin Cascaval, Nick McKeown, Nate Foster

ACM SIGCOMM, Budapest, Hungary, 2018

PDF

We present the design and implementation of p4v, a practical tool for verifying data planes described using the P4 programming language. The design of p4v is based on classic verification techniques but adds several key innovations including a novel mechanism for incorporating assumptions about the control plane and domain-specific optimizations which are needed to scale to large programs. We present case studies showing that p4v verifies important properties and finds bugs in real-world programs. We conduct experiments to quantify the scalability of p4v on a wide range of additional examples. We show that with just a few hundred lines of control-plane annotations, p4v is able to verify critical safety properties for switch.p4, a program that implements the functionality of on a modern data center switch, in under three minutes.

@inproceedings{liu2018p4v, title={p4v: Practical Verification for Programmable Data Planes}, author={Liu, Jed and Hallahan, William and Schlesinger, Cole and Sarif, Milad and Lee, Jeongkeun and Soule, Robert and Wang, Han and Cascaval, Calin and McKeown, Nick and Foster, Nate}, booktitle={ACM SIGCOMM}, year={2018} }

AppSwitch: Application-layer Load Balancing within a Software Switch

Eyan Cidon, Sean Choi, Sachin Katti, Nick McKeown

APNet 2017

PDF

We present AppSwitch, an application-layer approach to load balancing within a software switch. AppSwitch uses system call interposition to transparently redirect connection setup to backend servers, without modifying applications or the network stack. It supports application-aware routing decisions and integrates with container orchestration platforms. AppSwitch achieves performance comparable to kernel-level load balancers while providing greater flexibility and simpler deployment.

@inproceedings{cidon2017appswitch, title={AppSwitch: Application-layer Load Balancing within a Software Switch}, author={Cidon, Eyan and Choi, Sean and Katti, Sachin and McKeown, Nick}, booktitle={APNet}, year={2017} }

Packet Transactions: High-Level Programming for Line-Rate Switches

Anirudh Sivaraman, Alvin Cheung, Mihai Budiu, Changhoon Kim, Mohammad Alizadeh, Hari Balakrishnan, George Varghese, Nick McKeown, Steve Licking

ACM SIGCOMM, Florianopolis, Brazil, 2016

PDF

Many algorithms for congestion control, scheduling, network measurement, active queue management, and traffic engineering require custom processing of packets in the data plane of a network switch. To run at line rate, these data-plane algorithms must be implemented in hardware. With today's switch hardware, algorithms cannot be changed, nor new algorithms installed, after a switch has been built. This paper shows how to program data-plane algorithms in a high-level language and compile those programs into low-level microcode that can run on emerging programmable line-rate switching chips. The key challenge is that many data-plane algorithms create and modify algorithmic state. To achieve line-rate programmability for stateful algorithms, we introduce the notion of a packet transaction: a sequential packet-processing code block that is atomic and isolated from other such code blocks. We have developed this idea in Domino, a C-like imperative language to express data-plane algorithms. We show with many examples that Domino provides a convenient way to express sophisticated data-plane algorithms, and show that these algorithms can be run at line rate with modest estimated chip-area overhead.

@inproceedings{sivaraman2016packet, title={Packet Transactions: High-Level Programming for Line-Rate Switches}, author={Sivaraman, Anirudh and Cheung, Alvin and Budiu, Mihai and Kim, Changhoon and Alizadeh, Mohammad and Balakrishnan, Hari and Varghese, George and McKeown, Nick and Licking, Steve}, booktitle={ACM SIGCOMM}, year={2016} }

Programmable Packet Scheduling at Line Rate

Anirudh Sivaraman, Suvinay Subramanian, Mohammad Alizadeh, Sharad Chole, Shang-Tse Chuang, Anurag Agrawal, Hari Balakrishnan, Tom Edsall, Sachin Katti, Nick McKeown

ACM SIGCOMM, Florianopolis, Brazil, 2016

PDF

Switches today provide a small menu of scheduling algorithms. While we can tweak scheduling parameters, we cannot modify algorithmic logic, or add a completely new algorithm, after the switch has been designed. This paper presents a design for a {\em programmable} packet scheduler, which allows scheduling algorithms---potentially algorithms that are unknown today---to be programmed into a switch without requiring hardware redesign. Our design uses the property that scheduling algorithms make two decisions: in what order to schedule packets and when to schedule them. Further, we observe that in many scheduling algorithms, definitive decisions on these two questions can be made when packets are enqueued. We use these observations to build a programmable scheduler using a single abstraction: the push-in first-out queue (PIFO), a priority queue that maintains the scheduling order or time. We show that a PIFO-based scheduler lets us program a wide variety of scheduling algorithms. We present a hardware design for this scheduler for a 64-port 10 Gbit/s shared-memory (output-queued) switch. Our design costs an additional 4% in chip area. In return, it lets us program many sophisticated algorithms, such as a 5-level hierarchical scheduler with programmable decisions at each level.

@inproceedings{sivaraman2016programmable, title={Programmable Packet Scheduling at Line Rate}, author={Sivaraman, Anirudh and Subramanian, Suvinay and Alizadeh, Mohammad and Chole, Sharad and Chuang, Shang-Tse and Agrawal, Anurag and Balakrishnan, Hari and Edsall, Tom and Katti, Sachin and McKeown, Nick}, booktitle={ACM SIGCOMM}, year={2016} }

PISCES: A Programmable, Protocol-Independent Software Switch

Muhammad Shahbaz, Sean Choi, Ben Pfaff, Changhoon Kim, Nick Feamster, Nick McKeown, Jennifer Rexford

ACM SIGCOMM, Florianopolis, Brazil, 2016

PDF

We present PISCES, a protocol-independent software switch derived from Open vSwitch. Like a hardware P4 switch, PISCES can be configured to parse and process any set of protocols using a P4 program, but it runs entirely in software. By automatically generating the switch's parser and match-action pipeline from a P4 specification, PISCES eliminates the need to manually modify switch source code when adding new protocols. We show that PISCES achieves competitive performance with Open vSwitch while offering significantly greater flexibility.

@inproceedings{shahbaz2016pisces, title={PISCES: A Programmable, Protocol-Independent Software Switch}, author={Shahbaz, Muhammad and Choi, Sean and Pfaff, Ben and Kim, Changhoon and Feamster, Nick and McKeown, Nick and Rexford, Jennifer}, booktitle={ACM SIGCOMM}, year={2016} }

Towards Programmable Packet Scheduling

Anirudh Sivaraman, Suvinay Subramanian, Anurag Agrawal, Sharad Chole, Shang-Tse Chuang, Tom Edsall, Mohammad Alizadeh, Sachin Katti, Nick McKeown, Hari Balakrishnan

ACM HotNets, Philadelphia, PA, 2015

PDF

We present a compiler that translates programs written in a high-level packet-processing language to run on reconfigurable switch hardware. The compiler maps packet-processing pipelines onto the available hardware resources of reconfigurable match-action table architectures, handling the constraints of limited table memory, match types, and pipeline depth. Our compiler enables network operators to rapidly deploy new protocols and features on existing hardware, without waiting for vendors to add support in firmware.

@inproceedings{sivaraman2015towards, title={Towards Programmable Packet Scheduling}, author={Sivaraman, Anirudh and Subramanian, Suvinay and Agrawal, Anurag and Chole, Sharad and Chuang, Shang-Tse and Edsall, Tom and Alizadeh, Mohammad and Katti, Sachin and McKeown, Nick and Balakrishnan, Hari}, booktitle={ACM HotNets}, year={2015} }

Compiling Packet Programs to Reconfigurable Switches

Lavanya Jose, Lisa Yan, George Varghese, Nick McKeown

USENIX NSDI, Oakland, CA, 2015

PDF

We present a compiler that maps packet-processing programs to the pipeline architecture of reconfigurable match-action switches. These switches have a fixed number of match-action stages with limited resources, and the compiler must map a logical program to these physical constraints. We study the interplay between the expressiveness of the programming model and the efficiency of the resulting pipeline, and show that our compiler can successfully map a variety of packet-processing programs to existing hardware targets.

@inproceedings{jose2015compiling, title={Compiling Packet Programs to Reconfigurable Switches}, author={Jose, Lavanya and Yan, Lisa and Varghese, George and McKeown, Nick}, booktitle={USENIX NSDI}, year={2015} }

P4: Programming Protocol-Independent Packet ProcessorsSIGCOMM Test of Time 2024CCR Best Paper 2015SIGCOMM Network Systems Award 2025

Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, David Walker

ACM SIGCOMM Computer Communication Review, Vol. 44, Issue 3, July 2014

PDF

P4 is a high-level language for programming protocol-independent packet processors. P4 works in conjunction with SDN control protocols like OpenFlow. In its current form, OpenFlow explicitly specifies protocol headers on which it operates. This set has grown from 12 to 41 fields in a few years, increasing the complexity of the specification while still not providing the flexibility to add new headers. In this paper we propose P4 as a strawman proposal for how OpenFlow should evolve in the future. We have three goals: (1) Reconfigurability in the field: Programmers should be able to change the way switches process packets once they are deployed. (2) Protocol independence: Switches should not be tied to any specific network protocols. (3) Target independence: Programmers should be able to describe packet-processing functionality independently of the specifics of the underlying hardware. As an example, we describe how to use P4 to configure a switch to add a new hierarchical label.

@article{bosshart2014p4, title={P4: Programming Protocol-Independent Packet Processors}, author={Bosshart, Pat and Daly, Dan and Gibb, Glen and Izzard, Martin and McKeown, Nick and Rexford, Jennifer and Schlesinger, Cole and Talayco, Dan and Vahdat, Amin and Varghese, George and Walker, David}, journal={ACM SIGCOMM Computer Communication Review}, volume={44}, number={3}, year={2014} }

Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, Mark Horowitz

ACM SIGCOMM 2013

PDF

In Software Defined Networking (SDN) the control plane is physically separate from the forwarding plane. Control software programs the forwarding plane (e.g., switches and routers) using an open interface, such as OpenFlow. This paper aims to overcome two limitations in current switching chips and the OpenFlow protocol: (i) current hardware switches are quite rigid, allowing "Match-Action" processing on only a fixed set of fields, and (ii) the OpenFlow specification only defines a limited repertoire of packet processing actions. We propose the RMT (Reconfigurable Match Tables) model, a new RISC-inspired pipelined architecture for switching chips, and we identify the essential minimal set of action primitives to specify how headers are processed in hardware. RMT allows the forwarding plane to be changed in the field without modifying hardware. As in OpenFlow, the programmer can specify multiple match tables of arbitrary width and depth, subject only to an overall resource limit, with each table configurable for matching on arbitrary fields. However, RMT allows the programmer to modify all header fields much more comprehensively than in OpenFlow. Our paper describes the design of a 64 port by 10 Gb/s switch chip implementing the RMT model.

@inproceedings{bosshart2013forwarding, title={Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN}, author={Bosshart, Pat and Gibb, Glen and Kim, Hun-Seok and Varghese, George and McKeown, Nick and Izzard, Martin and Mujica, Fernando and Horowitz, Mark}, booktitle={ACM SIGCOMM}, year={2013} }

Programming the Network Dataplane in P4 (in Chinese)

Changhoon Kim, Ron Kao, Nick McKeown

Communications of the CCF (China Computer Federation)

PDF

This article provides a Chinese-language introduction to the P4 programming language for programming network data planes. P4 enables network engineers to define how packets are processed by programmable switches, routers, and NICs. The article covers key P4 concepts including header definitions, parser specifications, match-action tables, and control flow.

@article{kim2015programming, title={Programming the Network Dataplane in P4}, author={Kim, Changhoon and Kao, Ron and McKeown, Nick}, journal={Communications of the CCF}, year={2015} }

Plug-n-Serve: Load-Balancing Web Traffic using OpenFlow

Nikhil Handigol, Srinivasan Seetharaman, Nick McKeown, Ramesh Johari

Demo, Proceedings of ACM SIGCOMM, Barcelona, Spain, August 2009. Also in ACM SIGCOMM Computer Communication Review, Vol. 40, No. 1, January 2010

PDF

@inproceedings{handigol2009plugnserve, title={Plug-n-Serve: Load-Balancing Web Traffic using OpenFlow}, author={Handigol, Nikhil and Seetharaman, Srinivasan and McKeown, Nick and Johari, Ramesh}, booktitle={Demo, Proceedings of ACM SIGCOMM}, address={Barcelona, Spain}, year={2009} }

How the U.S. National Science Foundation Enabled Software-Defined Networking

Nick McKeown, Jennifer Rexford

Communications of the ACM, Vol. 68(12), 2025

CACM

It became clear in the early 2000s that the Internet faced challenges with ossification, slow innovation, and complex network management due to vendor-controlled hardware and software. Software-defined networking (SDN) emerged as a transformative solution, introducing an open interface for packet forwarding and logically centralized control. Its success stemmed from a virtuous cycle between early U.S. National Science Foundation (NSF)-funded academic research (for example, 100x100, GENI, FIND) and the pressing needs of cloud hyperscalers for flexible, scalable networks. SDN revolutionized network design and operation across public and private sectors, driving significant commercial adoption, fostering a broad research community, and enabling new capabilities like multi-tenant virtualization and efficient wide-area traffic engineering. The investments NSF made in SDN over the past two decades helped SDN revolutionize network design and operation across public and private sectors.

@article{mckeown2025nsf, title={How the U.S. National Science Foundation Enabled Software-Defined Networking}, author={McKeown, Nick and Rexford, Jennifer}, journal={Communications of the ACM}, volume={68}, number={12}, year={2025} }

Revitalizing the Public Internet by Making It Extensible

Hari Balakrishnan, Sujata Banerjee, Israel Cidon, David Culler, Deborah Estrin, Ethan Katz-Bassett, Arvind Krishnamurthy, James McCauley, Nick McKeown, Aurojit Panda, Sylvia Ratnasamy, Jennifer Rexford, Michael Schapira, Scott Shenker, Ion Stoica, David Tennenhouse, Amin Vahdat, Ellen Zegura

ACM SIGCOMM Computer Communication Review, Vol. 51, No. 2, 2021

PDF

There is now a significant and growing functional gap between the public Internet, whose basic architecture has remained unchanged for several decades, and a new generation of more sophisticated private networks. To address this increasing divergence of functionality and overcome the Internet's architectural stagnation, we argue for the creation of an Extensible Internet (EI) that supports in-network services that go beyond best-effort packet delivery. To gain experience with this approach, we hope to soon deploy both an experimental version (for researchers) and a prototype version (for early adopters) of EI. In the longer term, making the Internet extensible will require a community to initiate and oversee the effort; this paper is the first step in creating such a community.

@article{balakrishnan2021revitalizing, title={Revitalizing the Public Internet by Making It Extensible}, author={Balakrishnan, Hari and Banerjee, Sujata and Cidon, Israel and Culler, David and Estrin, Deborah and Katz-Bassett, Ethan and Krishnamurthy, Arvind and McCauley, James and McKeown, Nick and Panda, Aurojit and Ratnasamy, Sylvia and Rexford, Jennifer and Schapira, Michael and Shenker, Scott and Stoica, Ion and Tennenhouse, David and Vahdat, Amin and Zegura, Ellen}, journal={ACM SIGCOMM Computer Communication Review}, volume={51}, number={2}, year={2021} }

Using Deep Programmability to Put Network Owners in Control

Nate Foster, Nick McKeown, Jennifer Rexford, Guru Parulkar, Larry Peterson, Oguz Sunay

ACM SIGCOMM Computer Communication Review, Vol. 50, No. 4, 2020

PDF

Controlling an opaque system by reading some "dials" and setting some "knobs," without really knowing what they do, is a hazardous and fruitless endeavor, particularly at scale. What we need are transparent networks, that start at the top with a high-level intent and map all the way down, through the control plane to the data plane. If we can specify the behavior we want in software, then we can check that the system behaves as we expect. This is impossible if the implementation is opaque. We therefore need to use open-source software or write it ourselves (or both), and have mechanisms for checking actual behavior against the specified intent. With fine-grain checking (e.g., every packet, every state variable), we can build networks that are more reliable, secure, and performant. In the limit, we can build networks that run autonomously under verifiable, closed-loop control. We believe this vision, while ambitious, is finally within our reach, due to deep programmability across the stack, both vertically (control and data plane) and horizontally (end to end). It will emerge naturally in some networks, as network owners take control of their software and engage in open-source efforts; whereas in enterprise networks it may take longer. In 5G access networks, there is a pressing need for our community to engage, so these networks, too, can operate autonomously under verifiable, closed-loop control.

@article{foster2020deep, title={Using Deep Programmability to Put Network Owners in Control}, author={Foster, Nate and McKeown, Nick and Rexford, Jennifer and Parulkar, Guru and Peterson, Larry and Sunay, Oguz}, journal={ACM SIGCOMM Computer Communication Review}, volume={50}, number={4}, year={2020} }

Democratizing the Network Edge

Larry Peterson, Tom Anderson, Sachin Katti, Nick McKeown, Guru Parulkar, Jennifer Rexford, Mahadev Satyanarayanan, Oguz Sunay, Amin Vahdat

ACM SIGCOMM Computer Communication Review, Vol. 49, No. 2, 2019

PDF

The cloud and telecommunications industry is in the midst of a transition towards the edge. There is a tremendous opportunity for the research community to influence this transformation, but doing so requires understanding industry momentum, and making a concerted effort to align with that momentum. We believe there are three keys to doing this: (1) focus on the intersection of the cloud and access networks, (2) contribute to the relevant open source projects, and (3) address the challenge of operationalizing the results. The paper puts forward a concrete proposal for all three, and discusses the opportunity to influence how the Internet evolves at the edge and enable innovative edge applications.

@article{peterson2019democratizing, title={Democratizing the Network Edge}, author={Peterson, Larry and Anderson, Tom and Katti, Sachin and McKeown, Nick and Parulkar, Guru and Rexford, Jennifer and Satyanarayanan, Mahadev and Sunay, Oguz and Vahdat, Amin}, journal={ACM SIGCOMM Computer Communication Review}, volume={49}, number={2}, year={2019} }

From Ethane to SDN and Beyond

Martin Casado, Nick McKeown, Scott Shenker

ACM SIGCOMM Computer Communication Review, Vol. 49, No. 5, 2019

PDF

We briefly describe the history behind the Ethane paper and its ultimate evolution into SDN and beyond.

@article{casado2019ethane, title={From Ethane to SDN and Beyond}, author={Casado, Martin and McKeown, Nick and Shenker, Scott}, journal={ACM SIGCOMM Computer Communication Review}, volume={49}, number={5}, year={2019} }

Event-Driven Packet Processing

Stephen Ibanez, Gianni Antichi, Gordon Brebner, Nick McKeown

ACM HotNets, Princeton, NJ, 2019

PDF

The rise of programmable network devices and the P4 programming language has sparked an interest in developing new applications for packet processing data planes. Current data-plane programming models allow developers to express packet processing on a synchronous packet-by-packet basis, motivated by the goal of line rate processing in feed-forward pipelines. But some important data-plane operations do not naturally fit into this programming model. Sometimes we want to perform periodic tasks, or update the same state variables multiple times, or base a decision on state sitting at a different pipeline stage. While a P4-programmable device might contain special features to handle these tasks, such as packet generators and recirculation paths, there is currently no clean and consistent way to expose them to P4 programmers. We therefore propose a common, general way to express event processing using the P4 language, beyond just processing packet arrival and departure events. We believe that this more general notion of event processing can be supported without sacrificing line rate packet processing and we have developed a prototype event-driven architecture on the NetFPGA SUME platform to serve as an initial proof of concept.

@inproceedings{ibanez2019event, title={Event-Driven Packet Processing}, author={Ibanez, Stephen and Antichi, Gianni and Brebner, Gordon and McKeown, Nick}, booktitle={ACM HotNets}, year={2019} }

The Case for a Network Fast Path to the CPU

Stephen Ibanez, Muhammad Shahbaz, Nick McKeown

ACM HotNets, Princeton, NJ, 2019

PDF

For the past two decades, the communication channel between the NIC and CPU has largely remained the same---issuing memory requests across a slow PCIe peripheral interconnect. Today, with application service times and network fabric delays measuring hundreds of nanoseconds, the NIC--CPU interface can account for most of the overhead when programming modern warehouse-scale computers. In this paper, we tackle this issue head-on by proposing a design for a fast path between the NIC and CPU, called Lightning NIC (L-NIC), which deviates from the established norms of offloading computation onto the NIC (inflating latency), or using centralized dispatcher cores for packet scheduling (limiting throughput). L-NIC adds support for a fast path from the network to the core of the CPU by writing and reading packets directly to/from the CPU register file. This approach minimizes network IO latency, providing significant performance improvements over traditional NIC--CPU interfaces.

@inproceedings{ibanez2019case, title={The Case for a Network Fast Path to the CPU}, author={Ibanez, Stephen and Shahbaz, Muhammad and McKeown, Nick}, booktitle={ACM HotNets}, year={2019} }

BeHop: A Testbed for Dense WiFi Networks

Yiannis Yiakoumis, Manu Bansal, Adam Covington, Johan van Reijendam, Sachin Katti, Nick McKeown

WiNTECH, Maui, 2014

PDF

Dense WiFi deployments, where many access points operate in close proximity, suffer from poor channel selection, coverage gaps, and unsatisfying user experience. BeHop is a testbed that enables researchers to experiment with new approaches to managing dense WiFi networks. Built on commodity hardware, BeHop provides a programmable platform for experimenting with channel assignment, power control, and association policies in a realistic multi-AP environment.

@inproceedings{yiakoumis2014behop, title={BeHop: A Testbed for Dense WiFi Networks}, author={Yiakoumis, Yiannis and Bansal, Manu and Covington, Adam and van Reijendam, Johan and Katti, Sachin and McKeown, Nick}, booktitle={WiNTECH}, year={2014} }

OSNT: Open Source Network Tester

Gianni Antichi, Muhammad Shahbaz, Yilong Geng, Noa Zilberman, Adam Covington, Marc Bruyere, Nick McKeown, Nick Feamster

IEEE Network, September 2014

PDF

Despite network monitoring and testing being critical for computer networks, current solutions are both extremely expensive and inflexible. Into this lacuna we launch the Open Source Network Tester, a fully open source traffic generator and capture system. Our prototype implementation on the NetFPGA-10G supports 4 ×

@article{antichi2014osnt, title={OSNT: Open Source Network Tester}, author={Antichi, Gianni and Shahbaz, Muhammad and Geng, Yilong and Zilberman, Noa and Covington, Adam and Bruyere, Marc and McKeown, Nick and Feamster, Nick}, journal={IEEE Network}, year={2014} }

Flow Caching For High Entropy Packet Fields

Nick Shelly, Ethan J. Jackson, Teemu Koponen, Nick McKeown, Jarno Rajahalme

HotSDN, Chicago, IL, August 2014

PDF

Packet classification on general purpose CPUs remains expensive regardless of advances in classification algorithms. Unless the packet forwarding pipeline is both simple and static in function, fine-tuning the system for optimal forwarding is a time-consuming and brittle process. Network virtualization and network function virtualization value general purpose CPUs exactly for their flexibility: in such systems, a single x86 forwarding element does not implement a single, static classification step but a sequence of dynamically reconfigurable and potentially complex forwarding operations. This leaves a software developer looking for maximal packet forwarding throughput with few options besides flow caching. In this paper, we consider the problem of flow caching and more specifically the challenge of flow caching for packet header fields with high entropy.

@inproceedings{shelly2014flow, title={Flow Caching for High Entropy Packet Fields}, author={Shelly, Nick and Jackson, Ethan J. and Koponen, Teemu and McKeown, Nick and Rajahalme, Jarno}, booktitle={HotSDN}, year={2014} }

Maturing of OpenFlow and Software-Defined Networking through Deployments

Masayoshi Kobayashi, Srini Seetharaman, Guru Parulkar, Guido Appenzeller, Joseph Little, Johan van Reijendam, Paul Weissmann, Nick McKeown

Computer Networks, Vol. 61, March 2014

PDF

Since the publication of the original OpenFlow paper in 2008, software-defined networking (SDN) and OpenFlow have evolved from a research experiment into a widely-deployed production technology. We describe the key technical and operational lessons learned through deployment, including the evolution of the OpenFlow specification, the development of production SDN controllers, and the challenges of migrating from research prototypes to production systems.

@article{kobayashi2014maturing, title={Maturing of OpenFlow and Software-Defined Networking through Deployments}, author={Kobayashi, Masayoshi and Seetharaman, Srini and Parulkar, Guru and Appenzeller, Guido and Little, Joseph and van Reijendam, Johan and Weissmann, Paul and McKeown, Nick}, journal={Computer Networks}, volume={61}, year={2014} }

Rethinking IP Core Networks

Saurav Das, Guru Parulkar, Nick McKeown

Journal of Optical Communications and Networking, December 2013

PDF

The Internet core today is completely based on IP routers. Circuits are only used to provide static point-to-point optical links between routers. We argue that circuits should play a bigger role in the core, because they are simpler, use less power, have lower latency, and are inherently more reliable than routers. In this paper we describe how a combination of circuit and packet switching could lead to a simpler, lower cost, and more reliable Internet backbone.

@article{das2013rethinking, title={Rethinking IP Core Networks}, author={Das, Saurav and Parulkar, Guru and McKeown, Nick}, journal={Journal of Optical Communications and Networking}, year={2013} }

Design Principles for Packet Parsers

Glen Gibb, George Varghese, Mark Horowitz, Nick McKeown

ACM/IEEE ANCS 2013

PDF

All network devices must parse packet headers to decide howpacketsshouldbeprocessed. A64×10Gb/sEthernet switch must parse one billion packets per second to extract fields used in forwarding decisions. Although a necessary part of all switch hardware, very little has been written on parserdesignandthetrade-offsbetweendifferentdesigns. Is it better to design one fast parser, or several slow parsers? What is the cost of making the parser reconfigurable in the field? What design decisions most impact power and area? Inthispaper,wedescribetrade-offsinparserdesign,identify design principles for switch and router designers, and describe a parser generator that outputs synthesizable Verilog that is available for download. We show that i) packet parsers today occupy about 1-2% of the chip, and ii) while future packet parsers will need to be programmable, this only doubles the (already small) area needed.

@inproceedings{gibb2013design, title={Design Principles for Packet Parsers}, author={Gibb, Glen and Varghese, George and Horowitz, Mark and McKeown, Nick}, booktitle={ACM/IEEE ANCS}, year={2013} }

The Controller Placement Problem

Brandon Heller, Rob Sherwood, Nick McKeown

HotSDN, 2012

PDF

Network architectures such as Software-Defined Networks (SDNs) move the control logic off packet processing devices and onto external controllers. These network architectures with decoupled control planes open many unanswered questions regarding reliability, scalability, and performance when compared to more traditional purely distributed systems. This paper opens the investigation by focusing on two specific questions: given a topology, how many controllers are needed, and where should they go? To answer these questions, we examine fundamental limits to control plane propagation latency on an upcoming Internet2 production deployment, then expand our scope to over 100 publicly available WAN topologies. As expected, the answers depend on the topology. More surprisingly, one controller location is often sufficient to meet existing reaction-time requirements (though certainly not fault tolerance requirements).

@inproceedings{heller2012controller, title={The Controller Placement Problem}, author={Heller, Brandon and Sherwood, Rob and McKeown, Nick}, booktitle={HotSDN}, year={2012} }

Outsourcing Network Functionality

Glen Gibb, Hongyi Zeng, Nick McKeown

HotSDN, 2012

PDF

This paper presents an architecture for adding functionality to networks via outsourcing. In this model, the enterprise network only forwards data; any additional processing is performed by external Feature Providers (FPs). FPs provide and manage features, scaling and moving them in response to customer demand, and providing automated recovery in case of failure. Benefits to the enterprise include reduced cost and management complexity, improved features through FP specialization, and increased choice in services. Central to the model are a policy component and a Feature API (FAPI). Policy is specified with features not locations, enabling features to be located anywhere. FAPI enables communication between enterprise and FP control planes to share policy and configuration.

@inproceedings{gibb2012outsourcing, title={Outsourcing Network Functionality}, author={Gibb, Glen and Zeng, Hongyi and McKeown, Nick}, booktitle={HotSDN}, year={2012} }

Why OpenFlow/SDN Can Succeed Where GMPLS Failed

Saurav Das, Guru Parulkar, Nick McKeown

ECOC 2012

PDF

This paper presents an architecture for adding functionality to networks via outsourcing. In this model, the enterprise network only forwards data; any additional processing is performed by external Feature Providers (FPs). FPs provide and manage features, scaling and moving them in response to customer demand, and providing automated recovery in case of failure. Benefits to the enterprise include reduced cost and management complexity, improved features through FP specialization, and increased choice in services. Central to the model are a policy component and a Feature API (FAPI). Policy is specified with features not locations, enabling features to be located anywhere. FAPI enables communication between enterprise and FP control planes to share policy and configuration.

@inproceedings{das2012why, title={Why OpenFlow/SDN Can Succeed Where GMPLS Failed}, author={Das, Saurav and Parulkar, Guru and McKeown, Nick}, booktitle={ECOC}, year={2012} }

MPLS-TE and MPLS VPNS with OpenFlow

Ali Reza Sharafat, Saurav Das, Guru Parulkar, Nick McKeown

ACM SIGCOMM, Toronto, August 2011

PDF

We demonstrate MPLS Traffic Engineering (MPLS-TE) and MPLS-based Virtual Private Networks (MPLS VPNs) using OpenFlow and NOX. The demonstration is the outcome of an engineering experiment to answer the following questions: How hard is it to implement a complex control plane on top of a network controller such as NOX? Does the global vantage point in NOX make the implementation easier than the traditional method of implementing it on every switch, embedded in the data plane? We implemented every major feature of MPLS-TE and MPLS-VPN in just 2,000 lines of code, compared to much larger lines of code in the more traditional approach, such as Quagga-MPLS. Because NOX maintains a consistent, up-to-date topology map, the MPLS control plane features are no longer tied to multiple protocols that would normally have to be changed.

@inproceedings{sharafat2011mpls, title={MPLS-TE and MPLS VPNS with OpenFlow}, author={Sharafat, Ali Reza and Das, Saurav and Parulkar, Guru and McKeown, Nick}, booktitle={ACM SIGCOMM}, year={2011} }

Slicing Home Networks

Yiannis Yiakoumis, Kok-Kiong Yap, Sachin Katti, Guru Parulkar, Nick McKeown

SIGCOMM Workshop on Home Networks, August 2011

PDF

Despite the popularity of home networks, they face a number of systemic problems: (i) Broadband networks are expensive to deploy, and it is not clear how the cost can be shared by several service providers; (ii) Home networks are getting harder to manage as we connect more devices, use new applications, and rely on them for entertainment, communication and work—it is common for home networks to be poorly managed, insecure or just plain broken; and (iii) It is not clear how home networks will steadily improve, after they have been deployed, to provide steadily better service to home users. In this paper we propose slicing home networks as a way to overcome these problems. As a mechanism, slicing allows multiple service providers to share a common infrastructure and supports the independent evolution of services.

@inproceedings{yiakoumis2011slicing, title={Slicing Home Networks}, author={Yiakoumis, Yiannis and Yap, Kok-Kiong and Katti, Sachin and Parulkar, Guru and McKeown, Nick}, booktitle={SIGCOMM Workshop on Home Networks}, year={2011} }

Application-Aware Aggregation and Traffic Engineering in a Converged Packet-Circuit Network

Saurav Das, Yiannis Yiakoumis, Guru M. Parulkar, Preeti Singh, Daniel Getachew, Premal Dinesh Desai, Nick McKeown

Proceedings of OFC/NFOEC'11, Los Angeles, March 2011

PDF

We demonstrate a converged Open Flow enabled packet-circuit network, where circuit flow properties (guarantee d bandwidth, low latency, recovery) provide differential treatment to dynamically aggregated packet flows for voice, video and web traffic. OCIS codes: (060.4253) Networks, circuit-switched; (060.4259) Networks, packet-switched

@inproceedings{das2011applicationaware, title = {Application-Aware Aggregation and Traffic Engineering in a Converged Packet-Circuit Network}, author = {Saurav Das, Yiannis Yiakoumis, Guru M. Parulkar, Preeti Singh, Daniel Getachew, Premal Dinesh Desai, Nick McKeown}, booktitle = {Proceedings of OFC/NFOEC'11, Los Angeles, March 2011}, year = {2011} }

MPLS with a Simple OPEN Control Plane

Saurav Das, Ali Reza Sharafat, Guru M. Parulkar, Nick McKeown

Proceedings of OFC/NFOEC'11, Los Angeles, March 2011

PDF

We propose a new approach to MPLS that uses the standard MPLS data plane and an Open Flow based simpler and extensible control plane. We demonstrate this approach using a prototype system for MPLS Traffic Engineering. OCIS codes: (060.4250) Networks; (060.4259) Networks, packet-switched

@inproceedings{das2011mpls, title = {MPLS with a Simple OPEN Control Plane}, author = {Saurav Das, Ali Reza Sharafat, Guru M. Parulkar, Nick McKeown}, booktitle = {Proceedings of OFC/NFOEC'11, Los Angeles, March 2011}, year = {2011} }

Initial thoughts on custom network processing via waypoint services

Glen Gibb, Hongyi Zeng, Nick McKeown

WISH - 3rd Workshop on Infrastructures for Software/Hardware co-design, CGO 2011, April 2011, Chamonix, France

PDF

Network operators want additional functionality from the networks they manage. The current approach to add functionality is to deploy middleboxes. Unfortunately middleboxes raise concerns regarding robustness, correctness, and efficiency due to their need to be deployed at chokepoints. This paper provides some initial thoughts for solving the middleboxprobleminanarchitecturalway. Webelievethat waypoint services are the correct way to add functionality toanetwork. Networkprocessingcanbemodeledasclassification followed by action. Additional functionality should beaddedtothenetworkthroughaservicemodelexposedas new actions. Services would be implemented at waypoints which reside off the normal packet path; routers can send traffic to those services for additional processing. The waypoint service model allows services to be hosted anywherewithinthenetwork,allowsservicestobesharedby multiple routers, and is accessible via a simple action API. Abstracting custom packet processing as waypoint services providesasystematicwaytobringnewfunctionalitytothe network.

@inproceedings{gibb2011initial, title = {Initial thoughts on custom network processing via waypoint services}, author = {Glen Gibb, Hongyi Zeng, Nick McKeown}, booktitle = {WISH - 3rd Workshop on Infrastructures for Software/Hardware co-design, CGO 2011, April 2011, Chamonix, France}, year = {2011} }

Architecting for Innovation

Ali Ghodsi, P. Brighten Godfrey, Nick McKeown, Guru Parulkar, Barath Raghavan

ACM Computer Communication Review, 2011

PDF

Despite the popularity of home networks, they face a number of systemic problems: (i) Broadband networks are expensive to deploy, and it is not clear how the cost can be shared by several service providers; (ii) Home networks are getting harder to manage as we connect more devices, use new applications, and rely on them for entertainment, communication and work—it is common for home networks to be poorly managed, insecure or just plain broken; and (iii) It is not clear how home networks will steadily improve, after they have been deployed, to provide steadily better service to home users. In this paper we propose slicing home networks as a way to overcome these problems. As a mechanism, slicing allows multiple service providers to share a common infrastructure and supports the independent evolution of services.

@article{ghodsi2011architecting, title={Architecting for Innovation}, author={Ghodsi, Ali and Godfrey, P. Brighten and McKeown, Nick and Parulkar, Guru and Raghavan, Barath}, journal={ACM Computer Communication Review}, year={2011} }

OpenPipes: Making Distributed Hardware Systems Easier

Glen Gibb, Nick McKeown

Proceedings of the International Conference on Field-Programmable Technology, FPT 2010, December 2010, Tsinghua University, Beijing, China 2010. FPT 2010:381-384

PDF

Distributing a hardware design across multiple physical devices is difficult—splitting a design across two chips requires considerable effort to partition the design and to build the communication mechanism between the chips. Designers and researchers would benefit enormously if this were easier as it would, for example, allow multiple FPGAS to be used when building prototypes. To this end we propose Open Pipes, a platform to allow hardware designs to be distributed across physicalresources.Open Pipesfollowsthemodelofmanysystembuilding platforms: systems are built by composing modules together. What makes it unique is that it uses an Open Flow network as the interconnect between modules, providing Open Pipes with complete control over all traffic flows within the interconnect. Any device that can attach to the network can host modules, allowing software modules to be used alongside hardware modules. The control provided by Open Flow allows running systems to be modified dynamically, and as we show in the paper, Open Pipes provides a mechanism for migrating from software to hardware modules that simplifies testing.

@inproceedings{gibb2010openpipes, title = {OpenPipes: Making Distributed Hardware Systems Easier}, author = {Glen Gibb, Nick McKeown}, booktitle = {Proceedings of the International Conference on Field-Programmable Technology, FPT 2010, December 2010, Tsinghua University, Beijing, China 2010. FPT 2010:381-384}, year = {2010} }

Can the Production Network Be the Testbed?

Rob Sherwood, Glen Gibb, Kok-Kiong Yap, Guido Appenzeller, Martin Casado, Nick McKeown, Guru Parulkar

USENIX OSDI, Vancouver, BC, October 2010

PDF

A persistent problem in computer network research is validation. When deciding how to evaluate a new feature or bug fix, researchers face an uncomfortable trade-off between the realism of the test environment and the effort required to deploy it. We argue that the production network itself should serve as the testbed. We describe FlowVisor, a special purpose OpenFlow controller that creates rich slices of network resources and delegates control of each slice to a different controller. This allows multiple isolated experimental networks to run simultaneously on the same production hardware, enabling realistic testing without disrupting production traffic.

@inproceedings{sherwood2010can, title={Can the Production Network Be the Testbed?}, author={Sherwood, Rob and Gibb, Glen and Yap, Kok-Kiong and Appenzeller, Guido and Casado, Martin and McKeown, Nick and Parulkar, Guru}, booktitle={USENIX OSDI}, year={2010} }

Packet and Circuit Network Convergence with OpenFlow

Saurav Das, Guru Parulkar, Preeti Singh, Daniel Getachew, Lyndon Ong and Nick McKeown

Optical Fiber Communication Conference (OFC'10), San Diego, March, 2010

PDF

IP and Transport networks are controlled and operated independently today, leading to significant Capex and Opex inefficiencies for the providers. We discuss a unified approach with Open Flow, and present a recent demonstration of a unified control plane for Open Flow enabled IP/Ethernet and TDM switched networks.

@inproceedings{das2010packet, title = {Packet and Circuit Network Convergence with OpenFlow}, author = {Saurav Das, Guru Parulkar, Preeti Singh, Daniel Getachew, Lyndon Ong and Nick McKeown}, booktitle = {Optical Fiber Communication Conference (OFC'10), San Diego, March, 2010}, year = {2010} }

OpenRoads: Empowering Research in Mobile Networks

Kok-Kiong Yap, Masayoshi Kobayashi, Rob Sherwood, Nikhil Handigol, Te-Yuan Huang, Michael Chan, Nick McKeown

ACM SIGCOMM, Barcelona, Spain, August 2009

PDF

OpenRoads is a mobile wireless testbed built on the OpenFlow platform that empowers researchers to deploy and evaluate new protocols and services in a realistic environment. OpenRoads provides a programmatic interface for managing wireless networks, including WiMAX and WiFi, allowing researchers to control network behavior through software. We describe the architecture and deployment of OpenRoads on the Stanford campus, and demonstrate how it enables practical research in mobile networking.

@inproceedings{yap2009openroads, title={OpenRoads: Empowering Research in Mobile Networks}, author={Yap, Kok-Kiong and Kobayashi, Masayoshi and Sherwood, Rob and Handigol, Nikhil and Huang, Te-Yuan and Chan, Michael and McKeown, Nick}, booktitle={ACM SIGCOMM}, year={2009} }

Delegating Network Security Through More Information

Jad Naous, Ryan Stutsman, David Mazieres, Nick McKeown, and Nickolai Zeldovich

In Proceedings of the Workshop on Research on Enterprise Networking, August, 2009

PDF

Current approaches to network security try to prevent attackers from gaining access. We propose a complementary approach: delegate the decision of whether to trust a communication to the end-host by providing it with information about the provenance of traffic. We describe a system that allows end-hosts to learn more about where their traffic is coming from, enabling them to make informed decisions about which network communications to accept.

@inproceedings{naous2009delegating, title = {Delegating Network Security Through More Information}, author = {Jad Naous, Ryan Stutsman, David Mazieres, Nick McKeown, and Nickolai Zeldovich}, booktitle = {In Proceedings of the Workshop on Research on Enterprise Networking, August, 2009}, year = {2009} }

Demo: Lossless Handover with n-casting between WiFi-WiMAX on OpenRoads

Kok-Kiong Yap, Te-Yuan Huang, Masayoshi Kobayashi, Michael Chan, Rob Sherwood, Guru Parulkar, and Nick McKeown

Mobicom, Beijing, China, September 2009

PDF

We demonstrate a lossless handover mechanism between WiFi and WiMAX networks on the OpenRoads testbed. Using OpenFlow-based n-casting, the system sends duplicate packets over multiple wireless networks simultaneously, enabling seamless transitions between access technologies without packet loss during handoff.

@inproceedings{yap2009demo, title = {Demo: Lossless Handover with n-casting between WiFi-WiMAX on OpenRoads}, author = {Kok-Kiong Yap, Te-Yuan Huang, Masayoshi Kobayashi, Michael Chan, Rob Sherwood, Guru Parulkar, and Nick McKeown}, booktitle = {Mobicom, Beijing, China, September 2009}, year = {2009} }

The Stanford OpenRoads Deployment

Kok-Kiong Yap, Masayoshi Kobayashi, David Underhill, Srinivasan Seetharaman, Peyman Kazemian, and Nick McKeown

WiNTECH, Mobicom, Beijing, China, September 2009

PDF

We have built and deployed OpenRoads, a testbed that allows multiple network experiments to be conducted concurrently in a production network. For example, multiple routing protocols, mobility managers and network access controllers can run simultaneously in the same network. In this paper, we describe and discuss our deployment of the testbed at Stanford University. We focus on the challenges we faced deploying in a production network, and the tools we built to overcome these challenges. Our goal is to gain enough experience for other groups to deploy OpenRoads in their campus network.

@inproceedings{yap2009stanford, title = {The Stanford OpenRoads Deployment}, author = {Kok-Kiong Yap, Masayoshi Kobayashi, David Underhill, Srinivasan Seetharaman, Peyman Kazemian, and Nick McKeown}, booktitle = {WiNTECH, Mobicom, Beijing, China, September 2009}, year = {2009} }

Unifying Packet and Circuit Switched Networks

Saurav Das, Guru Parulkar and Nick McKeown

In Proceedings of the Workshop on Below IP Networking, held in conjunction with Globecom09, Hawaii, November, 2009

PDF

There have been many attempts to unify the control and management of circuit and packet switched networks, but none have taken hold. In this paper we propose a simple way to unify both types of network using Open Flow. The basic idea is that a simple flow abstraction fits well with both types of network, provides a common paradigm for control, and makes it easy to insert new functionality into the network. Open Flow provides a common API to the underlying hardware, and allows all of the routing, control and management to be defined in software outside the datapath.

@inproceedings{das2009unifying, title = {Unifying Packet and Circuit Switched Networks}, author = {Saurav Das, Guru Parulkar and Nick McKeown}, booktitle = {In Proceedings of the Workshop on Below IP Networking, held in conjunction with Globecom09, Hawaii, November, 2009}, year = {2009} }

Towards Software-Friendly Networks

Kok-Kiong Yap, Te-Yuan Huang, Ben Dodson, Monica S. Lam, Nick McKeown

APSys 2010, New Delhi, India

PDF

There has usually been a clean separation between networks and the applications that use them. Applications send packets over a simple socket API; the network delivers them. However, there are many occasions when applications can benefit from more direct interaction with the network: to observe more of the current network state and to obtain more control over the network behavior. This paper explores some of the potential benefits of closer interaction between applications and the network. Exploiting the emergence of so-called “software-defined networks” (SDN) built above network-wide control planes, we explore how to build a more “software-friendly network”. We present results from a preliminary exploration that aims to provide network services to applications via an explicit communication channel.

@inproceedings{yap2010towards, title = {Towards Software-Friendly Networks}, author = {Kok-Kiong Yap, Te-Yuan Huang, Ben Dodson, Monica S. Lam, Nick McKeown}, booktitle = {APSys 2010, New Delhi, India}, year = {2010} }

PhoneNet: a Phone-to-Phone Network for Group Communication within an Administrative Domain

Te-Yuan Huang, Kok-Kiong Yap, Ben Dodson, Monica S. Lam, Nick McKeown

MobiHeld 2010, New Delhi, India

PDF

This paper proposes PhoneNet, an application framework to support direct group communication among phones without relay nodes. PhoneNet presents the familiar abstraction of a multi-user chat service to application writers. It performs two main functions: inviting participants to the chat room and routing data between participants directly without going through any intermediaries. Made possible by a generic chat room service embedded in the network itself, all application-specific code in PhoneNet applications runs on the phones themselves. Unlike the conventional server-client model, this design does not require scalable central servers that can handle all simultaneous interactions. As a first step, we have created a prototype of PhoneNet that works within an administrative domain. The multicast functionality among phones is implemented on top of a software-defined network (SDN). We have developed two applications using PhoneNet: teleconferencing and photo-sharing. Our experience suggests that it is easy to develop PhoneNet applications and PhoneNet appears to be effective in reducing network traffic.

@inproceedings{huang2010phonenet, title = {PhoneNet: a Phone-to-Phone Network for Group Communication within an Administrative Domain}, author = {Te-Yuan Huang, Kok-Kiong Yap, Ben Dodson, Monica S. Lam, Nick McKeown}, booktitle = {MobiHeld 2010, New Delhi, India}, year = {2010} }

Experimental Demonstration of OpenFlow Control of Packet and Circuit Switches

Vinesh Gudla, Saurav Das, Anujit Shastri, Guru Parulkar, Nick McKeown and Leonid Kazovsky

OSA / OFC/NFOEC 2010

PDF

OpenFlow is presented as a unified control plane and architecture for packet and circuit switched networks. We demonstrate a simple proof-of-concept testbed, where a bidirectional wavelength circuit is dynamically created to transport a TCP flow.

@inproceedings{gudla2010experimental, title = {Experimental Demonstration of OpenFlow Control of Packet and Circuit Switches}, author = {Vinesh Gudla, Saurav Das, Anujit Shastri, Guru Parulkar, Nick McKeown and Leonid Kazovsky}, booktitle = {OSA / OFC/NFOEC 2010}, year = {2010} }

Demo: Carving Research Slices Out of Your Production Networks with OpenFlow

Rob Sherwood, Michael Chan, Adam Covington, Glen Gibb, Mario Flajslik, Nikhil Handigol, Te-Yuan Huang, Peyman Kazemian, Masayoshi Kobayashi, Jad Naous, Srinivasan Seetharaman, David Underhill, Tatsuya Yabe, Kok-Kiong Yap, Yiannis Yiakoumis, Hongyi Zeng, Guido Appenzeller, Ramesh Johari, Nick McKeown, and Guru Parulkar

In Proceedings of ACM SIGCOMM, Barcelona, Spain, August 2009

PDF

We demonstrate FlowVisor, a special purpose OpenFlow controller that slices the network, allowing multiple researchers to run independent experiments on the same production OpenFlow network simultaneously. FlowVisor provides isolation between slices, ensuring that research experiments cannot interfere with production traffic or with each other.

@inproceedings{sherwood2009demo, title = {Demo: Carving Research Slices Out of Your Production Networks with OpenFlow}, author = {Rob Sherwood, Michael Chan, Adam Covington, Glen Gibb, Mario Flajslik, Nikhil Handigol, Te-Yuan Huang, Peyman Kazemian, Masayoshi Kobayashi, Jad Naous, Srinivasan Seetharaman, David Underhill, Tatsuya Yabe, Kok-Kiong Yap, Yiannis Yiakoumis, Hongyi Zeng, Guido Appenzeller, Ramesh Johari, Nick McKeown, and Guru Parulkar}, booktitle = {In Proceedings of ACM SIGCOMM, Barcelona, Spain, August 2009}, year = {2009} }

Rethinking Enterprise Network Control

Martin Casado, Michael J. Freedman, Justin Pettit, Jianying Luo, Natasha Gude, Nick McKeown, Scott Shenker

IEEE/ACM Transactions on Networking, Vol. 17, No. 4, August 2009

PDF

Enterprise networks are complex and difficult to manage. Current networks are built from a variety of devices from multiple vendors, each with their own configuration interfaces, making it hard to implement and enforce consistent network-wide policies. We have implemented Ethane in both hardware and software and deployed it in the Stanford Computer Science department. Ethane allows managers to define a single network-wide policy, and then enforces it directly using simple flow-based switches and a centralized controller.

@article{casado2009rethinking, title={Rethinking Enterprise Network Control}, author={Casado, Martin and Freedman, Michael J. and Pettit, Justin and Luo, Jianying and Gude, Natasha and McKeown, Nick and Shenker, Scott}, journal={IEEE/ACM Transactions on Networking}, volume={17}, number={4}, year={2009} }

Implementing an OpenFlow Switch on the NetFPGA platform

Jad Naous, David Erickson, Adam Covington, Guido Appenzeller, and Nick McKeown

ANCS'08, November 6-7, 2008, San Jose, CA, USA

PDF

We describe an implementation of an OpenFlow switch on the NetFPGA platform, an open-source hardware platform for networking research. Our implementation supports the OpenFlow specification, allowing researchers to experiment with OpenFlow-based networks using commodity hardware. We describe the architecture, implementation, and performance of the switch.

@inproceedings{naous2008implementing, title = {Implementing an OpenFlow Switch on the NetFPGA platform}, author = {Jad Naous, David Erickson, Adam Covington, Guido Appenzeller, and Nick McKeown}, booktitle = {ANCS'08, November 6-7, 2008, San Jose, CA, USA}, year = {2008} }

NOX: Towards an Operating System for Networks

Natasha Gude, Teemu Koponen, Justin Pettit, Ben Pfaff, Martin Casado, Nick McKeown, Scott Shenker

ACM Computer Communication Review, April 2008

PDF

Networks are now an integral part of every aspect of our daily lives. Managing and controlling the network infrastructure has become increasingly complex. The programming model made available by current networking equipment is primitive, much like the computing environment before the development of operating systems. NOX is a network operating system that provides a programmatic interface to the entire network, giving applications a centralized network view and enabling high-level management policies. NOX manages a network the same way that a traditional operating system manages a computer: it provides useful abstractions for the underlying physical resources and a uniform programmatic interface for managing them.

@article{gude2008nox, title={NOX: Towards an Operating System for Networks}, author={Gude, Natasha and Koponen, Teemu and Pettit, Justin and Pfaff, Ben and Casado, Martin and McKeown, Nick and Shenker, Scott}, journal={ACM Computer Communication Review}, year={2008} }

OpenFlow: Enabling Innovation in Campus Networks

Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson, Jennifer Rexford, Scott Shenker, Jonathan Turner

ACM SIGCOMM Computer Communication Review, Vol. 38, No. 2, April 2008

PDF

This whitepaper proposes OpenFlow: a way for researchers to run experimental protocols in the networks they use every day. OpenFlow is based on an Ethernet switch, with an internal flow-table, and a standardized interface to add and remove flow entries. Our goal is to encourage networking vendors to add OpenFlow to their switch products for deployment in college campus backbones and wiring closets. We believe that OpenFlow is a pragmatic compromise: on one hand, it allows researchers to run experiments on heterogeneous switches in a uniform way at line-rate and with high port-density; while on the other hand, vendors do not need to expose the internal workings of their switches. In addition to allowing researchers to evaluate their ideas in real-world traffic settings, OpenFlow could serve as a useful campus component in testbeds like GENI.

@article{mckeown2008openflow, title={OpenFlow: Enabling Innovation in Campus Networks}, author={McKeown, Nick and Anderson, Tom and Balakrishnan, Hari and Parulkar, Guru and Peterson, Larry and Rexford, Jennifer and Shenker, Scott and Turner, Jonathan}, journal={ACM SIGCOMM Computer Communication Review}, volume={38}, number={2}, pages={69--74}, year={2008} }

Ethane: Taking Control of the EnterpriseSIGCOMM Test of Time 2017

Martin Casado, Michael J. Freedman, Justin Pettit, Jianying Luo, Nick McKeown, Scott Shenker

ACM SIGCOMM, Kyoto, Japan, August 2007

PDF

This paper presents Ethane, a new network architecture for the enterprise. Ethane allows managers to define a single network-wide fine-grain policy, and then enforces it directly. Ethane couples extremely simple flow-based Ethernet switches with a centralized controller that manages the admittance and routing of flows. While radical, this design is backwards-compatible with existing hosts and switches. We have built and deployed Ethane in our university department, and we describe our experience over a seven-month period.

@inproceedings{casado2007ethane, title={Ethane: Taking Control of the Enterprise}, author={Casado, Martin and Freedman, Michael J. and Pettit, Justin and Luo, Jianying and McKeown, Nick and Shenker, Scott}, booktitle={ACM SIGCOMM}, year={2007} }

Prototyping Fast, Simple, Secure Switches for Ethane

Jianying Luo, Justin Pettit, Martin Casado, John Lockwood, Nick McKeown

Hot Interconnects, Stanford, August 2007

PDF

Ethane is a new network architecture that uses a centralized controller to manage the admittance and routing of flows through the network. This paper describes the design and implementation of Ethane flow switches -- the simple forwarding elements that make up an Ethane network. Through the introduction of middleboxes, Ethane provides fine-grained access control and network management without requiring complex configuration of traditional enterprise switches and routers.

@inproceedings{luo2007prototyping, title={Prototyping Fast, Simple, Secure Switches for Ethane}, author={Luo, Jianying and Pettit, Justin and Casado, Martin and Lockwood, John and McKeown, Nick}, booktitle={Hot Interconnects}, year={2007} }

SANE: A Protection Architecture for Enterprise Networks

Martin Casado, Tal Garfinkel, Aditya Akella, Michael Freedman, Dan Boneh, Nick McKeown, Scott Shenker

15th USENIX Security Symposium, Vancouver, Canada, August 2006

PDF

Despite the critical importance of enterprise network security, today's defenses are difficult to configure and manage. Enterprise networks are protected by a combination of complex routing and bridging policies, firewalls, NATs, and other middleboxes. SANE is a protection architecture for enterprise networks that replaces these mechanisms with a single, clean design. In SANE, all routing and access control decisions are made by a logically centralized server that authenticates all communication and grants access according to a network-wide policy.

@inproceedings{casado2006sane, title={SANE: A Protection Architecture for Enterprise Networks}, author={Casado, Martin and Garfinkel, Tal and Akella, Aditya and Freedman, Michael and Boneh, Dan and McKeown, Nick and Shenker, Scott}, booktitle={15th USENIX Security Symposium}, year={2006} }

Hydra: Effective Runtime Network Verification

Sundararajan Renganathan, Benny Rubin, Hyojoon Kim, Pier Luigi Ventre, Carmelo Cascone, Daniele Moro, Charles Chan, Nick McKeown, Nate Foster

ACM SIGCOMM 2023

PDF

It is notoriously difficult to verify that a network is behaving as intended, especially at scale. This paper presents Hydra, a system that uses ideas from runtime verification to check that every packet is correctly processed with respect to a specification in real time. We propose a domain-specific language for writing properties, called Indus, and we develop a compiler that turns properties thus specified into executable P4 code that runs alongside the forwarding code at line rate. To evaluate our approach, we used Indus to model a range of properties, showing that it is expressive enough to capture examples studied in prior work. We also deployed Hydra checkers for validating paths in source routing and for enforcing slice isolation in Aether, an open-source cellular platform. We confirmed a subtle bug in Aether's 5G mobile core that would have been hard to detect using static techniques. We also evaluated the overheads of Hydra on hardware, finding that it does not significantly increase latency and often does not require additional pipeline stages.

@inproceedings{renganathan2023hydra, title={Hydra: Effective Runtime Network Verification}, author={Renganathan, Sundararajan and Rubin, Benny and Kim, Hyojoon and Ventre, Pier Luigi and Cascone, Carmelo and Moro, Daniele and Chan, Charles and McKeown, Nick and Foster, Nate}, booktitle={ACM SIGCOMM}, year={2023} }

I Know What Your Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks

Nikhil Handigol, Brandon Heller, Vimalkumar Jeyakumar, David Mazieres, Nick McKeown

USENIX NSDI, Seattle, WA, 2014

PDF

We present Header Space Analysis, a framework for statically checking network specifications and configurations. Our framework uses a geometric model of packet headers to check for network-wide invariants, such as reachability failures, forwarding loops, and traffic isolation properties. By treating the header as a point in a geometric space, we can use set operations to compute the set of all possible packet headers that can reach from any port to any other port in the network. We implement our framework and apply it to the Stanford University backbone network, where it finds several configuration errors.

@inproceedings{handigol2014know, title={I Know What Your Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks}, author={Handigol, Nikhil and Heller, Brandon and Jeyakumar, Vimalkumar and Mazieres, David and McKeown, Nick}, booktitle={USENIX NSDI}, year={2014} }

Libra: Divide and Conquer to Verify Forwarding Tables in Huge Networks

Hongyi Zeng, Shidong Zhang, Fei Ye, Vimalkumar Jeyakumar, Mickey Ju, Junda Liu, Nick McKeown, Amin Vahdat

USENIX NSDI, Seattle, WA, 2014

PDF

We present a systematic approach to troubleshooting software-defined networks by leveraging the SDN layering architecture. When a network problem occurs, our system automatically identifies which layer of the SDN stack is responsible for the issue, using existing debugging tools such as NetPlumber and VeriFlow at the appropriate layer. By decomposing the debugging problem across SDN layers, we can pinpoint bugs faster and more accurately than approaches that treat the network as a monolithic system.

@inproceedings{zeng2014libra, title={Libra: Divide and Conquer to Verify Forwarding Tables in Huge Networks}, author={Zeng, Hongyi and Zhang, Shidong and Ye, Fei and Jeyakumar, Vimalkumar and Ju, Mickey and Liu, Junda and McKeown, Nick and Vahdat, Amin}, booktitle={USENIX NSDI}, year={2014} }

Real Time Network Policy Checking using Header Space Analysis

Peyman Kazemian, Michael Chang, Hongyi Zeng, George Varghese, Nick McKeown, Scott Whyte

USENIX NSDI 2013

PDF

Network state may change rapidly in response to customer demands, load conditions or configuration changes. But the network must also ensure correctness conditions such as isolating tenants from each other and from critical services. Existing policy checkers cannot verify compliance in real time because of the need to collect state from the entire network and the time it takes to analyze this state. SDNs provide an opportunity in this respect as they provide a logically centralized view from which every proposed change can be checked for compliance with policy. But there remains the need for a fast compliance checker. Our paper introduces a real time policy checking tool called NetPlumber based on Header Space Analysis (HSA). Unlike HSA, however, NetPlumber incrementally checks for policy violations as new rules are inserted, rather than checking the entire network from scratch.

@inproceedings{kazemian2013real, title={Real Time Network Policy Checking using Header Space Analysis}, author={Kazemian, Peyman and Chang, Michael and Zeng, Hongyi and Varghese, George and McKeown, Nick and Whyte, Scott}, booktitle={USENIX NSDI}, year={2013} }

Leveraging SDN Layering to Systematically Troubleshoot Networks

Brandon Heller, Colin Scott, Nick McKeown, Scott Shenker, Andreas Wundsam, Hongyi Zeng, Sam Whitlock, Vimalkumar Jeyakumar, Nikhil Handigol, James McCauley, Kyriakos Zarifis, Peyman Kazemian

ACM HotSDN 2013

PDF

Network state may change rapidly in response to customer demands, load conditions or configuration changes. But the network must also ensure correctness conditions such as isolating tenants from each other and from critical services. Existing policy checkers cannot verify compliance in real time because of the need to collect state from the entire network and the time it takes to analyze this state. SDNs provide an opportunity in this respect as they provide a logically centralized view from which every proposed change can be checked for compliance with policy. But there remains the need for a fast compliance checker. Our paper introduces a real time policy checking tool called NetPlumber based on Header Space Analysis (HSA). Unlike HSA, however, NetPlumber incrementally checks for policy violations as new rules are inserted, rather than checking the entire network from scratch.

@inproceedings{heller2013leveraging, title={Leveraging SDN Layering to Systematically Troubleshoot Networks}, author={Heller, Brandon and Scott, Colin and McKeown, Nick and Shenker, Scott and Wundsam, Andreas and Zeng, Hongyi and Whitlock, Sam and Jeyakumar, Vimalkumar and Handigol, Nikhil and McCauley, James and Zarifis, Kyriakos and Kazemian, Peyman}, booktitle={ACM HotSDN}, year={2013} }

Automatic Test Packet Generation

Hongyi Zeng, Peyman Kazemian, George Varghese, Nick McKeown

IEEE/ACM Transactions on Networking, 2014

PDF

Networks are getting larger and more complex, yet administrators rely on rudimentary tools such as ping and traceroute to troubleshoot problems. We present ATPG, a system that automatically generates a minimum set of test packets to test the liveness and correctness of all rules in a network. ATPG reads router configurations and generates a minimum set of test packets that traverse every link and test every forwarding rule in the network, greatly simplifying the testing and debugging of networks.

@article{zeng2014automatic, title={Automatic Test Packet Generation}, author={Zeng, Hongyi and Kazemian, Peyman and Varghese, George and McKeown, Nick}, journal={IEEE/ACM Transactions on Networking}, year={2014} }

Where is the Debugger for my Software-Defined Network?Best Student Presentation

Nikhil Handigol, Brandon Heller, Vimalkumar Jeyakumar, David Mazieres, Nick McKeown

HotSDN 2012, Helsinki, Finland

PDF

The behavior of a Software-Defined Network is controlled by programs, which like all software, will have bugs—but this programmatic control also enables new ways to debug networks. This paper introduces ndb, a prototype network debugger inspired by gdb, which implements two primitives useful for debugging an SDN: breakpoints and packet backtraces. We show how ndb modifies forwarding state and logs packet digests to rebuild the sequence of events leading to an errant packet, providing SDN programmers and operators with a valuable tool for tracking down the root cause of a bug.

@inproceedings{handigol2012where, title={Where is the Debugger for my Software-Defined Network?}, author={Handigol, Nikhil and Heller, Brandon and Jeyakumar, Vimalkumar and Mazieres, David and McKeown, Nick}, booktitle={HotSDN}, year={2012} }

Header Space Analysis: Static Checking for NetworksNSDI Test of Time 2024

Peyman Kazemian, George Varghese, Nick McKeown

USENIX NSDI, San Jose, CA, April 2012

PDF

The behavior of a Software-Defined Network is controlled by programs, which like all software, will have bugs—but this programmatic control also enables new ways to debug networks. This paper introduces ndb, a prototype network debugger inspired by gdb, which implements two primitives useful for debugging an SDN: breakpoints and packet backtraces. We show how ndb modifies forwarding state and logs packet digests to rebuild the sequence of events leading to an errant packet, providing SDN programmers and operators with a valuable tool for tracking down the root cause of a bug.

@inproceedings{kazemian2012header, title={Header Space Analysis: Static Checking for Networks}, author={Kazemian, Peyman and Varghese, George and McKeown, Nick}, booktitle={USENIX NSDI}, year={2012} }

Formal Network Testing

Hongyi Zeng, Peyman Kazemian, George Varghese, Nick McKeown

Techincal Report : Stanford University, Stanford, CA USA, UCSD, San Diego and Yahoo! Labs, Santa Clara, CA, USA

PDF

Certain types of errors, such as hardware failure, link failure or congestion can only be detected at runtime of a network, as these are failures that happen when the network is in operation. Today, there exist only preliminary tools to detect and pinpoint such kind of errors, typified by ping and traceroute. In this paper we introduce a formal and comprehensive way to fully test networks at runtime which enables us to pinpoint the exact rule that is causing the problem. Our test packet generation algorithm first picks a relatively small number of test packets and corresponding input ports so that these packets exercise all the rules in the network. By periodically sending these packets through the network, we can constantly monitor the health of the network. Whenever an error happens, the fault localization algorithm uses the result of all test packets to find out the exact rule that is at fault.

@techreport{zengformal, title = {Formal Network Testing}, author = {Hongyi Zeng, Peyman Kazemian, George Varghese, Nick McKeown}, institution = {Techincal Report : Stanford University, Stanford, CA USA, UCSD, San Diego and Yahoo! Labs, Santa Clara, CA, USA}, year = {} }

Chronos: Prescheduled Circuit Switching for LLM TrainingBest Paper

Sundararajan Renganathan, Nick McKeown

2nd Workshop on Networks for AI Computing (NAIC), ACM SIGCOMM 2025

PDF

The all-to-all collective communications used in training large language models create a predictable, periodic traffic pattern. We exploit this predictability with Chronos, a network that preschedules optical circuit switches to provide each server-pair with a guaranteed rate. Because the traffic matrix is known in advance, Chronos computes a recurring schedule of circuit configurations before training begins, eliminating run-time scheduling overhead. We show that Chronos can meet the bandwidth requirements of common LLM parallelism strategies while reducing network cost by replacing expensive packet switches with simpler optical circuit switches.

@inproceedings{renganathan2025chronos, title={Chronos: Prescheduled Circuit Switching for LLM Training}, author={Renganathan, Sundararajan and McKeown, Nick}, booktitle={2nd Workshop on Networks for AI Computing (NAIC), ACM SIGCOMM}, year={2025} }

The nanoPU: A Nanosecond Network Stack for Datacenters

Stephen Ibanez, Alex Mallery, Serhat Arslan, Theo Jepsen, Muhammad Shahbaz, Changhoon Kim, Nick McKeown

USENIX OSDI 2021

PDF

We present the nanoPU, a new NIC-CPU co-design to accelerate an increasingly pervasive class of datacenter applications: those that utilize many small Remote Procedure Calls (RPCs) with very short microsecond-scale processing times. The novel aspect of the nanoPU is the design of a fast path between the network and the CPU that bypasses the cache and memory hierarchy, placing arriving messages directly into the CPU register file. We implement a prototype nanoPU in an FPGA and show it can provide a thread-to-thread RPC latency of about 69 nanoseconds, more than ten times faster than existing systems.

@inproceedings{ibanez2021nanopu, title={The nanoPU: A Nanosecond Network Stack for Datacenters}, author={Ibanez, Stephen and Mallery, Alex and Arslan, Serhat and Jepsen, Theo and Shahbaz, Muhammad and Kim, Changhoon and McKeown, Nick}, booktitle={USENIX OSDI}, year={2021} }

NanoTransport: A Low-Latency, Programmable Transport Layer for NICs

Serhat Arslan, Stephen Ibanez, Alex Mallery, Changhoon Kim, Nick McKeown

ACM SOSR 2021

PDF

Transport protocols can be implemented in NIC hardware to increase throughput and reduce latency, but hardware implementations are inflexible and difficult to modify. NanoTransport is a programmable transport layer implemented on a NIC that supports microsecond-scale latency while allowing transport protocols to be changed in software. NanoTransport provides a simple programming interface for defining custom transport protocols, enabling rapid experimentation with new transport designs without sacrificing performance.

@inproceedings{arslan2021nanotransport, title={NanoTransport: A Low-Latency, Programmable Transport Layer for NICs}, author={Arslan, Serhat and Ibanez, Stephen and Mallery, Alex and Kim, Changhoon and McKeown, Nick}, booktitle={ACM SOSR}, year={2021} }

A Distributed Algorithm to Calculate Max-Min Fair Rates Without Per-Flow State

Lavanya Jose, Stephen Ibanez, Mohammad Alizadeh, Nick McKeown

ACM SIGMETRICS, 2019

PDF

Most congestion control algorithms, like TCP, rely on a reactive control system that detects congestion, then marches carefully towards a desired operating point (e.g. by modifying the window size or adjusting a rate). In an effort to balance stability and convergence speed, they often take hundreds of RTTs to converge; an increasing problem as networks get faster, with less time to react. This paper is about an alternative class of congestion control algorithms based on proactive-scheduling: switches and NICs "proactively" exchange control messages to run a distributed algorithm to pick "explicit rates" for each flow.We call these Proactive Explicit Rate Control (PERC) algorithms. They take as input the routing matrix and link speeds, but not a congestion signal. By exploiting information such as the number of flows at a link, they can converge an order of magnitude faster than reactive algorithms.

@inproceedings{jose2019distributed, title={A Distributed Algorithm to Calculate Max-Min Fair Rates Without Per-Flow State}, author={Jose, Lavanya and Ibanez, Stephen and Alizadeh, Mohammad and McKeown, Nick}, booktitle={ACM SIGMETRICS}, year={2019} }

pFabric: Minimal Near-Optimal Datacenter Transport

Mohammad Alizadeh, Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker

ACM SIGCOMM 2013

PDF

We present pFabric, a minimalistic datacenter transport design that achieves near-optimal flow completion times for short flows while maintaining high throughput for large flows. pFabric has a very simple design: switches implement a priority queue that schedules packets based on a single priority number, and the endpoints implement a minimal rate control mechanism. Despite its simplicity, we show through analysis and simulation that pFabric achieves flow completion times within 10-15% of the optimal for a wide range of workloads.

@inproceedings{alizadeh2013pfabric, title={pFabric: Minimal Near-Optimal Datacenter Transport}, author={Alizadeh, Mohammad and Yang, Shuang and Sharif, Milad and Katti, Sachin and McKeown, Nick and Prabhakar, Balaji and Shenker, Scott}, booktitle={ACM SIGCOMM}, year={2013} }

Deconstructing Datacenter Packet Transport

Mohammad Alizadeh, Shuang Yang, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker

HotNets 2012

PDF

We observe that the requirements for datacenter transport are fundamentally different from the wide-area Internet. In the datacenter, the network fabric is administered by a single entity, latencies are small, and the switching elements can be enhanced with new functionality. This suggests that datacenter transport should be deconstructed: instead of implementing all transport functions at the end-host, some functions are better performed inside the network fabric where switches have more timely and complete information about congestion. We describe a transport architecture that decouples flow scheduling from rate control, placing scheduling decisions in the network and rate control at the end-hosts.

@inproceedings{alizadeh2012deconstructing, title={Deconstructing Datacenter Packet Transport}, author={Alizadeh, Mohammad and Yang, Shuang and Katti, Sachin and McKeown, Nick and Prabhakar, Balaji and Shenker, Scott}, booktitle={HotNets}, year={2012} }

Ripcord: A Modular Platform For Data Center Networking

Martin Casado, David Erickson, Igor Anatolyevich Ganichev, Rean Griffith, Brandon Heller, Nick Mckeown, Daekyeong Moon, Teemu Koponen, Scott Shenker, Kyriakos Zarifis

EECS Department University of California, Berkeley Technical Report No. UCB/EECS-2010-93

PDF

In this demo, we present Ripcord, a modular platform for rapidly prototyping scale-out data center networks. Ripcord enables researchers to build and evaluate new network features and topologies, using only commercially available hardwareandopen-sourcesoftware. The Ripcorddemowill showthreeexamplesofcustomnetworkfunctions,operating together,ontopofa160-nodecluster. Thefirstisarouting engine that isolates classes of traffic. The second is a dynamicnetworkmanagerthanadjustslinksandswitchpower states to reduce energy. The third is a statistics aggregatorthatsupportsnetworkhealthmonitoringandautomatic alerts. Thedemowill beinteractive, witha visualizationof liveparametersforeachlinkandswitch,suchasbandwidth, drops, and power status, as well a control panel to modify thetrafficload. Wefeelthataninteractivedemoisthebest way to introduce the research community to Ripcord and get their feedback.

@techreport{casado2010ripcord, title = {Ripcord: A Modular Platform For Data Center Networking}, author = {Martin Casado, David Erickson, Igor Anatolyevich Ganichev, Rean Griffith, Brandon Heller, Nick Mckeown, Daekyeong Moon, Teemu Koponen, Scott Shenker, Kyriakos Zarifis}, institution = {EECS Department University of California, Berkeley Technical Report No. UCB/EECS-2010-93}, year = {2010} }

Optimizing a Virtualized Data Center

David Erickson, Brandon Heller, Shuang Yang, Jonathan Chu, Jonathan D. Ellithorpe, Scott Whyte, Stephen Stuart, Nick McKeown, Guru M. Parulkar, Mendel Rosenblum

Proceedings of the ACM SIGCOMM 2011 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, Toronto, ON, Canada, August 2011

PDF

Many data centers extensively use virtual machines (VMs), which provide the flexibility to move workload among physical servers. VMs can be placed to maximize application performance, power efficiency, or even fault tolerance. However, VMs are typically repositioned without considering network topology, congestion, or traffic routes. In this demo, we show a system, Virtue, which enables the comparison of different algorithms for VM placement and network routing at the scale of an entire data center. Our goal is to understand how placement and routing affect overall application performance by varying the types and mix of workloads, network topologies, and compute resources; these parameters will be available for demo attendees to explore.

@inproceedings{erickson2011optimizing, title = {Optimizing a Virtualized Data Center}, author = {David Erickson, Brandon Heller, Shuang Yang, Jonathan Chu, Jonathan D. Ellithorpe, Scott Whyte, Stephen Stuart, Nick McKeown, Guru M. Parulkar, Mendel Rosenblum}, booktitle = {Proceedings of the ACM SIGCOMM 2011 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, Toronto, ON, Canada, August 2011}, year = {2011} }

ElasticTree: Saving Energy in Data Center Networks

Brandon Heller, Srini Seetharaman, Priya Mahadevan, Yiannis Yiakoumis, Puneet Sharma, Sujata Banerjee, Nick McKeown

USENIX NSDI, San Jose, CA, April 2010

PDF

Energy proportionality is a key goal for data center networks. Networks are a shared resource connecting critical IT infrastructure, and the general practice is to keep them always on, even during off-peak hours when they are lightly utilized. ElasticTree is a network-wide energy optimizer that dynamically adjusts the set of active network elements to satisfy changing data center traffic loads. It combines a formal model of the network with traffic information to find the minimum power network subset and to turn off unneeded links and switches. We demonstrate that ElasticTree can save 25-40% of network energy in a data center.

@inproceedings{heller2010elastictree, title={ElasticTree: Saving Energy in Data Center Networks}, author={Heller, Brandon and Seetharaman, Srini and Mahadevan, Priya and Yiakoumis, Yiannis and Sharma, Puneet and Banerjee, Sujata and McKeown, Nick}, booktitle={USENIX NSDI}, year={2010} }

Demo: A Demonstration of Virtual Machine Mobility in an OpenFlow network

David Erickson, Glen Gibb, Brandon Heller, David Underhill, Jad Naous, Guido Appenzeller, Guru Parulkar, Nick McKeown, Mendel Rosenblum, Monica Lam, Sailesh Kumar, Valentina Alaria, Pere Monclus, Flavio Bonomi, Jean Tourrilhes, Praveen Yalagandula, Sujata Banerjee, Charles Clark, Rick McGeer

ACM SIGCOMM, (Best Demo Award), Seattle, WA, August 2008

PDF

We demonstrate seamless virtual machine mobility in an OpenFlow network. OpenFlow is an architecture that allows researchers to run experimental protocols in campus networks. Using OpenFlow-enabled switches at Stanford, we show how virtual machines can be migrated between physical hosts while maintaining active network connections, without disrupting ongoing communication.

@inproceedings{erickson2008demo, title = {Demo: A Demonstration of Virtual Machine Mobility in an OpenFlow network}, author = {David Erickson, Glen Gibb, Brandon Heller, David Underhill, Jad Naous, Guido Appenzeller, Guru Parulkar, Nick McKeown, Mendel Rosenblum, Monica Lam, Sailesh Kumar, Valentina Alaria, Pere Monclus, Flavio Bonomi, Jean Tourrilhes, Praveen Yalagandula, Sujata Banerjee, Charles Clark, Rick McGeer}, booktitle = {ACM SIGCOMM, (Best Demo Award), Seattle, WA, August 2008}, year = {2008} }

Using Network Knowledge to Improve Workload Performance in Virtualized Data Centers

David Erickson, Brandon Heller, Nick McKeown, Mendel Rosenblum

IC2E, Boston, March 2014

PDF

The scale and expense of modern data centers motivates running them as efficiently as possible. This paper explores how virtualized data center performance can be improved when network traffic and topology data informs VM placement. Our practical heuristics, tested on network-heavy, scale-out workloads in an 80 server cluster, improve overall performance by up to 70% compared to random placement in a multi-tenant configuration.

@inproceedings{erickson2014using, title={Using Network Knowledge to Improve Workload Performance in Virtualized Data Centers}, author={Erickson, David and Heller, Brandon and McKeown, Nick and Rosenblum, Mendel}, booktitle={IC2E}, year={2014} }

Sammy: Smoothing Video Traffic to Be a Friendly Internet Neighbor

Bruce Spang, Shravya Kunamalla, Renata Teixeira, Te-Yuan Huang, Grenville Armitage, Ramesh Johari, Nick McKeown

ACM SIGCOMM 2023

PDF

The scale and expense of modern data centers motivates running them as efficiently as possible. This paper explores how virtualized data center performance can be improved when network traffic and topology data informs VM placement. Our practical heuristics, tested on network-heavy, scale-out workloads in an 80 server cluster, improve overall performance by up to 70% compared to random placement in a multi-tenant configuration.

@inproceedings{spang2023sammy, title={Sammy: Smoothing Video Traffic to Be a Friendly Internet Neighbor}, author={Spang, Bruce and Kunamalla, Shravya and Teixeira, Renata and Huang, Te-Yuan and Armitage, Grenville and Johari, Ramesh and McKeown, Nick}, booktitle={ACM SIGCOMM}, year={2023} }

Buffer Sizing and Video QoE Measurements at Netflix

Bruce Spang, Brady Walsh, Te-Yuan Huang, Tom Rusnock, J.C. Lawrence, Nick McKeown

Workshop on Buffer Sizing, 2019

PDF

In this paper we report on measurement of Netflix video traffic, to understand how the size of router buffers affects the quality of video seen by the end-user (i.e. video QoE). In certain locations, Netflix streams video over congested links, and the resulting packet loss leads to video impairments that can cause the viewer to abandon the stream. Intuitively, larger router buffers lead to fewer drops, better video, and fewer abandonments. But how much buffer is enough? In this paper we describe a set of measurement and analysis techniques that give video service providers a framework for determining this. We describe experiments performed on a congested transoceanic link, where we measured the impact of buffer size on video quality as measured by Netflix's Video Quality of Experience (QoE) metrics. We found that in our setup, reducing the buffer size from 210ms to 30ms of buffering had little impact on the video quality experienced by subscribers.

@inproceedings{spang2019buffer, title={Buffer Sizing and Video QoE Measurements at Netflix}, author={Spang, Bruce and Walsh, Brady and Huang, Te-Yuan and Rusnock, Tom and Lawrence, J.C. and McKeown, Nick}, booktitle={Workshop on Buffer Sizing}, year={2019} }

A Buffer-Based Approach to Rate Adaptation: Evidence from a Large Video Streaming Service

Te-Yuan Huang, Ramesh Johari, Nick McKeown, Matthew Trunnell, Mark Watson

ACM SIGCOMM, Chicago, 2014

PDF

Existing ABR algorithms face a significant challenge in estimating future capacity: capacity can vary widely over time, a phenomenon commonly observed in commercial services. In this work, we suggest an alternative approach to rate adaptation that avoids this problem. We design an ABR algorithm that chooses the video rate based on the current buffer occupancy. We show using analysis that a carefully designed buffer-based approach can provide near-optimal results: it can achieve a video rate close to the maximum average throughput, while maintaining a small probability of rebuffering. We also describe a deployment by Netflix over a period of several weeks with millions of users. Our deployments confirm the theoretical findings: our buffer-based algorithm provides a comparable or improved user experience.

@inproceedings{huang2014buffer, title={A Buffer-Based Approach to Rate Adaptation: Evidence from a Large Video Streaming Service}, author={Huang, Te-Yuan and Johari, Ramesh and McKeown, Nick and Trunnell, Matthew and Watson, Mark}, booktitle={ACM SIGCOMM}, year={2014} }

Downton Abbey Without the Hiccups: Buffer-Based Rate Adaptation for HTTP Video Streaming

Te-Yuan Huang, Ramesh Johari, Nick McKeown

ACM SIGCOMM Workshop on Future Human-Centric Multimedia Networking (FhMN), 2013

PDF

Recent work has shown how hard it is to pick a video streaming rate. Video service providers use heuristics to estimate the network capacity leading to unnecessary rebuffering events and suboptimal video quality. This paper begins with the observation that video streaming is, in essence, a scheduling problem: the video player must schedule the download of each chunk over a shared bottleneck link. We describe a principled approach based on Model Predictive Control that optimizes the bitrate selection and schedules the chunk downloads over a finite time horizon. We implement the algorithm in a client-side video player, and compare it to existing commercial clients in a wide range of network conditions. We find that it reduces the rebuffer rate by 10-20% while achieving similar or better average bitrate.

@inproceedings{huang2013downton, title={Downton Abbey Without the Hiccups: Buffer-Based Rate Adaptation for HTTP Video Streaming}, author={Huang, Te-Yuan and Johari, Ramesh and McKeown, Nick}, booktitle={ACM SIGCOMM FhMN Workshop}, year={2013} }

Confused, Timid, and Unstable: Picking a Video Streaming Rate is Hard

Te-Yuan Huang, Nikhil Handigol, Brandon Heller, Nick McKeown, Ramesh Johari

ACM SIGCOMM Internet Measurement Conference (IMC), Boston, November 2012

PDF

Today's commercial video streaming services use dynamic rate selection to provide a high-quality user experience. Most services host content on standard HTTP servers in CDNs, so rate selection must occur at the client. Picking the right video rate is hard because the capacity of the path between server and client varies over time and is hard to predict. If the selected rate is too high, the video player's buffer will run empty and the video will rebuffer; if the rate is too low, video quality will suffer unnecessarily. In this paper, we examine several state-of-the-art rate-selection algorithms used by commercial services and find that they fall into common traps, leading to behavior we describe as confused, timid, and unstable. We explain the root causes of these behaviors and their effects on user experience.

@inproceedings{huang2012confused, title={Confused, Timid, and Unstable: Picking a Video Streaming Rate is Hard}, author={Huang, Te-Yuan and Handigol, Nikhil and Heller, Brandon and McKeown, Nick and Johari, Ramesh}, booktitle={ACM IMC}, year={2012} }

Unbiased Experiments in Congested Networks

Bruce Spang, Veronica Hannan, Shravya Kunamalla, Te-Yuan Huang, Nick McKeown, Ramesh Johari

ACM Internet Measurement Conference (IMC), 2021

PDF

When developing a new networking algorithm, it is established practice to run a randomized experiment, or A/B test, to evaluate its performance. In an A/B test, traffic is randomly allocated between a treatment group, which uses the new algorithm, and a control group, which uses the existing algorithm. However, because networks are congested, both treatment and control traffic compete against each other for resources in a way that biases the outcome of these tests. This bias can have a surprisingly large effect; for example, in lab A/B tests with two widely used congestion control algorithms, the treatment appeared to deliver 150% higher throughput when used by a few flows, and 75% lower throughput when used by most flows---despite the fact that the two algorithms have identical throughput when used by all traffic. Beyond the lab, we show that A/B tests can also be biased at scale. In an experiment run in cooperation with Netflix, estimates from A/B tests mistake the direction of change of some metrics, miss changes in other metrics, and overestimate the size of effects. We propose alternative experiment designs, previously used in online platforms, to more accurately evaluate new algorithms and allow experimenters to better understand the impact of congestion on their tests.

@inproceedings{spang2021unbiased, title={Unbiased Experiments in Congested Networks}, author={Spang, Bruce and Hannan, Veronica and Kunamalla, Shravya and Huang, Te-Yuan and McKeown, Nick and Johari, Ramesh}, booktitle={ACM Internet Measurement Conference (IMC)}, year={2021} }

Switches Know the Exact Amount of Congestion

Serhat Arslan, Nick McKeown

Workshop on Buffer Sizing, 2019

PDF

Deployed congestion control algorithms rely on end hosts to figure out how congested the network is. Initially, for TCP, the congestion signal was packet drops; a binary signal that kicks in only after the congestion is well underway. More recently, the trend has been towards using RTT as a congestion signal instead (e.g. Timely and BBR). But RTT is a noisy surrogate for congestion; it contains a valuable signal about congestion at the bottleneck but also includes noise from the queuing delay at the non-bottlenecked switches. Taking a step back, it is worth asking: Why don't the switches and routers simply tell us the actual congestion they are experiencing? After all, they must keep track of the precise occupancy of their own queues anyway; they can directly tell the end-hosts. Conventional wisdom said this is too expensive (in terms of additional bits in headers, or complexity and power consumption). We argue that even if this was once the case, it no longer is. Today, it is quite feasible, with negligible increase in power or lost capacity, to report the precise queuing delay at the switches, allowing the end hosts to make more accurate decisions when minimizing required buffering. We explore how this might work using modern programmable switches and NICs that stamp each packet with the queue occupancy (or the maximum queue occupancy along the path), which can be thought of as a multi-bit ECN signal. We provide evidence that the resulting signal is a more accurate indication of congestion at the flow's bottleneck and can lead to higher link utilization and shorter flow completion times than RTT-based algorithms. Consequently, it becomes easier to control required buffer sizes. Our goal here is not to argue for a particular multi-bit ECN algorithm, but to point out that in the future, there is no longer a need to rely on noisy RTT measurements at the edges.

@inproceedings{arslan2019switches, title={Switches Know the Exact Amount of Congestion}, author={Arslan, Serhat and McKeown, Nick}, booktitle={Workshop on Buffer Sizing}, year={2019} }

Virtualized Congestion Control

Bryce Cronkite-Ratcliff, Aran Bergman, Shay Vargaftik, Madhusudhan Ravi, Nick McKeown, Ittai Abraham, Isaac Keslassy

ACM SIGCOMM, Florianopolis, Brazil, 2016

PDF

New congestion control algorithms are rapidly improving datacenter performance by reducing latency and overcoming incast collapse. However, deploying a new algorithm requires updating every endpoint in the datacenter, which is slow and error-prone. We present Virtualized Congestion Control, which decouples congestion control from the endpoint operating system and runs it in a virtual congestion control layer. This allows operators to deploy, update, and experiment with congestion control algorithms without modifying the host OS kernel.

@inproceedings{cronkite2016virtualized, title={Virtualized Congestion Control}, author={Cronkite-Ratcliff, Bryce and Bergman, Aran and Vargaftik, Shay and Ravi, Madhusudhan and McKeown, Nick and Abraham, Ittai and Keslassy, Isaac}, booktitle={ACM SIGCOMM}, year={2016} }

High Speed Networks Need Proactive Congestion Control

Lavanya Jose, Lisa Yan, Mohammad Alizadeh, George Varghese, Nick McKeown, Sachin Katti

ACM HotNets, Philadelphia, PA, 2015

PDF

As datacenter speeds scale to 100 Gb/s and beyond, traditional congestion control algorithms like TCP and RCP converge slowly to steady sending rates, which leads to poorer reactive and less predictable user experience for short flows. We argue that at today's datacenter speeds, link bandwidth should be allocated proactively by the network, rather than reactively by end-hosts. We describe an architecture called Homa, which proactively allocates bandwidth using in-network priority queues and a receiver-driven protocol. We show that Homa can deliver messages with tail latencies within a small factor of optimal, while achieving high overall throughput.

@inproceedings{jose2015high, title={High Speed Networks Need Proactive Congestion Control}, author={Jose, Lavanya and Yan, Lisa and Alizadeh, Mohammad and Varghese, George and McKeown, Nick and Katti, Sachin}, booktitle={ACM HotNets}, year={2015} }

Buffer Sizing results for RCP Congestion Control under Connection Arrivals and Departures

Ashvin Lakshmikantha, R. Srikant, Nandita Dukkipati, Nick McKeown, and Carolyn Beck

ACM SIGCOMM Computer Communication Review, Volume 39, Issue 1, January 2009, Pages 5-15

PDF

Buffer sizing has received a lot of attention recently since it is becoming increasingly difficult to use large buffers in high-speed routers. Much of the prior work has concentrated on analyzing the amount of buffering required in core routers assuming that TCP carries all the data traffic. In this paper, we evaluate the amount of buffering required for RCP on a single congested link, while explicitly modeling flow arrivals and departures. Our theoretical analysis and simulations indicate that buffer sizes of about 10% of the bandwidth-delay product are sufficient for RCP to deliver good performance to end-users.

@inproceedings{lakshmikantha2009buffer, title = {Buffer Sizing results for RCP Congestion Control under Connection Arrivals and Departures}, author = {Ashvin Lakshmikantha, R. Srikant, Nandita Dukkipati, Nick McKeown, and Carolyn Beck}, booktitle = {ACM SIGCOMM Computer Communication Review, Volume 39, Issue 1, January 2009, Pages 5-15}, year = {2009} }

Stability Analysis of Explicit Congestion Control Protocols

Hamsa Balakrishnan, Nandita Dukkipati, Nick McKeown, Claire Tomlin

IEEE Communications Letters, Vol. 11, No. 10, October 2007

PDF

In the context of explicit congestion control protocols like XCP and RCP where the equilibrium queue lengths are zero, we show that the stability region derived from traditional Nyquist analysis is not an accurate predictor of protocol stability for networks carrying a finite number of flows. This is because when there is no queueing, the flow rates do not change smoothly; they instead change abruptly creating limit cycles. We characterize the stability region analytically and find that it is a function of the number of flows. Our analysis provides a simple rule for choosing protocol parameters to ensure stability for a given network scenario.

@article{balakrishnan2007stability, title={Stability Analysis of Explicit Congestion Control Protocols}, author={Balakrishnan, Hamsa and Dukkipati, Nandita and McKeown, Nick and Tomlin, Claire}, journal={IEEE Communications Letters}, volume={11}, number={10}, year={2007} }

Processor Sharing Flows in the Internet

Nandita Dukkipati, Nick McKeown

Stanford University High Performance Networking Group Technical Report TR04-HPNG-061604

PDF

Most congestion control algorithms try to emulate processor sharing (PS) by giving each competing flow an equal share of a bottleneck link. This approach leads to fairness, and prevents long flows from hogging resources. For example, if a set of flows with the same RTT share a bottleneck link, TCP’s congestion control mechanism tries to achieve PS; so do most of the proposed alternatives, such as eXplicit Control Protocol (XCP). But although they emulate PS well in a static scenario when all flows are long-lived, they do not come close to PS when new flows arrive randomly and have a finite amount of data to send, as is the case in today’s Internet. Typically, flows take an order of magnitude longer to complete with TCP or XCP than with PS, suggesting large room for improvement. And so in this paper, we explore how a new congestion control algorithm — Rate Control Protocol (RCP) — comes much closer to emulating PS over a broad range of operating conditions. In RCP, a router assigns a single rate to all flows that pass through it. The router does not keep flow-state, and does no per-packet calculations. Yet we are able to show that under a wide range of traffic characteristics and network conditions, RCP’s performance is very close to ideal processor sharing.

@techreport{dukkipatiprocessor, title = {Processor Sharing Flows in the Internet}, author = {Nandita Dukkipati, Nick McKeown}, institution = {Stanford University High Performance Networking Group Technical Report TR04-HPNG-061604}, year = {} }

Processor Sharing Flows in the Internet

Nandita Dukkipati, Masayoshi Kobayashi, Rui Zhang-Shen, Nick McKeown

Thirteenth International Workshop on Quality of Service (IWQoS), Passau, Germany, June 2005

PDF

Most congestion control algorithms try to emulate processor sharing (PS) by giving each competing flow an equal share of a bottleneck link. This approach leads to fairness, and prevents long flows from hogging resources. For example, if a set of flows with the same RTT share a bottleneck link, TCP’s congestion control mechanism tries to achieve PS; so do most of the proposed alternatives, such as eXplicit Control Protocol (XCP). But although they emulate PS well in a static scenario when all flows are long-lived, they do not come close to PS when new flows arrive randomly and have a finite amount of data to send, as is the case in today’s Internet. Typically, flows take an order of magnitude longer to complete with TCP or XCP than with PS, suggesting large room for improvement. And so in this paper, we explore how a new congestion control algorithm — Rate Control Protocol (RCP) — comes much closer to emulating PS over a broad range of operating conditions. In RCP, a router assigns a single rate to all flows that pass through it. The router does not keep flow-state, and does no per-packet calculations. Yet we are able to show that under a wide range of traffic characteristics and network conditions, RCP’s performance is very close to ideal processor sharing.

@inproceedings{dukkipati2005processor, title = {Processor Sharing Flows in the Internet}, author = {Nandita Dukkipati, Masayoshi Kobayashi, Rui Zhang-Shen, Nick McKeown}, booktitle = {Thirteenth International Workshop on Quality of Service (IWQoS), Passau, Germany, June 2005}, year = {2005} }

Building a RCP (Rate Control Protocol) Test Network

Nandita Dukkipati, Glen Gibb, Nick McKeown, Jiang Zhu

Hot Interconnects, Stanford, August 2007

PDF

We recently proposed the Rate Control Protocol (RCP) as way to minimize download times (or flow-completion times). Simulations suggest that if RCP were widely deployed, downloadswould frequently finish ten times faster thanwithTCP.ThisisbecauseRCPinvolvesexplicitfeedbackfrom theroutersalongthepath,allowingasenderto pickafaststartingrate,andadaptquicklytonetworkconditions. RCP is particularly appealing because it can be shown to be stable under broad operatingconditions, and itsperformanceisindependentoftheflow-sizedistribution and the RTT. Although it requires changes to the routers, the changesare small: The routers keep no per-flow state orper-flowqueues, andthe per-packetprocessingis minimal. However, the bar is high for a new congestion control mechanism–introducinganewschemerequiresenormous change,andtheargumentneedstobecompelling. Andso, to enable some scientific and repeatable experiments with RCP, we have built and tested an open and public implementation of RCP; we have made available both the endhostsoftware,andtherouterhardware. In this paper we describe our end-host implementation ofRCPin Linux,andourrouterimplementationin Verilog (on the NetFPGA platform). We hope that others will be ableto use theseimplementationstoexperimentwith RCP andfurtherourunderstandingofcongestioncontrol.

@inproceedings{dukkipati2007building, title = {Building a RCP (Rate Control Protocol) Test Network}, author = {Nandita Dukkipati, Glen Gibb, Nick McKeown, Jiang Zhu}, booktitle = {Hot Interconnects, Stanford, August 2007}, year = {2007} }

RCP-AC: Congestion Control to Make Flows Complete Quickly in Any Environment

Nandita Dukkipati, Nick McKeown, Alexander G. Fraser

High-Speed Networking Workshop (IEEE INFOCOM), Barcelona, April 2006

PDF

Rate Control Protocol with Acceleration Control (RCP-AC) is a congestion control protocol designed to minimize flow completion time. RCP-AC operates at the router, which assigns rates to flows based on aggregate feedback. Unlike traditional TCP-based approaches that optimize for throughput or fairness, RCP-AC explicitly targets flow completion time, which enables typical Internet flows to complete several times faster than with TCP.

@inproceedings{dukkipati2006rcp, title={RCP-AC: Congestion Control to Make Flows Complete Quickly in Any Environment}, author={Dukkipati, Nandita and McKeown, Nick and Fraser, Alexander G.}, booktitle={IEEE INFOCOM High-Speed Networking Workshop}, year={2006} }

Why Flow-Completion Time is the Right Metric for Congestion Control

Nandita Dukkipati, Nick McKeown

ACM SIGCOMM Computer Communication Review, Vol. 36, No. 1, January 2006

PDF

We argue that flow completion time is the right metric for evaluating congestion control in datacenter networks. Most datacenter applications care about how quickly their flows complete, not about the throughput or fairness achieved in steady state. We show that optimizing for flow completion time leads to different design choices than optimizing for throughput or fairness, and present a framework for reasoning about the flow completion time performance of different congestion control schemes.

@article{dukkipati2006why, title={Why Flow-Completion Time is the Right Metric for Congestion Control}, author={Dukkipati, Nandita and McKeown, Nick}, journal={ACM SIGCOMM Computer Communication Review}, volume={36}, number={1}, year={2006} }

An Approach to Alleviate Link Overload as Observed on an IP Backbone

Sundar Iyer, Supratik Bhattacharyya, Nina Taft, Christophe Diot, Nick McKeown

IEEE INFOCOM, San Francisco, CA, March 2003

PDF

Shortest path routing protocols may suffer from con- of shortest path routing are well known. They constrain the load gestion due to the use of a single shortest path between a source and on the network along a small set of shortest paths even if altera destination. The goal of our work is to first understand how links nate paths with available capacity exist and limit the throughput become overloaded in an IP backbone, and then to explore if the between any two nodes to well below the maximum capacity routing protocol, — either in its existing form, or in some enhanced form could be made to respond immediately to overload and reduce [1]. Ideally, if multiple paths were available, the routing protothe likelihood of its occurrence. Our method is to use extensive mea- col would spread the traffic uniformly over the whole network, surements of Sprint’s backbone network, measuring 138 links minimizing the probability that any one link is overloaded. between September 2000 and June 2001. We find that since the Indeed, both IS-IS and OSPF have been extended to split traffic backbone is designed to be overprovisioned, link overload is rare, among multiple equal cost paths. Multipath routing protocols and when it occurs, 80% of the time it is caused due to link failures. Furthermore, we find that when a link is overloaded, few (if any) which can divide the load unequally amongst several alternate other links in the network are also overloaded.

@inproceedings{iyer2003approach, title={An Approach to Alleviate Link Overload as Observed on an IP Backbone}, author={Iyer, Sundar and Bhattacharyya, Supratik and Taft, Nina and Diot, Christophe and McKeown, Nick}, booktitle={IEEE INFOCOM}, year={2003} }

Congestion Control and Periodic Behavior

Anna Gilbert, Youngmi Joo and Nick McKeown

Presented at LANMAN Workshop 2001

PDF

Feedback based congestion control — used by TCP and other transport protocols — causes the transmission rate of a long-lived flow to oscillate. This paper is about the tendency of multiplexed flows on a bottleneck link to oscillate causing “aggregate periodic behavior”. As a consequence, bottleneck links can become underutilized, wasting capacity. One simple way to prevent this is to use larger buffers in the bottleneck router. We explore how large these buffers need to be to absorb oscillations. Another proposed solution is to use a randomized active queue management algorithm, such as RED. A goal of RED is to reduce “global synchronization” of TCP flows, and hence avoid oscillations. Our results suggest that RED does not reduce aggregate periodic behavior when compared to Drop Tail. Perhaps surprisingly, our results and a simple analytical model suggest that the tendency for aggregate traffic to oscillate is robust against many forms of disturbances added to the timing of feedback and reaction, even if each constituent flow shows little or no sign of periodicity. Different flows can be in different states and yet conspire to make their sum have periodic behavior.

@inproceedings{gilbert2001congestion, title = {Congestion Control and Periodic Behavior}, author = {Anna Gilbert, Youngmi Joo and Nick McKeown}, booktitle = {Presented at LANMAN Workshop 2001}, year = {2001} }

Neutral Net Neutrality

Yiannis Yiakoumis, Sachin Katti, Nick McKeown

ACM SIGCOMM, Florianopolis, Brazil, 2016

PDF

Should applications receive special treatment from the network? And if so, who decides which applications are preferred? This discussion, known as net neutrality, goes beyond technology and is a hot political topic. In this paper we approach net neutrality from a user's perspective. Through user studies, we demonstrate that users do indeed want some services to receive preferential treatment; and their preferences have a heavy tail: a one-size-fits-all approach will not work. We prototype Boost, a mechanism that gives users the ability to prioritize their own traffic, putting the net neutrality decision in the hands of the end user.

@inproceedings{yiakoumis2016neutral, title={Neutral Net Neutrality}, author={Yiakoumis, Yiannis and Katti, Sachin and McKeown, Nick}, booktitle={ACM SIGCOMM}, year={2016} }

Putting Home Users in Charge of their Network

Yiannis Yiakoumis, Te-Yuan Huang, Kok-Kiong Yap, Sachin Katti, Ramesh Johari, Nick McKeown

ACM HomeSys Workshop 2012, Pittsburgh, PA

PDF

There is an active policy debate about who can control the Internet traffic that flows over the last-mile connection, often causing congestion and bad user experience. Policy-makers, ISPs, and content providers each want a say, but the user is rarely consulted. We argue that users should be able to decide how their traffic is treated. We present a system that allows home users to prioritize their own traffic, giving them direct control over their network experience.

@inproceedings{yiakoumis2012putting, title={Putting Home Users in Charge of their Network}, author={Yiakoumis, Yiannis and Huang, Te-Yuan and Yap, Kok-Kiong and Katti, Sachin and Johari, Ramesh and McKeown, Nick}, booktitle={ACM HomeSys Workshop}, year={2012} }

Scheduling Packets Over Multiple Interfaces While Respecting User Preferences

Kok-Kiong Yap, Te-Yuan Huang, Yiannis Yiakoumis, Sandeep Chinchali, Nick McKeown, Sachin Katti

CoNEXT 2013, Santa Barbara, CA

PDF

Mobile devices today have access to multiple network interfaces simultaneously, including WiFi, cellular, and Bluetooth. We present a system that schedules packets across available interfaces to optimize performance while respecting user preferences about cost, energy consumption, and quality. Users express preferences as simple policies, which the scheduler translates into optimal scheduling decisions. We implement and evaluate our system on Android devices.

@inproceedings{yap2013scheduling, title={Scheduling Packets Over Multiple Interfaces While Respecting User Preferences}, author={Yap, Kok-Kiong and Huang, Te-Yuan and Yiakoumis, Yiannis and Chinchali, Sandeep and McKeown, Nick and Katti, Sachin}, booktitle={CoNEXT}, year={2013} }

Late-binding: How to Lose Fewer Packets During Handoff

Kok-Kiong Yap, Te-Yuan Huang, Yiannis Yiakoumis, Nick McKeown, Sachin Katti

CellNet 2013

PDF

Current networking stacks were designed for a single wired network interface. Today, it is common for a mobile device connect to many networks that come and go, and whose rates are constantly changing. Current practice is to pick one network, hand-off to it, and use it until a better one comes along, leading to packet loss and delay during the handoff. We describe a late-binding architecture, in which the sender does not choose the interface ahead of time. Instead, we use a simple multi-path protocol that delays the binding of a packet to a particular path. We find that late-binding can reduce median handoff packet-loss by 5-10x.

@inproceedings{yap2013late, title={Late-binding: How to Lose Fewer Packets During Handoff}, author={Yap, Kok-Kiong and Huang, Te-Yuan and Yiakoumis, Yiannis and McKeown, Nick and Katti, Sachin}, booktitle={CellNet}, year={2013} }

Multi-Server Generalized Processor Sharing

Kok-Kiong Yap, Nick McKeown, Sachin Katti

September 2012, Krakw, Poland International Teletraffic Congress 2012

PDF

End-hosts are increasingly equipped with multiple network interfaces, ranging from smartphones with multiple radiostoserverswithmulti-homing.Theseinterfacesarediverse; some are expensive to use (e.g. 4G), some are free (e.g Wi Fi) and they have different rates and reliability. On the other hand, end-hosts now run diverse applications with different priorities, from relatively less important web browsing to higher priority VoIP and video calls. Finally, users may have policies that constrain interface use (e.g. use 4G only for high priority flows). This paper tackles the question of how different applications can use different subsets of the available network interfaces, while ensuring a fair resource allocation among flows, while satisfying policy constraints. We generalize prior classical work onprocessorsharing(GPS)tothecaseofflowssharingdifferent subsets of the available interfaces. We show a simple scheduling scheme for packet-by-packet GPS over multiple interfaces, and prove that it can provide bounded delay and rate guarantees.

@inproceedings{yap2012multiserver, title = {Multi-Server Generalized Processor Sharing}, author = {Kok-Kiong Yap, Nick McKeown, Sachin Katti}, booktitle = {September 2012, Krakw, Poland International Teletraffic Congress 2012}, year = {2012} }

Making Use of All the Networks Around Us: A Case Study in Android

Kok-Kiong Yap, Te-Yuan Huang, Yiannis Yiakoumis, Masayoshi Kobayashi, Sachin Katti, Guru Parulkar, Nick McKeown

ACM CellNet Workshop, Helsinki, Finland, August 2012

PDF

Poor connectivity is common when we use wireless networks on the go. A natural way to tackle the problem is to take advantage of the multiple network interfaces on our mobile devices, and use all the networks around us simultaneously. We describe Hercules, a system that uses all available network interfaces to increase throughput and reliability for mobile devices. We implement Hercules in Android and evaluate it through extensive measurements. We find that Hercules can increase throughput by up to 2x compared to using a single interface, while also reducing the duration of connectivity gaps.

@inproceedings{yap2012making, title={Making Use of All the Networks Around Us: A Case Study in Android}, author={Yap, Kok-Kiong and Huang, Te-Yuan and Yiakoumis, Yiannis and Kobayashi, Masayoshi and Katti, Sachin and Parulkar, Guru and McKeown, Nick}, booktitle={ACM CellNet Workshop}, year={2012} }

Separating Authentication, Access and Accounting: A Case Study with OpenWiFi

Kok-Kiong Yap, Yiannis Yiakoumis, Masayoshi Kobayashi, Sachin Katti, Guru Parulkar, and Nick McKeown

Stanford University NEC OPENFLOW-TR-2011-1

PDF

Guest WiFi systems have become an integral part of our lives, yet they continue to be plagued by largely unresolved issues. In this technical report, we argue for the separation of authentication, access and accounting. To understand our proposal, we prototyped the OpenWiFi system. We describe OpenWiFi, its uses, implementation, and the numerous challenges we faced, along with trials we have undertaken to share wireless access more broadly.

@inproceedings{yap2011separating, title = {Separating Authentication, Access and Accounting: A Case Study with OpenWiFi}, author = {Kok-Kiong Yap, Yiannis Yiakoumis, Masayoshi Kobayashi, Sachin Katti, Guru Parulkar, and Nick McKeown}, booktitle = {Stanford University NEC OPENFLOW-TR-2011-1}, year = {2011} }

Blueprint for Introducing Innovation into Wireless Mobile Networks

Michael Chan, Nikhil Handigol, Nick McKeown, Guru Parulkar

ACM SIGCOMM VISA Workshop, New Delhi, India, September 2010

PDF

In the past couple of years we have seen quite a change in the mobile industry, driven largely by user demand for new applications and services. But the underlying wireless infrastructure is slow to evolve. We argue that the network itself must become programmable and open to innovation, just as the Internet was opened up through software-defined networking. We present a blueprint for making wireless and mobile networks more open to experimentation, enabling rapid deployment of new services and protocols.

@inproceedings{chan2010blueprint, title={Blueprint for Introducing Innovation into Wireless Mobile Networks}, author={Chan, Michael and Handigol, Nikhil and McKeown, Nick and Parulkar, Guru}, booktitle={ACM SIGCOMM VISA Workshop}, year={2010} }

Delivering Capacity for the Mobile Internet by Stitching Together Networks

Kok-Kiong Yap, Sachin Katti, Guru Parulkar, Nick McKeown

Proceedings of the 2010 ACM workshop on Wireless of the students, by the students, for the students S310, September 2010, Chicago, Illinois

PDF

Despite findings that only 5.2% of the spectrum from 30 MHz to 3 GHz is utilized, there is much talk of an impending spectrum crisis. This is no contradiction — spectrum is inefficiently utilized, with some part of the spectrum being heavily used while others barely used. In this paper, we explore a simple high-level approach to the problem: to enable user mobility across networks and exploit all the capacity available. By doing so, we can create a better mobile network by stitching together existing ones spanning multiple wireless technologies. We briefly outline our exploratory foray into radio agnostic handover and discuss the various challenges ahead.

@inproceedings{yap2010delivering, title = {Delivering Capacity for the Mobile Internet by Stitching Together Networks}, author = {Kok-Kiong Yap, Sachin Katti, Guru Parulkar, Nick McKeown}, booktitle = {Proceedings of the 2010 ACM workshop on Wireless of the students, by the students, for the students S310, September 2010, Chicago, Illinois}, year = {2010} }

Reproducible Network Experiments using Container-Based Emulation

Nikhil Handigol, Brandon Heller, Vimalkumar Jeyakumar, Bob Lantz, Nick McKeown

CoNEXT 2012, Nice, France

PDF

In an ideal world, all research papers would be runnable: simply click to replicate all results, using the same setup as the authors. Unfortunately, reproducibility in networking research is rare. We present a system for creating reproducible network experiments using lightweight container-based emulation. Our approach combines Mininet with version-controlled experiment specifications, making it possible to package, share, and re-run network experiments with minimal effort. We demonstrate the system by reproducing several published research results.

@inproceedings{handigol2012reproducible, title={Reproducible Network Experiments using Container-Based Emulation}, author={Handigol, Nikhil and Heller, Brandon and Jeyakumar, Vimalkumar and Lantz, Bob and McKeown, Nick}, booktitle={CoNEXT}, year={2012} }

A Network in a Laptop: Rapid Prototyping for Software-Defined NetworksSIGCOMM Test of Time 2020

Bob Lantz, Brandon Heller, Nick McKeown

ACM SIGCOMM HotNets, November 2010

PDF

Mininet is a system for rapidly prototyping large networks on the limited resources of a single computer. It creates virtual networks using lightweight OS-level virtualization, providing hosts, switches, controllers, and links that behave like their hardware counterparts. Mininet enables interactive development, testing, and debugging of Software-Defined Networks, and its networks can be easily transferred to hardware for line-rate operation. We describe Mininet's design and evaluate its fidelity and performance.

@inproceedings{lantz2010network, title={A Network in a Laptop: Rapid Prototyping for Software-Defined Networks}, author={Lantz, Bob and Heller, Brandon and McKeown, Nick}, booktitle={ACM SIGCOMM HotNets}, year={2010} }

On Estimating the Number of Flows

Bruce Spang, Nick McKeown

Proceedings of the Buffer Sizing Workshop (BS ’19), co-located with ACM SIGMETRICS, Stanford, CA, December 2019

PDF

@inproceedings{spang2019estimating, title={On Estimating the Number of Flows}, author={Spang, Bruce and McKeown, Nick}, booktitle={Proceedings of the Buffer Sizing Workshop (BS '19), co-located with ACM SIGMETRICS}, year={2019} }

Packet-switching with little or no buffers

Nick McKeown

31st European Conference on Optical Communications (ECOC 2005), Glasgow, UK, Vol. 1, p. 7, September 2005

PDF

@inproceedings{mckeown2005nobuffers, title={Packet-switching with little or no buffers}, author={McKeown, Nick}, booktitle={31st European Conference on Optical Communications (ECOC 2005)}, address={Glasgow, UK}, volume={1}, pages={7}, year={2005} }

Recent Results on Sizing Router Buffers

Guido Appenzeller, Nick McKeown, Joel Sommers, Paul Barford

Proceedings of the Network Systems Design Conference, San Jose, CA, October 2004

PDF

@inproceedings{appenzeller2004recentresults, title={Recent Results on Sizing Router Buffers}, author={Appenzeller, Guido and McKeown, Nick and Sommers, Joel and Barford, Paul}, booktitle={Proceedings of the Network Systems Design Conference}, address={San Jose, CA}, year={2004} }

Performing Time-Sensitive Network Experiments

Neda Beheshti, Yashar Ganjali, Monia Ghobadi, Nick McKeown, Jad Naous, and Geoff Salmon

ANCS'08, November 6-7, 2008, San Jose, CA, USA

PDF

Network experiments often require precise timing and repeatability, which are difficult to achieve in real networks. We describe a system for performing time-sensitive network experiments that enables researchers to control buffer sizes, measure queue occupancies, and replay traffic patterns with high precision. Our system is built on the NetFPGA platform and provides the accuracy needed for buffer sizing experiments.

@inproceedings{beheshti2008performing, title = {Performing Time-Sensitive Network Experiments}, author = {Neda Beheshti, Yashar Ganjali, Monia Ghobadi, Nick McKeown, Jad Naous, and Geoff Salmon}, booktitle = {ANCS'08, November 6-7, 2008, San Jose, CA, USA}, year = {2008} }

Optical Packet Buffers for Backbone Internet Routers

Neda Beheshti, Emily F. Burmeister, Yashar Ganjali, John E. Bowers, Daniel J. Blumenthal, Nick McKeown

IEEE Transactions on Networking, (TON), vol. 18, no. 5, October 2010

PDF

If optical routers are to become reality, we will need several new optical technologies, one of which is to build sufficiently large optical buffers. Building optical buffers for routers is daunting: today’s electronic routers often hold millions of packets, which is well beyond the capabilities of optical technology. In this paper, we argue that two new results offer a solution. First, we show that the size of buffers in backbone routers can be made very small — just about 20 packets per linecard — at the expense of a small loss in throughput. Second, we show that integrated delay line optical buffers can store a few dozen packets on a photonic chip. With the combination of these two results, we conclude that future Internet routers could use optical buffers.

@article{beheshti2010optical, title = {Optical Packet Buffers for Backbone Internet Routers}, author = {Neda Beheshti, Emily F. Burmeister, Yashar Ganjali, John E. Bowers, Daniel J. Blumenthal, Nick McKeown}, journal = {IEEE Transactions on Networking, (TON), vol. 18, no. 5, October 2010}, year = {2010} }

Integrated Photonics for Low-Power Packet Networking

Daniel J. Blumenthal, John Barton, Neda Beheshti, John E. Bowers, Emily Burmeister, Larry A. Coldren, Matt Dummer, Garry Epps, Alexander Fang, Yashar Ganjali, John Garcia, Brian Koch, Vikrant Lal, Erica Lively, John Mack, Milan Masanovic, Nick McKeown, Kim Nguyen, Steven C. Nicholes, Hyundai Park, Biljana Stamenic, Anna Tauke-Pedretti, Henrik Poulsen, and Matt Sysak

IEEE Journal of Selected Topics in Quantum Electronics, Vol 17, No. 2, March/April 2011

PDF

We describe how integrated photonic technologies can enable low-power packet networking. As network speeds increase, the power consumption of electronic packet processing becomes a growing concern. We discuss how photonic integration can reduce power in key networking functions including buffering, switching, and signal processing, and present results from several prototype demonstrations.

@article{blumenthal2011integrated, title = {Integrated Photonics for Low-Power Packet Networking}, author = {Daniel J. Blumenthal, John Barton, Neda Beheshti, John E. Bowers, Emily Burmeister, Larry A. Coldren, Matt Dummer, Garry Epps, Alexander Fang, Yashar Ganjali, John Garcia, Brian Koch, Vikrant Lal, Erica Lively, John Mack, Milan Masanovic, Nick McKeown, Kim Nguyen, Steven C. Nicholes, Hyundai Park, Biljana Stamenic, Anna Tauke-Pedretti, Henrik Poulsen, and Matt Sysak}, journal = {IEEE Journal of Selected Topics in Quantum Electronics, Vol 17, No. 2, March/April 2011}, year = {2011} }

Updating the Theory of Buffer Sizing

Bruce Spang, Serhat Arslan, Nick McKeown

Performance Evaluation, Vol. 151, 2021

PDF

Internet routers have packet buffers which reduce packet loss during times of congestion. Sizing the router buffer correctly is important: if a router buffer is too small, it can cause high packet loss and link under-utilization. If a buffer is too large, packets may have to wait an unnecessarily long time in the buffer during congested periods, often up to hundreds of milliseconds. While an operator can reduce the operational size of a router buffer, the maximum size of a router buffer is decided by the router manufacturer, and the operator typically configures the router to use all the available buffers. Without clear guidance about how big a buffer needs to be, manufacturers tend to oversize buffers and operators tend to configure larger buffers than necessary, leading to increased cost and delay.

@article{spang2021updating, title={Updating the Theory of Buffer Sizing}, author={Spang, Bruce and Arslan, Serhat and McKeown, Nick}, journal={Performance Evaluation}, volume={151}, year={2021} }

Sizing Router Buffers (Redux)

Nick McKeown, Guido Appenzeller, Isaac Keslassy

ACM SIGCOMM Computer Communication Review, Vol. 49, No. 5, 2019

PDF

The queueing delay faced by a packet is arguably the largest source of uncertainty during its journey. It therefore seems crucial that we understand how big the buffers should be in Internet routers. Our 2004 Sigcomm paper revisited the existing rule of thumb that a buffer should hold one bandwidth-delay product of packets. We claimed that for long-lived TCP flows, it could be reduced by √N, where N is the number of active flows, potentially reducing the required buffers by well over 90% in Internet backbone routers. One might reasonably expect that such a result, which supports cheaper routers with smaller buffers, would be embraced by the ISP community. In this paper we revisit the result 15 years later, and explain where it has succeeded and failed to affect how buffers are sized.

@article{mckeown2019sizing, title={Sizing Router Buffers (Redux)}, author={McKeown, Nick and Appenzeller, Guido and Keslassy, Isaac}, journal={ACM SIGCOMM Computer Communication Review}, volume={49}, number={5}, year={2019} }

Analysis of a Memory Architecture for Fast Packet Buffers

Sundar Iyer, Ramana Rao Kompella, and Nick McKeown

IEEE - High Performance Switching and Routing, Dallas, Texas, May 2001, pp. 368-373

PDF

All packet switches contain packet buffers to hold packets during times of congestion. The capacity of a high performance router is often dictated by the speed of its packet buffers. This is particularly true for a shared memory switch where the memory needs to operate at N times the line rate, where N is the number of ports in the system. Even input queued switches must be able to buffer packets at the rate at which they arrive. And so as link rates increase memory bandwidth requirements grow. With today’s DRAM technology and for an OC192c (10Gb/s) link, it is barely possible to write packets to (read packets from) memory at the rate at which they arrive (depart). As link rates increase, the problem will get harder. There are several techniques for building faster packet buffers, based on ideas from computer architecture such as memory interleaving and banking. While not directly applicable to packet switches, they form the basis of several techniques in use today. In this paper we consider one particular packet buffer architecture consisting of large, slow, low cost, DRAMs coupled with a small, fast SRAM “buffer”. We describe and analyze a memory management algorithm (ECQF-MMA) for replenishing the cache and find a bound on the size of the SRAM.

@inproceedings{iyer2001analysis, title = {Analysis of a Memory Architecture for Fast Packet Buffers}, author = {Sundar Iyer, Ramana Rao Kompella, and Nick McKeown}, booktitle = {IEEE - High Performance Switching and Routing, Dallas, Texas, May 2001, pp. 368-373}, year = {2001} }

Analysis of a Statistics Counter Architecture

Devavrat Shah, Sundar Iyer, Balaji Prabhakar, and Nick McKeown

Hot Interconnects, Stanford, August 2001. This paper also appeared as: "Maintaining Statistics Counters in Router Line Cards in IEEE Micro, Jan-Feb 2002, pp. 76-81

PDF

Packet switches (e.g., IP routers, ATM switches and Ethernet switches) maintain statistics for a variety of reasons: performance monitoring, network management, security, network tracing, and traffic engineering. The statistics are usually collected by counters which might, for example, count the number of arrivals of a specific type of packet, or count particular events, such as when a packet is dropped. The arrival of a packet may lead to several different statistics counters being updated. The number of statistics counters and the rate at which they are updated is often limited by memory technology. A small number of counters may be held in on-chip registers or in (onor off-chip) SRAM. But often, the number of counters is very large, and hence they need to be stored in off-chip DRAM. However, the large random access times of DRAMs make it difficult to support high bandwidth links. The time taken to read, update and write a single counter would be too large, and worse still multiple counters may need to be updated for each arriving packet. In this paper we consider a specific architecture for storing and updating statistics counters. Smaller sized counters are maintained in fast (potentially on-chip) SRAM, while a large, slower DRAM maintains the full-sized counters. The problem is to ensure that the counter values are always correctly maintained at line-rate. We describe and analyze an optimal counter management algorithm (LCF-CMA), which minimizes the size of the SRAM required while ensuring correct line-rate operation of a large number of counters.

@article{shah2001analysis, title = {Analysis of a Statistics Counter Architecture}, author = {Devavrat Shah, Sundar Iyer, Balaji Prabhakar, and Nick McKeown}, journal = {Hot Interconnects, Stanford, August 2001. This paper also appeared as: "Maintaining Statistics Counters in Router Line Cards in IEEE Micro, Jan-Feb 2002, pp. 76-81}, year = {2001} }

Techniques for Fast Shared Memory Switches

Sundar Iyer, Nick McKeown

Stanford HPNG Technical Report TR01-HPNG-081501

PDF

Shared memory switches provide the best possible throughput and delay characteristics, but building them at high speeds is challenging because the memory must operate at N times the line rate, where N is the number of ports. We describe techniques to build shared memory switches at high speeds while reducing the memory speed requirement, making practical high-performance shared memory switch implementations feasible.

@techreport{iyertechniques, title = {Techniques for Fast Shared Memory Switches}, author = {Sundar Iyer, Nick McKeown}, institution = {Stanford HPNG Technical Report TR01-HPNG-081501}, year = {} }

Experimental Study of Router Buffer SizingBest Paper

Neda Beheshti, Yashar Ganjali, Monia Ghobadi, Nick McKeown, Geoff Salmon

IMC, Vouliagmeni, Greece, October 2008

PDF

During the past four years, several papers have proposed rules for sizing buffers in Internet core routers. We describe a set of experiments conducted on a commercial router to measure the relationship between buffer size and link utilization, queueing delay, and drop rate. Our results confirm the theoretical predictions that buffers can be significantly smaller than the bandwidth-delay product, and we characterize the trade-offs between buffer size, link utilization, and packet loss.

@inproceedings{beheshti2008experimental, title={Experimental Study of Router Buffer Sizing}, author={Beheshti, Neda and Ganjali, Yashar and Ghobadi, Monia and McKeown, Nick and Salmon, Geoff}, booktitle={ACM IMC}, year={2008} }

Demo: Experimenting with Programmable Routers in Real Networks

N. Beheshti, D. Underhill, B. Heller, S. Bolouki, N. McKeown, and Y. Ganjali

ACM SIGCOMM, (2nd-Best Demo Award), Seattle, WA, August 2008

PDF

We have deployed a network of fully programmable routers in the Internet2 backbone. Each router, built on the NetFPGA platform [1], can process four ports of line-rate Gigabit Ethernet traffic. The routers are interconnected by a full mesh of dedicated links over the Internet2 backbone. Our goal is to demonstrate how such a network of routers can enable real-time experiments that are not possible with commercial routers. In particular, we will demonstrate the tradeoff between a router’s buffer size and its forwarding performance, as seen by both the end-user and the network operator. Attendees will be able to interactively set the buffer size while monitoring the real-time occupancy of the buffers, drop rate, and throughput with a resolution of 8ns.

@inproceedings{beheshti2008demo, title = {Demo: Experimenting with Programmable Routers in Real Networks}, author = {N. Beheshti, D. Underhill, B. Heller, S. Bolouki, N. McKeown, and Y. Ganjali}, booktitle = {ACM SIGCOMM, (2nd-Best Demo Award), Seattle, WA, August 2008}, year = {2008} }

Designing Packet Buffers for Router Linecards

Sundar Iyer, Ramana Kompella, Nick McKeown

IEEE/ACM Transactions on Networking, Vol. 16, Issue 3, June 2008

PDF

Internet routers need packet buffers to absorb bursts of traffic and avoid packet loss during congestion. On router linecards, these buffers must be arranged to serve multiple logical queues while meeting strict timing constraints. We analyze the design of packet buffers for router linecards, considering the trade-offs between memory technology, buffer organization, and performance. We describe practical designs that achieve high throughput with reasonable memory costs.

@article{iyer2008designing, title={Designing Packet Buffers for Router Linecards}, author={Iyer, Sundar and Kompella, Ramana and McKeown, Nick}, journal={IEEE/ACM Transactions on Networking}, volume={16}, number={3}, year={2008} }

Packet Scheduling in Optical FIFO Buffers

N. Beheshti, Y. Ganjali, and N. McKeown

High-Speed Networking Workshop (In Conjunction with IEEE Infocom 2007), Anchorage, AK, May 2007

PDF

Recent advances in optical technology show the possibilityofbuildingall-opticalbuffersinthenearfuture.These buffers are usually composed of a number of fiber delay lines (FDLs) and optical switches. Incoming packets are stored for a limitedtimebygoingthroughopticaldelaylines.Opticalswitches transfer these packets among different delay lines, or send them towards the output line if a packet is to leave the system. As a direct consequence of using optical technology, one of the major constraints in this setting is that the size of switch needs to be small. In this paper, we show the feasibility of constructing a FIFO N O(logN) queue of size by using only 2x2 switches. A simple scheduling algorithm that achieves this bound is developed. The proposed structure provides an efficient way of storing optical packets using a minimal number of delay lines and switches.

@inproceedings{beheshti2007packet, title = {Packet Scheduling in Optical FIFO Buffers}, author = {N. Beheshti, Y. Ganjali, and N. McKeown}, booktitle = {High-Speed Networking Workshop (In Conjunction with IEEE Infocom 2007), Anchorage, AK, May 2007}, year = {2007} }

Experimenting with Buffer Sizing in Routers

Neda Beheshti, Yashar Ganjali, Jad Naous, and Nick McKeown

ANCS'07, December 2007, Orlando, Florida, USA

PDF

Recent theoretical results in buffer sizing research suggest that core Internet routers can achieve high link utilization, if they are capable of storing only a handful of packets. The underlying assumption is that the traffic is non-bursty, and that the system is operated below 85–90% utilization. In this paper, we present a test-bed for buffer sizing experiments using NetFPGA, a PCI-form factor board that contains reprogrammable FPGA elements, and four Gigabit Ethernet interfaces. We have designed and implemented a NetFPGA-based Ethernet switch with finely tunable buffer sizes, and an event capturing system to monitor buffer occupancies inside the switch. We show that reducing buffer sizes down to 20–50 packets does not necessarily degrade system performance.

@inproceedings{beheshti2007experimenting, title = {Experimenting with Buffer Sizing in Routers}, author = {Neda Beheshti, Yashar Ganjali, Jad Naous, and Nick McKeown}, booktitle = {ANCS'07, December 2007, Orlando, Florida, USA}, year = {2007} }

Obtaining High Throughput Networks with Tiny Buffers

Neda Beheshti, Yashar Ganjali, Ashish Goel, Nick McKeown

In Proceedings of 16th International Workshop on Quality of Service (IWQoS), Enschede, Netherlands, June 2008

PDF

In this paper we explore whether a general topology network built up of routers with very small buffers can maintain high throughput under TCP’s congestion control mechanism. Recent results on buffer sizing challenged the widely used assumption that routers should buffer millions of packets. These new results suggest that when smooth TCP traffic goes through a single tiny buffer of size O(log W), then close-to-peak throughput can be achieved; W is the maximum window size of TCP flows. In this work, we want to know if a network of many routers can perform well when all buffers in the network are made very small, independent of the structure of the network. This scenario represents a real network where packets go through several buffering stages on their routes. Assuming the ingress TCP traffic to a network is paced, we first prove that all routers can get by with very small buffers, if the network has a tree structure. For networks with general topology, we propose a simple active queue management policy called Bounded Jitter Policy (BJP), and show that under the proposed policy each flow will preserve its smooth pattern across the network. Logarithmic size buffers would therefore be enough in every router of the network.

@inproceedings{beheshti2008obtaining, title = {Obtaining High Throughput Networks with Tiny Buffers}, author = {Neda Beheshti, Yashar Ganjali, Ashish Goel, Nick McKeown}, booktitle = {In Proceedings of 16th International Workshop on Quality of Service (IWQoS), Enschede, Netherlands, June 2008}, year = {2008} }

Update on Buffer Sizing in Internet Routers

Yashar Ganjali, Nick McKeown

ACM SIGCOMM Computer Communication Review, Vol. 36, No. 5, October 2006

PDF

In the past two years, several papers have proposed rules that suggest two to five orders of magnitude reduction in Internet core router buffers. Others present scenarios where buffer sizes need to be significantly larger. Depending on the rule, a 40 Gb/s router might need a few bytes, or a few gigabytes of buffer. The choice of buffer size affects the design and cost of routers, network congestion, quality of service, and the fairness and stability of TCP. In this paper we discuss the state of the debate on buffer sizing, describe the areas of agreement, and try to identify areas where more work is needed.

@article{ganjali2006update, title={Update on Buffer Sizing in Internet Routers}, author={Ganjali, Yashar and McKeown, Nick}, journal={ACM SIGCOMM Computer Communication Review}, volume={36}, number={5}, year={2006} }

Routers with Very Small Buffers

Mihaela Enachescu, Yashar Ganjali, Ashish Goel, Nick McKeown, Tim Roughgarden

IEEE INFOCOM, Barcelona, Spain, April 2006

PDF

Internet routers require buffers to hold packets during times of congestion. The buffers need to be fast, and so ideally they should be small enough to use fast memory technologies such as SRAM or all-optical packet buffers. In this paper, we present results that suggest routers need only a very small amount of buffering. Specifically, we show that if the number of flows through the bottleneck link is large enough, the amount of buffering can be reduced by a factor equal to the square root of the number of flows, from the commonly used rule-of-thumb of B = RTT x C, to B = (RTT x C) / sqrt(n), where B is the buffer size, C is the link capacity, and n is the number of flows.

@inproceedings{enachescu2006routers, title={Routers with Very Small Buffers}, author={Enachescu, Mihaela and Ganjali, Yashar and Goel, Ashish and McKeown, Nick and Roughgarden, Tim}, booktitle={IEEE INFOCOM}, year={2006} }

Building Packet Buffers with Interleaved Memories

Gireesh Shrimali and Nick McKeown

Proceedings of Workshop on High Performance Switching and Routing, Hong Kong, May 2005

PDF

High end routers need to store a iurge amount of dntn. Dynamic random access memories (DRAMSa) re typicdy used.for this purpose. However, DMM metnojy devices don’t Mcht he bdwiddh random access speeds. In this requirements, especially in terms of memory interleavkg scheme. This paper, we analyze a gener&ed implements memory using muliplo, slower scheme a b e , f ast DRAMS. In the presence of smd mount of speed-up) we show that low drop probabditks) can be reosonabk stafktical guarantees &e., provided by using smdl SRAM buffers thot queue redwrite requests to DRAMS. We then rehed rop probabilities to SRAM buffer size fur a wide range of statistical arrivalpatterns.

@inproceedings{shrimali2005building, title = {Building Packet Buffers with Interleaved Memories}, author = {Gireesh Shrimali and Nick McKeown}, booktitle = {Proceedings of Workshop on High Performance Switching and Routing, Hong Kong, May 2005}, year = {2005} }

Part I: Buffer Sizes for Core Routers

Damon Wischik and Nick McKeown

ACM/SIGCOMM Computer Communication Review, Vol. 35, No. 3, July 2005

PDF

In this article we describe recent work on buffer sizing for core Internet routers. This work suggests that the widelyused rule of thumb leads to buffers which are much larger thantheyneedtobe. Forexample,thebufferinabackbone router could be reduced from 1,000,000 packets to 10,000 without loss in performance. It could be reduced even further,perhapsto10–20packets,atthecostofasmallamount ofbandwidthutilization. Thistradeoffisworthconsidering, for example for a possible futureall-optical router.

@article{wischik2005part, title = {Part I: Buffer Sizes for Core Routers}, author = {Damon Wischik and Nick McKeown}, journal = {ACM/SIGCOMM Computer Communication Review, Vol. 35, No. 3, July 2005}, year = {2005} }

Buffer sizing in all-optical packet switches

Neda Beheshti, Yashar Ganjali, Ramesh Rajaduray, Daniel Blumenthal, and Nick McKeown

In Proceedings of OFC/NFOEC, Anaheim, CA, March 2006

PDF

Packet-switched routers need buffers during times of congestion. We show that a combined input-output queued router needs no more buffering than an output queued router. Using simulations, we show that 10–20 packet buffers are enough.

@inproceedings{beheshti2006buffer, title = {Buffer sizing in all-optical packet switches}, author = {Neda Beheshti, Yashar Ganjali, Ramesh Rajaduray, Daniel Blumenthal, and Nick McKeown}, booktitle = {In Proceedings of OFC/NFOEC, Anaheim, CA, March 2006}, year = {2006} }

Sizing Router BuffersSIGCOMM Test of Time 2015

Guido Appenzeller, Isaac Keslassy, Nick McKeown

ACM SIGCOMM, Portland, August 2004

PDF

All Internet routers contain buffers to hold packets during times of congestion. Today, the size of these buffers is determined by the rule-of-thumb that each link needs a buffer of size equal to the bandwidth-delay product. The widespread use of this rule has led to backbone routers with millions of packets of buffering. We question whether this much buffering is actually needed. Through a combination of analysis and simulation, we argue that buffers can be much smaller than the bandwidth-delay product, reducing router cost and latency.

@inproceedings{appenzeller2004sizing, title={Sizing Router Buffers}, author={Appenzeller, Guido and Keslassy, Isaac and McKeown, Nick}, booktitle={ACM SIGCOMM}, year={2004} }

Designing Packet Buffers with Statistical Guarantees

Gireesh Shrimali, Isaac Keslassy, and Nick McKeown

Proceedings of Hot Interconnects, Stanford, CA, August 2004

PDF

Packet buffers are an essential part of routers. In highend routers these buffers need to store a large amount of data at very high speeds. To satisfy these requirements, we need a memory with the the speed of SRAM and the density of DRAM. A typical solution is to use hybrid packet buffers built from a combination of SRAM and DRAM, where the SRAM holds the heads and tails of per-flow packet FIFOs and the DRAM is used for bulk storage. The main challenge then is to minimize the size of the SRAM while providing reasonable performance guarantees. In this paper, we analyze a commonly used hybrid architecture from a statistical perspective, and ask the following question: if the packet buffer designer is willing to tolerate a certain drop probability, then how small can the SRAM get? To do so, we introduce an analytical model for representing the SRAM buffer occupancy, and derive drop probabilities as a function of SRAM size under a wide range of statistical traffic patterns. As a consequence of our analysis we show that, for low drop probability, the required SRAM size is proportional to the number of flows.

@inproceedings{shrimali2004designing, title = {Designing Packet Buffers with Statistical Guarantees}, author = {Gireesh Shrimali, Isaac Keslassy, and Nick McKeown}, booktitle = {Proceedings of Hot Interconnects, Stanford, CA, August 2004}, year = {2004} }

Doubling Memory Bandwidths for Network Buffers

Youngmi Joo, Nick McKeown

IEEE INFOCOM, San Francisco, April 1998

PDF

Memory bandwidth is frequently a limiting factor in the design of high-speed switches and routers. In this paper, we introduce a buffering scheme called ping-pong buffering, that increases memory bandwidth utilization by a factor of two, while adding less than one percent to total memory requirements. In particular, we show how ping-pong buffering can be used to double the throughput of a shared-memory switch, or reduce the memory speed requirement of an output-queued switch by a factor of two.

@inproceedings{joo1998doubling, title={Doubling Memory Bandwidths for Network Buffers}, author={Joo, Youngmi and McKeown, Nick}, booktitle={IEEE INFOCOM}, year={1998} }

Guaranteeing Quality of Service to Peering Traffic

Rui Zhang-Shen, Nick McKeown

Presented at Infocom 2008

PDF

Network operators connect their backbone networks together at peering points. It is well known that the peering points are the most congested parts of the backbone network. Network operators have little incentive to provision them well, and have few tools to decide how best to route traffic over them. In this paper we propose how peering networks can be congestion free, so long as we know the total amount of traffic between them. In particular, we propose the use of Valiant Load-Balancing (VLB), which has been previously studied for individual backbone networks. In our approach, the backbone networks do not need to use VLB internally — they simply load-balance traffic over their peering links. Our analysis shows how the load-balancing should be done, and we conclude that no other method is more efficient than VLB in achieving a congestion-free network.

@inproceedings{zhang-shen2008guaranteeing, title = {Guaranteeing Quality of Service to Peering Traffic}, author = {Rui Zhang-Shen, Nick McKeown}, booktitle = {Presented at Infocom 2008}, year = {2008} }

Designing a Fault-Tolerant Network Using Valiant Load-Balancing

Rui Zhang-Shen, Nick McKeown

Presented at Infocom 2008

PDF

Commercial backbone networks must continue to operate even when links and routers fail. Routing schemes such as OSPF, IS-IS, and MPLS reroute traffic, but they cannot guarantee that the resulting network will be congestion-free. As a result, backbone networks are grossly over-provisioned — sometimes running at a utilization below 10% so they can remain uncongested under failure. Yet even with such large over-provisioning, they still cannot guarantee to be uncongested, sometimes even with just a single failure. With our proposed approach, a network can be designed to tolerate an almost arbitrary number of failures, and guarantee no congestion, usually with an extremely small amount of over-provisioning. In a typical case, a 50 node network can continue to run congestion-free when any 5 links or routers fail, with only 10% over-provisioning. The key to the approach is Valiant Load-Balancing (VLB). VLB’s path diversity allows it to tolerate k arbitrary failures in an N node network, with an over-provisioning ratio of approximately k/N.

@inproceedings{zhang-shen2008designing, title = {Designing a Fault-Tolerant Network Using Valiant Load-Balancing}, author = {Rui Zhang-Shen, Nick McKeown}, booktitle = {Presented at Infocom 2008}, year = {2008} }

Scaling Internet Routers Using Optics

Isaac Keslassy, Shang-Tse Chuang, Kyoungsik Yu, David Miller, Mark Horowitz, Olav Solgaard, Nick McKeown

ACM SIGCOMM, Karlsruhe, Germany, August 2003

PDF

Routers built around a single-stage crossbar and a centralized scheduler do not scale, and (in practice) do not provide the throughput guarantees that network operators need to make efficient use of their network. We describe a scalable router architecture based on a three-stage Clos network topology, using an optical switch in the middle stage. The architecture allows the router to scale to very high aggregate bandwidths while maintaining deterministic performance guarantees. We address two main questions: what performance guarantees can a three-stage Clos-network router provide, and how optical switches can be used to build a practical implementation.

@inproceedings{keslassy2003scaling, title={Scaling Internet Routers Using Optics}, author={Keslassy, Isaac and Chuang, Shang-Tse and Yu, Kyoungsik and Miller, David and Horowitz, Mark and Solgaard, Olav and McKeown, Nick}, booktitle={ACM SIGCOMM}, year={2003} }

On the Speedup Required for a Multicast Parallel Packet Switch

Sundar Iyer and Nick McKeown

IEEE Communication Letters, June 2001, vol. 5, no. 6, pp

PDF

A parallel packet switch (PPS) is a switch in which the memories run slower than the line rate. Arriving packets are load-balanced packet-by-packet over multiple lower speed center stage packet switches. It is known that, for unicast traffic, a PPS can precisely emulate a FCFS output-queued (OQ) switch with a speedup of two and an OQ switch with delay guarantees with a speedup of three. In this paper we ask: Is it possible for a PPS to emulate the behavior of an OQmulticast switch? The main result is that for multicast traffic an N-port PPS can precisely emulate a FIFO OQ S>2 N+1, switch with a speedup of and a switch that provides delay guarantees with a speedup ofS>2 2N+2.

@article{iyer2001speedup, title = {On the Speedup Required for a Multicast Parallel Packet Switch}, author = {Sundar Iyer and Nick McKeown}, journal = {IEEE Communication Letters, June 2001, vol. 5, no. 6, pp}, year = {2001} }

A Load-Balanced Switch with an Arbitrary Number of Linecards

Isaac Keslassy, Shang-Tse Chuang, Nick McKeown

Stanford HPNG Technical Report TR03-HPNG-080102

PDF

We present a load-balanced switch architecture that can scale to an arbitrary number of linecards. The architecture uses a two-stage switching fabric that first load-balances traffic across the fabric, then delivers it to the correct output. We show that this architecture can provide 100% throughput without centralized scheduling, making it suitable for building very large routers.

@techreport{keslassyloadbalanced, title = {A Load-Balanced Switch with an Arbitrary Number of Linecards}, author = {Isaac Keslassy, Shang-Tse Chuang, Nick McKeown}, institution = {Stanford HPNG Technical Report TR03-HPNG-080102}, year = {} }

Analysis of the Parallel Packet Switch Architecture

Sundar Iyer and Nick McKeown

IEEE/ACM Transactions on Networking, pp. 314-324, April 2003

PDF

Our work is motivated by the desire to design packet switches with large aggregate capacity and fast line rates. In this paper, we consider building a packet switch from multiple lower-speed packet switches operating independently and in parallel. In particular, we consider a (perhaps obvious) parallel packet switch (PPS) architecture in which arriving traffic is demultiplexed over identical lower speed packet switches, switched to the correct output port, then recombined (multiplexed) before departing from the system. Essentially, the packet switch performs packet-by-packet load balancing, or inverse multiplexing, over multiple independent packet switches. Each lower speed packet switch operates at a fraction of the line rate. It is a goal of our work that all memory buffers in the PPS run slower than the line rate. Ideally, a PPS would share the benefits of an output-queued switch, i.e., the delay of individual packets could be precisely controlled, allowing the provision of guaranteed qualities of service. In this paper, we ask: Is it possible for a PPS to precisely emulate the behavior of an output-queued packet switch? We show that it is theoretically possible for a PPS to emulate a first-come first-served (FCFS) output-queued (OQ) packet switch. We further show that small co-ordination buffers can eliminate the need for a centralized scheduling algorithm, allowing a full distributed implementation with low computational and communication complexity. Furthermore, we show that without speedup, the resulting PPS can mimic an FCFS OQ switch within a delay bound.

@article{iyer2003analysis, title = {Analysis of the Parallel Packet Switch Architecture}, author = {Sundar Iyer and Nick McKeown}, journal = {IEEE/ACM Transactions on Networking, pp. 314-324, April 2003}, year = {2003} }

Routers with a Single Stage of Buffering

Sundar Iyer, Rui Zhang, Nick McKeown

ACM SIGCOMM, Pittsburgh, August 2002

PDF

Most high performance routers today use combined input and output queueing (CIOQ). The CIOQ router is also frequently used as an abstract model for routers: at one extreme is input queueing, at the other extreme is output queueing, and in between are CIOQ routers with various amounts of speedup. It is well known that a CIOQ router with a speedup of two can precisely emulate an output-queued router. In this paper, we ask whether it is possible to achieve similar performance guarantees with a CIOQ router that has a speedup of just one; i.e., with a router that has only a single stage of buffering. We find that it is possible, using a technique called buffered crossbar switching.

@inproceedings{iyer2002routers, title={Routers with a Single Stage of Buffering}, author={Iyer, Sundar and Zhang, Rui and McKeown, Nick}, booktitle={ACM SIGCOMM}, year={2002} }

Maintaining Packet Order in Two-Stage Switches

Isaac Keslassy and Nick McKeown

Proceedings of IEEE INFOCOM '02, New York, June 2002

PDF

High performance packet switches frequently use a centralized scheduler (also known as an arbiter) to determine the configuration of a non-blocking crossbar. The scheduler often limits the scalability of the system because of the frequency and complexity of its decisions. A recent paper by C.-S. Chang et al. introduces an interesting two-stage switch, in which each stage uses a trivial deterministic sequence of configurations. The switch is simple to implement at high speed and has been proved to provide 100% throughput for a broad class of traffic. Furthermore, there is a bound between the average delay of the two-stage switch and that of an ideal output-queued switch. However, in its simplest form, the switch mis-sequences packets by an arbitrary amount. In this paper, building on the two-stage switch, we present an algorithm called Full Frames First (FFF), that prevents mis-sequencing while maintaining the performance benefits (in terms of throughput and delay) of the basic two-stage switch. FFF comes at some additional cost, which we evaluate in this paper.

@inproceedings{keslassy2002maintaining, title = {Maintaining Packet Order in Two-Stage Switches}, author = {Isaac Keslassy and Nick McKeown}, booktitle = {Proceedings of IEEE INFOCOM '02, New York, June 2002}, year = {2002} }

Optimal Load-Balancing

Isaac Keslassy, Cheng-Shang Chang, Nick McKeown, Duan-Shin Lee

IEEE INFOCOM, Miami, Florida, March 2005

PDF

This paper is about load-balancing packets across multiplepathsinsideaswitch,oracrossanetwork.Itismotivated by the recent interest in load-balanced switches. Load-balanced switches provide an appealing alternative to crossbars with centralizedschedulers.Aload-balancedswitchhasnoscheduler, is particularly amenable to optics, and – most relevant here – guarantees 100% throughput. A uniform mesh is used to loadbalance packets uniformly across all 2-hop paths in the switch. Inthispaperweexplorewhetherthisparticularmethodofloadbalancing is optimal in the sense that it achieves the highest throughput for a given capacity of interconnect. The method we use allows the load-balanced switch to be compared with Fig.1. Load-balancedswitcharchitecture ring, torus and hypercube interconnects, too. We prove that for a given interconnect capacity, the load-balancing mesh has the maximumthroughput.Perhapssurprisingly,wefindthatthebest mesh is slightly non-uniform, or biased, and has a throughput of N/(2N −1), where N is the number of nodes.

@inproceedings{keslassy2005optimal, title={Optimal Load-Balancing}, author={Keslassy, Isaac and Chang, Cheng-Shang and McKeown, Nick and Lee, Duan-Shin}, booktitle={IEEE INFOCOM}, year={2005} }

Designing a Predictable Internet Backbone Network

Rui Zhang-Shen, Nick McKeown

HotNets III, San Diego, CA, November 2004

PDF

Designing a backbone network is hard. On one hand, users expect the network to have very high availability, little or no congestion, and hence little or no queueing delay. On the other hand, traffic conditions change frequently, making it hard to predict future traffic demands. Network operators try to build enough capacity for the worst case, but this can be very expensive. In this paper, we describe an approach to designing backbone networks that provide predictable performance in the face of unpredictable demands. The approach is based on the idea of designing the network so that its performance degrades gracefully under a wide range of traffic conditions, and in particular under link failures.

@inproceedings{zhangshen2004designing, title={Designing a Predictable Internet Backbone Network}, author={Zhang-Shen, Rui and McKeown, Nick}, booktitle={HotNets III}, year={2004} }

Optics inside Routers

Nick McKeown

ECOC 2003, Rimini, Italy, September 2003

PDF

We describe recent developments in how optics can be used inside Internet routers to scale capacity and reduce power. At Stanford University we are designing a 100 Tb/s Internet router with an optical switch fabric that guarantees 100% throughput for all traffic. We discuss how optical switches, combined with load-balancing techniques, can overcome the scaling limitations of electronic crossbar switches.

@inproceedings{mckeown2003optics, title = {Optics inside Routers}, author = {Nick McKeown}, booktitle = {ECOC 2003, Rimini, Italy, September 2003}, year = {2003} }

A Load-Balanced Switch with an Arbitrary Number of Linecards

Isaac Keslassy, Shang-Tse Chuang, Nick McKeown

Proceedings of IEEE Infocom '04, Hong Kong, March 2004

PDF

We present a load-balanced switch architecture that can scale to an arbitrary number of linecards. The architecture uses a two-stage switching fabric that first load-balances traffic across the fabric, then delivers it to the correct output. We show that this architecture can provide 100% throughput without centralized scheduling, making it suitable for building very large routers.

@inproceedings{keslassy2004loadbalanced, title = {A Load-Balanced Switch with an Arbitrary Number of Linecards}, author = {Isaac Keslassy, Shang-Tse Chuang, Nick McKeown}, booktitle = {Proceedings of IEEE Infocom '04, Hong Kong, March 2004}, year = {2004} }

Configuring a Load-Balanced Switch in Hardware

Srikanth Arekapudi, Shang-Tse Chuang, Isaac Keslassy, Nick McKeown

Hot Interconnects 12, Stanford, August 2004

PDF

The load-balanced switch architecture is a promising way to scale router capacity. We explained how it can be used to build a 100Tb/s router with no centralized scheduler, no memory operating faster than the line-rate, no packet mis-sequencing, a 100% throughput guarantee for all traffic patterns, and an optical switch fabric that simply spreads traffic evenly among linecards. This switch fabric uses optical MEMS switches that are reconfigured only when linecards are added and deleted, allowing the router to function when any subset of linecards is present and working. We introduced a configuration algorithm that will find a correct configuration of the MEMS switches in polynomial time. However, our algorithm takes over 50 seconds to run in software for a 100Tb/s router. Our goal is to restore the router to operation within 50ms upon failure. We therefore modify our algorithm for implementation in dedicated hardware. In particular, to simplify the Ford-Fulkerson algorithm in bipartite matches, we reduce memory accesses and use bit manipulation schemes. Then, we decompose permutations using the Slepian-Duguid algorithm and reduce the configuration time with a simplified memory scheme. Our experimental results show that it is possible to achieve the 50ms target.

@inproceedings{arekapudi2004configuring, title = {Configuring a Load-Balanced Switch in Hardware}, author = {Srikanth Arekapudi, Shang-Tse Chuang, Isaac Keslassy, Nick McKeown}, booktitle = {Hot Interconnects 12, Stanford, August 2004}, year = {2004} }

Making Parallel Packet Switches Practical

Sundar Iyer, Nick McKeown

IEEE INFOCOM, Alaska, March 2001

PDF

A parallel packet switch (PPS) is a switch in which the memories run slower than the line rate. Arriving packets are spread (or load-balanced) packet-by-packet over multiple slower-speed packet switches. It is already known that with a speedup of S ≥ 2, a PPS can theoretically mimic a FCFS output-queued (OQ) switch. However, the theory relies on a centralized packet scheduling algorithm that is essentially impractical because of high communication complexity. In this paper, we attempt to make a high performance PPS practical by introducing two results. First, we show that small co-ordination buffers can eliminate the need for a centralized packet scheduling algorithm, allowing a full distributed implementation with low computational and communication complexity. Second, we show that without speedup, the resulting PPS can mimic an FCFS OQ switch within a delay bound.

@inproceedings{iyer2001making, title={Making Parallel Packet Switches Practical}, author={Iyer, Sundar and McKeown, Nick}, booktitle={IEEE INFOCOM}, year={2001} }

Analysis of a Packet Switch with Memories Running Slower than the Line Rate

Sundar Iyer, Amr A. Awadallah, and Nick McKeown

IEEE INFOCOM March 2000, Tel-Aviv, Israel, pp. 529-537

PDF

Our work is motivated by the desire to build a very high speed packet-switch with extremely high line-rates. In this paper, we consider building a packet-switch from multiple, lower speed packet-switches operating independently and in parallel. In particular, we consider a (perhaps obvious) parallel packet switch (PPS) architecture in which arriving traffic is demultiplexed over k identical, lower speed packet-switches, switched to the correct output port, then recombined (multiplexed) before departing from the system. Essentially, the packet-switch performs packet-by-packet loadbalancing, or “inverse-multiplexing” over multiple independent packetswitches. Each lower-speed packet switch, operates at a fraction of the linerate, R; for example, if each packet-switch operates at rateR⁄k no memory buffers are required to operate at the full line-rate of the system. Ideally, a PPS would share the benefits of an output-queued switch; i.e. the delay of individual packets could be precisely controlled, allowing the provision of guaranteed qualities of service. In this paper, we ask the question: Is it possible for a PPS to precisely emulate the behavior of an output-queued packetswitch with the same capacity and with the same number of ports? The main result of this paper is that it is theoretically possible for a PPS to emulate a FCFS output-queued packet-switch if each layer operates at a rate of 2R⁄k. This simple result is analogous to Clos’ theorem for a approximately three-stage circuit switch to be strictly non-blocking. We further show that the PPS can emulate any QoS queueing discipline if each layer operates at a rate of approximately3R⁄k.

@inproceedings{iyer2000analysis, title = {Analysis of a Packet Switch with Memories Running Slower than the Line Rate}, author = {Sundar Iyer, Amr A. Awadallah, and Nick McKeown}, booktitle = {IEEE INFOCOM March 2000, Tel-Aviv, Israel, pp. 529-537}, year = {2000} }

Maximum Size Matching is Unstable for Any Packet Switch

Isaac Keslassy, Rui Zhang-Shen, Nick McKeown

Stanford HPNG Technical Report TR03-HPNG-030100

PDF

Input-queued packet switches use a matching algorithm to configure a nonblocking switch fabric (e.g., a crossbar). Ideally, the matching algorithm will guarantee 100% throughput for a broad class of traffic, so long as the switch is not oversubscribed. An intuitive choice is the maximum size matching (MSM) algorithm, which maximizes the instantaneous throughput. It was shown, by McKeown et al. in 1999, that with MSM the throughput can be less than 100% when N ≥ 3, even with benign Bernoulli i.i.d. arrivals. In this letter, we extend this result to N = 2, and hence show it to be true for switches of any size.

@techreport{keslassymaximum, title = {Maximum Size Matching is Unstable for Any Packet Switch}, author = {Isaac Keslassy, Rui Zhang-Shen, Nick McKeown}, institution = {Stanford HPNG Technical Report TR03-HPNG-030100}, year = {} }

Practical Algorithms for Performance Guarantees in Buffered Crossbars

Shang-Tse Chuang, Sundar Iyer, Nick McKeown

Stanford HPNG Technical Report TR03-HPNG-061501

PDF

This paper is about high capacity switches and routers that give guaranteed throughput, rate and delay guarantees. Many routers are built using input queueing or combined input and output queueing (CIOQ), using crossbar switching fabrics. But such routers require impractically complex scheduling algorithms to provide the desired guarantees. We explore how a buffered crossbar — a crossbar switch with a packet buffer at each crosspoint — can provide guaranteed performance (throughput, rate, and delay), with less complex, practical scheduling algorithms. We describe scheduling algorithms that operate in parallel on each input and output port, and hence are scalable. With these algorithms, buffered crossbars with a speedup of two can provide 100% throughput, rate, and delay guarantees.

@techreport{chuangpractical, title = {Practical Algorithms for Performance Guarantees in Buffered Crossbars}, author = {Shang-Tse Chuang, Sundar Iyer, Nick McKeown}, institution = {Stanford HPNG Technical Report TR03-HPNG-061501}, year = {} }

Analysis of Scheduling Algorithms That Provide 100% Throughput in Input-Queued Switches

Isaac Keslassy and Nick McKeown

Proceedings of the 39th Annual Allerton Conference on Communication, Control, and Computing. Monticello, Illinois, October 2001

PDF

Internet routers frequently use a crossbar switch to interconnect linecards. The crossbar switch is scheduled using an algorithm that picks a new crossbar configuration every cycle. Several scheduling algorithms have been shown to guarantee 100% throughput under a variety of traffic patterns. The first such algorithm was the maximum weight matching (MWM) algorithm in which the weight is the sum of the occupancies of the queues. We explore whether alternative weight functions, such as using the sum of the square of the occupancies, leads to stronger or weaker stability. The first result of this paper is that a broad class of weight functions give rise to strong stability, including the sum of the squares, the sum of the cubes and so on. A counter-intuitive result, indicating a limitation of the Lyapunov method, is that the sum of the square root of the occupancies is not included in this class, even though simulation suggests that the resulting average delay is lower than for the other functions. We also consider the simpler, ON()log , randomized scheduling algorithm (TASS) proposed by Tassiulas. We show similar results for different weight functions as for MWM. We finally show that TASS gives 100% throughput when the weights are noisy, or out-of-date.

@inproceedings{keslassy2001analysis, title = {Analysis of Scheduling Algorithms That Provide 100% Throughput in Input-Queued Switches}, author = {Isaac Keslassy and Nick McKeown}, booktitle = {Proceedings of the 39th Annual Allerton Conference on Communication, Control, and Computing. Monticello, Illinois, October 2001}, year = {2001} }

Maximum Size Matching and Input Queued Switches

Sundar Iyer and Nick McKeown

Proceedings of the 40th Annual Allerton Conference on Communication, Control and Computing

PDF

Simulation results suggest that a maximum size matching (MSM) algorithm will lead to 100% throughput for uniform Bernoulli throughput of MSM algorithms enforce deterministic constraints on the joint arrival traffic. In this paper we explore MSM algorithms under Bernoulli arrivals. We show that when the arrival traffic is admissible and cells are scheduled in batches, a sub-class of MSMs called the Critical Maximum Size Matching achieves uniform) Bernoulli i.i.d arrivals. Further, we show that — with batch scheduling — all MSMs achieve 100% throughput under Bernoulli i.i.d. uniform load.

@inproceedings{iyermaximum, title = {Maximum Size Matching and Input Queued Switches}, author = {Sundar Iyer and Nick McKeown}, booktitle = {Proceedings of the 40th Annual Allerton Conference on Communication, Control and Computing}, year = {} }

Using Constraint Sets to Achieve Delay Bounds in CIOQ Switches

Sundar Iyer and Nick McKeown

IEEE Communication Letters, pp. 275-277, 2003

PDF

We recently proposed Constraint Sets as a simple technique to analyze routers with a single stage of buffering. In this letter, we extend the technique to analyze combined input and output (CIOQ) routers with two stages of buffering. We show that a CIOQ router with a speedup of two can emulate an output-queued router, subject to a bounded delay difference.

@article{iyer2003using, title = {Using Constraint Sets to Achieve Delay Bounds in CIOQ Switches}, author = {Sundar Iyer and Nick McKeown}, journal = {IEEE Communication Letters, pp. 275-277, 2003}, year = {2003} }

Maximum Size Matching is Unstable for Any Packet Switch

Isaac Keslassy, Rui Zhang-Shen, Nick McKeown

IEEE Communications Letters, Vol. 7, No. 10, pp. 496-498, Oct. 2003

PDF

Input-queued packet switches use a matching algorithm to configure a nonblocking switch fabric (e.g., a crossbar). Ideally, the matching algorithm will guarantee 100% throughput for a broad class of traffic, so long as the switch is not oversubscribed. An intuitive choice is the maximum size matching (MSM) algorithm, which maximizes the instantaneous throughput. It was shown, by McKeown et al. in 1999, that with MSM the throughput can be less than 100% when N ≥ 3, even with benign Bernoulli i.i.d. arrivals. In this letter, we extend this result to N = 2, and hence show it to be true for switches of any size.

@article{keslassy2003maximum, title = {Maximum Size Matching is Unstable for Any Packet Switch}, author = {Isaac Keslassy, Rui Zhang-Shen, Nick McKeown}, journal = {IEEE Communications Letters, Vol. 7, No. 10, pp. 496-498, Oct. 2003}, year = {2003} }

The Throughput of a Buffered Crossbar Switch

Mingjie Lin, Nick McKeown

IEEE Communications Letters 2004

PDF

The throughput of an input-queued crossbar switch — with a single FIFO queue at each input — is limited to 2(1−1/N) for uniformly distributed, Bernoulli i.i.d. arrivals of fixed length packets. In this letter we prove that if the crossbar switch can buffer one packet at each crosspoint, then the throughput increases to 100% asymptotically as N approaches infinity, where N is the number of switch ports.

@article{lin2004throughput, title = {The Throughput of a Buffered Crossbar Switch}, author = {Mingjie Lin, Nick McKeown}, journal = {IEEE Communications Letters 2004}, year = {2004} }

Practical Algorithms for Performance Guarantees in Buffered Crossbars

Shang-Tse Chuang, Sundar Iyer, Nick McKeown

Proceedings of IEEE INFOCOM 2005, Miami, Florida, March 2005

PDF

This paper is about high capacity switches and routers that give guaranteed throughput, rate and delay guarantees. Many routers are built using input queueing or combined input and output queueing (CIOQ), using crossbar switching fabrics. But such routers require impractically complex scheduling algorithms to provide the desired guarantees. We explore how a buffered crossbar — a crossbar switch with a packet buffer at each crosspoint — can provide guaranteed performance (throughput, rate, and delay), with less complex, practical scheduling algorithms. We describe scheduling algorithms that operate in parallel on each input and output port, and hence are scalable. With these algorithms, buffered crossbars with a speedup of two can provide 100% throughput, rate, and delay guarantees.

@inproceedings{chuang2005practical, title = {Practical Algorithms for Performance Guarantees in Buffered Crossbars}, author = {Shang-Tse Chuang, Sundar Iyer, Nick McKeown}, booktitle = {Proceedings of IEEE INFOCOM 2005, Miami, Florida, March 2005}, year = {2005} }

Achieving 100% Throughput in an Input-Queued Switch

Nick McKeown, Adisak Mekkittikul, Venkat Anantharam, Jean Walrand

IEEE Transactions on Communications, Vol. 47, No. 8, August 1999

PDF

It is well known that head-of-line (HOL) blocking limits the throughput of an input-queued switch with FIFO queues. Under certain conditions, the throughput can be shown to be limited to approximately 58%. It is also known that if non-FIFO queueing policies are used, the throughput can be increased. However, it has not been previously shown that if a suitable queueing policy and scheduling algorithm are used then it is possible to achieve 100% throughput for all independent arrival processes. In this paper we prove this to be the case using a simple linear programming argument and quadratic Lyapunov function. In particular, we assume that each input maintains a separate FIFO queue for each output and that the switch is scheduled using a maximum weight bipartite matching algorithm.

@article{mckeown1999achieving, title={Achieving 100\% Throughput in an Input-Queued Switch}, author={McKeown, Nick and Mekkittikul, Adisak and Anantharam, Venkat and Walrand, Jean}, journal={IEEE Transactions on Communications}, volume={47}, number={8}, year={1999} }

iSLIP: A Scheduling Algorithm for Input-Queued Switches

Nick McKeown

IEEE/ACM Transactions on Networking, Vol. 7, No. 2, April 1999

PDF

We describe and analyze iSLIP, a low-complexity scheduling algorithm for input-queued switches. iSLIP uses multiple iterations of a parallel round-robin matching algorithm to achieve high throughput and low latency. We prove that for uniform i.i.d. Bernoulli arrivals, iSLIP achieves 100% throughput. Under non-uniform traffic, iSLIP performs well in practice, converging quickly to a stable scheduling pattern. The algorithm is simple enough to implement in hardware at high line rates and has been used in several commercial switch designs.

@article{mckeown1999islip, title={iSLIP: A Scheduling Algorithm for Input-Queued Switches}, author={McKeown, Nick}, journal={IEEE/ACM Transactions on Networking}, volume={7}, number={2}, year={1999} }

Matching Output Queueing with a Combined Input Output Queued Switch

Shang-Tse Chuang, Ashish Goel, Nick McKeown, Balaji Prabhakar

IEEE INFOCOM, 1999; also IEEE JSAC, Vol. 17, No. 6, December 1999

PDF

The Internet is facing two problems simultaneously: there is a need for a faster switching/routing infrastructure, and a need to introduce guaranteed qualities of service (QoS). Each problem can be solved independently: switches and routers can be made faster by using input-queued crossbars, instead of shared memory systems; and QoS can be provided using WFQ-based packet scheduling. However, until now, the two solutions have been mutually exclusive — all of the work on WFQ-based scheduling algorithms has required that switches/routers use output-queueing, or centralized shared memory. This paper demonstrates that a Combined Input Output Queueing (CIOQ) switch running twice as fast as an input-queued switch can provide precise emulation of a broad class of packet scheduling algorithms, including WFQ and strict priorities. More precisely, we show that for an NN· switch, a “speedup” of 21– ⁄ N is necessary and a speedup of two is sufficient for this exact emulation. Perhaps most interestingly, this result holds for all traffic arrival patterns. On its own, the result is primarily a theoretical observation; it shows that it is possible to emulate purely OQ switches with CIOQ switches running at approximately twice the line-rate. To make the result more practical, we introduce several scheduling algorithms that, with a speedup of two, can emulate an OQ switch. We focus our attention on the simplest of these algorithms, Critical Cells First (CCF), and consider its running-time and implementation complexity. We conclude that additional techniques are required to make the scheduling algorithms implementable at high speed, and propose two specific strategies.

@inproceedings{chuang1999matching, title={Matching Output Queueing with a Combined Input Output Queued Switch}, author={Chuang, Shang-Tse and Goel, Ashish and McKeown, Nick and Prabhakar, Balaji}, booktitle={IEEE INFOCOM}, year={1999} }

Matching Output Queueing with Combined Input and Output Queueing

Nick McKeown, Balaji Prabhakar, and Mingyan Zhu

Proceedings of the 35th Annual Allerton Conference on Communication, Control, and Computing. Monticello, Illinois, October 1997

PDF

At very high aggregate bandwidths, output queueing is impractical because of insufficient memory bandwidth. This problem is getting worse: memory bandwidth is improving slowly, whereas the demand for network bandwidth continues to grow exponentially. The difficulty is that outputqueued switches require memories that run at a speedup of N, where N is equal to the number of switch ports. This paper addresses the following question: Is it possible for a switch to exactly match outputqueueing with a reduced speedup? We prove that if virtual output queueing is used, a combined input-output queued switch is always work-conserving if its speedup is N⁄2. greater than This result is proved using a novel scheduling algorithm - the Home Territory Algorithm (HTA). 1: Introduction Many commercial switches and routers today employ output queueing.1 The advantages of output queueing are two-fold. First, it allows the throughput to be maximized: so long as no input or output is oversubscribed, the switch will support the traffic. Second, because packets are immediately placed in output queues upon arrival, it is possible to control the latency of packets through the switch. This is very important for supporting QoS in a switch or router. But output

@inproceedings{mckeown1997matching, title = {Matching Output Queueing with Combined Input and Output Queueing}, author = {Nick McKeown, Balaji Prabhakar, and Mingyan Zhu}, booktitle = {Proceedings of the 35th Annual Allerton Conference on Communication, Control, and Computing. Monticello, Illinois, October 1997}, year = {1997} }

On the Speedup Required for Combined Input and Output Queued Switching

Balaji Prabhakar and Nick McKeown

Computer Systems Technical Report CSL-TR-97-738. November 1997. Also published in: Automatica, Vol. 35, no. 12, December 1999

PDF

We consider the speedup needed for a combined input and output queued (CIOQ) switch to emulate an output-queued switch. It is known that a speedup of two suffices. We examine this question from a theoretical perspective, deriving conditions on the speedup factor that allow the CIOQ switch to precisely mimic the behavior of an ideal output-queued switch, including providing identical departure times for all packets.

@techreport{prabhakar1997speedup, title = {On the Speedup Required for Combined Input and Output Queued Switching}, author = {Balaji Prabhakar and Nick McKeown}, institution = {Computer Systems Technical Report CSL-TR-97-738. November 1997. Also published in: Automatica, Vol. 35, no. 12, December 1999}, year = {1997} }

A Practical Scheduling Algorithm to Achieve 100% Throughput in Input-Queued Switches

Adisak Mekkittikul, Nick McKeown

IEEE INFOCOM, San Francisco, April 1998

PDF

Increasingly, a passive shared bus is being replaced by an active non-blocking switch fabric — most often a Input queueing is becoming increasingly used for highcrossbar switch. Each line card is connected by a dedibandwidth switches and routers. In previous work, it was cated point-to-point link to the central switch fabric, and proved that it is possible to achieve 100% throughput for therefore has fewer electrical limitations due to loading input-queued switches using a combination of virtual outand reflections. More importantly, each connection to the put queueing and a scheduling algorithm called LQF. switch need run only as fast as the line rate, rather than at However, this is only a theoretical result: LQF is too comthe aggregate bandwidth of the switch. Centralized shared plex to implement in hardware. In this paper we introduce memory is also being replaced—by separate queues at a new algorithm called Longest Port First (LPF), which is each input of the switching fabric. Input queues need only designed to overcome the complexity problems of LQF, run at the line rate, and therefore allow a faster overall sysand can be implemented in hardware at high speed. By tem to be built [6][11]. giving preferential service based on queue lengths, we prove that LPF can achieve 100% throughput. The very fastest switches and routers usually transfer packets across the switching fabric in fixed size units, that

@inproceedings{mekkittikul1998practical, title={A Practical Scheduling Algorithm to Achieve 100\% Throughput in Input-Queued Switches}, author={Mekkittikul, Adisak and McKeown, Nick}, booktitle={IEEE INFOCOM}, year={1998} }

A Quantitative Comparison of Scheduling Algorithms for Input-Queued Switches

Nick McKeown, Thomas E. Anderson

Computer Networks and ISDN Systems, Vol. 30, No. 24, December 1998

PDF

In this paper we quantitatively evaluate three iterative algorithms for scheduling cells in a high-bandwidth input-queued ATM switch. In particular, we compare the performance of an algorithm described in this paper (iSLIP) with two recently proposed algorithms: PIM and iRRM. These algorithms find a maximal match between input and output ports using multiple iterations of request-grant-accept, and can be readily implemented in hardware at high speed. We compare their throughput and delay performance under various traffic conditions, and identify the trade-offs between complexity and performance.

@article{mckeown1998quantitative, title={A Quantitative Comparison of Scheduling Algorithms for Input-Queued Switches}, author={McKeown, Nick and Anderson, Thomas E.}, journal={Computer Networks and ISDN Systems}, volume={30}, number={24}, year={1998} }

Scheduling Cells in an Input-Queued Switch

Nick McKeown, Pravin Varaiya, and Jean Walrand

IEE Electronics Letters, Dec 9th 1993, pp.2174-5

PDF

We consider the problem of scheduling cells in an input-queued switch. Input-queued switches are attractive for high-speed networking because they require memory that operates only at the line rate rather than N times the line rate. However, input queuing suffers from head-of-line blocking, which limits throughput to approximately 58.6%. We describe scheduling algorithms that can overcome this limitation.

@article{mckeown1993scheduling, title = {Scheduling Cells in an Input-Queued Switch}, author = {Nick McKeown, Pravin Varaiya, and Jean Walrand}, journal = {IEE Electronics Letters, Dec 9th 1993, pp.2174-5}, year = {1993} }

A Fast Scheduling Algorithm for Input-Queued Switches

Nick McKeown, Jean Walrand

Proceedings of 7th IEEE LAN/MAN Workshop, Florida, March 1995

PDF

We describe a fast scheduling algorithm for input-queued switches. Input-queued switches are attractive at high line rates because they require memory bandwidth proportional to only the line rate rather than the aggregate switch rate. We present an iterative algorithm that converges quickly to a high-throughput matching between inputs and outputs, enabling practical implementation at high speeds.

@inproceedings{mckeown1995fast, title = {A Fast Scheduling Algorithm for Input-Queued Switches}, author = {Nick McKeown, Jean Walrand}, booktitle = {Proceedings of 7th IEEE LAN/MAN Workshop, Florida, March 1995}, year = {1995} }

ATM Input-Buffered Switches with Guaranteed-Rate Property

Anthony Hung, George Kesidis, and Nick McKeown

Proc. IEEE ISCC '98, Athens, Jul 1998, pp 331-335

PDF

We consider input-buffered ATM switches that provide guaranteed-rate service. We describe a class of scheduling algorithms that can guarantee a minimum rate to each input-output pair while achieving high overall switch throughput. The algorithms operate by computing matchings in a bipartite graph and can be implemented efficiently in hardware.

@inproceedings{hung1998atm, title = {ATM Input-Buffered Switches with Guaranteed-Rate Property}, author = {Anthony Hung, George Kesidis, and Nick McKeown}, booktitle = {Proc. IEEE ISCC '98, Athens, Jul 1998, pp 331-335}, year = {1998} }

Design and Implmentation of a Fast Crossbar Scheduler

Pankaj Gupta and Nick McKeown

Hot Interconnects VI, Stanford University, August 1998. IEEE Micro Magazine, Jan-Feb 1999

PDF

We describe the design and implementation of a fast crossbar scheduler for input-queued switches. The scheduler implements the iSLIP algorithm, which achieves high throughput with low implementation complexity. We present a hardware implementation that operates at speeds sufficient for gigabit-rate switches and describe the design tradeoffs involved in building practical crossbar schedulers.

@article{gupta1998design, title = {Design and Implmentation of a Fast Crossbar Scheduler}, author = {Pankaj Gupta and Nick McKeown}, journal = {Hot Interconnects VI, Stanford University, August 1998. IEEE Micro Magazine, Jan-Feb 1999}, year = {1998} }

Multicast Scheduling for Input-Queued Switches

Rietsh Ahuja, Balaji Prabhakar, Nick McKeown

IEEE JSAC, Vol. 15, No. 5, June 1997

PDF

In this paper we consider policies for scheduling cells in an input-queued multicast switch. It is assumed that each input maintains a single queue for arriving multicast cells and that only the cell at the head of line (HOL) can be observed and scheduled at one time. The policies are assumed to be work-conserving, which means that cells may be copied to the outputs that they request over several cell times. When a scheduling policy decides which cells to schedule, contention may require that it leave a residue of cells to be scheduled in the next cell time. The selection of where to place the residue uniquely defines the scheduling policy. We prove that for a 2×N switch, a policy that always concentrates the residue, subject to a natural fairness constraint, always outperforms all other policies. Simulation results indicate that this policy also performs well for more general M×N switches. We present a heuristic round-robin policy called mRRM that is simple to implement in hardware, fair, and performs almost as well as the concentrating policy.

@article{ahuja1997multicast, title={Multicast Scheduling for Input-Queued Switches}, author={Ahuja, Rietsh and Prabhakar, Balaji and McKeown, Nick}, journal={IEEE JSAC}, volume={15}, number={5}, year={1997} }

Scheduling VOQ Switches under Non-Uniform Traffic

Adisak Mekkittikul, and Nick McKeown

CSL Technical Report, CSL-TR 97-747, Stanford University, 1997

PDF

We analyze the performance of scheduling algorithms for virtual output queued (VOQ) switches under non-uniform traffic patterns. While many scheduling algorithms are known to achieve 100% throughput under uniform traffic, their performance under non-uniform traffic can vary significantly. We characterize the throughput regions of several common scheduling algorithms and provide guidelines for algorithm selection.

@techreport{mekkittikul1997scheduling, title = {Scheduling VOQ Switches under Non-Uniform Traffic}, author = {Adisak Mekkittikul, and Nick McKeown}, institution = {CSL Technical Report, CSL-TR 97-747, Stanford University, 1997}, year = {1997} }

Achieving 100% Throughput in an Input-Queued Switch

Nick McKeown, Venkat Anantharam, Jean Walrand

IEEE INFOCOM, San Francisco, March 1996

PDF

It is well known that head-of-line (HOL) blocking limits the throughput of an input-queued switch with FIFO queues. Under certain conditions, the throughput can be shown to be limited to approximately 58%. It is also known that if non-FIFO queueing policies are used, the throughput can be increased. However, it has not been previously shown that if a suitable queueing policy and scheduling algorithm are used then it is possible to achieve 100% throughput for all independent arrival processes. In this paper we prove this to be the case using a simple linear programming argument and quadratic Lyapunov function. In particular, we assume that each input maintains a separate FIFO queue for each output and that the switch is scheduled using a maximum weight bipartite matching algorithm.

@inproceedings{mckeown1996achieving, title={Achieving 100\% Throughput in an Input-Queued Switch}, author={McKeown, Nick and Anantharam, Venkat and Walrand, Jean}, booktitle={IEEE INFOCOM}, year={1996} }

Designing a Multicast Switch Scheduler

Balaji Prabhakar and Nick McKeown

Proceedings of the 33rd Annual Allerton Conference on Communication, Control, and Computing. pp. 984-993. Monticello, Illinois, October 1995

PDF

This paper presents the design of the scheduler for an M×N input-queued multicast switch. It is assumed that each input maintains a single queue for arriving multicast cells and that only the cell at the head of line (HOL) can be observed and scheduled at one time. The scheduler is required to be work-conserving, which means that no output port may be idle as long as there is an input cell destined to it. Furthermore, the scheduler is required to be fair, which means that no input cell may be held at HOL for more than M cell times (M is the number of input ports). The aim is to find a work-conserving, fair policy that delivers maximum throughput and minimizes input queue latency. When a scheduling policy decides which cells to schedule, contention may require that it leave a residue of cells to be scheduled in the next cell time. The selection of where to place the residue uniquely defines the scheduling policy. It is demonstrated that a policy which always concentrates the residue, subject to our fairness constraint, always outperforms all other policies. We present one such policy, called TATRA, and analyze it geometrically. We also present a heuristic round-robin policy called mRRM that is simple to implement in hardware, fair, and performs quite well when compared to a concentrating algorithm.

@inproceedings{prabhakar1995designing, title = {Designing a Multicast Switch Scheduler}, author = {Balaji Prabhakar and Nick McKeown}, booktitle = {Proceedings of the 33rd Annual Allerton Conference on Communication, Control, and Computing. pp. 984-993. Monticello, Illinois, October 1995}, year = {1995} }

8 Tb/s ATM Interconnection through optical WDM networks

Adisak Mekkittikul, D. Sadot, L.G. Kazovsky, Nick McKeown

High-Speed Semiconductor Laser Sources, San Jose, CA, Vol. 2684, pp. 186-98, 1-2 February, 1996

PDF

We describe an architecture for an 8 Tb/s ATM interconnection network using optical wavelength division multiplexing (WDM). The architecture combines high-speed ATM switching with WDM optical networking to achieve very high aggregate bandwidths. We analyze the performance of the proposed architecture and discuss the feasibility of implementation using available optical and electronic technologies.

@inproceedings{mekkittikul1996, title = {8 Tb/s ATM Interconnection through optical WDM networks}, author = {Adisak Mekkittikul, D. Sadot, L.G. Kazovsky, Nick McKeown}, booktitle = {High-Speed Semiconductor Laser Sources, San Jose, CA, Vol. 2684, pp. 186-98, 1-2 February, 1996}, year = {1996} }

Scheduling Multicast Cells in an Input-Queued Switch

Nick McKeown and Balaji Prabhakar

Proceedings of IEEE Infocom '96, San Francisco, Vol 1, pp. 271-278, March 1996

PDF

In this paper we consider policies for scheduling cells in an input-queued multicast switch. It is assumed that each input maintains a single queue for arriving multicast cells and that only the cell at the head of line (HOL) can be observed and scheduled at one time. The policies are assumed to be work-conserving, which means that cells may be copied to the outputs that they request over several cell times. When a scheduling policy decides which cells to schedule, contention may require that it leave a residue of cells to be scheduled in the next cell time. The selection of where to place the residue uniquely defines the scheduling policy. We prove that for a 2×N switch, a policy that always concentrates the residue, subject to a natural fairness constraint, always outperforms all other policies. Simulation results indicate that this policy also performs well for more general M×N switches. We present a heuristic round-robin policy called mRRM that is simple to implement in hardware, fair, and performs almost as well as the concentrating policy.

@inproceedings{mckeown1996scheduling, title = {Scheduling Multicast Cells in an Input-Queued Switch}, author = {Nick McKeown and Balaji Prabhakar}, booktitle = {Proceedings of IEEE Infocom '96, San Francisco, Vol 1, pp. 271-278, March 1996}, year = {1996} }

Tetris Models for Multicast Switches

Balaji Prabhakar, Nick McKeown and Jean Mairesse

Proceedings of the Princeton Conference, Princeton, March 1996

PDF

We introduce Tetris models for analyzing multicast switches. In a multicast switch, a single input packet may need to be delivered to multiple outputs simultaneously. We develop an analytical framework based on the Tetris game metaphor that captures the combinatorial nature of multicast scheduling and derive performance bounds for various scheduling policies.

@inproceedings{prabhakar1996tetris, title = {Tetris Models for Multicast Switches}, author = {Balaji Prabhakar, Nick McKeown and Jean Mairesse}, booktitle = {Proceedings of the Princeton Conference, Princeton, March 1996}, year = {1996} }

Scheduling Algorithms for Input-Queued Cell Switches

Nick McKeown

PhD Thesis, University of California at Berkeley, May 1995

PDF

This thesis addresses the scheduling problem for input-queued cell switches. We describe and analyze scheduling algorithms that determine which cells are transferred across the switch fabric in each time slot. We present new algorithms that are simple enough to implement in hardware at high speed, yet provide high throughput and low delay. The main contributions include the iSLIP algorithm, which uses iterated round-robin matching to achieve 100% throughput for uniform traffic, and new analysis techniques for understanding the behavior of input-queued switches.

@phdthesis{mckeown1995scheduling, title={Scheduling Algorithms for Input-Queued Cell Switches}, author={McKeown, Nick}, school={University of California at Berkeley}, year={1995} }

Algorithms for Packet Classification

Pankaj Gupta, Nick McKeown

IEEE Network, March 2001

PDF

The process of categorizing packets into “flows” in an Internet router is called packet classification. All packets belonging to the same flow obey a pre-defined rule and are processed in a similar manner by the router. For example, all packets with the same source and destination IP addresses may be defined to form a flow. Packet classification is needed for non “best-effort” services, such as firewalls and quality of service; services that require the capability to distinguish and isolate traffic in different flows for suitable processing. In general, packet classification on multiple fields is a difficult problem. Hence, researchers have proposed a variety of algorithms which, broadly speaking, can be categorized as “basic search algorithms,” geometric algorithms, heuristic algorithms, or hardware-specific search algorithms. In this tutorial we describe algorithms that are representative of each category, and discuss which type of algorithm might be suitable for different applications.

@article{gupta2001algorithms, title={Algorithms for Packet Classification}, author={Gupta, Pankaj and McKeown, Nick}, journal={IEEE Network}, year={2001} }

Dynamic Algorithms with Worst-case Performance for Packet Classification

Pankaj Gupta and Nick McKeown

Proceedings IFIP Networking, May 2000, Paris, France

PDF

We present dynamic algorithms for packet classification that provide worst-case performance guarantees. Packet classification is the process of categorizing packets based on multiple header fields, and is a fundamental operation in routers and firewalls. Our algorithms support efficient insertion and deletion of classification rules while maintaining fast lookup times with provable worst-case bounds.

@inproceedings{gupta2000dynamic, title = {Dynamic Algorithms with Worst-case Performance for Packet Classification}, author = {Pankaj Gupta and Nick McKeown}, booktitle = {Proceedings IFIP Networking, May 2000, Paris, France}, year = {2000} }

Packet Classification on Multiple Fields

Pankaj Gupta, Nick McKeown

ACM SIGCOMM, Harvard University, September 1999

PDF

Routers classify packets to determine which flow they belong to, and to decide what service they should receive. Classification may, in general, be based on an arbitrary number of fields in the packet header. Performing classification quickly on an arbitrary number of fields is known to be difficult, and has poor worst-case performance. In this paper, we consider a number of classifiers taken from real networks. We find that the classifiers contain considerable structure and redundancy that can be exploited by the classification algorithm. In particular, we find that a simple multi-stage classification algorithm, called RFC (recursive flow classification), can classify 30 million packets per second in pipelined hardware, or one million packets per second in software.

@inproceedings{gupta1999packet, title={Packet Classification on Multiple Fields}, author={Gupta, Pankaj and McKeown, Nick}, booktitle={ACM SIGCOMM}, year={1999} }

Packet Classification using Hierarchical Intelligent Cuttings

Pankaj Gupta, Nick McKeown

Hot Interconnects VII, Stanford, August 1999; also IEEE Micro, Jan/Feb 2000

PDF

the rules could be based on several fields of the packet Internet routers that operate as firewalls, or provide a variety including layer 2, 3, 4 and may be 5 addressing and protoof service classes, perform different operations on different col information. flows. A flow is defined to be all the packets sharing common header characteristics; for example a flow may be defined as The simplest, and most well-known form of packet all the packets between two specific IP addresses. In order to classify a packet, a router consults a table (or classifier) using classification is used in routing IP datagrams, where each one or more fields from the packet header to search for the rule specifies a destination prefix. The associated action is corresponding flow. The classifier is a list of rules that identify each flow and the actions to be performed on each. With the IP address of the next-hop where the packet needs to be the increasing demands on router performance, there is a routed to. The classification process requires determining need for algorithms that can classify packets quickly with minimal storage requirements and allow new flows to be frethe longest prefix which matches the destination address of quently added and deleted. In the worst case, packet classification is hard requiring routers to use heuristics that exploit the packet. structure present in the classifiers.

@inproceedings{gupta1999hierarchical, title={Packet Classification using Hierarchical Intelligent Cuttings}, author={Gupta, Pankaj and McKeown, Nick}, booktitle={Hot Interconnects VII}, year={1999} }

Routing Lookups in Hardware at Memory Access Speeds

Pankaj Gupta, Steven Lin, Nick McKeown

IEEE INFOCOM, San Francisco, April 1998

PDF

The increased bandwidth in the Internet puts great demands on network routers; for example, to route minimum sized Gigabit Ethernet packets, an IP router must process about 1.5/spl times/10/sup 6/ packets per second per port. Using the "rule-of-thumb" that it takes roughly 1000 packets per second for every 10/sup 6/ bits per second of line rate, an OC-192 line requires 10/spl times/10/sup 6/ routing lookups per second; well above current router capabilities. One limitation of router performance is the route lookup mechanism. IP routing requires that a router perform a longest-prefix-match address lookup for each incoming datagram in order to determine the datagram's next hop. We present a route lookup mechanism that when implemented in a pipelined fashion in hardware, can achieve one route lookup every memory access. With current 50 ns DRAM, this corresponds to approximately 20/spl times/10/sup 6/ packets per second; much faster than current commercially available routing lookup schemes. We also present novel schemes for performing quick updates to the forwarding table in hardware. We demonstrate using real routing update patterns that the routing tables can be updated with negligible overhead to the central processor.

@inproceedings{gupta1998routing, title={Routing Lookups in Hardware at Memory Access Speeds}, author={Gupta, Pankaj and Lin, Steven and McKeown, Nick}, booktitle={IEEE INFOCOM}, year={1998} }

The Tiny Tera: A Packet Switch Core

Nick McKeown, Martin Izzard, Adisak Mekkittikul, Bill Ellersick, Mark Horowitz

Hot Interconnects V, Stanford, August 1996; also IEEE Micro, Jan/Feb 1997

PDF

In this paper, we present the Tiny Tera: a small packet switch with an aggregate bandwidth of 320Gb/s. The Tiny Tera is a CMOS-based input-queued, fixed-size packet switch suitable for a wide range of applications such as a highperformance ATM switch, the core of an Internet router or as a fast multiprocessor interconnect. Using off-the-shelf technology, we plan to demonstrate that a very highbandwidth switch can be built without the need for esoteric optical switching technology. By employing novel scheduling algorithms for both unicast and multicast traffic, the switch will have a maximum throughput close to 100%. Using novel highspeed chip-to-chip serial link technology, we plan to reduce the physical size and complexity of the switch, as well as the system pin-count.

@inproceedings{mckeown1996tinytera, title={The Tiny Tera: A Packet Switch Core}, author={McKeown, Nick and Izzard, Martin and Mekkittikul, Adisak and Ellersick, Bill and Horowitz, Mark}, booktitle={Hot Interconnects V}, year={1996} }

The Tiny Tera: A Small High-Bandwidth Packet Switch Core

Nick McKeown, Martin Izzard, Adisak Mekkittikul, Bill Ellersick, Mark Horowitz

Proceedings of Hot Inteconnects IV, pp. 161-173, Stanford, August 1996

PDF

In this paper, we present the aggregate bandwidth of 320Gb/s. The fixed-size packet switch suitable for a wide range of applications such as a highperformance ATM switch, the core of an Internet router or as a fast multiprocessor interconnect. Using off-the-shelf technology, we plan to demonstrate that a very highbandwidth switch can be built without technology. By employing novel scheduling algorithms for both unicast and multicast traffic, the switch will have a maximum throughput close to 100%. Using novel highspeed chip-to-chip serial link technology, we plan to reduce the physical size and complexity of the switch, as well as the system pin-count.

@inproceedings{mckeown1996tiny, title = {The Tiny Tera: A Small High-Bandwidth Packet Switch Core}, author = {Nick McKeown, Martin Izzard, Adisak Mekkittikul, Bill Ellersick, Mark Horowitz}, booktitle = {Proceedings of Hot Inteconnects IV, pp. 161-173, Stanford, August 1996}, year = {1996} }

A 50 Gb/s 32×32 CMOS Crossbar Chip Using Asymmetric Serial Links

Kun-Yung Ken Chang, Shang-Tse Chuang, Nick McKeown, Mark Horowitz

Symposium on VLSI Circuits, 1999

PDF

Second is to compensate for delay variations in the clock distribution. A 32 x 32 synchronous crossbar chip was designed in a 0.27pm During normal operations, all 32 serial link blocks work as Dumb CMOS technology for use in a high-speed network switch [I]. The Ends of the Asymmetric Serial Links, which do not adjust their own crossbar chip uses 32 Asymmetric Serial Links [2][3] to achieve high transmitter and receiver clocks. However, for testing purposes, two speed at the interfaces and to reduce both power and area. The crossbar serial link blocks were designed as Smart Ends with PLLs to adjust switch core is implemented with static CMOS multi-stage multiplexors their clocks, but may also be operated as Dumb Ends. with multicast capability. The chip operates successfully with links The crossbar controller manages the initial calibration process, running at 1.6 Gb/s. The measured bit-error-rate is < when all disabling links before they are properly synchronized. As a result, when channels and the switch core are operating. The crossbar chip consumes a serial link is disconnected, it will never be synchronized, so it will 5W and provides a total bandwidth above 50 Gb/s. always be disabled by the controller without any explicit logic. The Asymmetric Serial Links use differential-pair open drain Architecture transmitters and current-integrating receivers [6].

@inproceedings{chang1999crossbar, title={A 50 Gb/s 32x32 CMOS Crossbar Chip Using Asymmetric Serial Links}, author={Chang, Kun-Yung Ken and Chuang, Shang-Tse and McKeown, Nick and Horowitz, Mark}, booktitle={Symposium on VLSI Circuits}, year={1999} }

The Tiny Tera: A Small High-Bandwidth ATM Switch

Nick McKeown, Martin Izzard, Adisak Mekkittikul;

Proceedings of SPIE 96, Vol. 2917, pp. 387-397, Boston, November 1996

PDF

In this paper, we present the aggregate bandwidth of 320Gb/s. The fixed-size packet switch suitable for a wide range of applications such as a highperformance ATM switch, the core of an Internet router or as a fast multiprocessor interconnect. Using off-the-shelf technology, we plan to demonstrate that a very highbandwidth switch can be built without technology. By employing novel scheduling algorithms for both unicast and multicast traffic, the switch will have a maximum throughput close to 100%. Using novel highspeed chip-to-chip serial link technology, we plan to reduce the physical size and complexity of the switch, as well as the system pin-count.

@inproceedings{mckeown1996tiny, title = {The Tiny Tera: A Small High-Bandwidth ATM Switch}, author = {Nick McKeown, Martin Izzard, Adisak Mekkittikul;}, booktitle = {Proceedings of SPIE 96, Vol. 2917, pp. 387-397, Boston, November 1996}, year = {1996} }

A 2 Gb/s Asymmetric Serial Link for High-Bandwidth Packet Switches

Ken K.-Y. Chang, William Ellersick, Shang-Tse Chuang, Stefanos Sidiropoulos, Mark Horowitz, Nick McKeown

Hot Interconnects VI, pp. 171-179, Stanford University, August 1997

PDF

This paper describes the design of a novel CMOS 2 Gb/s asymmetric serial link. The serial link is designed for systems that use high speed chip-to-chip communications. In such designs, power dissipation is a common problem, particularly when multiple serial links are required on one chip. The power arises primarily from the phase adjustment circuitry used to align data with the clock. This circuitry is usually placed at the receiver, but in our asymmetric link design we take a different approach. We first assume that a link consists of two unidirectional connections — one for each direction of the link. We move the phase adjustment circuitry from one end of the link to the other, adjusting the phase of the transmitter rather than the receiver. Although this does not reduce overall system power, it allows us to choose the location of the phase adjustment circuitry, moving it from chips with a large number of links to chips with a smaller number. The link was designed for use in the Tiny Tera packet switch, which has a Simulations suggests that the power dissipation of serial links on the crossbar switch can be reduced by a factor of 4.

@inproceedings{chang1997, title = {A 2 Gb/s Asymmetric Serial Link for High-Bandwidth Packet Switches}, author = {Ken K.-Y. Chang, William Ellersick, Shang-Tse Chuang, Stefanos Sidiropoulos, Mark Horowitz, Nick McKeown}, booktitle = {Hot Interconnects VI, pp. 171-179, Stanford University, August 1997}, year = {1997} }

The performance of circuit switching in the Internet

Pablo Molinero-Fernandez, Nick McKeown

OSA Journal of Optical Networking, Vol. 2, No. 4, March 2003

PDF

We analyze the performance of circuit switching as an alternative to packet switching in the Internet. While packet switching has traditionally been considered superior for bursty data traffic, we show that circuit switching can be surprisingly efficient for certain Internet traffic patterns. We compare the throughput and delay characteristics of circuit and packet switching under realistic traffic conditions.

@article{molinero-fernandez2003performance, title = {The performance of circuit switching in the Internet}, author = {Pablo Molinero-Fernandez, Nick McKeown}, journal = {OSA Journal of Optical Networking, Vol. 2, No. 4, March 2003}, year = {2003} }

Is IP Going to Take Over the World (of Communications)?

Pablo Molinero-Fernandez, Nick McKeown, Hui Zhang

HotNets-I, Princeton, NJ, October 2002

PDF

The Internet Protocol (IP) has rapidly become the dominant protocol for data networks and is increasingly being used for voice, video, and other communications. We examine whether IP will eventually replace all other communications technologies, or whether some applications will continue to use specialized networks and protocols. We consider the technical, economic, and operational factors that will determine the extent to which IP takes over the world of communications.

@inproceedings{molinero2002ip, title={Is IP Going to Take Over the World (of Communications)?}, author={Molinero-Fernandez, Pablo and McKeown, Nick and Zhang, Hui}, booktitle={HotNets-I}, year={2002} }

TCP Switching: Exposing circuits to IP

Pablo Molinero-Fernandez, Nick McKeown

Hot Interconnects IX, Stanford University, August 2001

PDF

We propose TCP Switching, a mechanism that exposes circuit-like behavior to IP networks. The key idea is to identify long-lived TCP flows and switch them onto dedicated circuits, while short flows continue to be packet-switched. We show that this hybrid approach can reduce the load on packet routers while maintaining the flexibility of IP networking.

@inproceedings{molinero-fernandez2001tcp, title = {TCP Switching: Exposing circuits to IP}, author = {Pablo Molinero-Fernandez, Nick McKeown}, booktitle = {Hot Interconnects IX, Stanford University, August 2001}, year = {2001} }

A 2.5Tb/s Switch Core with LCS Interface

Nick McKeown, Costas Calamvokis, and Shang-tse Chuang

Hot Chips 13, Stanford, CA, August 2001

PDF

We describe a 2.5 Tb/s switch core with an LCS (Linecard to Switch) interface. The switch core is designed for high-bandwidth packet switching and uses a single-stage crossbar architecture with the iSLIP scheduling algorithm. We describe the chip design, the interface protocol, and the performance characteristics of the switch core.

@inproceedings{mckeown2001tbs, title = {A 2.5Tb/s Switch Core with LCS Interface}, author = {Nick McKeown, Costas Calamvokis, and Shang-tse Chuang}, booktitle = {Hot Chips 13, Stanford, CA, August 2001}, year = {2001} }

TCP Switching: Exposing circuits to IP

Pablo Molinero-Fernandez, Nick McKeown

IEEE Micro, Vol. 22, No. 1, Jan/Feb 2002

PDF

We propose TCP Switching, a mechanism that exposes circuit-like behavior to IP networks. The key idea is to identify long-lived TCP flows and switch them onto dedicated circuits, while short flows continue to be packet-switched. We show that this hybrid approach can reduce the load on packet routers while maintaining the flexibility of IP networking.

@article{molinero-fernandez2002tcp, title = {TCP Switching: Exposing circuits to IP}, author = {Pablo Molinero-Fernandez, Nick McKeown}, journal = {IEEE Micro, Vol. 22, No. 1, Jan/Feb 2002}, year = {2002} }

A Simulation Study of IP Switching

Steven Lin, Nick McKeown

ACM SIGCOMM, September 1997

PDF

Recently there has been much interest in combining the speed of layer-2 switching with the features of layer-3 routing. This has been prompted by numerous proposals, including: IP Switching, Tag Switching, ARIS, CSR, and IP over ATM. In this paper, we study IP Switching using a detailed simulation model. We simulate the handling of IP traffic by an IP Switch, comparing it with a conventional store-and-forward IP router. We find that IP Switching can significantly reduce the amount of IP route processing needed in the router, and that most data packets can be switched directly at layer 2.

@inproceedings{lin1997simulation, title={A Simulation Study of IP Switching}, author={Lin, Steven and McKeown, Nick}, booktitle={ACM SIGCOMM}, year={1997} }

The Bay Bridge: A High Speed Bridge/Router

Nick McKeown, Richard Edell, and My T. Le

IFIP Workshop, Stockholm, Sweden, 1992

PDF

We describe the Bay Bridge, a high-speed bridge/router designed to interconnect FDDI and SMDS networks. The Bay Bridge operates at rates significantly higher than existing bridge/routers, using a custom hardware architecture optimized for high-throughput packet forwarding. We describe the design, implementation, and performance of the system.

@inproceedings{mckeown1992bay, title = {The Bay Bridge: A High Speed Bridge/Router}, author = {Nick McKeown, Richard Edell, and My T. Le}, booktitle = {IFIP Workshop, Stockholm, Sweden, 1992}, year = {1992} }

Architecture and Performance of The BayBridge: A High Speed Bridge/Router between FDDI and SMDS

Nick McKeown, Richard Edell, and My T. Le

Technical Report

PDF

We describe the Bay Bridge, a high-speed bridge/router designed to interconnect FDDI and SMDS networks. The Bay Bridge operates at rates significantly higher than existing bridge/routers, using a custom hardware architecture optimized for high-throughput packet forwarding. We describe the design, implementation, and performance of the system.

@techreport{mckeownarchitecture, title = {Architecture and Performance of The BayBridge: A High Speed Bridge/Router between FDDI and SMDS}, author = {Nick McKeown, Richard Edell, and My T. Le}, institution = {Technical Report}, year = {} }

A High Performance SMDS Interface at STS-3c Rate

My T. Le, Nick McKeown, and Richard Edell

Technical Report

PDF

We describe a high-performance SMDS (Switched Multi-megabit Data Service) interface operating at the STS-3c rate of 155 Mb/s. The interface is designed for use in high-speed bridge/routers and provides efficient packet processing for SMDS traffic. We describe the architecture, implementation, and performance characteristics of the interface.

@techreport{lehigh, title = {A High Performance SMDS Interface at STS-3c Rate}, author = {My T. Le, Nick McKeown, and Richard Edell}, institution = {Technical Report}, year = {} }

Output-buffer ATM Packet Switching for Integrated-Services Communication Networks

George Kesidis, and Nick McKeown

Presented at ICC '97, Montreal, Canada, Aug 1997

PDF

We describe an output-buffered ATM packet switch architecture for integrated-services communication networks. Output-buffered switches provide ideal performance characteristics, including deterministic delay bounds and lossless operation. We analyze the design requirements and present an architecture that makes output buffering practical for ATM networks carrying mixed traffic types.

@inproceedings{kesidis1997outputbuffer, title = {Output-buffer ATM Packet Switching for Integrated-Services Communication Networks}, author = {George Kesidis, and Nick McKeown}, booktitle = {Presented at ICC '97, Montreal, Canada, Aug 1997}, year = {1997} }

A Fast Switched Backplane for a Gigabit Switched Router

Nick McKeown

Business Communications Review, December 1997

PDF

There is a new trend in the architecture of high performance Internet routers: congested, shared backplanes are being replaced by much faster switched backplanes that allow multiple packets to be transferred simultaneously. In this paper, we study why the demand for router performance means that switched backplanes are needed, and examine the technical problems that must be solved in their design. In particular, we focus on the switched backplane developed for the Cisco 12000 GSR. This router has a high-performance switched backplane architecture capable of switching 16 ports simultaneously, each with a line rate of 2.4Gb/s. The backplane uses a number of new technologies that enable a parallel, compact design providing extremely high throughput for both unicast and multicast traffic. Integrated support for priorities on the backplane allows the router to provide distinguished qualities of service for multimedia applications.

@article{mckeown1997fast, title={A Fast Switched Backplane for a Gigabit Switched Router}, author={McKeown, Nick}, journal={Business Communications Review}, year={1997} }

OpenPipes: Making Distributed Hardware Systems Easier

Glen Gibb, Nick McKeown

Proceedings of the International Conference on Field-Programmable Technology (FPT), Tsinghua University, Beijing, China, December 2010

PDF

@inproceedings{gibb2010openpipes, title={OpenPipes: Making Distributed Hardware Systems Easier}, author={Gibb, Glen and McKeown, Nick}, booktitle={Proceedings of the International Conference on Field-Programmable Technology (FPT)}, address={Beijing, China}, year={2010} }

Hands-on with the NetFPGA to build a Gigabit-rate Router

Nick McKeown, John W. Lockwood, Jad Naous, Glen Gibb, Adam Covington

Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects (HOTI), Stanford, CA, pp. 7–10, August 2007

PDF

@inproceedings{mckeown2007netfpgahands, title={Hands-on with the NetFPGA to build a Gigabit-rate Router}, author={McKeown, Nick and Lockwood, John W. and Naous, Jad and Gibb, Glen and Covington, Adam}, booktitle={Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects (HOTI)}, address={Stanford, CA}, pages={7--10}, year={2007} }

The PyramidSnapshot Challenge: Understanding Student Process from Visual Output of Programs

Lisa Yan, Nick McKeown, Chris Piech

ACM SIGCSE 2019

PDF

In the ideal CS1 classroom, we should understand programming process---how student code evolves over time. However, for graphics-based programming assignments, the task of understanding and grading final solutions, let alone thousands of intermediate steps, is incredibly labor-intensive. In this work, we present a challenge, a dataset, and a promising first solution to autonomously use image output to identify functional, intermediate stages of a student solution. By using computer vision techniques to associate visual output of intermediate student code with functional progress, we supplement a lot of the teacher labor associated with understanding graphics-based, open-ended assignments. We hope our publication of the dataset used in our case study sparks discussion in the community on how to analyze programs with visual program output.

@inproceedings{yan2019pyramid, title={The PyramidSnapshot Challenge: Understanding Student Process from Visual Output of Programs}, author={Yan, Lisa and McKeown, Nick and Piech, Chris}, booktitle={ACM SIGCSE}, year={2019} }

Co-Teaching Computer Science Across Borders: Human-Centric Learning at ScaleBest Paper 2020

Chris Piech, Lisa Yan, Lisa Einstein, Ana Saavedra, Baris Bozkurt, Eliska Sestakova, Ondrej Guth, Nick McKeown

ACM L@S 2020

PDF

Programming is fast becoming a required skill set for students in every country. We present CS Bridge, a model for cross-border co-teaching of CS1, along with a corresponding open-source course-in-a-box curriculum made for easy localization. In the CS Bridge model, instructors and student-teachers from different countries come together to teach a short, stand-alone CS1 course to hundreds of local high school students. The corresponding open-source curriculum has been specifically designed to be easily adapted to a wide variety of local teaching practices, languages, and cultures. Over the past six years, the curriculum has been used to teach CS1 material to over 1,000 high school students in Colombia, the Czech Republic, Turkey, and Guinea. A large majority of our students continue on to study CS or CS-related fields in university. More importantly, many of our undergraduate student-teachers stay involved with teaching beyond the program. Joint teaching creates a positive, high-quality learning experience for students around the world and a powerful, high-impact professional development experience for the teaching team---instructors and student-teachers alike.

@inproceedings{piech2020coteaching, title={Co-Teaching Computer Science Across Borders: Human-Centric Learning at Scale}, author={Piech, Chris and Yan, Lisa and Einstein, Lisa and Saavedra, Ana and Bozkurt, Baris and Sestakova, Eliska and Guth, Ondrej and McKeown, Nick}, booktitle={ACM L@S}, year={2020} }

TMOSS: Using Intermediate Assignment Work to Understand Excessive Collaboration in Large Classes

Lisa Yan, Nick McKeown, Mehran Sahami, Chris Piech

ACM SIGCSE, Baltimore, MD, 2018

PDF

We present TMOSS, a tool that uses intermediate snapshots of student programming assignments to identify excessive collaboration. Unlike traditional plagiarism detection tools such as MOSS that compare only final submissions, TMOSS analyzes the evolution of student code over time, comparing intermediate versions to detect patterns of code sharing that would otherwise be missed. We evaluate TMOSS on a large introductory programming course at Stanford and show that it can more accurately identify cases where students have shared code.

@inproceedings{yan2018tmoss, title={TMOSS: Using Intermediate Assignment Work to Understand Excessive Collaboration in Large Classes}, author={Yan, Lisa and McKeown, Nick and Sahami, Mehran and Piech, Chris}, booktitle={ACM SIGCSE}, year={2018} }

Learning Networking by Reproducing Research ResultsCCR Best Paper 2017

Lisa Yan, Nick McKeown

ACM SIGCOMM Computer Communication Review, 2017

PDF

In the past five years, the graduate networking course at Stanford has assigned over 200 students the task of reproducing results from over 40 networking papers. We began the project as a means of teaching both engineering rigor and critical thinking, qualities that are necessary for careers in networking research and industry. We have observed that reproducing research can simultaneously be a tool for education and a means for students to contribute to the networking community. Through this editorial we describe our project in reproducing network research and show through anecdotal evidence that this project is important for both the classroom and the networking community at large, and we hope to encourage other institutions to host similar class projects.

@article{yan2017learning, title={Learning Networking by Reproducing Research Results}, author={Yan, Lisa and McKeown, Nick}, journal={ACM SIGCOMM Computer Communication Review}, year={2017} }

NetFPGA: Reusable Router Architecture for Experimental Research

Jad Naous, Glen Gibb, Sara Bolouki, and Nick McKeown

To appear: SIGCOMM PRESTO Workshop, Seattle, WA, August 2008

PDF

Our goal is to enable fast prototyping of networking hardware (e.g. modified Ethernet switches and IP routers) for teaching and research. To this end, we built and made available the NetFPGA platform. Starting from open-source reference designs, students and researchers create their designs in Verilog, and then download them to the NetFPGA board where they can process packets at line-rate for 4-ports of 1GE. The board is becoming widely used for teaching and research, and so it has become important to make it easy to re-use modules and designs. We have created a standard interface between modules, making it easier to plug modules together in pipelines, and to create new re-usable designs. In this paper we describe our modular design, and how we have used it to build several systems, including our IP router reference design and some extensions to it.

@inproceedings{naous2008netfpga, title = {NetFPGA: Reusable Router Architecture for Experimental Research}, author = {Jad Naous, Glen Gibb, Sara Bolouki, and Nick McKeown}, booktitle = {To appear: SIGCOMM PRESTO Workshop, Seattle, WA, August 2008}, year = {2008} }

FPGA Research Design Platform Fuels Network Advances

Michaela Blott, Jonathan Ellithorpe, Nick McKeown, Kees Vissers, Hongyi Zeng

Xcell Journal, Fourth Quarter, 2010

PDF

We describe how the NetFPGA platform, an FPGA-based research design platform, has fueled advances in networking research. NetFPGA provides researchers with a hardware platform for building and testing networking systems at gigabit rates. We survey the research projects that have used NetFPGA and describe how the platform has enabled rapid prototyping of new networking ideas.

@article{blott2010fpga, title = {FPGA Research Design Platform Fuels Network Advances}, author = {Michaela Blott, Jonathan Ellithorpe, Nick McKeown, Kees Vissers, Hongyi Zeng}, journal = {Xcell Journal, Fourth Quarter, 2010}, year = {2010} }

NetFPGA – Open Platform for Teaching How to Build Gigabit-rate Network Switches and Routers

Glen Gibb, John W. Lockwood, Jad Naous, Paul Hartke, Nick McKeown

IEEE Transactions on Education, 2008

PDF

The NetFPGA platform enables students and researchers to build high-performance networking systems using Field Programmable Gate Array (FPGA) hardware. A new version of the NetFPGA platform has been developed and is available for use by the academic community. The NetFPGA platform has modular interfaces that enable development of complex hardware designs by integration of simple building blocks. FPGA logic is used to implement the core data processing functions while software running on an attached host computer or embedded cores within the device implement control functions. Reference designs and component libraries have been developed for the CS344 course at Stanford University and an open-source Verilog reference design is available for download from the project website.

@article{gibb2008netfpga, title={NetFPGA: An Open Platform for Teaching How to Build Gigabit-Rate Network Switches and Routers}, author={Gibb, Glen and Lockwood, John W. and Naous, Jad and Hartke, Paul and McKeown, Nick}, journal={IEEE Transactions on Education}, year={2008} }

Teaching Networking Hardware

Martin Casado, Gregory Watson, Nick McKeown

ITiCSE, Monte de Caparica, Portugal, June 2005

PDF

We describe our experience using reconfigurable networking hardware to teach networking concepts. Students build complete networking systems, including routers and switches, using the NetFPGA platform. The hands-on approach gives students a deeper understanding of how networking hardware works and enables them to experiment with new ideas in a realistic setting.

@inproceedings{casado2005teaching, title = {Teaching Networking Hardware}, author = {Martin Casado, Gregory Watson, Nick McKeown}, booktitle = {ITiCSE, Monte de Caparica, Portugal, June 2005}, year = {2005} }

Reconfigurable Networking Hardware: A Classroom Tool

Martin Casado, Gregory Watson, Nick McKeown

Hot Interconnects 13, Stanford, August 2005

PDF

We present an educational platform for teaching the design, debugging and deployment of real networking equipment in the operational Internet. The emphasis of our work is on teaching and, therefore, on providing an environment that is flexible, robust, low cost and easy to use. The platform is built around ‘NetFPGAs’ — custom boards containing eight ethernet ports and two FPGAs. NetFPGA boards, when used with VNS (Virtual Network System — another tool we have developed), can be integrated into dynamically configurable network topologies reachable from the Internet. VNS enables a user-space process running on any remote computer to function as a system controller for the NetFPGA boards. NetFPGA and VNS are used at Stanford in a graduate level networking course to teach router implementation in hardware and software.

@inproceedings{casado2005reconfigurable, title = {Reconfigurable Networking Hardware: A Classroom Tool}, author = {Martin Casado, Gregory Watson, Nick McKeown}, booktitle = {Hot Interconnects 13, Stanford, August 2005}, year = {2005} }

The Clack Graphical Router: Visualizing Network Software

Dan Wendlandt, Martin Casado, Paul Tarjan, Nick McKeown

ACM Symposium on Software Visualization, September 2006

PDF

We present Clack, a graphical environment for teaching students how Internet routers work and other core networking concepts. Clack is a router written as a Java Applet, and routes live network traffic in real-time. Students can look inside the router to see how packets are processed, and watch the dynamics of the queues. They can modify and enhance the router, making it process packets as they wish. Clack provides multiple views of the operational router including the full network topology, the router’s software components, and a packet-level view of the traffic as it passes through the router. Clack’s detailed visual interface to the software internals of a functioning router, as well as its ability to modify and observe live Internet traffic, provide a unique environment to aid in networking education. Over the last two years, Clack has been used in the classroom at six universities. Feedback from the students through anonymous, formal evaluations has been positive. In this paper we describe the goals and design of Clack as well as our experiences using it in the classroom.

@inproceedings{wendlandt2006clack, title = {The Clack Graphical Router: Visualizing Network Software}, author = {Dan Wendlandt, Martin Casado, Paul Tarjan, Nick McKeown}, booktitle = {ACM Symposium on Software Visualization, September 2006}, year = {2006} }

NetFPGA - An Open Platform for Gigabit-rate Network Switching and Routing

John W. Lockwood, Nick McKeown, Greg Watson, Glen Gibb, Paul Hartke, Jad Naous, Ramanan Raghuraman, and Jianying Luo

MSE 2007, San Diego, June 2007

PDF

The NetFPGA platform enables students and researchers to build high-performance networking systems in hardware. A new version of the NetFPGA platform has been developed and is available for use by the academic community. The NetFPGA 2.1 platform now has interfaces that can be parameterized, therefore enabling development of modular hardware designs with varied word sizes. It also includes more logic and faster memory than the previous platform. Field Programmable Gate Array (FPGA) logic is used to implement the core data processing functions while software running on embedded cores within the FPGA and/or programs running on an attached host computer implement only control functions. Reference designs and component libraries have been developed for the CS344 course at Stanford University. Open-source Verilog code is available for download from the project website.

@inproceedings{lockwood2007netfpga, title = {NetFPGA - An Open Platform for Gigabit-rate Network Switching and Routing}, author = {John W. Lockwood, Nick McKeown, Greg Watson, Glen Gibb, Paul Hartke, Jad Naous, Ramanan Raghuraman, and Jianying Luo}, booktitle = {MSE 2007, San Diego, June 2007}, year = {2007} }

The Virtual Network System

Martin Casado, Nick McKeown

ACM SIGCSE, St. Louis, Missouri, February 2005

PDF

The goal of our work is to give students a hands-on experience designing, deploying and debugging parts of the Internet infrastructure. Real networks are difficult to experiment with: they are expensive, complex and widely shared. Our approach is to replace the hardware components that perform packet forwarding with software equivalents that run in the user-space of real computers. The Virtual Network System (VNS) is a teaching tool designed for undergraduate and graduate networking courses. VNS allows students to build and test routing software without access to real hardware and without detailed knowledge of device drivers or kernel software. Students can write forwarding software and test it on a controlled topology connected to the real Internet. VNS has been used in networking courses at Stanford, MIT, and Princeton.

@inproceedings{casado2005virtual, title={The Virtual Network System}, author={Casado, Martin and McKeown, Nick}, booktitle={ACM SIGCSE}, year={2005} }

The Stanford Virtual Router: a teaching tool and network simulator

Martin Casado, Vikram Vijayaraghavan, Guido Appenzeller, Nick McKeown

ACM SIGCOMM Computer Communication Review, Volume 32 , Issue 3 (July 2002)

PDF

We describe the Stanford Virtual Router, a software tool designed for teaching networking concepts and simulating network behavior. The virtual router allows students to build and configure network topologies, implement routing protocols, and observe packet-level behavior without requiring physical networking hardware. We describe the tool's architecture and our experience using it in networking courses.

@inproceedings{casado2002stanford, title = {The Stanford Virtual Router: a teaching tool and network simulator}, author = {Martin Casado, Vikram Vijayaraghavan, Guido Appenzeller, Nick McKeown}, booktitle = {ACM SIGCOMM Computer Communication Review, Volume 32 , Issue 3 (July 2002)}, year = {2002} }

A New Direction for Machine Learning in Criminal Law

Kristen Bell, Jenny Hong, Nick McKeown, Catalin Voss

2021

PDF

Machine learning is increasingly being used in the criminal justice system, particularly for recidivism prediction and bail decisions. We propose a new direction: instead of using machine learning to make predictions about people, we should use it to improve the quality and consistency of human decision-making in criminal law. We identify applications where ML can reduce bias and improve accuracy, including analyzing police reports, improving forensic evidence analysis, and helping to identify potential wrongful convictions.

@article{bell2021new, title={A New Direction for Machine Learning in Criminal Law}, author={Bell, Kristen and Hong, Jenny and McKeown, Nick and Voss, Catalin}, year={2021} }

Can Overlay hosting services make ip ossification irrelevant

J Turner, and N McKeown

cabernet.cs.princeton.edu, 2007

PDF

We explore whether overlay hosting services can make IP ossification irrelevant. IP ossification refers to the difficulty of deploying new protocols and services in the Internet due to the large installed base of IP routers. We argue that overlay networks and hosting services can provide a practical path for deploying new network services without requiring changes to the underlying IP infrastructure.

@inproceedings{turner2007can, title = {Can Overlay hosting services make ip ossification irrelevant}, author = {J Turner, and N McKeown}, booktitle = {cabernet.cs.princeton.edu, 2007}, year = {2007} }

The Effectiveness of Whitelisting: a User-Study

David Erickson, Martin Casado, and Nick McKeown

Conference on Email and Anti-Spam, Mountain View, CA, August 2008

PDF

We believe this paper is the first extensive user-study of whitelisting email addresses. While whitelists are common in social net- (e.g., working and instant messaging buddylists), they have not caught on in email systems. Instead, most of us use spam filters that try to identify all the senders we do not want to accept email from. With whitelists weonlyneedtoidentifythemuchsmallerset can of users who legitimately send us email - generally most of these users are known to us. In our study we built and deployed a whitelisting email service - named DOEmail - along with a Thunderbird add-on to try and make whitelists easy to manage. During the past two years, over 120 users have used DOEmail. As expected, we find that almost no spam makes it to users’ inboxes, and less than 1% of legitimate email is mis-classified. We measure how hard it is for users to manage their whitelists, and conclude that after initial setup, the burden is very low. Our systemuses challenge-responsesto identifywhetherunknownsendersarelegitimate. The jury is still out as to how effective they are, and we continue to study them.

@inproceedings{erickson2008effectiveness, title = {The Effectiveness of Whitelisting: a User-Study}, author = {David Erickson, Martin Casado, and Nick McKeown}, booktitle = {Conference on Email and Anti-Spam, Mountain View, CA, August 2008}, year = {2008} }

Monitoring Very High Speed Links

Gianluca Iannaccone, Christophe Diot, Ian Graham, Nick McKeown

ACM SIGCOMM Internet Measurement Workshop, 2001

PDF

Our goal is to build passive monitoring equipment for use at 10Gb/s (e.g. 10GE and OC-192) and above. We already have in place an OC-48 passive monitoring system for capturing and storing a detailed record for every packet. But because of constraints on storage and bus bandwidth this will not be feasible at 10Gb/s and above. Therefore, taking advantage of the fact that packets can be considered as belonging to flows, our system will store per-flow records that are created at the time of capture, and stored alongside small per-packet records. This way storage requirements can be reduced several-fold. Results indicate that it will be possible to capture and store detailed flow information at OC-192 without losing much information compared to our OC-48 packet traces.

@inproceedings{iannaccone2001monitoring, title={Monitoring Very High Speed Links}, author={Iannaccone, Gianluca and Diot, Christophe and Graham, Ian and McKeown, Nick}, booktitle={ACM SIGCOMM Internet Measurement Workshop}, year={2001} }

Xdistribute: A Process Distribution System

Karl Petty, and Nick McKeown

Technical Report

PDF

We describe Xdistribute, a system for distributing processes across multiple machines. Xdistribute provides a simple programming model that allows applications to take advantage of distributed computing resources transparently. We describe the system architecture, the distribution mechanisms, and the performance characteristics of the system.

@techreport{pettyxdistribute, title = {Xdistribute: A Process Distribution System}, author = {Karl Petty, and Nick McKeown}, institution = {Technical Report}, year = {} }

Billing Users and Pricing for TCP

Richard Edell, Nick McKeown, Pravin Varaiya

IEEE JSAC Special Issue on Advances in the Fundamentals of Networking, September 1995

PDF

We address the problem of how to bill users for TCP traffic and how to design pricing schemes that promote efficient use of network resources. We analyze the interaction between TCP's congestion control mechanism and usage-based pricing, and show that naive pricing approaches can lead to undesirable outcomes. We propose pricing mechanisms that are compatible with TCP behavior and provide proper incentives for users to choose appropriate service levels.

@article{edell1995billing, title={Billing Users and Pricing for TCP}, author={Edell, Richard and McKeown, Nick and Varaiya, Pravin}, journal={IEEE JSAC}, year={1995} }

Automatic Vehicle Control Developments in the PATH Program

Steven E. Shladover, Charles A. Desoer, J. Karl Hedrick, Masayoshi Tomizuka, Jean Walrand, Wei-Bin Zhang, Donn H. McMahon, Huei Peng, Shahab Sheikholeslam, Nick McKeown

IEEE Transactions on Vehicular Technology, Vol. 40, No. 1, February 1991

PDF

This paper describes the accomplishments and directions of the California PATH program in automatic vehicle control. The research and development work progresses from individual vehicle control, through coordination of closely spaced vehicles in platoons, to the full automation of mixed traffic on a highway. We describe the key technology developments and field demonstrations, including longitudinal and lateral vehicle control, vehicle-to-vehicle communication, and automated highway systems.

@article{shladover1991automatic, title={Automatic Vehicle Control Developments in the PATH Program}, author={Shladover, Steven E. and Desoer, Charles A. and Hedrick, J. Karl and Tomizuka, Masayoshi and Walrand, Jean and Zhang, Wei-Bin and McMahon, Donn H. and Peng, Huei and Sheikholeslam, Shahab and McKeown, Nick}, journal={IEEE Transactions on Vehicular Technology}, volume={40}, number={1}, year={1991} }

Nick McKeown – Papers

Top 10 Most Cited Papers

OpenFlow: Enabling Innovation in Campus Networks

P4: Programming Protocol-Independent Packet ProcessorsSIGCOMM Test of Time 2024CCR Best Paper 2015SIGCOMM Network Systems Award 2025

A Network in a Laptop: Rapid Prototyping for Software-Defined NetworksSIGCOMM Test of Time 2020

iSLIP: A Scheduling Algorithm for Input-Queued Switches

Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

ElasticTree: Saving Energy in Data Center Networks

Ethane: Taking Control of the EnterpriseSIGCOMM Test of Time 2017

Achieving 100% Throughput in an Input-Queued Switch

pFabric: Minimal Near-Optimal Datacenter Transport

Sizing Router BuffersSIGCOMM Test of Time 2015

Programmable Networks

Enabling the Reflex Plane with the nanoPU

From Sand to Flour: The Next Leap in Granular Computing with NanoSort

The nanoPU: Redesigning the CPU-Network Interface to Minimize RPC Tail Latency

The P4→NetFPGA Workflow for Line-Rate Packet Processing

p4v: Practical Verification for Programmable Data Planes

AppSwitch: Application-layer Load Balancing within a Software Switch

Packet Transactions: High-Level Programming for Line-Rate Switches

Programmable Packet Scheduling at Line Rate

PISCES: A Programmable, Protocol-Independent Software Switch

Towards Programmable Packet Scheduling

Compiling Packet Programs to Reconfigurable Switches

P4: Programming Protocol-Independent Packet ProcessorsSIGCOMM Test of Time 2024CCR Best Paper 2015SIGCOMM Network Systems Award 2025

Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

Programming the Network Dataplane in P4 (in Chinese)

Software Defined Networks

Plug-n-Serve: Load-Balancing Web Traffic using OpenFlow

How the U.S. National Science Foundation Enabled Software-Defined Networking

Revitalizing the Public Internet by Making It Extensible

Using Deep Programmability to Put Network Owners in Control

Democratizing the Network Edge

From Ethane to SDN and Beyond

Event-Driven Packet Processing

The Case for a Network Fast Path to the CPU

BeHop: A Testbed for Dense WiFi Networks

OSNT: Open Source Network Tester

Flow Caching For High Entropy Packet Fields

Maturing of OpenFlow and Software-Defined Networking through Deployments

Rethinking IP Core Networks

Design Principles for Packet Parsers

The Controller Placement Problem

Outsourcing Network Functionality

Why OpenFlow/SDN Can Succeed Where GMPLS Failed

MPLS-TE and MPLS VPNS with OpenFlow

Slicing Home Networks

Application-Aware Aggregation and Traffic Engineering in a Converged Packet-Circuit Network

MPLS with a Simple OPEN Control Plane

Initial thoughts on custom network processing via waypoint services

Architecting for Innovation

OpenPipes: Making Distributed Hardware Systems Easier

Can the Production Network Be the Testbed?

Packet and Circuit Network Convergence with OpenFlow

OpenRoads: Empowering Research in Mobile Networks

Delegating Network Security Through More Information

Demo: Lossless Handover with n-casting between WiFi-WiMAX on OpenRoads

The Stanford OpenRoads Deployment

Unifying Packet and Circuit Switched Networks

Towards Software-Friendly Networks

PhoneNet: a Phone-to-Phone Network for Group Communication within an Administrative Domain

Experimental Demonstration of OpenFlow Control of Packet and Circuit Switches

Demo: Carving Research Slices Out of Your Production Networks with OpenFlow

Rethinking Enterprise Network Control

Implementing an OpenFlow Switch on the NetFPGA platform

NOX: Towards an Operating System for Networks

OpenFlow: Enabling Innovation in Campus Networks

Ethane: Taking Control of the EnterpriseSIGCOMM Test of Time 2017

Prototyping Fast, Simple, Secure Switches for Ethane

SANE: A Protection Architecture for Enterprise Networks

Network Verification & Debugging

Hydra: Effective Runtime Network Verification

I Know What Your Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks

Libra: Divide and Conquer to Verify Forwarding Tables in Huge Networks

Real Time Network Policy Checking using Header Space Analysis

Leveraging SDN Layering to Systematically Troubleshoot Networks

Automatic Test Packet Generation

Where is the Debugger for my Software-Defined Network?Best Student Presentation

Header Space Analysis: Static Checking for NetworksNSDI Test of Time 2024

Formal Network Testing