Publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2026
- NSDIMetaEase: Heuristic Analysis from Source Code via Symbolic-Guided OptimizationPantea Karimi, Siva Kesava Reddy Kakarla, Pooria Namyar, Santiago Segarra, Ryan Beckett, Mohammad Alizadeh, and Behnaz ArzaniIn 23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26), Apr 2026
@inproceedings{karimi2026metaease, author = {Karimi, Pantea and Kakarla, Siva Kesava Reddy and Namyar, Pooria and Segarra, Santiago and Beckett, Ryan and Alizadeh, Mohammad and Arzani, Behnaz}, title = {MetaEase: Heuristic Analysis from Source Code via Symbolic-Guided Optimization}, booktitle = {23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26)}, year = {2026}, publisher = {USENIX Association}, month = apr, }
2025
- SIGCOMMFirefly: Scalable, Ultra-Accurate Clock Synchronization for DatacentersPooria Namyar, Yuliang Li, Weitao Wang, Nandita Dukkipati, Kk Yap, Junzhi Gong, Chen Chen, Peixuan Gao, Devdeep Ray, Gautam Kumar, and 3 more authorsIn Proceedings of the ACM SIGCOMM 2025 Conference, São Francisco Convent, Coimbra, Portugal, Apr 2025
Cloud-based financial exchanges require sub-10ns device-to-device clock synchronization accuracy while adhering to Coordinated Universal Time (UTC). Existing clock sync techniques struggle to meet this demand at scale and are vulnerable to clock drift, jitter, and path asymmetries. Firefly, a software-driven datacenter clock sync system, scalably, cost-effectively, and reliably achieves very high clock sync accuracy. It employs a distributed consensus algorithm on a random overlay graph to rapidly converge to a common time while applying gradual adjustments to device hardware clocks. To realize consistent sync-to-UTC (external sync) across devices while maintaining a stable device-to-device internal sync, Firefly uses a novel technique, layered synchronization, that decouples internal and external syncs. In a 248-machine Clos network, Firefly achieves sub-10ns device-to-device and ≤1μs device-to-UTC sync, and is resilient to time server failure and unstable clocks.
@inproceedings{namyar2025Firefly, author = {Namyar, Pooria and Li, Yuliang and Wang, Weitao and Dukkipati, Nandita and Yap, Kk and Gong, Junzhi and Chen, Chen and Gao, Peixuan and Ray, Devdeep and Kumar, Gautam and Ma, Yidan and Govindan, Ramesh and Vahdat, Amin}, title = {Firefly: Scalable, Ultra-Accurate Clock Synchronization for Datacenters}, year = {2025}, isbn = {9798400715242}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3718958.3750502}, doi = {10.1145/3718958.3750502}, booktitle = {Proceedings of the ACM SIGCOMM 2025 Conference}, pages = {434–452}, numpages = {19}, keywords = {clock synchronization, financial exchange, UTC synchronization, path asymmetry, distributed consensus}, location = {S\~{a}o Francisco Convent, Coimbra, Portugal}, series = {SIGCOMM '25}, } - SIGCOMMZENITH: Towards A Formally Verified Highly-Available Control PlanePooria Namyar, Arvin Ghavidel, Mingyang Zhang, Harsha V. Madhyastha, Srivatsan Ravi, Chao Wang, and Ramesh GovindanIn Proceedings of the ACM SIGCOMM 2025 Conference, São Francisco Convent, Coimbra, Portugal, Apr 2025
Today, large-scale software-defined networks use microservice-based controllers. Bugs in these controllers can reduce network availability by making the data plane state inconsistent with the high-level intent. To recover from such inconsistencies, modern controllers periodically reconcile the state of all the switches with the desired intent. However, periodic reconciliation limits the availability and performance of the network at scale. We introduce Zenith, a microservice-based controller that avoids inconsistencies by design rather than always relying on recovery mechanisms. We have formally verified Zenith’s specifications and have proved that it ensures the network state will eventually be consistent with intent. We automatically generate Zenith’s code from its specification to minimize the likelihood of errors in the final implementation. Zenith’s guarantees and abstractions also enable developers to independently verify SDN applications and ensure end-to-end safety and correctness. Zenith resolves inconsistencies 5\texttimes faster than today’s designs and significantly improves availability.
@inproceedings{namyar2025Zenith, author = {Namyar, Pooria and Ghavidel, Arvin and Zhang, Mingyang and Madhyastha, Harsha V. and Ravi, Srivatsan and Wang, Chao and Govindan, Ramesh}, title = {ZENITH: Towards A Formally Verified Highly-Available Control Plane}, year = {2025}, isbn = {9798400715242}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3718958.3750533}, doi = {10.1145/3718958.3750533}, booktitle = {Proceedings of the ACM SIGCOMM 2025 Conference}, pages = {409–433}, numpages = {25}, keywords = {software defined networking, formal methods, availability}, location = {S\~{a}o Francisco Convent, Coimbra, Portugal}, series = {SIGCOMM '25}, } - SIGCOMMRaha: A General Tool to Analyze WAN DegradationBehnaz Arzani, Sina Taheri, Pooria Namyar, Ryan Beckett, Siva Kesava Reddy Kakarla, and Elnaz JalilipourIn Proceedings of the ACM SIGCOMM 2025 Conference, São Francisco Convent, Coimbra, Portugal, Apr 2025
Raha is the first general tool that can analyze probable degradation of traffic engineered networks under arbitrary failures and traffic shifts to prevent outages. Raha addresses a significant gap in prior work which consider only (1) ≤ k failures; (2) specific traffic engineering schemes; and (3) the maximum impact of failures irrespective of the network design point.Our insight is to formulate the problem in terms of heuristic analysis, where one seeks to maximize the performance gap between the network design point (i.e., the network with no failures) and the network under failures. We invent techniques that allow us to exploit the mechanisms within these tools to encode the problem into components which can handle them. We present extensive experiments on Microsoft’s production network and those of Topology Zoo that demonstrate Raha is scalable and can effectively solve the problem. We use Raha to propose capacity augments that allow operators to mitigate potential problems and avoid future outages. Our results show Raha can find ≥ 2\texttimes higher degradations compared to those tools that only consider up to 2 failures.
@inproceedings{Arzani2025Raha, author = {Arzani, Behnaz and Taheri, Sina and Namyar, Pooria and Beckett, Ryan and Kakarla, Siva Kesava Reddy and Jalilipour, Elnaz}, title = {Raha: A General Tool to Analyze WAN Degradation}, year = {2025}, isbn = {9798400715242}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3718958.3754348}, doi = {10.1145/3718958.3754348}, booktitle = {Proceedings of the ACM SIGCOMM 2025 Conference}, pages = {184–202}, numpages = {19}, keywords = {traffic engineering, network performance analysis, network reliability}, location = {S\~{a}o Francisco Convent, Coimbra, Portugal}, series = {SIGCOMM '25}, } - NSDIEnhancing Network Failure Mitigation with Performance-Aware RankingPooria Namyar, Arvin Ghavidel, Daniel Crankshaw, Daniel S. Berger, Kevin Hsieh, Srikanth Kandula, Ramesh Govindan, and Behnaz ArzaniIn 22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25), Apr 2025
Cloud providers install mitigations to reduce the impact of network failures within their datacenters. Existing network mitigation systems rely on simple local criteria or global proxy metrics to determine the best action. In this paper, we show that we can support a broader range of actions and select more effective mitigations by directly optimizing end-to-end flow-level metrics and analyzing actions holistically. To achieve this, we develop novel techniques to quickly estimate the impact of different mitigations and rank them with high fidelity. Our results on incidents from a large cloud provider show orders of magnitude improvements in flow completion time and throughput. We also show our approach scales to large datacenters.
@inproceedings{namyar2025mitigation, author = {Namyar, Pooria and Ghavidel, Arvin and Crankshaw, Daniel and Berger, Daniel S. and Hsieh, Kevin and Kandula, Srikanth and Govindan, Ramesh and Arzani, Behnaz}, title = {Enhancing Network Failure Mitigation with {Performance-Aware} Ranking}, booktitle = {22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25)}, year = {2025}, isbn = {978-1-939133-46-5}, address = {Philadelphia, PA}, pages = {335--357}, url = {https://www.usenix.org/conference/nsdi25/presentation/namyar}, publisher = {USENIX Association}, month = apr, } - NSDIEverything Matters in Programmable Packet SchedulingAlbert Gran Alcoz, Balázs Vass, Pooria Namyar, Behnaz Arzani, Gabor Retvari, and Laurent VanbeverIn 22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25), Apr 2025
Operators can deploy any scheduler they desire on existing switches through programmable packet schedulers: they tag packets with ranks (which indicate their priority) and schedule them in the order of these ranks. The ideal programmable scheduler is the Push-In First-Out (PIFO) queue, which schedules packets in a perfectly sorted order by “pushing” packets into any position of the queue based on their ranks. However, it is hard to implement PIFO queues in hardware due to their need to sort packets at line rate (based on their ranks). Recent proposals approximate PIFO behaviors on existing data-planes. While promising, they fail to simultaneously capture both of the necessary behaviors of PIFO queues: their scheduling behavior and admission control. We introduce PACKS, an approximate PIFO scheduler that addresses this problem. PACKS runs on top of a set of priority queues and uses packet-rank information and queue-occupancy levels during enqueue to determine whether to admit each incoming packet and to which queue it should be mapped. We fully implement PACKS in P4 and evaluate it on real workloads. We show that PACKS better approximates PIFO than state-of-the-art approaches. Specifically, PACKS reduces the rank inversions by up to 7× and 15× with respect to SP-PIFO and AIFO, and the number of packet drops by up to 60% compared to SP-PIFO. Under pFabric ranks, PACKS reduces the mean FCT across small flows by up to 33% and 2.6×, compared to SP-PIFO and AIFO. We also show that PACKS runs at line rate on existing hardware (Intel Tofino).
@inproceedings{alcoz2025packs, author = {Alcoz, Albert Gran and Vass, Bal{\'a}zs and Namyar, Pooria and Arzani, Behnaz and Retvari, Gabor and Vanbever, Laurent}, title = {Everything Matters in Programmable Packet Scheduling}, booktitle = {22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25)}, year = {2025}, isbn = {978-1-939133-46-5}, address = {Philadelphia, PA}, pages = {1467--1485}, url = {https://www.usenix.org/conference/nsdi25/presentation/alcoz}, publisher = {USENIX Association}, month = apr, }
2024
- NSDISolving Max-Min Fair Resource Allocations Quickly on Large GraphsPooria Namyar, Behnaz Arzani, Srikanth Kandula, Santiago Segarra, Daniel Crankshaw, Umesh Krishnaswamy, Ramesh Govindan, and Himanshu RajIn 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), Apr 2024
We consider the max-min fair resource allocation problem. The best-known solutions use either a sequence of optimizations or waterfilling, which only applies to a narrow set of cases. These solutions have become a practical bottleneck in WAN traffic engineering and cluster scheduling, especially at larger problem sizes. We improve both approaches: (1) we show how to convert the optimization sequence into a single fast optimization, and (2) we generalize waterfilling to the multi-path case. We empirically show our new algorithms Pareto-dominate prior techniques: they produce faster, fairer, and more efficient allocations. Some of our allocators also have theoretical guarantees: they trade off a bounded amount of unfairness for faster allocation. We have deployed our allocators in Azure’s WAN traffic engineering pipeline, where we preserve solution quality and achieve a roughly 3× speedup.
@inproceedings{namyar2024maxminfair, author = {Namyar, Pooria and Arzani, Behnaz and Kandula, Srikanth and Segarra, Santiago and Crankshaw, Daniel and Krishnaswamy, Umesh and Govindan, Ramesh and Raj, Himanshu}, title = {Solving {Max-Min} Fair Resource Allocations Quickly on Large Graphs}, booktitle = {21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)}, year = {2024}, isbn = {978-1-939133-39-7}, address = {Santa Clara, CA}, pages = {1937--1958}, url = {https://www.usenix.org/conference/nsdi24/presentation/namyar-solving}, publisher = {USENIX Association}, month = apr, } - NSDIFinding Adversarial Inputs for Heuristics using Multi-level OptimizationPooria Namyar, Behnaz Arzani, Ryan Beckett, Santiago Segarra, Himanshu Raj, Umesh Krishnaswamy, Ramesh Govindan, and Srikanth KandulaIn 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), Apr 2024
Production systems use heuristics because they are faster or scale better than their optimal counterparts. Yet, practitioners are often unaware of the performance gap between a heuristic and the optimum or between two heuristics in realistic scenarios. We present MetaOpt, a system that helps analyze heuristics. Users specify the heuristic and the optimal (or another heuristic) as input, and MetaOpt automatically encodes these efficiently for a solver to find performance gaps and their corresponding adversarial inputs. Its suite of built-in optimizations helps it scale its analysis to practical problem sizes. To show it is versatile, we used MetaOpt to analyze heuristics from three domains (traffic engineering, vector bin packing, and packet scheduling). We found a production traffic engineering heuristic can require 30% more capacity than the optimal to satisfy realistic demands. Based on the patterns in the adversarial inputs MetaOpt produced, we modified the heuristic to reduce its performance gap by 12.5×. We examined adversarial inputs to a vector bin packing heuristic and proved a new lower bound on its performance.
@inproceedings{namyar2024metaopt, author = {Namyar, Pooria and Arzani, Behnaz and Beckett, Ryan and Segarra, Santiago and Raj, Himanshu and Krishnaswamy, Umesh and Govindan, Ramesh and Kandula, Srikanth}, title = {Finding Adversarial Inputs for Heuristics using Multi-level Optimization}, booktitle = {21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)}, year = {2024}, isbn = {978-1-939133-39-7}, address = {Santa Clara, CA}, pages = {927--949}, url = {https://www.usenix.org/conference/nsdi24/presentation/namyar-finding}, publisher = {USENIX Association}, month = apr, } - HotNetsEnd-to-End Performance Analysis of Learning-enabled SystemsPooria Namyar, Michael Schapira, Ramesh Govindan, Santiago Segarra, Ryan Beckett, Siva Kesava Reddy Kakarla, and Behnaz ArzaniIn Proceedings of the 23rd ACM Workshop on Hot Topics in Networks, Irvine, CA, USA, Apr 2024
We propose a performance analysis tool for learning-enabled systems that allows operators to uncover potential performance issues before deploying DNNs in their systems. The tools that exist for this purpose require operators to faithfully model all components (a white-box approach) or do inefficient black-box local search. We propose a gray-box alternative, which eliminates the need to precisely model all the system’s components. Our approach is faster and finds substantially worse scenarios compared to prior work. We show that a state-of-the-art learning-enabled traffic engineering pipeline can underperform the optimal by 6\texttimes — a much higher number compared to what the authors found.
@inproceedings{namyar2024learning, author = {Namyar, Pooria and Schapira, Michael and Govindan, Ramesh and Segarra, Santiago and Beckett, Ryan and Kakarla, Siva Kesava Reddy and Arzani, Behnaz}, title = {End-to-End Performance Analysis of Learning-enabled Systems}, year = {2024}, isbn = {9798400712722}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3696348.3696875}, doi = {10.1145/3696348.3696875}, booktitle = {Proceedings of the 23rd ACM Workshop on Hot Topics in Networks}, pages = {86–94}, numpages = {9}, keywords = {Machine Learning for Systems, Performance Analysis}, location = {Irvine, CA, USA}, series = {HOTNETS '24}, } - HotNetsTowards Safer Heuristics With XPlainPantea Karimi, Solal Pirelli, Siva Kesava Reddy Kakarla, Ryan Beckett, Santiago Segarra, Beibin Li, Pooria Namyar, and Behnaz ArzaniIn Proceedings of the 23rd ACM Workshop on Hot Topics in Networks, Irvine, CA, USA, Apr 2024
Many problems that cloud operators solve are computationally expensive, and operators often use heuristic algorithms (that are faster and scale better than optimal) to solve them more efficiently. Heuristic analyzers enable operators to find when and by how much their heuristics underperform. However, these tools do not provide enough detail for operators to mitigate the heuristic’s impact in practice: they only discover a single input instance that causes the heuristic to underperform (and not the full set) and they do not explain why.We propose XPlain, a tool that extends these analyzers and helps operators understand when and why their heuristics underperform. We present promising initial results that show such an extension is viable.
@inproceedings{karimi2024Xplain, author = {Karimi, Pantea and Pirelli, Solal and Kakarla, Siva Kesava Reddy and Beckett, Ryan and Segarra, Santiago and Li, Beibin and Namyar, Pooria and Arzani, Behnaz}, title = {Towards Safer Heuristics With XPlain}, year = {2024}, isbn = {9798400712722}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3696348.3696884}, doi = {10.1145/3696348.3696884}, booktitle = {Proceedings of the 23rd ACM Workshop on Hot Topics in Networks}, pages = {68–76}, numpages = {9}, keywords = {Domain-Specific Language, Explainable Analysis, Heuristic Analysis}, location = {Irvine, CA, USA}, series = {HOTNETS '24}, }
2023
- ToNOptimal Oblivious Routing With Concave Objectives for Structured NetworksKanatip Chitavisutthivong, Sucha Supittayapornpong, Pooria Namyar, Mingyang Zhang, Minlan Yu, and Ramesh GovindanIEEE/ACM Transactions on Networking, Apr 2023
Oblivious routing distributes traffic from sources to destinations following predefined routes with rules independent of traffic demands. While finding optimal oblivious routing with a concave objective is intractable for general topologies, we show that it is tractable for structured topologies often used in datacenter networks. To achieve this, we apply graph automorphism and prove the existence of the optimal automorphism-invariant solution. This result reduces the search space to targeting the optimal automorphism-invariant solution. We design an iterative algorithm to obtain such a solution by alternating between convex optimization and a linear program. The convex optimization finds an automorphism-invariant solution based on representative variables and constraints, making the problem tractable. The linear program generates adversarial demands to ensure the final result satisfies all possible demands. Since the construction of the representative variables and constraints are combinatorial problems, we design polynomial-time algorithms for the construction. We evaluate the iterative algorithm in terms of throughput performance, scalability, and generality over three potential applications. The algorithm i) improves the throughput up to 87.5% for partially deployed FatTree and achieves up to 2.55\times throughput gain for DRing over heuristic algorithms, ii) scales for three considered topologies with a thousand switches, iii) applies to a general structured topology with non-uniform link capacity and server distribution.
@article{Chitavis2023, author = {Chitavisutthivong, Kanatip and Supittayapornpong, Sucha and Namyar, Pooria and Zhang, Mingyang and Yu, Minlan and Govindan, Ramesh}, journal = {IEEE/ACM Transactions on Networking}, title = {Optimal Oblivious Routing With Concave Objectives for Structured Networks}, year = {2023}, pages = {1-13}, doi = {10.1109/TNET.2023.3264632}, url = {https://ieeexplore.ieee.org/document/10100699} }
2022
- HotNetsMinding the Gap between Fast Heuristics and Their Optimal CounterpartsPooria Namyar, Behnaz Arzani, Ryan Beckett, Santiago Segarra, Himanshu Raj, and Srikanth KandulaIn Proceedings of the 21st ACM Workshop on Hot Topics in Networks, Austin, Texas, Apr 2022
Production systems use heuristics because they are faster or scale better than the corresponding optimal algorithms. Yet, practitioners are often unaware of how worse off a heuristic’s solution may be with respect to the optimum in realistic scenarios. Leveraging two-stage games and convex optimization, we present a provable framework that unveils settings where a given heuristic underperforms.
@inproceedings{HeuristicVerifier, author = {Namyar, Pooria and Arzani, Behnaz and Beckett, Ryan and Segarra, Santiago and Raj, Himanshu and Kandula, Srikanth}, title = {Minding the Gap between Fast Heuristics and Their Optimal Counterparts}, year = {2022}, isbn = {9781450398992}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3563766.3564102}, doi = {10.1145/3563766.3564102}, booktitle = {Proceedings of the 21st ACM Workshop on Hot Topics in Networks}, pages = {138–144}, numpages = {7}, keywords = {network management, adversarial inputs, heuristics}, location = {Austin, Texas}, series = {HotNets '22} } - INFOCOMOptimal Oblivious Routing for Structured NetworksSucha Supittayapornpong, Pooria Namyar, Mingyang Zhang, Minlan Yu, and Ramesh GovindanIn IEEE INFOCOM 2022 - IEEE Conference on Computer Communications, Apr 2022
Oblivious routing distributes traffic from sources to destinations following predefined routes with rules independent of traffic demands. While finding optimal oblivious routing is intractable for general topologies, we show that it is tractable for structured topologies often used in datacenter networks. To achieve this, we apply graph automorphism and prove the existence of the optimal automorphism-invariant solution. This result reduces the search space to targeting the optimal automorphism-invariant solution. We design an iterative algorithm to obtain such a solution by alternating between two linear programs. The first program finds an automorphism-invariant solution based on representative variables and constraints, making the problem tractable. The second program generates adversarial demands to ensure the final result satisfies all possible demands. Since, the construction of the representative variables and constraints are combinatorial problems, we design polynomial-time algorithms for the construction. We evaluate proposed iterative algorithm in terms of throughput performance, scalability, and generality over three potential applications. The algorithm i) improves the throughput up to 87.5% over a heuristic algorithm for partially deployed FatTree, ii) scales for FatClique with a thousand switches, iii) is applicable to a general structured topology with non-uniform link capacity and server distribution.
@inproceedings{Sucha2022, author = {Supittayapornpong, Sucha and Namyar, Pooria and Zhang, Mingyang and Yu, Minlan and Govindan, Ramesh}, booktitle = {IEEE INFOCOM 2022 - IEEE Conference on Computer Communications}, title = {Optimal Oblivious Routing for Structured Networks}, url = {https://ieeexplore.ieee.org/abstract/document/9796682}, year = {2022}, volume = {}, number = {}, pages = {1988-1997}, doi = {10.1109/INFOCOM48880.2022.9796682} }
2021
- SIGCOMMA Throughput-Centric View of the Performance of Datacenter TopologiesPooria Namyar, Sucha Supittayapornpong, Mingyang Zhang, Minlan Yu, and Ramesh GovindanIn Proceedings of the 2021 ACM SIGCOMM 2021 Conference, Apr 2021
While prior work has explored many proposed datacenter designs, only two designs, Clos-based and expander-based, are generally considered practical because they can scale using commodity switching chips. Prior work has used two different metrics, bisection bandwidth and throughput, for evaluating these topologies at scale. Little is known, theoretically or practically, how these metrics relate to each other. Exploiting characteristics of these topologies, we prove an upper bound on their throughput, then show that this upper bound better estimates worst-case throughput than all previously proposed throughput estimators and scales better than most of them. Using this upper bound, we show that for expander-based topologies, unlike Clos, beyond a certain size of the network, no topology can have full throughput, even if it has full bisection bandwidth; in fact, even relatively small expander-based topologies fail to achieve full throughput. We conclude by showing that using throughput to evaluate datacenter performance instead of bisection bandwidth can alter conclusions in prior work about datacenter cost, manageability, and reliability.
@inproceedings{PooriaTUB, title = {A Throughput-Centric View of the Performance of Datacenter Topologies}, author = {Namyar, Pooria and Supittayapornpong, Sucha and Zhang, Mingyang and Yu, Minlan and Govindan, Ramesh}, url = {https://dx.doi.org/10.1145/3452296.3472913}, doi = {10.1145/3452296.3472913}, isbn = {9781450383837}, year = {2021}, date = {2021-08-09}, urldate = {2021-08-09}, booktitle = {Proceedings of the 2021 ACM SIGCOMM 2021 Conference}, pages = {349–369}, publisher = {Association for Computing Machinery}, address = {Virtual Event, USA}, series = {SIGCOMM '21}, keywords = {datacenter, network management, nsl, throughput}, pubstate = {published}, tppubtype = {inproceedings}, }