Research

In my research, I identify and solve fundamental problems that limit the performance and availability of large-scale cloud networks. To do so, I develop techniques grounded in a broad set of domains, including graph theory, optimization, and formal methods, while closely tying them to real-world systems and their practical constraints.

So far, my approach has led to the development of:

Scalable algorithms and optimization methods with formal guarantees, which have resulted in both conceptual advances and tangible impact in areas such as resource allocation and clock synchronization.
Systems and methods that identify, explain, and effectively address performance gaps in both handcrafted and learning-enabled heuristics.

Currently, I am extending this agenda by redesigning large-scale training and inference pipelines to power the next generation of AI models.

Selected Publications

SIGCOMM
Firefly: Scalable, Ultra-Accurate Clock Synchronization for Datacenters

Pooria Namyar, Yuliang Li, Weitao Wang, Nandita Dukkipati, Kk Yap, Junzhi Gong, Chen Chen, Peixuan Gao, Devdeep Ray, Gautam Kumar, and 3 more authors

In Proceedings of the ACM SIGCOMM 2025 Conference, São Francisco Convent, Coimbra, Portugal, 2025

Abs Bib Link

Cloud-based financial exchanges require sub-10ns device-to-device clock synchronization accuracy while adhering to Coordinated Universal Time (UTC). Existing clock sync techniques struggle to meet this demand at scale and are vulnerable to clock drift, jitter, and path asymmetries. Firefly, a software-driven datacenter clock sync system, scalably, cost-effectively, and reliably achieves very high clock sync accuracy. It employs a distributed consensus algorithm on a random overlay graph to rapidly converge to a common time while applying gradual adjustments to device hardware clocks. To realize consistent sync-to-UTC (external sync) across devices while maintaining a stable device-to-device internal sync, Firefly uses a novel technique, layered synchronization, that decouples internal and external syncs. In a 248-machine Clos network, Firefly achieves sub-10ns device-to-device and ≤1μs device-to-UTC sync, and is resilient to time server failure and unstable clocks.
@inproceedings{namyar2025Firefly, author = {Namyar, Pooria and Li, Yuliang and Wang, Weitao and Dukkipati, Nandita and Yap, Kk and Gong, Junzhi and Chen, Chen and Gao, Peixuan and Ray, Devdeep and Kumar, Gautam and Ma, Yidan and Govindan, Ramesh and Vahdat, Amin}, title = {Firefly: Scalable, Ultra-Accurate Clock Synchronization for Datacenters}, year = {2025}, isbn = {9798400715242}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3718958.3750502}, doi = {10.1145/3718958.3750502}, booktitle = {Proceedings of the ACM SIGCOMM 2025 Conference}, pages = {434–452}, numpages = {19}, keywords = {clock synchronization, financial exchange, UTC synchronization, path asymmetry, distributed consensus}, location = {S\~{a}o Francisco Convent, Coimbra, Portugal}, series = {SIGCOMM '25}, }
SIGCOMM
ZENITH: Towards A Formally Verified Highly-Available Control Plane

Pooria Namyar, Arvin Ghavidel, Mingyang Zhang, Harsha V. Madhyastha, Srivatsan Ravi, Chao Wang, and Ramesh Govindan

In Proceedings of the ACM SIGCOMM 2025 Conference, São Francisco Convent, Coimbra, Portugal, 2025

Abs Bib Link

Today, large-scale software-defined networks use microservice-based controllers. Bugs in these controllers can reduce network availability by making the data plane state inconsistent with the high-level intent. To recover from such inconsistencies, modern controllers periodically reconcile the state of all the switches with the desired intent. However, periodic reconciliation limits the availability and performance of the network at scale. We introduce Zenith, a microservice-based controller that avoids inconsistencies by design rather than always relying on recovery mechanisms. We have formally verified Zenith’s specifications and have proved that it ensures the network state will eventually be consistent with intent. We automatically generate Zenith’s code from its specification to minimize the likelihood of errors in the final implementation. Zenith’s guarantees and abstractions also enable developers to independently verify SDN applications and ensure end-to-end safety and correctness. Zenith resolves inconsistencies 5\texttimes faster than today’s designs and significantly improves availability.
@inproceedings{namyar2025Zenith, author = {Namyar, Pooria and Ghavidel, Arvin and Zhang, Mingyang and Madhyastha, Harsha V. and Ravi, Srivatsan and Wang, Chao and Govindan, Ramesh}, title = {ZENITH: Towards A Formally Verified Highly-Available Control Plane}, year = {2025}, isbn = {9798400715242}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3718958.3750533}, doi = {10.1145/3718958.3750533}, booktitle = {Proceedings of the ACM SIGCOMM 2025 Conference}, pages = {409–433}, numpages = {25}, keywords = {software defined networking, formal methods, availability}, location = {S\~{a}o Francisco Convent, Coimbra, Portugal}, series = {SIGCOMM '25}, }
NSDI
Solving Max-Min Fair Resource Allocations Quickly on Large Graphs

Pooria Namyar, Behnaz Arzani, Srikanth Kandula, Santiago Segarra, Daniel Crankshaw, Umesh Krishnaswamy, Ramesh Govindan, and Himanshu Raj

In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), Apr 2024

Abs Bib Link Code

We consider the max-min fair resource allocation problem. The best-known solutions use either a sequence of optimizations or waterfilling, which only applies to a narrow set of cases. These solutions have become a practical bottleneck in WAN traffic engineering and cluster scheduling, especially at larger problem sizes. We improve both approaches: (1) we show how to convert the optimization sequence into a single fast optimization, and (2) we generalize waterfilling to the multi-path case. We empirically show our new algorithms Pareto-dominate prior techniques: they produce faster, fairer, and more efficient allocations. Some of our allocators also have theoretical guarantees: they trade off a bounded amount of unfairness for faster allocation. We have deployed our allocators in Azure’s WAN traffic engineering pipeline, where we preserve solution quality and achieve a roughly 3× speedup.
@inproceedings{namyar2024maxminfair, author = {Namyar, Pooria and Arzani, Behnaz and Kandula, Srikanth and Segarra, Santiago and Crankshaw, Daniel and Krishnaswamy, Umesh and Govindan, Ramesh and Raj, Himanshu}, title = {Solving {Max-Min} Fair Resource Allocations Quickly on Large Graphs}, booktitle = {21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)}, year = {2024}, isbn = {978-1-939133-39-7}, address = {Santa Clara, CA}, pages = {1937--1958}, url = {https://www.usenix.org/conference/nsdi24/presentation/namyar-solving}, publisher = {USENIX Association}, month = apr, }
NSDI
Finding Adversarial Inputs for Heuristics using Multi-level Optimization

Pooria Namyar, Behnaz Arzani, Ryan Beckett, Santiago Segarra, Himanshu Raj, Umesh Krishnaswamy, Ramesh Govindan, and Srikanth Kandula

In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), Apr 2024

Abs Bib Link Code

Production systems use heuristics because they are faster or scale better than their optimal counterparts. Yet, practitioners are often unaware of the performance gap between a heuristic and the optimum or between two heuristics in realistic scenarios. We present MetaOpt, a system that helps analyze heuristics. Users specify the heuristic and the optimal (or another heuristic) as input, and MetaOpt automatically encodes these efficiently for a solver to find performance gaps and their corresponding adversarial inputs. Its suite of built-in optimizations helps it scale its analysis to practical problem sizes. To show it is versatile, we used MetaOpt to analyze heuristics from three domains (traffic engineering, vector bin packing, and packet scheduling). We found a production traffic engineering heuristic can require 30% more capacity than the optimal to satisfy realistic demands. Based on the patterns in the adversarial inputs MetaOpt produced, we modified the heuristic to reduce its performance gap by 12.5×. We examined adversarial inputs to a vector bin packing heuristic and proved a new lower bound on its performance.
@inproceedings{namyar2024metaopt, author = {Namyar, Pooria and Arzani, Behnaz and Beckett, Ryan and Segarra, Santiago and Raj, Himanshu and Krishnaswamy, Umesh and Govindan, Ramesh and Kandula, Srikanth}, title = {Finding Adversarial Inputs for Heuristics using Multi-level Optimization}, booktitle = {21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)}, year = {2024}, isbn = {978-1-939133-39-7}, address = {Santa Clara, CA}, pages = {927--949}, url = {https://www.usenix.org/conference/nsdi24/presentation/namyar-finding}, publisher = {USENIX Association}, month = apr, }
SIGCOMM
A Throughput-Centric View of the Performance of Datacenter Topologies

Pooria Namyar, Sucha Supittayapornpong, Mingyang Zhang, Minlan Yu, and Ramesh Govindan

In Proceedings of the 2021 ACM SIGCOMM 2021 Conference, Apr 2021

Abs Bib Link Code

While prior work has explored many proposed datacenter designs, only two designs, Clos-based and expander-based, are generally considered practical because they can scale using commodity switching chips. Prior work has used two different metrics, bisection bandwidth and throughput, for evaluating these topologies at scale. Little is known, theoretically or practically, how these metrics relate to each other. Exploiting characteristics of these topologies, we prove an upper bound on their throughput, then show that this upper bound better estimates worst-case throughput than all previously proposed throughput estimators and scales better than most of them. Using this upper bound, we show that for expander-based topologies, unlike Clos, beyond a certain size of the network, no topology can have full throughput, even if it has full bisection bandwidth; in fact, even relatively small expander-based topologies fail to achieve full throughput. We conclude by showing that using throughput to evaluate datacenter performance instead of bisection bandwidth can alter conclusions in prior work about datacenter cost, manageability, and reliability.
@inproceedings{PooriaTUB, title = {A Throughput-Centric View of the Performance of Datacenter Topologies}, author = {Namyar, Pooria and Supittayapornpong, Sucha and Zhang, Mingyang and Yu, Minlan and Govindan, Ramesh}, url = {https://dx.doi.org/10.1145/3452296.3472913}, doi = {10.1145/3452296.3472913}, isbn = {9781450383837}, year = {2021}, date = {2021-08-09}, urldate = {2021-08-09}, booktitle = {Proceedings of the 2021 ACM SIGCOMM 2021 Conference}, pages = {349–369}, publisher = {Association for Computing Machinery}, address = {Virtual Event, USA}, series = {SIGCOMM '21}, keywords = {datacenter, network management, nsl, throughput}, pubstate = {published}, tppubtype = {inproceedings}, }