
I joined OpenAI slightly after ChatGPT launched and have worked on both training infra and systems. Founding member of the Sora project. Notable contributions: • Agent & sampling infra, more efficient sampling algorithm for eval • Algorithms for collective communications that achieved better performance and scalability • Pretraining framework to abstract away the system complexities and make it easier to do model research • Data and caching systems to remove bottlenecks for large and small scale training workloads • Systems to improve resiliency of training workloads

I joined Anyscale, the company behind Ray, as the co-TL for infra. Notable contributions: • Collaborated with OpenAI to improve the end-to-end startup time for launching new clusters and jobs within a cluster • OOM detection and handling to more gracefully handle CPU memory pressure • Instance picker to optimize cluster cost