I work at the intersection of model training and ML systems engineering. Scaled up the first diffusion-based transformer for VideoGen. Distilled larger models. Researched new architecture to improve long-context performance.
My other expertise is in distributed systems — fault tolerance at scale, consensus protocols, data consistency. Tons of fun (and hard) problems that are well understood at this point, with some applicable to machine learning.
Computer use (Jan 2026) Improved the reasoning data and trained a foundation model from scratch that is targeted for computer use.
Grok Fast (Sept 2025) Advocated for Grok Fast and led the midtrain infra work. Scaled the agent pipelines and datasets.
Grok (Jul 2025-) Shipped parallel test-time compute. Enabled GB inference. Scaled and stabilized the RL big run.
Sora (Jan 2024) Shipped Sora 1 as the #1 core contributor, starting as a forward-deployed engineer before transitioning to the project full-time.
Ray (Mar 2023) TL for Anyscale / Ray. Shipped optimistic memory-based scheduling, faster job start, autoscaling. Distributed object store & queue.
Kubernetes (Jul 2015) Core contributor for Google's cluster management systems, transitioning the company from Borg to Kubernetes.