About

Welcome to my homepage!

I am a Scientist at AWS Neuron Science team in the Annapurna Labs, working on the design and support of large-scale pretraining algorithms for AWS Trainium. I received my Ph.D. degree in Computer Science from Cornell University, advised by Chris De Sa. I’m also grateful to work with Vitaly Shmatikov on machine learning privacy. Prior to Cornell, I obtained bachelor degree in Mathematics (ZhiYuan Honors) from Shanghai Jiao Tong University, where I am fourtunate to be advised by John E. Hopcroft and Huan Long.

I work on improving the efficiency of machine learning systems to address growing computational demands. My research focuses on:

Optimizing training and inference efficiency through low-precision methods (e.g., FP8, FP4) and post-training quantization
Robust and scalable learning, including training instabilities and predictive scaling behaviors in LLMs
Designing data-aware representations that leverage the inherent geometry of data

Recently, I have been particularly passionate about algorithm–hardware co-design—understanding the limitations and capabilities of both hardware and software, leveraging their strengths, and developing methods that enable efficient and reliable training and inference. My goal is to simplify the development and deployment of foundation models on hardware. My research interests also extend to private and robust machine learning algorithms.

Tao Yu