Publications
See a full list on Google Scholar
Efficient Long-context Language Model Training by Core Attention Disaggregation MLSys 2026 [paper]
Yonghao Zhuang, Junda Chen, Bo Pang, Yi Gu, Yibo Zhu, Yimin Jiang, Ion Stoica, Eric Xing, Hao Zhang
Efficiently Scaling LLM Reasoning Programs with Certaindex NeurIPS 2025 [paper]
Yichao Fu, Junda Chen, Siqi Zhu, Zheyu Fu, Zhongdongming Dai, Yonghao Zhuang, Yian Ma, Aurick Qiao, Tajana Rosing, Ion Stoica, Hao Zhang
Scaling Long Context Training Data by Long-Distance Referrals ICLR 2025 [paper]
Yonghao Zhuang, Lanxiang Hu, Longfei Yun, Souvik Kundu, Zhengzhong Liu, Eric P. Xing, Hao Zhang
Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs ASPLOS 2025 [paper]
Yixuan Mei, Yonghao Zhuang, Xupeng Miao, Juncheng Yang, Zhihao Jia, Rashmi Vinayak
Llm360: Towards Fully Transparent Open-source LLMs COLM 2024 [blogpost] [blog]
LLM360 Team
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset ICLR 2024 [paper]
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric Xing, Joseph E Gonzalez, Ion Stoica, Hao Zhang
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena NeurIPS 2023 Datasets and Benchmarks [paper]
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric. P Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica
On Optimizing the Communication of Model Parallelism MLSys 2023 [paper]
Yonghao Zhuang *, Hexu Zhao *, Lianmin Zheng, Zhuohan Li, Eric P. Xing, Qirong Ho, Joseph E. Gonzalez, Ion Stoica, Hao Zhang
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning OSDI 2022 [paper] [code]
Lianmin Zheng *, Zhuohan Li *, Hao Zhang *, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Joseph E. Gonzalez, Ion Stoica
