Publications
See a full list on Google Scholar
Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs
Yixuan Mei, Yonghao Zhuang, Xupeng Miao, Juncheng Yang, Zhihao Jia, Rashmi Vinayak ASPLOS 2025 [paper]
Llm360: Towards Fully Transparent Open-source LLMs
LLM360 Team COLM 2024 [paper] [blog]
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric Xing, Joseph E Gonzalez, Ion Stoica, Hao Zhang ICLR 2024 [paper]
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric. P Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica NeurIPS 2023 Datasets and Benchmarks [paper]
On Optimizing the Communication of Model Parallelism
Yonghao Zhuang *, Hexu Zhao *, Lianmin Zheng, Zhuohan Li, Eric P. Xing, Qirong Ho, Joseph E. Gonzalez, Ion Stoica, Hao Zhang MLSys 2023 [paper]
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Lianmin Zheng *, Zhuohan Li *, Hao Zhang *, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Joseph E. Gonzalez, Ion Stoica OSDI 2022 [paper] [code]