Dan Qiao’s Homepage

要人心之自由，胸襟开放。要拿全世界人类曾经走过的路，都要算是我走过的路之一；要有一个远见，能超越你未见。要想办法设想，我没见到的地方，那个世界还有可能什么样。 —— 许倬云

To have the freedom of the heart and an open mind. To take all the paths that humanity has walked around the world, they must be considered as one of the paths I have walked; Have a vision that can surpass what you haven’t seen before. Think of a way to imagine what the world could be like in a place I haven’t seen. – Prof. Xu Zhuoyun

About me

I’m currently a year-4 Ph.D. candidate advised by Assistant Prof. Baoxiang Wang and Prof. Hongyuan Zha in the School of Data Science at the Chinese University of Hong Kong, Shenzhen (CUHKSZ). My research interests mainly focus on multi-agent reinforcement learning, LLM post-training, multi-agent systems, and social welfare. My PhD topic is focusing on “How to maintain cooperation of multiple AI agents in an autonomous AI-decision making society”. For more details, please refer to my google scholar.

Before that, I got my Master’s Degree and B.Eng. Degree in Automotive Engineering, advised by Associate Prof. Zhaoxia Peng, at the School of Transportation Science and Engineering, Beihang University in 2021 and 2018 respectively. I worked as a research assistant with Prof. Junge Zhang at the Institute of Automation, Chinese Academy of Sciences (CASIA) in 2022. I am also fortunate to collaborate with Assistant Prof. Wenhao Li in Tongji University, Dr. Binbin Chen in ByteDance Inc., Dr. Lei Song and Dr. Jiang Bian in MicroSoft Research Asia. Grateful to all collaborators and mentors for their passion for research and selfless support throughout my academic journey.

Education

2022 - Present, Ph.D. Candidate - Computer Science, the Chinese University of Hong Kong, Shenzhen, China
2018 - 2021, M. Eng - Automotive Engineering, Beihang University, China
2014 - 2018, B. Eng - Automotive Engineering, Beihang University, China

Internship

May 2026 - Present, Algorithm Intern - XYZL AI Research Institute, Shanghai, China
Research topic: Agentic RL post-training pipeline.
Dec 2025 - Apr 2026, Algorithm Intern - Microsoft Research Asia (MSRA), Beijing, China
Research topic: data selection for more efficient Agentic RL training.
Apr 2025 - Nov 2025, Algorithm Intern - ByteDance Inc., Beijing, China
Research topic: MARL fine-tuning and collaborative reasoning with multiple LLMs.
Dec 2024 - Feb 2025, Algorithm Intern - Gusheng Intelligence, Shenzhen, China
Research topic: distributed RL training for racing-game strategy learning.
Sep 2021 - Jun 2022, Research Assistant - Institute of Automation, CAS (CASIA), Beijing, China
Research topic: privacy-preserving distributed networked MARL and convergence analysis.

Research Interests

My research interests include:

Multi-agent Reinforcement Learning
Sequetial Social Dilemma
Large Language Models & Agents
Mechanism Design for Social Welfare

Pre-prints

Y. Lin, S. Zhu, W. Li, A. Li, D. Qiao, P. Poupart, H. Zha, B. Wang, Policy-Conditioned Policies for Multi-Agent Task Solving, 2025. arXiv:2512.21024
D. Qiao, G. Wen, Z. Peng*, Novel Saturated Nussbaum-type Function based Adaptive Distributed Consensus Control of Multi-agent Systems with Unknown Arbitrary Control Directions. Preprint, 2021. arXiv:2201.09453

Publications

D. Qiao, et al., Epistemic Gain, Aleatoric Cost: Uncertainty Decomposition in Multi-Agent Debate for Math Reasoning. ICML’2026. arXiv:2603.01221
D. Qiao, W. Li, S. Yang, H. Zha, B. Wang*, Offline Multi-Agent Reinforcement Learning via Sequential Score Decomposition. ICML’2026. arXiv:2505.05968
Z. Li, D. Qiao, A. Rahman, S. Leonardos, Y. Du, S. V. Albrecht, STAR-MARL: LLM-based Sub-task Curricula Design. AAAI 2026 LaMAS Workshop.
W. Li, D. Qiao, B. Wang, X. Wang, B. Jin, H. Zha*, Multi-Agent Credit Assignment with Pretrained Language Models. AISTATS 2025. [pdf]

Misc

Welcome to follow my Zhihu account and BiliBili.

Contact

Office: Floor 4, Zhixin Building, CUHKSZ, Shenzhen, 518172