This page lists selected publications grouped by topics. Please refer to my CV for a full list of publications in a reverse chronological order.
* indicates equal contribution. Advisees are underlined. † indicates equal advising. 🎉 indicates highlighted works.
Stochastic Approximation Theory and Convergence of Reinforcement Learning
-
[JMLR 2025] The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise.
Shuze Liu, Shuhang Chen, Shangtong Zhang. -
[arXiv 2024] Almost Sure Convergence Rates and Concentration of Stochastic Approximation and Reinforcement Learning with Markovian Noise.
Xiaochi Qian*, Zixuan Xie*, Xinyu Liu*, Shangtong Zhang. -
[arXiv 2024] Almost Sure Convergence of Average Reward Temporal Difference Learning.
Ethan Blaser, Shangtong Zhang. -
[arXiv 2024] Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features.
Jiuqi Wang, Shangtong Zhang. -
[ICML 2023] On the Convergence of SARSA with Linear Function Approximation.
Shangtong Zhang, Remi Tachet des Combes, Romain Laroche. -
[JMLR 2022] Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch.
Shangtong Zhang, Remi Tachet des Combes†, Romain Laroche†.
In-Context Reinforcement Learning
- [ICLR 2025] Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning.
Jiuqi Wang*, Ethan Blaser*, Hadi Daneshmand, Shangtong Zhang.
QuantCo Spotlight Award at the ICML Workshop on In-Context Learning, 2024.
Efficient Monte Carlo Evaluation
-
[ICLR 2025] Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning.
Claire Chen*, Shuze Liu*, Shangtong Zhang. -
[ICLR 2025] Doubly Optimal Policy Evaluation for Reinforcement Learning.
Shuze Liu, Claire Chen, Shangtong Zhang. -
[AAAI 2025] Efficient Multi-Policy Evaluation for Reinforcement Learning.
Shuze Liu, Claire Chen, Shangtong Zhang.
Oral Presentation -
[ICML 2024] Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design.
Shuze Liu, Shangtong Zhang. -
[AAAI 2023] A New Challenge in Policy Evaluation.
Shangtong Zhang.
Applications
-
[arXiv 2024] CRASH: Challenging Reinforcement-Learning Based Adversarial Scenarios For Safety Hardening.
Amar Kulkarni, Shangtong Zhang, Madhur Behl. -
[arXiv 2023] StarCraft II Unplugged: Large Scale Offline Reinforcement Learning
Michael Mathieu*, Sherjil Ozair*, Srivatsan Srinivasan*, Caglar Gulcehre*, Shangtong Zhang*, Ray Jiang*, Tom Le Paine*, Richard Powell, Konrad Zolna, Julian Schrittwieser, David Choi, Petko Georgiev, Daniel Toyama, Aja Huang, Roman Ring, Igor Babuschkin, Timo Ewalds, Mahyar Bordbar, Sarah Henderson, Sergio Gomez Colmenarejo, Aaron van den Oord, Wojciech Marian Czarnecki, Nando de Freitas, Oriol Vinyals.
Breaking the Deadly Triad
-
[ICLR 2025] Revisiting a Design Choice in Gradient Temporal Difference Learning.
Xiaochi Qian, Shangtong Zhang. -
[JMLR 2022] Truncated Emphatic Temporal Difference Methods for Prediction and Control.
Shangtong Zhang, Shimon Whiteson. -
[ICML 2021] Breaking the Deadly Triad with a Target Network.
Shangtong Zhang, Hengshuai Yao, Shimon Whiteson. -
[ICML 2021] Average-Reward Off-Policy Policy Evaluation with Function Approximation.
Shangtong Zhang*, Yi Wan*, Richard S. Sutton, Shimon Whiteson. -
[ICML 2020] GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values.
Shangtong Zhang, Bo Liu, Shimon Whiteson. -
[ICML 2020] Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation.
Shangtong Zhang, Bo Liu, Hengshuai Yao, Shimon Whiteson. -
[AAMAS 2020] Deep Residual Reinforcement Learning.
Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson.
Best Paper Award. -
[NeurIPS 2019] Generalized Off-Policy Actor-Critic.
Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson.
Risk-Averse Reinforcement Learning
- [AAAI 2021] Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning.
Shangtong Zhang, Bo Liu, Shimon Whiteson.
Predictive Knowledge and General Value Function
- [NeurIPS 2020] Learning Retrospective Knowledge with Reverse Reinforcement Learning.
Shangtong Zhang, Vivek Veeriah, Shimon Whiteson.
Hierarchical Reinforcement Learning
-
[NeurIPS 2019] DAC: The Double Actor-Critic Architecture for Learning Options.
Shangtong Zhang, Shimon Whiteson. -
[AAAI 2019] ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search.
Shangtong Zhang, Hao Chen, Hengshuai Yao.
Spotlight Presentation -
[AAAI 2019] QUOTA: The Quantile Option Architecture for Reinforcement Learning.
Shangtong Zhang, Borislav Mavrin, Linglong Kong, Bo Liu, Hengshuai Yao.
Oral Presentation
A Deeper Look at Something
-
[AAMAS 2022] A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms.
Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson, Remi Tachet des Combes.
Oral Presentation -
[Deep RL Symposium, NIPS 2017] A Deeper Look at Experience Replay.
Shangtong Zhang, Richard S. Sutton.