* indicates equal contribution. † indicates equal advising.
-
Direct Gradient Temporal Difference Learning.
Xiaochi Qian, Shangtong Zhang.
arXiv:2308.01170, 2023 -
Improving Monte Carlo Evaluation with Offline Data.
Shuze Liu, Shangtong Zhang.
arXiv:2301.13734, 2023. -
A New Challenge in Policy Evaluation.
Shangtong Zhang.
New Faculty Highlights at AAAI Conference on Artificial Intelligence (AAAI), 2023. -
On the Convergence of SARSA with Linear Function Approximation.
Shangtong Zhang, Remi Tachet des Combes, Romain Laroche.
International Conference on Machine Learning (ICML), 2023. -
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch.
Shangtong Zhang, Remi Tachet des Combes†, Romain Laroche†.
Journal of Machine Learning Research (JMLR), 2022. -
Truncated Emphatic Temporal Difference Methods for Prediction and Control.
Shangtong Zhang, Shimon Whiteson.
Journal of Machine Learning Research (JMLR), 2022. -
A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms.
Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson, Remi Tachet des Combes.
International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2022.
Oral Presentation -
Learning Expected Emphatic Traces for Deep RL.
Ray Jiang, Shangtong Zhang, Veronica Chelu, Adam White, Hado van Hasselt.
AAAI Conference on Artificial Intelligence (AAAI), 2022. -
StarCraft II Unplugged: Large Scale Offline Reinforcement Learning
Michael Mathieu*, Sherjil Ozair*, Srivatsan Srinivasan*, Caglar Gulcehre*, Shangtong Zhang*, Ray Jiang*, Tom Le Paine*, Richard Powell, Konrad Zolna, Julian Schrittwieser, David Choi, Petko Georgiev, Daniel Toyama, Aja Huang, Roman Ring, Igor Babuschkin, Timo Ewalds, Mahyar Bordbar, Sarah Henderson, Sergio Gomez Colmenarejo, Aaron van den Oord, Wojciech Marian Czarnecki, Nando de Freitas, Oriol Vinyals.
arXiv:2308.03526, 2023
Deep RL Workshop at NeurIPS 2021 -
Breaking the Deadly Triad with a Target Network.
Shangtong Zhang, Hengshuai Yao, Shimon Whiteson.
International Conference on Machine Learning (ICML), 2021. -
Average-Reward Off-Policy Policy Evaluation with Function Approximation.
Shangtong Zhang*, Yi Wan*, Richard S. Sutton, Shimon Whiteson.
International Conference on Machine Learning (ICML), 2021. -
Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning.
Shangtong Zhang, Bo Liu, Shimon Whiteson.
AAAI Conference on Artificial Intelligence (AAAI), 2021. -
Learning Retrospective Knowledge with Reverse Reinforcement Learning.
Shangtong Zhang, Vivek Veeriah, Shimon Whiteson.
Conference on Neural Information Processing Systems (NeurIPS), 2020. -
GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values.
Shangtong Zhang, Bo Liu, Shimon Whiteson.
International Conference on Machine Learning (ICML), 2020. -
Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation.
Shangtong Zhang, Bo Liu, Hengshuai Yao, Shimon Whiteson.
International Conference on Machine Learning (ICML), 2020. -
Deep Residual Reinforcement Learning.
Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson.
International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2020.
Best Paper Award. -
Mega-Reward: Achieving Human-Level Play without Extrinsic Rewards.
Yuhang Song, Jianyi Wang, Thomas Lukasiewicz, Zhenghua Xu, Shangtong Zhang, Mai Xu.
AAAI Conference on Artificial Intelligence (AAAI), 2020. -
Generalized Off-Policy Actor-Critic.
Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson.
Conference on Neural Information Processing Systems (NeurIPS), 2019. -
DAC: The Double Actor-Critic Architecture for Learning Options.
Shangtong Zhang, Shimon Whiteson.
Conference on Neural Information Processing Systems (NeurIPS), 2019. -
Distributional Reinforcement Learning for Efficient Exploration.
Borislav Mavrin, Shangtong Zhang, Hengshuai Yao, Linglong Kong, Kaiwen Wu, Yaoliang Yu.
International Conference on Machine Learning (ICML), 2019.
A short version is accepted as an extended abstract at AAMAS 2019. -
ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search.
Shangtong Zhang, Hao Chen, Hengshuai Yao.
AAAI Conference on Artificial Intelligence (AAAI), 2019. -
QUOTA: The Quantile Option Architecture for Reinforcement Learning.
Shangtong Zhang, Borislav Mavrin, Linglong Kong, Bo Liu, Hengshuai Yao.
AAAI Conference on Artificial Intelligence (AAAI), 2019. -
MLPack 3: A Fast, Flexible Machine Learning Library.
Ryan R Curtin, Marcus Edel, Mikhail Lozhnikov, Yannis Mentekidis, Sumedh Ghaisas, Shangtong Zhang.
Journal of Open Source Software (JOSS), 2018. -
Crossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks.
Vivek Veeriah*, Shangtong Zhang*, Richard S. Sutton.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2017. -
A Deeper Look at Experience Replay.
Shangtong Zhang, Richard S. Sutton.
Deep RL Symposium at NIPS, 2017. -
Comparing Deep Reinforcement Learning and Evolutionary Methods in Continuous Control.
Shangtong Zhang, Osmar R. Zaiane.
Deep RL Symposium at NIPS, 2017. -
A Demon Control Architecture with Off-Policy Learning and Flexible Behavior Policy.
Shangtong Zhang, Richard S. Sutton.
Hierarchical RL Workshop at NIPS, 2017. -
A Deep Neural Network for Modeling Music.
Pengjing Zhang, Xiaoqing Zheng, Wenqiang Zhang, Siyan Li, Sheng Qian, Wenqi He, Shangtong Zhang, Ziyuan Wang.
International Conference on Multimedia Retrieval (ICMR), 2015.