Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
Cassidy Laidlaw, Shivam Singhal, and Anca Dragan. ArXiv Preprint 2024.
The Effective Horizon Explains Deep RL Performance in Stochastic Environments
Cassidy Laidlaw, Banghua Zhu, Stuart Russell, and Anca Dragan. ICLR 2024.
Spotlight (given to ~16% of accepted papers)
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
Anand Siththaranjan*, Cassidy Laidlaw*, and Dylan Hadfield-Menell. ICLR 2024.
Best paper honorable mention at the 2023 NeurIPS Workshop on Instruction Tuning and Instruction Following
Bridging RL Theory and Practice with the Effective Horizon
Cassidy Laidlaw, Stuart Russell, and Anca Dragan. NeurIPS 2023.
Oral (given to ~2% of accepted papers)
Best paper award at the 2023 ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems
The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models
Cassidy Laidlaw and Anca Dragan. ICLR 2022.
Uncertain Decisions Facilitate Better Preference Learning
Cassidy Laidlaw and Stuart Russell. NeurIPS 2021.
Spotlight (given to ~12% of accepted papers)
Perceptual Adversarial Robustness: Defense Against Unseen Threat Models
Cassidy Laidlaw, Sahil Singla, and Soheil Feizi. ICLR 2021.
Functional Adversarial Attacks
Cassidy Laidlaw and Soheil Feizi. NeurIPS 2019.
Capture, Learning, and Synthesis of 3D Speaking Styles
Daniel Cudeiro*, Timo Bolkart*, Cassidy Laidlaw, Anurag Ranjan, and Michael Black. CVPR 2019.
Playing it Safe: Adversarial Robustness with an Abstain Option
Cassidy Laidlaw and Soheil Feizi. arXiv Preprint 2019.