Rohin Shah

Cited by

	All	Since 2019
Citations	939	889
h-index	15	15
i10-index	17	17

380

190

285

20152016201720182019202020212022202320246 16 16 11 14 61 116 191 361 145

Public access

View all

4 articles

0 articles

available

not available

Based on funding mandates

Rohin Shah

Research Scientist, Google DeepMind

Verified email at deepmind.com - Homepage

AI alignment


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
On the utility of learning about humans for human-AI coordination M Carroll, R Shah, MK Ho, T Griffiths, S Seshia, P Abbeel, A Dragan Advances in Neural Information Processing Systems, 5174-5185, 2019	325	2019
Chlorophyll: Synthesis-aided compiler for low-power spatial architectures PM Phothilimthana, T Jelvis, R Shah, N Totla, S Chasins, R Bodik ACM SIGPLAN Notices 49 (6), 396-407, 2014	84	2014
Preferences Implicit in the State of the World R Shah, D Krasheninnikov, J Alexander, P Abbeel, A Dragan arXiv preprint arXiv:1902.04198, 2019	78*	2019
On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference R Shah, N Gundotra, P Abbeel, A Dragan International Conference on Machine Learning, 5670-5679, 2019	64	2019
Optimal Policies Tend to Seek Power AM Turner, L Smith, R Shah, A Critch, P Tadepalli arXiv preprint arXiv:1912.01683, 2019	61*	2019
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals R Shah, V Varma, R Kumar, M Phuong, V Krakovna, J Uesato, Z Kenton arXiv preprint arXiv:2210.01790, 2022	45	2022
The MAGICAL Benchmark for Robust Imitation S Toyer, R Shah, A Critch, S Russell Advances in Neural Information Processing Systems 33, 2020	43	2020
An Empirical Investigation of Representation Learning for Imitation X Chen, S Toyer, C Wild, S Emmons, I Fischer, KH Lee, N Alex, SH Wang, ... Thirty-fifth Conference on Neural Information Processing Systems Datasets …, 2021	29	2021
Active Inverse Reward Design S Mindermann, R Shah, A Gleave, D Hadfield-Menell arXiv preprint arXiv:1809.03060, 2018	28	2018
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla T Lieberum, M Rahtz, J Kramár, G Irving, R Shah, V Mikulik arXiv preprint arXiv:2307.09458, 2023	26	2023
The MineRL BASALT Competition on Learning from Human Feedback R Shah, C Wild, SH Wang, N Alex, B Houghton, W Guss, S Mohanty, ... arXiv preprint arXiv:2107.01969, 2021	26	2021
Evaluating the Robustness of Collaborative Agents P Knott, M Carroll, S Devlin, K Ciosek, K Hofmann, AD Dragan, R Shah arXiv preprint arXiv:2101.05507, 2021	26	2021
Benefits of Assistance over Reward Learning R Shah, P Freire, N Alex, R Freedman, D Krasheninnikov, L Chan, ...	22	2020
Explaining grokking through circuit efficiency V Varma, R Shah, Z Kenton, J Kramár, R Kumar arXiv preprint arXiv:2309.02390, 2023	16	2023
Choice Set Misspecification in Reward Inference R Freedman, R Shah, A Dragan CEUR Workshop Proceedings, 2020	16	2020
SIRL: Similarity-based Implicit Representation Learning A Bobu, Y Liu, R Shah, DS Brown, AD Dragan Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot …, 2023	13	2023
Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition S Milani, A Kanervisto, K Ramanauskas, S Schulhoff, B Houghton, ... arXiv preprint arXiv:2303.13512, 2023	10	2023
Retrospective on the 2021 MineRL BASALT Competition on Learning from Human Feedback R Shah, SH Wang, C Wild, S Milani, A Kanervisto, VG Goecks, ... NeurIPS 2021 Competitions and Demonstrations Track, 259-272, 2022	8*	2022
Challenges with unsupervised LLM knowledge discovery S Farquhar, V Varma, Z Kenton, J Gasteiger, V Mikulik, R Shah arXiv preprint arXiv:2312.10029, 2023	4	2023
AtP*: An efficient and scalable method for localizing LLM behaviour to components J Kramár, T Lieberum, R Shah, N Nanda arXiv preprint arXiv:2403.00745, 2024	3	2024

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by