Arash Ahmadian on Rethinking RLHF
Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.
Featured Reference
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Featured Reference
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker
Additional References
- Self-Rewarding Language Models, Yuan et al 2024
- Reinforcement Learning: An Introduction, Sutton and Barto 1992
- Learning from Delayed Rewards, Chris Watkins 1989
- Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992
Creators and Guests
![Robin Ranjit Singh Chauhan](https://img.transistor.fm/qrlD8dfZtBRez_zCzQ6Ah8j57V13cwEjT7sHiXJGHDI/rs:fill:400:400:1/q:60/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.webp)
Host
Robin Ranjit Singh Chauhan
🌱 Head of Eng @AgFunder 🧠 AI:Reinforcement Learning/ML/DL/NLP🎙️Host @TalkRLPodcast 💳 ex-@Microsoft ecomm PgmMgr 🤖 @UWaterloo CompEng 🇨🇦 🇮🇳
![Arash Ahmadian on Rethinking RLHF](https://img.transistor.fm/IUwGz7BU8qkkkR9Rz-bKwAClY1dz6VAWieSkjnQxOM8/rs:fill:800:800:1/q:60/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzE4MDY5NzMv/MTcxMTIyNjA1Mi1h/cnR3b3JrLmpwZw.webp)