The log-derivative trick

Loading...
查看授課大綱

審閱

4.2(404 個評分)
  • 5 stars
    57.17%
  • 4 stars
    23.76%
  • 3 stars
    8.91%
  • 2 stars
    4.45%
  • 1 star
    5.69%
FZ
2019年2月13日

A great course with very practical assignments to help you learn how to implement RL algorithms. But it also has some stupid quiz questions which makes you feel confusing.

LJ
2019年10月6日

Challenging (unlike many other courses on Coursera, it does not baby you and does not seem to be targeting as high a pass rate as possible), but very very rewarding.

從本節課中
Policy-based methods
We spent 3 previous modules working on the value-based methods: learning state values, action values and whatnot. Now's the time to see an alternative approach that doesn't require you to predict all future rewards to learn something.

教學方

  • Placeholder

    Pavel Shvechikov

    Researcher at HSE and Sberbank AI Lab
  • Placeholder

    Alexander Panin

    Lecturer

探索我們的目錄

免費加入並獲得個性化推薦、更新和優惠。