### 您將學到的內容有

Build a Reinforcement Learning system for sequential decision making.

Understand the space of RL algorithms (Temporal- Difference learning, Monte Carlo, Sarsa, Q-learning, Policy Gradients, Dyna, and more).

Understand how to formalize your task as a Reinforcement Learning problem, and how to begin implementing a solution.

Understand how RL fits under the broader umbrella of machine learning, and how it complements deep learning, supervised and unsupervised learning

### 您將獲得的技能

## 關於此 專項課程

## 應用的學習項目

Through programming assignments and quizzes, students will:

Build a Reinforcement Learning system that knows how to make automated decisions.

Understand how RL relates to and fits under the broader umbrella of machine learning, deep learning, supervised and unsupervised learning.

Understand the space of RL algorithms (Temporal- Difference learning, Monte Carlo, Sarsa, Q-learning, Policy Gradient, Dyna, and more).

Understand how to formalize your task as a RL problem, and how to begin implementing a solution.

Probabilities & Expectations, basic linear algebra, basic calculus, Python 3.0 (at least 1 year), implementing algorithms from pseudocode

### 此專項課程包含 4 門課程

### Fundamentals of Reinforcement Learning

Reinforcement Learning is a subfield of Machine Learning, but is also a general purpose formalism for automated decision-making and AI. This course introduces you to statistical learning techniques where an agent explicitly takes actions and interacts with the world. Understanding the importance and challenges of learning agents that make decisions is of vital importance today, with more and more companies interested in interactive agents and intelligent decision-making.

### Sample-based Learning Methods

In this course, you will learn about several algorithms that can learn near optimal policies based on trial and error interaction with the environment---learning from the agent’s own experience. Learning from actual experience is striking because it requires no prior knowledge of the environment’s dynamics, yet can still attain optimal behavior. We will cover intuitively simple but powerful Monte Carlo methods, and temporal difference learning methods including Q-learning. We will wrap up this course investigating how we can get the best of both worlds: algorithms that can combine model-based planning (similar to dynamic programming) and temporal difference updates to radically accelerate learning.

### Prediction and Control with Function Approximation

In this course, you will learn how to solve problems with large, high-dimensional, and potentially infinite state spaces. You will see that estimating value functions can be cast as a supervised learning problem---function approximation---allowing you to build agents that carefully balance generalization and discrimination in order to maximize reward. We will begin this journey by investigating how our policy evaluation or prediction methods like Monte Carlo and TD can be extended to the function approximation setting. You will learn about feature construction techniques for RL, and representation learning via neural networks and backprop. We conclude this course with a deep-dive into policy gradient methods; a way to learn policies directly without learning a value function. In this course you will solve two continuous-state control tasks and investigate the benefits of policy gradient methods in a continuous-action environment.

### A Complete Reinforcement Learning System (Capstone)

In this final course, you will put together your knowledge from Courses 1, 2 and 3 to implement a complete RL solution to a problem. This capstone will let you see how each component---problem formulation, algorithm selection, parameter selection and representation design---fits together into a complete solution, and how to make appropriate choices when deploying RL in the real world. This project will require you to implement both the environment to stimulate your problem, and a control agent with Neural Network function approximation. In addition, you will conduct a scientific study of your learning system to develop your ability to assess the robustness of RL agents. To use RL in the real world, it is critical to (a) appropriately formalize the problem as an MDP, (b) select appropriate algorithms, (c ) identify what choices in your implementation will have large impacts on performance and (d) validate the expected behaviour of your algorithms. This capstone is valuable for anyone who is planning on using RL to solve real problems.

Great course! Lots of hands-on RL algorithms. I'm looking forward to the next course in the specialization.

Excellent final course for the specialization. Moon Lander project was informative and fun.

Well peaced and thoughtfully explained course. Highly recommended for anyone willing to set solid grounding in Reinforcement Learning. Thank you Coursera and Univ. of Alberta for the masterclass.

I understood all the necessary concepts of RL. I've been working on RL for some time now, but thanks to this course, now I have more basic knowledge about RL and can't wait to watch other courses

Concepts are bit hard, but it is nice if you undersand it well, espically the bellman and dynamic programming.\n\nSometimes, visualizing the problem is hard, so need to thoroghly get prepared.

The comments given by the auto grader is not informative of the errors causing problem, and not sensitive enough to capture problems with action selection steps based on current state.

An excellent introduction to the subject of Reinforcement Learning, accompanied by a very clear text book. The python assignments in Jupyter notebooks are both informative and helpful.

Really great resource to follow along the RL Book. IMP Suggestion: Do not skip the reading assignments, they are really helpful and following the videos and assignments becomes easy.

It is recommended that learners take between 4-6 months to complete the specialization.

Recommended that learners have at least one year of undergraduate computer science or 2-3 years of professional experience in software development. Experience and comfort with programming in Python required. Must be comfortable converting algorithms and pseudocode into Python. Basic understanding of concepts from statistics (distributions, sampling, expected values), linear algebra (vectors and matrices), and calculus (computing derivatives)

Yes, it is recommended that courses are taken sequentially.

Learners that complete the specialization will earn a Coursera specialization certificate signed by the professors of record, not a University of Alberta credit.

By the end of this specialization, you will be able to"

- Build a Reinforcement Learning system for sequential decision making.
- Understand the space of RL algorithms (Temporal- Difference learning, Monte Carlo, Sarsa, Q-learning, Policy Gradients, Dyna, and more).
- Understand how to formalize your task as a Reinforcement Learning problem, and how to begin implementing a solution.
- Understand how RL fits under the broader umbrella of machine learning, and how it complements deep learning, supervised and unsupervised learning

