So, in the last video, we talked about what the Black-Scholes model does. It finds a perfect replicating portfolio whose price is always equal to the option price no matter what the stock price goes in the future. By the law of one price the fair option price now should be equal to the price of these replicating portfolio now. So, if you know how many stocks you should buy right now after you just sold an option, you know how much you should pay for the option. So far so good, but if you just say that the option price today should be equal to the price of the replicating portfolio today, would you not solve the problem yet? Why is that? This is because we don't know how many stocks we should buy for our replicating portfolio. What the Black-Scholes-Merton model essentially does it answers exactly this question. Black-Scholes model does it by just mimicking the option by a very frequent shuffling money between your stock investment and your bank cash account. The model gives you a way to determine how many stocks you should have in your replicating portfolio in each scenario for the future. Such that the total portfolio value which is the value of the option plus the replicating portfolio will always be zero in the future no matter what happens with the stock price. It turns out that this amazing result is possible but only if your time steps are infinitesimal. For this very special setting and picking the very specific model for the stock price evolution in the future, the Black-Scholes model finds a unique option price and unique number of shares that you have to have in your replicating portfolio now. So, in a nutshell the Black-Scholes model established the fact that even though the option price can and will change in the future because it depends on the future stock price which is also unknown, a unique fair option price can be found by using the principle of one price alongside with the method of pricing by replicating for the various special choice of dynamics of the stock price. Another name for this procedure is pricing by hedging, because you're replicating portfolio is a hedge for your option. In Black-Scholes model the stock dynamics has chosen to follow the law of Geometric Brownian Motion with a drift. It turns out that this choice leads to a closed form expression for a unique option price for a European coal or put option on the stock which is given by the celebrated Black-Scholes formula. But simultaneously, the classical Black-Scholes model leads to a somewhat paradoxical conclusion that options are altogether completely redundant as they can always be perfectly replicated by a simple portfolio made of a stock in the bond or a cash account. Now, if this were indeed the case in real life that these options were totally redundant, nobody would ever trade them except for possibly very bored traders. Yet option trading is a multibillion business where people make and lose money daily. Traders use options and other financial derivatives both as investment vehicles and as hedging instruments. And this means that options are not redundant. The reason that options are not redundant is that they carry a substantial risk notwithstanding the preposition of the classical Black-Scholes model. Nobody in the market trades options at their Black-Scholes prices. And differences between Black-Scholes prices and trader prices reflect a dealer's perception of actual risk embedded in options. Financial professionals are well aware of the fact that the classical Black-Scholes model completely eliminates any risk in options for the sake of stability by making two strong assumptions that do not hold in practice. The assumptions of the classical Black-Scholes model that make it totally miss risk and options are continuously hedging and zero transaction costs. These assumptions do not hold for real markets where the hedging is always done with their finite records, for example daily. The reason you can't hedge continuously is because trading stocks involves additional transaction costs. If you rebalance your hedge replicating portfolio continuously total transaction costs would be infinite. So in reality, rebalancing is never done too frequently, not to speak about continuously balancing. This is one of the reasons why all practical users of the Black-Scholes model involve some modifications of either dynamics or hedging and pricing methods. Now, here is our plan for what we're going to do next. We will use a discrete time version of the Black-Scholes model as a simple laboratory to study financial reinforcement learning models. On the one hand, this is a well understood extension of the Black-Scholes model that brings back some realism of actual option trading by considering rehedging at discrete times as opposed to a continuous rehedging obtained in the Black-Scholes limit of infinitesimal time steps. On the other hand, keeping rehedging frequency finite which is similar to how a rehedging is done in real life will allow us to focus on the key objective of option pricing and hedging and trading which is risk minimization by hedging in a sequential decision making process. Now, there are various extensions of the classical Black-Scholes model to a discrete time setting which are well studied in literature. We will just take one such formulation and use it to set up environment for reinforcement learning where we model an option seller as a reinforcement learning agent that hedges it's risk and an option by trading in the underlying stock at discrete times. To simplify things even further, we can discretize the state space in order to map the problem onto the setting of a finite state Markov decision process. And this will let us try some simple reinforcement learning algorithms for a discrete state space such as the famous Q-learning and not worry about additional complexity due to the need for functional approximation that would be necessary within a continuous space formulation. However, by simply reversing the last step of space discretization in our scheme we could use the same framework to test continuous action and continuous space reinforcement learning problems in finance. So, in a nutshell this is what we do. We generate data by simulation of stock price history alongside with actions that is rehedges that implement risk minimization strategy and rewards from taking these actions. Then we will give this data to Q-learning algorithms and task them to find the best hedging strategy which means risk minimization strategy directly from this data, and without knowing anything about the dynamics and hedge strategy that generated these data. But because we already know the best strategy we can continuously monitor the progress of the Q-learning algorithms towards these goals. We can also randomize actions equally in the data, for example, by intentionally doing sub-optimal hedges from time to time, and again ask Q-learning to find the best hedging strategy by looking at data collected under such a sub-optimal strategy. This would be a simple prototype of how reinforcement learning could be used if it works in a real trading environment. It goes like that. Take the history of the market and own trading strategy, then give it to reinforcement learning agent and ask it to improve the strategy by keeping the same goals. On the other hand, such a problem setting is quite standard for Q-learning which is an off-policy algorithm that is able to learn an optimal policy even when the data used for trading is produced using a sub-optimal policy. All such questions of direct relevance for financial applications become answerable in such setting. Now, let's follow up with the next video to see how it works.