In this video, I will introduce you to log likelihoods. These are just logarithms of the probabilities we're calculating from the last video. They are way more convenient to work with and they appeared throughout deep learning and NLP. Let's go back to the table you saw previously that contains the conditional probabilities of each word. For positive or negative sentiment. Words can have many shades of emotional meaning,but for the purpose of sentiment classification, they're simplified into three categories; neutral, positive, and negative. All can be identified by using their conditional probabilities. These categories can be numerically estimated just by dividing the corresponding conditional probabilities of this table. Now, let's see how this ratio looks for the words in your vocabulary. The ratio for the word I is 0.2 divided by 0.2 or 1. The ratio for the word am is again 1. The ratio for the word happy is 0.14 divided by 0.1 or 1.4. Do the same for because, learning, and NLP. The ratio is 1. For sad and not the ratio is 0.1 divided by 0.15 or 0.6. Again, neutral words have a ratio in your one. Positive words have a ratio larger than 1. The larger the ratio, the more positive the words are going to be. On the other hand, negative words have a ratio smaller than 1. The smaller the value, the more negative the word. In this week's assignment, you'll implement a function that filters words depending on their positivity or negativity. You will find the expression shown here to be very helpful with that. These ratios are essential in Naive Bayes' for binary classification. I'll illustrate why using an example you've seen before. Recall earlier, where you use the formula to categorize a tweet as positive if the products of the corresponding ratios of every word appears in the tweet is bigger than 1. We said it was negative, if it was less than 1. This is called the likelihood. If you were to take a ratio between the positive and negative tweets, you'd have what's called the prior ratio. I haven't mentioned it till now because in this small example, you had exactly the same number of positive and negative tweets, making the ratio 1. In this week's assignments, you'll have a balanced data-sets. You'll be working with a ratio of 1. In the future though, when you're building your own application, remember that this term becomes important for unbalanced data-sets. With the addition of the prior ratio, you now have the full Naive Bayes' formula for binary classification. A simple, fast and powerful method that you can use to establish a baseline quickly. Now, it's a good time to mention some other important considerations for your implementation of Naive Bayes'. Sentiments probability calculation requires multiplication of many numbers with values between 0 and 1. Carrying out such multiplications on a computer runs the risk of numerical underflow when the number returned is so small if can't be stored on your device. Luckily, there is a mathematical trick to solve this. It involves using a property of logarithms. Recall that the formula you're using to calculate a score for Naive Bayes' is the prior multiplied by the likelihood. The trick is to use a log of the score instead of the raw score. This allows you to write the previous expression as the sum of the log prior and the log likelihood, which is a sum of the logarithms of the conditional probability ratio of all unique words in your corpus. Let's use this method to classify the tweets: I'm happy because I'm learning. Remember how you used the Naive Bayes' inference condition earlier to get the sentiment score for your tweets. Now, you're going to do something very similar to get the log of your score. What you'll need to calculate the log of the score is called the Lambda. This is the log of the ratio of the probability, that your word is positive and you divide that by the probability that the word is negative. Now, let's calculate Lambda for every word in our vocabulary. For the word I, you get the logarithm of 0.05 divided by 0.05, or the logarithm of 1, which is equal to 0. Remember, the tweets will be labeled positive if the product is larger than 1. By this logic, I would be classified as neutral at 0. For am, you take the log of 0.04 over 0.04, which again is equal to 0. You enter 0 in the table. For happy, you get a Lambda of 2.2, which is greater than 0, indicating a positive sentiment. From here on out, you can calculate the log score of the entire corpus just by summing out the Lambdas. You're almost done with the log-likelihood. Let's stop here and take a quick look back at what you did so far. Words are often emotionally ambiguous, but you can simplify them into three categories and then measure exactly where they fall within those categories for binary classification. You do so by dividing the conditional probabilities of the words in each category. This ratio can be expressed as a logarithm as well called Lambda, and you can use that to reduce the risk of numerical underflow. In this video, you learned about the ratio of positive words and negative words. The higher the ratio is, the more positive the words are going to be. As the number of words we are using gets larger and larger, then we are very likely to get a product that is very close to 0. Hence, we end up taking the log of that ratio.