One of the characteristics that a good model should have is to be accurate. In this article, we will discuss what is meant by accuracy and how we define what is meant by an accurate prediction.
Let’s first review what we mean when we want our models to be accurate. Accurate models are defined as those which render correct predictions. However, since all models will not be 100% perfect, i.e., their predictions will not be correct all the time or they will result in some error, we will have to accept some degree of incorrectness. Hence the different measures of accuracy.
Most, but not all of the models used in predictive analytics fall into the two categories: classification models and regression models. Classification models are those predicting the outcome as a discrete value. In most cases, classification models are required to predict a binary outcome (such as 1/0, or Y/N). Regression models on the other hand, predict a continuous value. It is important that we don’t confuse the name regression with regression methods such as linear logistic regression. The term regression here is used to denote that we are trying to predict a continuous value.
The distinction between classification and regression models is important because the way we define a correct prediction is different for each of these two types of models. In the case of binary classification model, a correct prediction means that the model predicts the same value as the actual value of the dependent variable. An example of a classification problem is when we are want to predict a binary outcome representing the default status of a credit card account (Good/Bad), where Bad means that the account has been in arrears for more than 90 days, and Good means that it is not. In this case, a correct prediction of Bad means that the model predicted the status of the account in question as Bad and in reality it was also Bad.
The case of predicting a continuous value in regression models is not as precise. For example, let’s consider the case of predicting the total expected annual revenue that will be generated from each customer in a business database. If for a specific customer the actual value was $3,000 per year and the model predicted a value of $2928, would we then consider this prediction to be accurate? How about when the predicted value is $2993.30? In other words, how close should the prediction be to the actual value in order to qualify as being correct? There is no definitive answer to this question. However, there is a practical one: using our understanding of the business problem, we specify a hit range such that when the predicted value falls within this range we consider it to be correct prediction. For example, when the predicted value is within 5% of the actual value (on either side), we consider that prediction to be correct, otherwise it is not.
Now that we established a method of determining when a prediction is correct, we can define accuracy measures of models as a function of the count of the correctly predicted records. This function may not be a simple count or a ratio, but as we shall see in an upcoming blog, we can devise complex functions to establish the model accuracy in different ways.