Today I am going to cover an important aspect of any machine learning model. When we build a machine learning model we always want the model to be fit, we do not want overfitting or underfitting. I’ll present a very simple example to show the implications of overfitting and underfitting.
Let’s take a very simple example. Assume our model would predict whether a fruit is apple or not. The model will extract some features and based on that it will decide whether a fruit is apple or something else. Let’s assume following would be an example of fit, underfit, and overfit models for our example:
Fit: Anything that looks almost round but has two conical concave shapes on top and bottom and a shiny surface is an apple.
Underfit: Anything that looks round but has two conical concave shapes on top and bottom is an apple.
Overfit: Anything that looks almost round but has two conical concave shapes on top and bottom and a shiny surface and weigh around 400-500 grams, and has a stalk on the top is an apple.
Here, we can see that for the first one, the fit model is deciding on apple vs non-apple based on three features: 1. approximately round shape, 2. presence of conical concave shapes on top and bottom, 3. a shiny surface.
On the other hand, the underfit model failed to utilize some of the important features and it’s deciding on only two features: 1. approximately round shape, 2. presence of conical concave shapes on top and bottom. It does not consider presence of shiny surface as a feature. The problem of this model would be is that some other fruits which may have first two features might also be classified as an apple. For example, a peach has first two features but it does not have a shiny surface. But, an underfit model may classify a peach as an apple as it is deciding on first two features only. So, the model might have many false positives. Now, let’s consider the overfit model.
The overfit model is a powerful one and let’s assume it has extracted many features from the data, and in addition to 3 features of the fit model it is also considering weight and presence of stalks to decide on apple vs non-apple. Well, at a first glance it might seem that nothing is wrong with this model as it’s rather a powerful model which could extract more features. But, there is a serious implication of this model. If it is using these extra features to decide on apple vs non-apple then what would happen if we have some slightly different apples when we are testing our model? Let’s say when we are testing we picked an apple which has all first 3 features but too large, over 500 grams. Then the model will predict it’s not an apple! Or, if weight is between 400-500 grams but there is no stalk on the apple (bottom right apple in our image) again the model will predict that it’s not an apple. Thus, this model will have lots of false negatives.
The above example shows why it’s important to build a fit ML model. These days, with presence of all the powerful ML algorithms, overfitting problem is more common than the underfitting problem. Thus, we need to be careful with such models and we can use some techniques such as regularization to control the overfitting.