Imagine a student preparing for a history exam. One managed to read only the introduction to the textbook – they know too little to answer confidently. Another crammed all the exam tickets by heart, word for word, but if the teacher changes the wording of a question just a little, the student is lost: they didn't master the material, they simply memorized it. Both will fail, but for different reasons.
The same thing happens with machine learning models. Too few examples, and the model fails to capture the patterns. Too many identical or incorrectly selected data, and it starts cramming instead of learning. In both cases, the result will be unsatisfactory. The developer's task is to find a balance between these extremes. And this, as it turns out, is one of the most non-trivial tasks in AI work.
What is Underfitting in AI Models
When Examples Are Lacking
Let's start with the first extreme. Suppose we want to teach a model to distinguish cats from dogs in photos. If we show it just ten pictures – five of each kind – it will most likely not form an objective representation of what these animals look like. Ten examples is too narrow a sample. Among them, there might be three ginger cats and two white dogs, and the model will decide that ginger color is a sign of a cat, and white is a dog. This is not knowledge, but a random correlation that works on those ten pictures but will prove useless on the eleventh.
This state is called underfitting. The model failed to grasp the real data structure because there wasn't enough data, or it was too homogeneous, incomplete, and did not reflect the full diversity of the world.
A good analogy is a child who was shown one single bird and told, «This is a bird.» Now they are convinced that all birds are gray pigeons. They will recognize a sparrow, but a parrot or a pelican will baffle them.
An underfitted model behaves similarly: it makes overgeneralized assumptions because it lacks sufficient material to identify subtle differences. Its answers are often either too generic or systematically incorrect – it misses not by chance, but predictably, in the same direction.
This is noticeable even on simple tasks. If you ask such a model to classify movie reviews as positive or negative, it might learn to react only to the most obvious words – «excellent» or «terrible» – completely ignoring context. The sarcastic phrase, «Well, what a masterpiece, I must say», will remain misunderstood by it.
Underfitting is not a catastrophe, but a signal: the model needs more quality and diverse data.
What is Overfitting in Machine Learning
When There Are Too Many Examples or They Are Too Homogeneous
Now let's consider the opposite situation. We decided to fix the situation and loaded a huge array of data into the model. But we did it recklessly: we didn't check the quality or care about diversity, we just increased the volume. Or, even worse, we forced the model to train on the same examples over and over again until it memorized them in the smallest details.
This is where overfitting begins.
Let's go back to the student with the tickets. If they learned every word and every comma, they will likely answer the questions on the tickets themselves perfectly. But on the real exam, where the wording will be different, they will be helpless because they did not understand the material, they only reproduced it. Essentially, this is an extreme case of over-specialization: the system works flawlessly under strictly defined conditions and is completely lost outside of them – we discussed a similar logic in the article Narrow AI, General AI, and Illusions of the Future when talking about narrow models.
An overfitted model works by the same logic. On training data, it shows brilliant results: almost zero error and high accuracy. But if you give it new inputs from the real world that it hasn't seen before, the performance quality drops sharply. The model has become too «attached» to the training examples: it learned specific cases, not a pattern.
A good illustration: imagine a detective who investigated crimes only in a small provincial town. They know every local criminal by face, remember their habits, and can accurately describe exactly how a certain Vanya Ivanov hides stolen goods. But in a big city, such a detective will be lost. Their experience is too specific: they know how to work with familiar cases, but struggle to generalize information.
Overfitting is particularly dangerous because it is easy to miss during the development stage. As long as developers test the model on the same data it was trained on, everything looks wonderful. The problem is discovered later – when the model goes out into the real world and encounters examples it has never met before.
This is a well-known trap: impressive results during training and disappointing ones in practice. That is why separate data sets are always used for training and for checking. But more on that later.
Balancing Data Diversity and Model Training
In Search of the Golden Mean
So, we have two extremes. When data is scarce, the model doesn't understand what is wanted from it. When there is an excess of monotonous data, it «tries too hard» and stops seeing the forest for the trees. Where is the balance point?
There is no universal answer. This is not a formula, but a process of constant tuning that involves several tools simultaneously.
The first is data diversity. The model must see enough different examples to form stable and flexible representations. These are cats of different colors, sizes, poses, and in different lighting; reviews of different lengths, tones, and styles. Rare situations must also be in the sample, otherwise, the model will be helpless when encountering them in reality.
The second tool is data splitting. Part of the examples is used only for training, the other exclusively for validation. The model does not see the second part during training, so such a test objectively shows how well it generalizes information, rather than just memorizing it. It's like a test in school: if the student really understood the topic, they will handle new tasks; if they only crammed, they won't.
The third tool is regular error monitoring. If the error on training data decreases, but on validation data it grows, this is a clear sign of overfitting. The model has started to memorize, not understand. At this point, you need to stop and correct the process.
All this resembles the work of a good coach. They won't run an athlete along the same route until exhaustion – that way the athlete will get used to a specific track but will not be ready for competitions on another one. The coach alternates loads, changes conditions, and adds new exercises. They periodically check not only how well familiar things are going, but also how confidently the athlete handles unknown challenges.
This flexibility is the goal: not to memorize, but to understand.
Summary of Model Training Challenges
The Bottom Line
Overfitting and underfitting are not exotic technical problems, but fundamental traps that any learning system faces. Be it a person preparing for an exam or a neural network processing millions of images.
The key idea is simple: the quality of training is determined not only by the amount of data but also by how well it reflects the real diversity of the world. Ten thousand almost identical examples are worse than a thousand different ones. And a thousand different ones are better than ten random ones.
A good model is not one that handles familiar tasks perfectly, but one that confidently works with new ones. The difference between these approaches is the difference between memorization and understanding.
In the following materials, we will talk about how developers determine the state of the model and what techniques help maintain balance throughout the entire training process.