Published February 27, 2026

How AI Learns Language: A Deep Dive into Cases and Markers

Why AI Doesn't Understand Language Like a Human: A Lesson from Cases and Markers

Researchers trained a language model on synthetic languages and found that AI learns some grammatical patterns intuitively, while others it seems to miss entirely.

Computer Science
Author: Dr. Sophia Chen Reading Time: 9 – 13 minutes
«While I was digging into this research, one question kept nagging me: what do we even consider «understanding» grammar if a model, without any explanation, rediscovers a principle that linguists took years to formulate? Yet, at the same time, it completely misses another pattern that any native speaker feels intuitively. It's like a student who brilliantly solves one type of problem but is completely blind to the pattern in another. I wonder if this very «blind spot» will turn out to be the key to understanding the difference between statistical and genuine linguistic intelligence.» – Dr. Sophia Chen

Imagine you're watching an episode of «Black Mirror» where the main character learns a foreign language just by reading a million books in a row. He starts picking up on patterns, building sentences, and guessing what comes next. But at some point, it turns out he never understood one particular rule – the very one that seems self-evident to native speakers. This is more or less what happened in a recent linguistics experiment with language models based on GPT-2. Only instead of a TV series, it was a scientific paper. And instead of a character, it was a neural network.

Why Teach AI «Cases» Using a Fictional Language?

Let's start with some context. Linguists have long known that the world's languages share a surprising feature: even though Russian, Swahili, Japanese, and Turkish emerged independently on different continents, they regularly develop similar grammatical patterns. These are called linguistic universals – patterns that occur in languages far more often than can be explained by mere chance.

One such pattern is differential argument marking. It sounds intimidating, but it's actually about something very familiar. Think about how in Russian we say «vizhu tebya« (I see you, with a case ending) and «vizhu stol« (I see a table, where the form is the same as the nominative). Or how in Turkish, the accusative case only appears for animate or specific nouns. It's almost unnoticeable in English, but this is how it works in many languages: a special marker doesn't appear on every word, but only on «special» ones – those that break the typical pattern.

What does «break the pattern» mean? Let's use an example. Typically, in a sentence, the subject (the one performing the action) is animate: a person, a cat, a character. And the object (the one the action is directed at) is more often inanimate: a table, an apple, a stone. When an object suddenly turns out to be a living being, that's atypical. And it's precisely these atypical cases that languages tend to «flag» with a special marker, as if to say, «Hey, something unusual is happening here, pay attention!»

Researchers wondered: does artificial intelligence reproduce this logic? Not a specific language, but the very idea of marking the atypical? And if so, does it learn everything in the same way humans do?

Why Teach AI Linguistic Cases Using Fictional Languages?

The Experiment: 18 Fictional Languages and One Neural Network

To answer this question cleanly, without extra variables, the researchers took a clever approach: they didn't train the model on real languages like Russian, Turkish, or Hindi. Instead, they constructed 18 synthetic languages – completely artificial, like something from a linguistic thought experiment.

Each of these languages had its own marking logic. In some, the marker was placed on the object; in others, on the subject. Some marked the «atypical» (e.g., an animate object), while others did the opposite, marking the «typical» (an inanimate object). Word order also varied: some languages built sentences on the «Subject-Verb-Object» principle (like in Russian and English), and others on the «Subject-Object-Verb» principle (like in Japanese or Turkish).

The markers in these languages were intentionally abstract – just letters like «X» or «Y» attached to a word. No meaning, no etymology. Only position and context. It was as if someone took a grammar, stripped it of all its «human» content, and left only the structure.

A separate version of GPT-2 medium – a model with 345 million parameters, powerful enough to grasp subtle patterns – was trained on each of these 18 languages. Each version «read» a million sentences in its synthetic language over ten training epochs.

After training, the models were tested using minimal pairs, a classic linguistic technique: you take two nearly identical sentences that differ in exactly one detail (in this case, the presence or absence of a marker) and see which one the model considers more «plausible».If the model has learned the rule well, it should confidently prefer the correct version.

The Experiment: Training a Neural Network with 18 Fictional Languages

The Results: Two Findings, One of Them a Surprise

Finding #1: AI understands the logic of «marking the unusual» ✓

The first result was encouraging. The models did indeed learn the logic of the «natural direction of marking» – that is, the idea that a marker should appear precisely where something is atypical.

If a language marked an animate object (an atypical case, since objects are usually inanimate), the model confidently preferred the marked version. If a language was «inverted» – marking inanimate objects or animate subjects as typical – the models struggled more: they were worse at generalizing the rule and more often «hesitated»./p>

This aligns beautifully with what we know about real languages. In the world's languages, marking almost always «points» to something unusual, like an exclamation mark in text: putting one after every word is pointless, but using it where it really matters is the marker's function. The neural network essentially «rediscovered» this principle on its own, just by reading sentences without any explanations.

This is curious in itself: the model received no instructions like «mark the atypical».It simply saw patterns and formed an internal «feeling» that some marking systems were more «logical» than others. It's almost like a child who has never read a grammar textbook but intuitively senses that «I saw he» sounds wrong.

Finding #2: AI fails to «see» something important ✗

The second result, however, was unexpected – and it's what makes this study truly interesting.

In human languages, differential marking systems are far more likely to mark objects than subjects. This is a robust tendency observed by linguists worldwide: if a language decides to «flag» something, it will almost always be the object. Subjects are marked significantly less often.

Why is this the case in human languages? One popular hypothesis is that the subject is usually the agent, the «doer», and understanding who is doing what is critical to the sentence's meaning. Therefore, it's the object that more often needs clarification – it's the «unexpected guest» in the sentence that's worth pointing out explicitly.

Well, GPT-2 did not reproduce this tendency. The models performed equally well (or equally poorly) at learning both object-marking and subject-marking systems, provided that both followed the logic of «mark the atypical».No statistically significant preference for objects was found.

To use an analogy: imagine you're teaching a child traffic rules using only pictures, with no explanations. They'll quickly grasp that a red light means «stop» and a green light means «go», because the contrast is obvious. But they might not understand why the crosswalk is drawn specifically before the intersection, not after – that requires understanding a deeper logic: who yields to whom and why. GPT-2 learned that «red means stop», but it didn't grasp why the crosswalk is located specifically there.

The Results: AI's Unexpected Findings in Language Learning

Why This is Important: What the Gap Tells Us

The researchers call this phenomenon dissociation – when two things that are «linked» for humans turn out to be independent for the model. And this isn't just a technical detail. It hints at something fundamental.

The preference for «marking the atypical» is essentially a principle of informativeness. Mark what is unexpected; what is expected needs no explanation. This principle works directly with probabilities and frequencies – which is exactly what a language model works with. It's trained to predict the next word, and atypical combinations are, by definition, harder to predict. So it makes sense that the model «notices» them and starts treating them as special cases.

But the preference for the object is a different story. It appears to be linked not just to frequencies but to deeper communicative and possibly cognitive reasons: how people structure attention in discourse, who they consider the default «doer», how they assign roles in an event. None of this is present in synthetic sentences of the «Subject-Verb-Object» format with abstract markers.

This doesn't mean the model is «bad».It means that different linguistic universals have different origins. Some arise from statistics and frequencies, and AI learns these naturally. Others arise from cognitive and social mechanisms that have been shaped in humans over millennia, and texts alone are not enough to learn them.

It's a bit like how a neural network can learn to paint «in the style of Malevich» from examples of his work – it will have squares, colors, and minimalism. But it doesn't «understand» why the Black Square was a statement in 1915, because it lacks the context: what came before, what came after, and what it meant to the people of that time.

Why This is Important: Gaps in AI Language Understanding

Word Order Is Irrelevant Here

One nice control result is worth mentioning separately: word order did not affect how the models learned the marking. Both «Subject-Verb-Object» languages and «Subject-Object-Verb» languages yielded roughly the same results.

This suggests that the effects observed by the researchers are truly related to semantics – to the animacy and typicality of the arguments – and not to the order in which the words appear in the sentence. The surface structure of the sentence is secondary here; what's important is the word's semantic role.

Word Order is Irrelevant in AI Language Marking Learning

What This Means for Understanding AI and Language

This study is part of a large and important discussion that has intensified, especially since 2020, when powerful language models became widely available. The question is roughly this: do language models understand language, or do they merely imitate it?

The answer offered by the authors of this paper is more nuanced: models learn what can be learned from the distribution of words and contexts. Principles that are directly «encoded» in the language's statistics – like «mark the atypical» – they grasp quite well. Principles that require an understanding of social context, communicative roles, and language evolution – like «objects are marked more often than subjects» – are harder for them or not learned at all.

Interestingly, the preference for marking objects over subjects in real languages didn't appear overnight. It's the result of millennia of linguistic evolution, communicative pressures, and social conventions. This history is absent from a synthetic corpus of a million sentences. It's no wonder the model didn't learn it.

On the other hand, the fact that the model reproduced any typological universals at all – without special training, without instructions, just from text – is remarkable in itself. It means that some of what we consider «human» properties of language actually stem from very general statistical principles, not just from biology or culture.

What AI's Language Learning Means for Understanding AI

What's Next?

The authors point to several directions for future work. First, to make the synthetic languages more complex: add discourse context, multiple participants, and chains of events. Perhaps in such conditions, the model will begin to «see» the difference between a subject-agent and an object-patient the way a human does.

The second direction is to study how the picture changes when training on real languages: Turkish, Finnish, Hindi, where differential marking is well-documented. Synthetic data provides control but loses richness. Real languages provide richness but lose the purity of the experiment. The answer probably lies somewhere between these two approaches.

And third, to test whether the lack of preference for the object is specific to the GPT-2 architecture or if it's a more general property of transformer models. Perhaps larger models or models with different architectures will perform differently.

For now, the conclusion is this: language models are not just «parrots» that repeat text. Nor are they humans who understand language from the inside. They are somewhere in between – and it is this «somewhere in between» that deserves our close attention. Because it is in this space that the most interesting questions about what language, understanding, and intelligence truly are, are hidden.

Original Title: Differences in Typological Alignment in Language Models' Treatment of Differential Argument Marking
Article Publication Date: Feb 19, 2026
Original Article Authors : Iskar Deng, Nathalia Xu, Shane Steinert-Threlkeld
Previous Article Higher-Order Symmetries: How Mathematics Helps Physics Describe the New Next Article Shepherd, Leader, or Diplomat: How a Robot Learns to Manage a Living Crowd

From Research to Understanding

How This Text Was Created

This material is based on a real scientific study, not generated “from scratch.” At the beginning, neural networks analyze the original publication: its goals, methods, and conclusions. Then the author creates a coherent text that preserves the scientific meaning but translates it from academic format into clear, readable exposition – without formulas, yet without loss of accuracy.

Cultural perspective

87%

Accessible for everyone

85%

Pop-culture references

89%

Neural Networks Involved in the Process

We show which models were used at each stage – from research analysis to editorial review and illustration creation. Each neural network performs a specific role: some handle the source material, others work on phrasing and structure, and others focus on the visual representation. This ensures transparency of the process and trust in the results.

1.
Gemini 2.5 Flash Google DeepMind Research Summarization Highlighting key ideas and results

1. Research Summarization

Highlighting key ideas and results

Gemini 2.5 Flash Google DeepMind
2.
Claude Sonnet 4.6 Anthropic Creating Text from Summary Transforming the summary into a coherent explanation

2. Creating Text from Summary

Transforming the summary into a coherent explanation

Claude Sonnet 4.6 Anthropic
3.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

3. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
4.
Gemini 2.5 Flash Google DeepMind Editorial Review Correcting errors and clarifying conclusions

4. Editorial Review

Correcting errors and clarifying conclusions

Gemini 2.5 Flash Google DeepMind
5.
DeepSeek-V3.2 DeepSeek Preparing Description for Illustration Generating a textual prompt for the visual model

5. Preparing Description for Illustration

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
6.
FLUX.2 Pro Black Forest Labs Creating Illustration Generating an image based on the prepared prompt

6. Creating Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Related Publications

You May Also Like

Enter the Laboratory

Research does not end with a single experiment. Below are publications that develop similar methods, questions, or concepts.

Don’t miss a single experiment!

Subscribe to our Telegram channel —
we regularly post announcements of new books, articles, and interviews.

Subscribe