Imagine a Brazilian samba school before Carnival. Hundreds of people rehearse for hours: dancers, drummers, costume makers. But when the parade begins, the audience only sees the glitter of feathers and the rhythm of the bateria. No one thinks about those who sewed, painted, and rehearsed all night long. Modern artificial intelligence is built in much the same way – it dazzles with its answers and predictions, while behind the scenes remains an army of people who actually trained the machine.
These people are called data annotation specialists, or annotators. They are the ones who review thousands of photos, texts, and audio recordings, labeling them: “this is a cat,” “this is a malicious comment,” “this is news about violence.” Without their work, a neural network is just an empty box. And here's what's interesting: even though all of AI literally stands on the shoulders of these people, their work is traditionally seen as mechanical, cheap, and interchangeable. It's as if they replaced live musicians with a metronome and said, “There, that's good enough.”
But a group of researchers decided to ask: what if we did it differently? What if we treated annotators not as cogs in a machine, but as experts whose opinions shape the very reality of the data?
Why Data Isn't Just Numbers 🧩
Before we talk about the experiment, let's debunk a common myth. Many people think that data is something objective. Like, facts are facts. A cat is a cat. Violence is violence.
But dig a little deeper, and everything becomes much more complicated. Take a simple example: a news article describes an assault on a woman. One annotator reads it as a crime report. Another, as a story of systemic inequality. A third, as an example of media silencing. Who is right? All three of them. Because data doesn't exist in a vacuum – it is always embedded in a cultural, social, and historical context.
This isn't a philosophical abstraction. It's a practical problem. If an AI system is trained on data annotated by people from the same cultural background, it learns to see the world through only that single lens. And then such a system starts making decisions that seem “neutral” but actually carry specific biases – they're just invisible to those who built them in.
Feminist data theory has long pointed to this problem. Researchers like Catherine D'Ignazio and Lauren Klein, in their book “Data Feminism” (2020), detail how data is shaped by power relations and propose alternative principles: acknowledging context, embracing pluralism, and practicing an ethic of care. It sounds nice. But what it looks like in practice – that's the real question.
The Experiment: Journalists, Activists, and Gender-Based Violence in the News
This is precisely the question – “what does this look like in practice” – that the group of researchers we're discussing set out to answer. They organized a series of workshops, not with random freelancers from a gig platform, but with journalists and activists who professionally work on the topic of gender-based violence in the media.
The task was specific: to create a labeled dataset about how news materials cover gender-based violence. This is an important resource – such data is needed to train systems that automatically analyze media texts, identify patterns in reporting, and help journalists and human rights defenders work with large arrays of materials.
But the format of the workshops was fundamentally different from standard data annotation projects. There were no instructions like “press button A if you see X.” Instead, there was dialogue, iteration, and collective decision-making.
Multilingualism as a Principle, Not a Technical Task
The workshops were designed to be multilingual from the outset. Participants worked with texts in different languages, and this was no accident. Gender-based violence is a global problem, but its media coverage varies dramatically depending on the country, language, and cultural norms. What is considered a neutral description in one context may be a gross violation of reporting ethics in another.
The multilingual format allowed participants to share these very cultural and linguistic observations. An annotator from one country would notice a nuance invisible to a colleague from another. And this wasn't a hindrance to the process – it was its value. It's like a good jazz ensemble: each instrument hears the melody a little differently, and it's from these differences that something rich is born.
Iteration: When You Can Change Your Mind
The workshops were held in several rounds. After each cycle, participants returned to their previous decisions, reconsidered them, and refined the categories. This is a stark contrast to the standard model, where an annotator gets a task, completes it, and never sees the result again.
An iterative approach allows for the formation of collective knowledge. In the first round, someone proposes a definition; in the second, another participant challenges it; in the third, they find a more precise formulation together. It's like a football team reviewing a match: not to punish for mistakes, but to understand together what worked, what didn't, and how to play the next match better.
Facilitation Instead of Dictation
A key element of the workshops was the role of the facilitator. Not an expert who knows the right answer and passes it on to others. But a moderator who creates the conditions for the participants to find the right answers themselves.
This is a subtle but important distinction. When a person feels they are being asked, not taught, they speak differently. They risk voicing an unconventional opinion. They share context that they might otherwise have considered insignificant. And it is this very context that can be decisive for the quality of the data.
Two Discoveries That Complicate the Picture 🔍
The researchers honestly admit: beautiful principles and real-world practice are not the same thing. During the workshops, two tensions emerged that required rethinking.
Discovery One: Context Needs to Be Limited
Feminist approaches to data insist on prioritizing context. Consider everything: history, culture, politics, personal experience. This is correct in theory. But in practice, context is infinite. If every annotator unpacks the full context of every text, the process would never end, and the data would be incomparable.
The workshop participants independently came to the conclusion that they needed to agree on boundaries. Not to abandon context, but to purposefully choose which specific context would be considered within the scope of this project. For example, to focus on how specific cultural and regional norms influence the reporting of gender-based violence – and to temporarily set aside other relevant dimensions.
This sounds like a compromise. And it is. But the researchers call it a conscious limitation of context – and this is fundamentally different from ignoring it. The difference between “we didn't know” and “we agreed” is huge, especially when it comes to responsibility for the data.
Discovery Two: Tactical Consensus as a Working Tool
The principle of pluralism says: there is no single correct interpretation. Different people see things differently, and this must be respected. Wonderful. But to create an annotation system that other researchers and developers can use, you need some kind of common language. Otherwise, each label will mean something different to each participant, and the dataset will turn into a Tower of Babel.
The solution found by the participants was what the researchers called tactical consensus. The essence is this: the parties agree on a working definition or category – not because everyone considers it the one and only truth, but because it works well enough for the project's goals. At the same time, individual interpretations and doubts are not erased – they are documented, remain part of the process, and can be taken into account in the future.
It's like agreeing on the rules of the game before a football match. No one is saying that these exact rules are the absolute truth for all time. But today, on this field, this is how we play. And after the match, we can discuss again what should be changed.
Money and Relationships: The Most Awkward Conversation
One of the most interesting and honest sections of the study is about money. Feminist ethics insist: labor must be recognized and fairly compensated. The workshop participants are professionals; they are spending their time and expertise. Not paying them would be unethical.
But a subtle contradiction arises here. As soon as monetary compensation is involved, relationships risk becoming transactional: “I did the work, got paid, we parted ways.” But the researchers wanted to create something different: a space for dialogue, solidarity, and a common goal. A community, not a market.
The solution they arrived at was this: pay – yes, openly and fairly. But at the same time, consistently emphasize in communication that the value of participation is not reducible to its monetary equivalent. Discussions about compensation were held openly and transparently, without being hushed up. This in itself built trust.
It sounds simple, but this is a rarity in academia. Usually, the financial terms of participation in a study are set out in documents and are not discussed further. Here, money became part of a larger conversation about the value of labor and respect for people.
Hierarchies of Knowledge: Who's the Expert Here? 🎓
Another tension that the researchers candidly describe is that even with the best intentions to dismantle knowledge hierarchies, they still exist. Researchers come with prepared questions, frameworks, and categories. They define what is being studied and how. This is a power that is difficult to fully transfer to the participants without turning the project into something else entirely.
The solution that was found was not to pretend that the hierarchy doesn't exist, but to create a mechanism for constant reflection. The researchers regularly asked themselves and the participants: Where is the power here? Who is making the decisions? Whose voices are quieter? This doesn't eliminate the problem, but it makes it visible – and a visible problem can be worked on.
Compare this to how standard data annotation is usually organized. There, an annotator simply performs a task according to instructions. Their opinion about the task itself is of no interest to anyone. The system of their work is a black box. In the approach described here, the black box is opened, and everyone looks inside together.
Why This Matters for AI as a Whole
A question may arise: okay, they held some nice workshops, talked about feminism and context – but what does this have to do with real AI systems? A great deal.
Any machine learning system is, in essence, a snapshot of the judgments of the people who annotated the data. If these people worked under pressure, for minimal pay, without the opportunity to ask questions or express doubts, their judgments will reflect that. Not because they are bad specialists, but because the system left them no room for quality work.
Natural language processing models that analyze news texts, content moderation systems, tools for human rights defenders – they all depend on the quality of the data. And the quality of the data depends on the conditions under which the people who created that data worked.
The study we are discussing shows that when annotators are not interchangeable units but experts with a voice, the data turns out fundamentally different. Richer, more contextual, and more resilient to the blind spots of bias.
This doesn't mean that such an approach is cheap or easy to scale. Workshops with journalists and activists are expensive and time-consuming compared to the standard crowdsourcing model of annotation. But for tasks where data quality is critical and errors can affect real people, this is not a luxury, but a necessity.
Data as a Space of Relationships
Perhaps the most radical idea of this study is this: data is not an object, but a relationship. It doesn't exist separately from the people who create it, the context in which it is created, or the purposes for which it will be used.
The traditional engineering model of working with data presents it as a neutral resource: collect, label, load into a model. The feminist model says: data is always someone's point of view, always the result of someone's labor, always embedded in relationships of power and care.
To accept this is to accept responsibility. To ask: Who annotated this data? Under what conditions? Whose categories are we using? Whose voices are not heard in this data?
These are precisely the questions that everyone involved in developing AI systems should be asking themselves – not just researchers with a feminist agenda, but also engineers, product managers, and investors. Because AI built on invisible and devalued labor reproduces that invisibility and devaluation. But AI built on respect for expertise and dialogue is a different machine. Literally.
To think of this as a technical issue is to see only the carnival drums. The whole dance begins long before the celebration, in the rehearsal halls where people agree on the rhythm together.