Published January 28, 2026

Chinese Open Source Models: Architectures After Deepseek

How Chinese Open Source Handles Architectures: What Happens After DeepSeek

We explore the architectural solutions developers of Chinese open-source models are choosing and why decoder-based approaches continue to dominate the ecosystem.

Research
Event Source: Hugging Face Reading Time: 3 – 4 minutes

DeepSeek sparked a noticeable surge of interest in Chinese open-source models. But when the initial hype began to subside, a perfectly legitimate question arose: what's next? What architectural solutions are other teams in China choosing? How diverse are they, or, on the contrary, are they following established paths?

Researchers from Hugging Face decided to systematically examine this landscape – they analyzed data on uploaded models to understand which architectures currently dominate the Chinese ecosystem and if there's any room for experimentation.

Текущие архитектуры китайских моделей

🧱 What's Actually Happening with Architectures

In short: the vast majority of Chinese open-source models are built upon decoder architectures. These are the very transformers that generate text sequentially, token by token. GPT, LLaMA, Mistral – these are all examples of decoder models.

According to Hugging Face's analysis, about 95% of Chinese models specifically use the decoder approach. The remaining 5% are distributed between encoders (models like BERT, which excel at text understanding but not generation) and encoder-decoder hybrids (like T5 or BART).

Such a concentration on decoders is quite simply explained: they have demonstrated the best results in text generation tasks, which is currently the main focus of large language model development.

Популярные архитектуры среди декодеров

What Exactly Is Popular Within the Decoder Camp

Among decoder models, several clear favorites stand out. In first place is the LLaMA architecture (and its variations). This is not surprising: Meta open-sourced it, the documentation is good, the community is large, and the results are impressive.

Second place is taken by Qwen – a proprietary development by Alibaba Cloud. Qwen is actively evolving, with several versions available in different parameter sizes, and many Chinese teams adopt it as a base for their projects.

Also in the top are ChatGLM from Zhipu AI, Baichuan, and Yi. All of them are decoder transformers, each with its own peculiarities, but their general working principle is similar.

Важность выбора архитектуры моделей

Why Look at Architectural Choices at All?

At first glance, these might seem like mere technical details. But in reality, the choice of architecture reveals a lot about the direction an ecosystem is heading.

If everyone builds models on the same base, this simplifies knowledge sharing and the reuse of code and infrastructure. On the other hand, this might imply fewer experiments and fewer chances for unexpected breakthroughs.

In the case of the Chinese ecosystem, the dominance of decoders indicates that the main focus is on generative tasks: dialogues, content creation, and assistants. Conversely, tasks of pure text understanding (like classification and information extraction) fade into the background.

Энкодеры и гибридные модели: их роль

What About Encoders and Hybrids?

Encoder models like BERT were once at the peak of popularity. They excel at tasks requiring text understanding: search, classification, and sentiment analysis. However, in the era of large language models that can both generate and understand, pure encoders have become less in demand.

Encoder-decoder hybrids (like T5) also did not achieve widespread adoption in the Chinese ecosystem. Their advantage is their ability to handle

#analysis #systemic analysis #neural networks #ai development #engineering #model architecture #open technologies #model benchmarks #open-language-models
Original Title: Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek
Publication Date: Jan 27, 2026
Hugging Face huggingface.co A U.S.-based open platform and company for hosting, training, and sharing AI models.
Previous Article Trinity Large: What's Inside and Why Arcee Released Three Versions of the Same Model Next Article AMD Quark ONNX: Automated Search for Optimal Quantization Strategies

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.5 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.5 Anthropic
2.
Gemini 3 Pro Preview Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 3 Pro Preview Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

Don’t miss a single experiment!

Subscribe to our Telegram channel —
we regularly post announcements of new books, articles, and interviews.

Subscribe