Published on January 28, 2026

Chinese Open Source Models: Architectures After Deepseek

How Chinese Open Source Handles Architectures: What Happens After DeepSeek

We explore the architectural solutions developers of Chinese open-source models are choosing and why decoder-based approaches continue to dominate the ecosystem.

Research 3 – 4 minutes min read

Event Source: Hugging Face 3 – 4 minutes min read

DeepSeek sparked a noticeable surge of interest in Chinese open-source models. But when the initial hype began to subside, a perfectly legitimate question arose: what's next? What architectural solutions are other teams in China choosing? How diverse are they, or, on the contrary, are they following established paths?

Researchers from Hugging Face decided to systematically examine this landscape – they analyzed data on uploaded models to understand which architectures currently dominate the Chinese ecosystem and if there's any room for experimentation.

🧱 What's Actually Happening with Architectures

In short: the vast majority of Chinese open-source models are built upon decoder architectures. These are the very transformers that generate text sequentially, token by token. GPT, LLaMA, Mistral – these are all examples of decoder models.

According to Hugging Face's analysis, about 95% of Chinese models specifically use the decoder approach. The remaining 5% are distributed between encoders (models like BERT, which excel at text understanding but not generation) and encoder-decoder hybrids (like T5 or BART).

Such a concentration on decoders is quite simply explained: they have demonstrated the best results in text generation tasks, which is currently the main focus of large language model development.

What Exactly Is Popular Within the Decoder Camp

Among decoder models, several clear favorites stand out. In first place is the LLaMA architecture (and its variations). This is not surprising: Meta open-sourced it, the documentation is good, the community is large, and the results are impressive.

Second place is taken by Qwen – a proprietary development by Alibaba Cloud. Qwen is actively evolving, with several versions available in different parameter sizes, and many Chinese teams adopt it as a base for their projects.

Also in the top are ChatGLM from Zhipu AI, Baichuan, and Yi. All of them are decoder transformers, each with its own peculiarities, but their general working principle is similar.

Why Look at Architectural Choices at All?

At first glance, these might seem like mere technical details. But in reality, the choice of architecture reveals a lot about the direction an ecosystem is heading.

If everyone builds models on the same base, this simplifies knowledge sharing and the reuse of code and infrastructure. On the other hand, this might imply fewer experiments and fewer chances for unexpected breakthroughs.

In the case of the Chinese ecosystem, the dominance of decoders indicates that the main focus is on generative tasks: dialogues, content creation, and assistants. Conversely, tasks of pure text understanding (like classification and information extraction) fade into the background.

What About Encoders and Hybrids?

Encoder models like BERT were once at the peak of popularity. They excel at tasks requiring text understanding: search, classification, and sentiment analysis. However, in the era of large language models that can both generate and understand, pure encoders have become less in demand.

Encoder-decoder hybrids (like T5) also did not achieve widespread adoption in the Chinese ecosystem. Their advantage is their ability to handle

#analysis #systemic analysis #neural networks #ai development #engineering #model architecture #open technologies #model benchmarks #open language models

Link to Original: https://huggingface.co/blog/huggingface/one-year-since-the-deepseek-moment-blog-2

Original Title: Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek

Publication Date: Jan 27, 2026

Hugging Face huggingface.co A U.S.-based open platform and company for hosting, training, and sharing AI models.

Previous Article Trinity Large: What's Inside and Why Arcee Released Three Versions of the Same Model Next Article AMD Quark ONNX: Automated Search for Optimal Quantization Strategies

Chinese Open Source Models: Architectures After Deepseek

🧱 What's Actually Happening with Architectures

What Exactly Is Popular Within the Decoder Camp

Why Look at Architectural Choices at All?

What About Encoders and Hybrids?

Related Publications

Trinity Large: What's Inside and Why Arcee Released Three Versions of the Same Model

AMD Launches ReasonLite-0.6B: A Compact Model for Logical Reasoning

GLM-4.7-Flash: An Open-Source and Free Language Model

From Source to Analysis

Neural Networks Involved in the Process

1. Analyzing the Original Publication and Writing the Text

2. step.translate-en.title

3. Text Review and Editing

4. Preparing the Illustration Description

5. Creating the Illustration