DeepSeek sparked a noticeable surge of interest in Chinese open-source models. But when the initial hype began to subside, a perfectly legitimate question arose: what's next? What architectural solutions are other teams in China choosing? How diverse are they, or, on the contrary, are they following established paths?
Researchers from Hugging Face decided to systematically examine this landscape – they analyzed data on uploaded models to understand which architectures currently dominate the Chinese ecosystem and if there's any room for experimentation.
Текущие архитектуры китайских моделей
🧱 What's Actually Happening with Architectures
In short: the vast majority of Chinese open-source models are built upon decoder architectures. These are the very transformers that generate text sequentially, token by token. GPT, LLaMA, Mistral – these are all examples of decoder models.
According to Hugging Face's analysis, about 95% of Chinese models specifically use the decoder approach. The remaining 5% are distributed between encoders (models like BERT, which excel at text understanding but not generation) and encoder-decoder hybrids (like T5 or BART).
Such a concentration on decoders is quite simply explained: they have demonstrated the best results in text generation tasks, which is currently the main focus of large language model development.
Популярные архитектуры среди декодеров
What Exactly Is Popular Within the Decoder Camp
Among decoder models, several clear favorites stand out. In first place is the LLaMA architecture (and its variations). This is not surprising: Meta open-sourced it, the documentation is good, the community is large, and the results are impressive.
Second place is taken by Qwen – a proprietary development by Alibaba Cloud. Qwen is actively evolving, with several versions available in different parameter sizes, and many Chinese teams adopt it as a base for their projects.
Also in the top are ChatGLM from Zhipu AI, Baichuan, and Yi. All of them are decoder transformers, each with its own peculiarities, but their general working principle is similar.
Важность выбора архитектуры моделей
Why Look at Architectural Choices at All?
At first glance, these might seem like mere technical details. But in reality, the choice of architecture reveals a lot about the direction an ecosystem is heading.
If everyone builds models on the same base, this simplifies knowledge sharing and the reuse of code and infrastructure. On the other hand, this might imply fewer experiments and fewer chances for unexpected breakthroughs.
In the case of the Chinese ecosystem, the dominance of decoders indicates that the main focus is on generative tasks: dialogues, content creation, and assistants. Conversely, tasks of pure text understanding (like classification and information extraction) fade into the background.
Энкодеры и гибридные модели: их роль
What About Encoders and Hybrids?
Encoder models like BERT were once at the peak of popularity. They excel at tasks requiring text understanding: search, classification, and sentiment analysis. However, in the era of large language models that can both generate and understand, pure encoders have become less in demand.
Encoder-decoder hybrids (like T5) also did not achieve widespread adoption in the Chinese ecosystem. Their advantage is their ability to handle