Published February 14, 2026

Higress Update: Gateway API Support and AI Inference Features

Higress: Gateway API Support and Extensions for AI Inference

The Higress cloud gateway has been updated to support the Gateway API standard and now includes specialized features for working with artificial intelligence models.

Infrastructure
Event Source: Alibaba Cloud Reading Time: 4 – 6 minutes

Higress is a cloud gateway from Alibaba, designed to manage traffic between services and applications. It acts as a dispatcher, receiving user requests and routing them to their destinations. Recently, its developers released an update that introduces two key features: support for the new Gateway API standard and specialized capabilities for working with AI models.

What's Changed in Traffic Management?

Previously, the Ingress standard was used to configure such gateways. While it has been around for a long time and is still functional, its capabilities have proven insufficient for modern tasks. The Gateway API is a more flexible alternative that allows for a more precise description of the request handling process.

The main difference lies in the separation of roles. One specialist can configure the gateway's overall infrastructure, while another can define routes for a specific application. This is convenient for large teams where different specialists are responsible for different parts of the system.

Higress now fully supports this standard. This means that if you are already using the Gateway API with other tools, migrating to Higress will be easier, as the configuration remains familiar and requires no retraining.

Why Does a Gateway Need AI Features?

The second part of the update relates to artificial intelligence. When working with large language models or other AI services, it is often necessary to send requests to various models and receive responses. This presents several challenges.

First, different providers use different request formats: one model might expect data in one structure, while another needs it in a different one. Second, load management is crucial: if one model is overloaded, a request can be redirected to another. Third, speed is important: users shouldn't have to wait too long.

Higress now includes an extension for AI inference – the process of getting responses from models. This extension allows you to:

  • Route requests to different models through a single interface, without rewriting application code for each provider.
  • Automatically distribute the load across multiple models.
  • Cache responses, so that identical requests can be processed faster and more cheaply.
  • Control token usage – the units used to measure the volume of requests to language models.

In essence, the gateway takes on tasks that previously had to be implemented separately within each application.

How It Works in Practice

Imagine you have an application that uses several AI models. One model answers user questions, another generates text, and a third processes images. Without a gateway, you would have to write logic into your application's code for each model: where to send the request, how to process the response, and what to do if a model becomes unavailable.

With Higress, you configure the routes once at the gateway level. The application simply sends a standardized request, and the gateway itself decides which model to use, how to transform the request, and how to return the result. If one of the models is overloaded, the gateway automatically switches to another.

Caching also simplifies the process. If someone has already asked a similar question, the gateway can return the cached response without querying the model again. This saves both time and money, especially when working with paid APIs.

Who Can Benefit from This?

The Higress update is particularly relevant for teams that develop applications using AI models. If you are working with multiple providers or experimenting with different models, managing them through a gateway can significantly simplify your architecture.

It's also useful for those looking to adopt a more modern approach to traffic management. The Gateway API is actively developing and becoming an industry standard, so Higress's support helps you stay current.

For small projects using a single model with no complex routing logic, this functionality might be overkill. But if you plan to scale or are already facing challenges managing multiple AI services, it's worth taking a look at Higress.

What Questions Remain?

As with any update, some aspects will only become clear with practical use. How stable is the new functionality under a heavy load? How quickly can the gateway switch between models if one becomes unavailable? How flexible is the caching configuration for different request types?

Furthermore, Higress is an Alibaba product, and its ecosystem may be more convenient for those already using the company's cloud services. For teams working with other cloud providers, alternative solutions might exist that offer better integration with their infrastructure.

Nevertheless, the fact that cloud gateways are beginning to integrate features for AI workloads signals the direction in which the industry is heading. Artificial intelligence is becoming more deeply embedded in our infrastructure, and the tools are adapting to these changes.

Original Title: Higress Has Supported the New Gateway API and Its AI Inference Extension
Publication Date: Feb 13, 2026
Alibaba Cloud www.alibabacloud.com A Chinese cloud and AI division of Alibaba, providing infrastructure and AI services for businesses.
Previous Article Olmix: Allen AI's Approach to Data Mixing Across All Stages of Language Model Training Next Article Gang Scheduling: Balancing Rigidity and Flexibility in AI Compute Allocation

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.5 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.5 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

Anthropic has proposed a way to standardize the integration of language models with external sources – from databases to work tools. We explore how the MCP protocol solves the problem of fragmented integrations.

Copy AIwww.copy.ai Feb 7, 2026

The Perplexity team shared the story behind their search engine, which handles 200 million queries daily and operates in tandem with large language models.

Perplexity AIresearch.perplexity.ai Feb 7, 2026

Don’t miss a single experiment!

Subscribe to our Telegram channel —
we regularly post announcements of new books, articles, and interviews.

Subscribe