Higress is a cloud gateway from Alibaba, designed to manage traffic between services and applications. It acts as a dispatcher, receiving user requests and routing them to their destinations. Recently, its developers released an update that introduces two key features: support for the new Gateway API standard and specialized capabilities for working with AI models.
What's Changed in Traffic Management?
Previously, the Ingress standard was used to configure such gateways. While it has been around for a long time and is still functional, its capabilities have proven insufficient for modern tasks. The Gateway API is a more flexible alternative that allows for a more precise description of the request handling process.
The main difference lies in the separation of roles. One specialist can configure the gateway's overall infrastructure, while another can define routes for a specific application. This is convenient for large teams where different specialists are responsible for different parts of the system.
Higress now fully supports this standard. This means that if you are already using the Gateway API with other tools, migrating to Higress will be easier, as the configuration remains familiar and requires no retraining.
Why Does a Gateway Need AI Features?
The second part of the update relates to artificial intelligence. When working with large language models or other AI services, it is often necessary to send requests to various models and receive responses. This presents several challenges.
First, different providers use different request formats: one model might expect data in one structure, while another needs it in a different one. Second, load management is crucial: if one model is overloaded, a request can be redirected to another. Third, speed is important: users shouldn't have to wait too long.
Higress now includes an extension for AI inference – the process of getting responses from models. This extension allows you to:
- Route requests to different models through a single interface, without rewriting application code for each provider.
- Automatically distribute the load across multiple models.
- Cache responses, so that identical requests can be processed faster and more cheaply.
- Control token usage – the units used to measure the volume of requests to language models.
In essence, the gateway takes on tasks that previously had to be implemented separately within each application.
How It Works in Practice
Imagine you have an application that uses several AI models. One model answers user questions, another generates text, and a third processes images. Without a gateway, you would have to write logic into your application's code for each model: where to send the request, how to process the response, and what to do if a model becomes unavailable.
With Higress, you configure the routes once at the gateway level. The application simply sends a standardized request, and the gateway itself decides which model to use, how to transform the request, and how to return the result. If one of the models is overloaded, the gateway automatically switches to another.
Caching also simplifies the process. If someone has already asked a similar question, the gateway can return the cached response without querying the model again. This saves both time and money, especially when working with paid APIs.
Who Can Benefit from This?
The Higress update is particularly relevant for teams that develop applications using AI models. If you are working with multiple providers or experimenting with different models, managing them through a gateway can significantly simplify your architecture.
It's also useful for those looking to adopt a more modern approach to traffic management. The Gateway API is actively developing and becoming an industry standard, so Higress's support helps you stay current.
For small projects using a single model with no complex routing logic, this functionality might be overkill. But if you plan to scale or are already facing challenges managing multiple AI services, it's worth taking a look at Higress.
What Questions Remain?
As with any update, some aspects will only become clear with practical use. How stable is the new functionality under a heavy load? How quickly can the gateway switch between models if one becomes unavailable? How flexible is the caching configuration for different request types?
Furthermore, Higress is an Alibaba product, and its ecosystem may be more convenient for those already using the company's cloud services. For teams working with other cloud providers, alternative solutions might exist that offer better integration with their infrastructure.
Nevertheless, the fact that cloud gateways are beginning to integrate features for AI workloads signals the direction in which the industry is heading. Artificial intelligence is becoming more deeply embedded in our infrastructure, and the tools are adapting to these changes.