QuickStart of Higress AI Gateway
Release Time 2025-03-24
Higress is a cloud-native API gateway with a core based on Istio and Envoy. It integrates traffic gateways, microservice gateways, security gateways, and AI gateways into one solution. It supports writing Wasm plugins in Go/Rust/JS, provides dozens of ready-to-use general plugins, and offers an out-of-the-box console.
Among these, the Higress AI Gateway integrates protocols for AI service providers such as OpenAI, DeepSeek, and Qwen, and supports multiple functional plugins like token rate limiting, consumer authentication, WAF protection, and semantic caching etc., helping developers and enterprises quickly build reliable AI services.
This guide provides a quick deployment method for the Higress AI Gateway using Docker. If you need to use other deployment methods (such as standard Kubernetes clusters or local Kubernetes clusters), please refer to the Quick Start。
Install Higress AI Gateway
In your local console, execute the following command:
curl -sS https://higress.cn/ai-gateway/install.sh | bash
Follow the prompts to enter the API-KEYs for model providers; you can also press Enter to skip and modify it later in the console. You can also press Enter
to skip and modify it later in the console.
If your current port is occupied and need to use another port, download the installer script using wget https://higress.cn/ai-gateway/install.sh
, modify DEFAULT_GATEWAY_HTTP_PORT/DEFAULT_GATEWAY_HTTPS_PORT/DEFAULT_CONSOLE_PORT, and then execute the script using bash.
After the deployment is completed, the following command display will appear.
Console Configuration
Access the Higress console via a browser at http://localhost:8001/. The first login requires setting up an administrator account and password.
In the LLM Provider Management
, you can configure the API-KEYs for integrated suppliers. Currently integrated suppliers include Alibaba Cloud, DeepSeek, Azure OpenAI, OpenAI, DouBao, etc.
Each AI Service Provider
can independently configure token failover strategies. When the number of abnormal responses from a particular authentication token exceeds the threshold, Higress will pause requests using that token until subsequent health check requests receive a certain number of normal responses.
In the AI Route Config
, you can configure domain, model match types, fallback configurations, and allowed consumers for different routes. You can also configure different authentication methods, rate-limiting strategies, and AI features such as RAG, Prompt templates, and semantic caching through Strategy
.
Debugging
Open the system’s built-in command line and send a request using the following command (if the HTTP service is not deployed on port 8080, modify it to the corresponding port):
curl 'http://localhost:8080/v1/chat/completions' \ -H 'Content-Type: application/json' \ -d '{ "model": "qwen-max", "messages": [ { "role": "user", "content": "Who are you?" } ] }'
Sample response:
Observability
In the AI Dashboard
, you can observe AI requests. Observability metrics include the number of input/output tokens per second, token usage by each provider/model, etc.
Through these observation metrics, you can further compare the usage and latency of multiple models currently in use, helping developers optimize model strategies.
If you encounter any issues during deployment, feel free to leave your information in the Higress Github Issue.
If you are interested in future updates of Higress or wish to provide feedback, welcome to star Higress Github Repo.