QuickStart of Higress AI Gateway

Release Time 2025-03-24

This guide provides a quick deployment method for the Higress AI Gateway using Docker. If you need to use other deployment methods (such as standard Kubernetes clusters or local Kubernetes clusters), please refer to the Quick Start.

Achieve Remote MCP Server hosting based on Higress AI Gateway, please refer to MCP Server Quick Start.

Install Higress AI Gateway

In your local console, execute the following command:

curl -sS https://higress.cn/ai-gateway/install.sh | bash

Follow the prompts to enter the API-KEYs for model providers; you can also press Enter to skip and modify it later in the console. You can also press Enter to skip and modify it later in the console.

If your current port is occupied and need to use another port, download the installer script using wget https://higress.cn/ai-gateway/install.sh, modify DEFAULT_GATEWAY_HTTP_PORT/DEFAULT_GATEWAY_HTTPS_PORT/DEFAULT_CONSOLE_PORT, and then execute the script using bash.

After the deployment is completed, the following command display will appear.

Console Configuration

Access the Higress console via a browser at http://localhost:8001/. The first login requires setting up an administrator account and password.

In the LLM Provider Management, you can configure the API-KEYs for integrated suppliers. Currently integrated suppliers include Alibaba Cloud, DeepSeek, Azure OpenAI, OpenAI, DouBao, etc.

Each AI Service Provider can independently configure token failover strategies. When the number of abnormal responses from a particular authentication token exceeds the threshold, Higress will pause requests using that token until subsequent health check requests receive a certain number of normal responses.

In the AI Route Config, you can configure domain, model match types, fallback configurations, and allowed consumers for different routes. You can also configure different authentication methods, rate-limiting strategies, and AI features such as RAG, Prompt templates, and semantic caching through Strategy.

Debugging

Open the system’s built-in command line and send a request using the following command (if the HTTP service is not deployed on port 8080, modify it to the corresponding port):

curl 'http://localhost:8080/v1/chat/completions' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "qwen-max",
    "messages": [
      {
        "role": "user",
        "content": "Who are you?"
      }
    ]
  }'

Sample response:

Observability

In the AI Dashboard, you can observe AI requests. Observability metrics include the number of input/output tokens per second, token usage by each provider/model, etc.

Through these observation metrics, you can further compare the usage and latency of multiple models currently in use, helping developers optimize model strategies.

If you encounter any issues during deployment, feel free to leave your information in the Higress Github Issue.

If you are interested in future updates of Higress or wish to provide feedback, welcome to star Higress Github Repo.