AI Proxy
Function Description
The AI Proxy
plugin implements AI proxy functionality based on the OpenAI API contract. It currently supports AI service providers such as OpenAI, Azure OpenAI, Moonshot, and Qwen.
Note:
When the request path suffix matches
/v1/chat/completions
, it corresponds to text-to-text scenarios. The request body will be parsed using OpenAI’s text-to-text protocol and then converted to the corresponding LLM vendor’s text-to-text protocol.
When the request path suffix matches
/v1/embeddings
, it corresponds to text vector scenarios. The request body will be parsed using OpenAI’s text vector protocol and then converted to the corresponding LLM vendor’s text vector protocol.
Configuration Fields
Basic Configuration
Name | Data Type | Requirement | Default | Description |
---|---|---|---|---|
provider | object | Required | - | Configures information for the target AI service provider |
Details for the provider
configuration fields:
Name | Data Type | Requirement | Default | Description |
---|---|---|---|---|
type | string | Required | - | Name of the AI service provider |
apiTokens | array of string | Optional | - | Tokens used for authentication when accessing AI services. If multiple tokens are configured, the plugin randomly selects one for each request. Some service providers only support configuring a single token. |
timeout | number | Optional | - | Timeout for accessing AI services, in milliseconds. The default value is 120000, which equals 2 minutes. |
modelMapping | map of string | Optional | - | Mapping table for AI models, used to map model names in requests to names supported by the service provider. 1. Supports prefix matching. For example, “gpt-3-” matches all model names starting with “gpt-3-”; 2. Supports using "" as a key for a general fallback mapping; 3. If the mapped target name is an empty string "", the original model name is preserved. |
protocol | string | Optional | - | API contract provided by the plugin. Currently supports the following values: openai (default, uses OpenAI’s interface contract), original (uses the raw interface contract of the target service provider) |
context | object | Optional | - | Configuration for AI conversation context information |
customSettings | array of customSetting | Optional | - | Specifies overrides or fills parameters for AI requests |
Details for the context
configuration fields:
Name | Data Type | Requirement | Default | Description |
---|---|---|---|---|
fileUrl | string | Required | - | File URL to save AI conversation context. Only supports file content of plain text type |
serviceName | string | Required | - | Full name of the Higress backend service corresponding to the URL |
servicePort | number | Required | - | Port for accessing the Higress backend service corresponding to the URL |
Details for the customSettings
configuration fields:
Name | Data Type | Requirement | Default | Description |
---|---|---|---|---|
name | string | Required | - | Name of the parameter to set, e.g., max_tokens |
value | string/int/float/bool | Required | - | Value of the parameter to set, e.g., 0 |
mode | string | Optional | ”auto” | Mode for setting the parameter, can be set to “auto” or “raw”; if “auto”, the parameter name will be automatically rewritten based on the protocol; if “raw”, no rewriting or restriction checks will be applied |
overwrite | bool | Optional | true | If false, the parameter is only filled if the user has not set it; otherwise, it directly overrides the user’s existing parameter settings |
The custom-setting
adheres to the following table, replacing the corresponding field based on name
and protocol. Users need to fill in values from the settingName
column that exists in the table. For instance, if a user sets name
to max_tokens
, in the openai protocol, it replaces max_tokens
; for gemini, it replaces maxOutputTokens
. "none"
indicates that the protocol does not support this parameter. If name
is not in this table or the corresponding protocol does not support the parameter, and “raw” mode is not set, the configuration will not take effect.
settingName | openai | baidu | spark | qwen | gemini | hunyuan | claude | minimax |
---|---|---|---|---|---|---|---|---|
max_tokens | max_tokens | max_output_tokens | max_tokens | max_tokens | maxOutputTokens | none | max_tokens | tokens_to_generate |
temperature | temperature | temperature | temperature | temperature | temperature | Temperature | temperature | temperature |
top_p | top_p | top_p | none | top_p | topP | TopP | top_p | top_p |
top_k | none | none | top_k | none | topK | none | top_k | none |
seed | seed | none | none | seed | none | none | none | none |
If raw mode is enabled, custom-setting
will directly alter the JSON content using the input name
and value
, without any restrictions or modifications to the parameter names.
For most protocols, custom-setting
modifies or fills parameters at the root path of the JSON content. For the qwen
protocol, ai-proxy configures under the parameters
subpath. For the gemini
protocol, it configures under the generation_config
subpath.
Provider-Specific Configurations
OpenAI
For OpenAI, the corresponding type
is openai
. Its unique configuration fields include:
Name | Data Type | Requirement | Default | Description |
---|---|---|---|---|
openaiCustomUrl | string | Optional | - | Custom backend URL based on the OpenAI protocol, e.g., www.example.com/myai/v1/chat/completions |
responseJsonSchema | object | Optional | - | Predefined Json Schema that OpenAI responses must adhere to; note that currently only a few specific models support this usage |
Azure OpenAI
For Azure OpenAI, the corresponding type
is azure
. Its unique configuration field is:
Name | Data Type | Filling Requirements | Default Value | Description |
---|---|---|---|---|
azureServiceUrl | string | Required | - | The URL of the Azure OpenAI service, must include the api-version query parameter. |
Note: Azure OpenAI only supports configuring one API Token.
Moonshot
For Moonshot, the corresponding type
is moonshot
. Its unique configuration field is:
Name | Data Type | Filling Requirements | Default Value | Description |
---|---|---|---|---|
moonshotFileId | string | Optional | - | The file ID uploaded via the file interface to Moonshot, whose content will be used as context for AI conversations. Cannot be configured with the context field. |
Qwen (Tongyi Qwen)
For Qwen (Tongyi Qwen), the corresponding type
is qwen
. Its unique configuration fields are:
Name | Data Type | Filling Requirements | Default Value | Description |
---|---|---|---|---|
qwenEnableSearch | boolean | Optional | - | Whether to enable the built-in Internet search function provided by Qwen. |
qwenFileIds | array of string | Optional | - | The file IDs uploaded via the Dashscope file interface, whose content will be used as context for AI conversations. Cannot be configured with the context field. |
Baichuan AI
For Baichuan AI, the corresponding type
is baichuan
. It has no unique configuration fields.
Yi (Zero One Universe)
For Yi (Zero One Universe), the corresponding type
is yi
. It has no unique configuration fields.
Zhipu AI
For Zhipu AI, the corresponding type
is zhipuai
. It has no unique configuration fields.
DeepSeek
For DeepSeek, the corresponding type
is deepseek
. It has no unique configuration fields.
Groq
For Groq, the corresponding type
is groq
. It has no unique configuration fields.
ERNIE Bot
For ERNIE Bot, the corresponding type
is baidu
. It has no unique configuration fields.
360 Brain
For 360 Brain, the corresponding type
is ai360
. It has no unique configuration fields.
Mistral
For Mistral, the corresponding type
is mistral
. It has no unique configuration fields.
Minimax
For Minimax, the corresponding type
is minimax
. Its unique configuration field is:
Name | Data Type | Filling Requirements | Default Value | Description |
---|---|---|---|---|
minimaxGroupId | string | Required when using models abab6.5-chat , abab6.5s-chat , abab5.5s-chat , abab5.5-chat | - | When using models abab6.5-chat , abab6.5s-chat , abab5.5s-chat , abab5.5-chat , Minimax uses ChatCompletion Pro and requires setting the groupID. |
Anthropic Claude
For Anthropic Claude, the corresponding type
is claude
. Its unique configuration field is:
Name | Data Type | Filling Requirements | Default Value | Description |
---|---|---|---|---|
claudeVersion | string | Optional | - | The version of the Claude service’s API, default is 2023-06-01. |
Ollama
For Ollama, the corresponding type
is ollama
. Its unique configuration field is:
Name | Data Type | Filling Requirements | Default Value | Description |
---|---|---|---|---|
ollamaServerHost | string | Required | - | The host address of the Ollama server. |
ollamaServerPort | number | Required | - | The port number of the Ollama server, defaults to 11434. |
Hunyuan
For Hunyuan, the corresponding type
is hunyuan
. Its unique configuration fields are:
Name | Data Type | Filling Requirements | Default Value | Description |
---|---|---|---|---|
hunyuanAuthId | string | Required | - | Hunyuan authentication ID for version 3 authentication. |
hunyuanAuthKey | string | Required | - | Hunyuan authentication key for version 3 authentication. |
Stepfun
For Stepfun, the corresponding type
is stepfun
. It has no unique configuration fields.
Cloudflare Workers AI
For Cloudflare Workers AI, the corresponding type
is cloudflare
. Its unique configuration field is:
Name | Data Type | Filling Requirements | Default Value | Description |
---|---|---|---|---|
cloudflareAccountId | string | Required | - | Cloudflare Account ID. |
Spark
For Spark, the corresponding type
is spark
. It has no unique configuration fields.
The apiTokens
field value for Xunfei Spark (Xunfei Star) is APIKey:APISecret
. That is, enter your own APIKey and APISecret, separated by :
.
Gemini
For Gemini, the corresponding type
is gemini
. Its unique configuration field is:
Name | Data Type | Filling Requirements | Default Value | Description |
---|---|---|---|---|
geminiSafetySetting | map of string | Optional | - | Gemini AI content filtering and safety level settings. Refer to Safety settings. |
DeepL
For DeepL, the corresponding type
is deepl
. Its unique configuration field is:
Name | Data Type | Requirement | Default | Description |
---|---|---|---|---|
targetLang | string | Required | - | The target language required by the DeepL translation service |
Usage Examples
Using OpenAI Protocol Proxy for Azure OpenAI Service
Using the basic Azure OpenAI service without configuring any context.
Configuration Information
Request Example
Response Example
Using OpenAI Protocol Proxy for Qwen Service
Using Qwen service and configuring the mapping relationship between OpenAI large models and Qwen models.
Configuration Information
AI Conversation Request Example
URL: http://your-domain/v1/chat/completions
Request Example:
Response Example:
Multimodal Model API Request Example (Applicable to qwen-vl-plus
and qwen-vl-max
Models)
URL: http://your-domain/v1/chat/completions
Request Example:
Response Example:
Text Embedding Request Example
URL: http://your-domain/v1/embeddings
Request Example:
Response Example:
Using Qwen Service with Pure Text Context Information
Using Qwen service while configuring pure text context information.
Configuration Information
Request Example
Response Example
Using Qwen Service with Native File Context
Uploading files to Qwen in advance to use them as context when utilizing its AI service.
Configuration Information
Request Example
Response Example
Utilizing Moonshot with its Native File Context
Upload files to Moonshot in advance and use its AI services based on file content.
Configuration Information
Example Request
Example Response
Using OpenAI Protocol Proxy for Groq Service
Configuration Information
Example Request
Example Response
Using OpenAI Protocol Proxy for Claude Service
Configuration Information
Example Request
Example Response
Using OpenAI Protocol Proxy for Hunyuan Service
Configuration Information
Example Request
Request script:
Example Response
Using OpenAI Protocol Proxy for ERNIE Bot Service
Configuration Information
Request Example
Response Example
Using OpenAI Protocol Proxy for MiniMax Service
Configuration Information
Request Example
Response Example
Using OpenAI Protocol Proxy for 360 Brain Services
Configuration Information
Request Example
Response Example
Text Embedding Request Example
URL: http://your-domain/v1/embeddings
Request Example
Response Example
Using OpenAI Protocol Proxy for Cloudflare Workers AI Service
Configuration Information
Request Example
Response Example
Using OpenAI Protocol Proxy for Spark Service
Configuration Information
Request Example
Response Example
Utilizing OpenAI Protocol Proxy for Gemini Services
Configuration Information
Request Example
Response Example
Utilizing OpenAI Protocol Proxy for DeepL Text Translation Service
Configuration Information
Request Example
Here, model
denotes the service tier of DeepL and can only be either Free
or Pro
. The content
field contains the text to be translated; within role: system
, content
may include context that influences the translation but isn’t translated itself. For instance, when translating product names, including a product description as context could enhance translation quality.
Response Example
Full Configuration Example
Kubernetes Example
Here’s a full plugin configuration example using the OpenAI protocol proxy for Groq services.
Access Example:
Docker-Compose Example
docker-compose.yml
configuration file:
envoy.yaml
configuration file:
Access Example: