Skip to content
云栖回顾 | 2024 云栖大会微服务和网关相关演讲材料Know more

AI Proxy

Function Description

AI Proxy plugin implements AI proxy functionality based on OpenAI API contracts. It currently supports AI service providers such as OpenAI, Azure OpenAI, Moonshot, and Qwen.

Note: When the request path suffix matches /v1/chat/completions, corresponding to text generation scenarios, the request body will be parsed using OpenAI’s text generation protocol and then converted to the corresponding LLM vendor’s text generation protocol.

When the request path suffix matches /v1/embeddings, corresponding to text vector scenarios, the request body will be parsed using OpenAI’s text vector protocol and then converted to the corresponding LLM vendor’s text vector protocol.

Running Attributes

Plugin execution phase: Default phase
Plugin execution priority: 100

Configuration Fields

Basic Configuration

NameData TypeRequirementDefault ValueDescription
providerobjectRequired-Information about the target AI service provider

The description of fields in provider is as follows:

NameData TypeRequirementDefault ValueDescription
typestringRequired-Name of the AI service provider
apiTokensarray of stringOptional-Tokens for authentication when accessing the AI service. If multiple tokens are provided, the plugin will randomly choose one when making requests. Some service providers only support one token configuration.
timeoutnumberOptional-Timeout for accessing the AI service, in milliseconds. The default value is 120000, which is 2 minutes.
modelMappingmap of stringOptional-AI model mapping table for mapping model names in requests to supported model names by the service provider.
1. Supports prefix matching. For example, “gpt-3-” matches all models whose names start with “gpt-3-”;
2. Supports using "
" as a key to configure a general fallback mapping;
3. If the target name in the mapping is an empty string "", it means to retain the original model name.
protocolstringOptional-The API interface contract provided by the plugin. Currently supports the following values: openai (default, uses OpenAI’s interface contract), original (uses the original interface contract of the target service provider)
contextobjectOptional-Configuration for AI conversation context information
customSettingsarray of customSettingOptional-Specify override or fill parameters for AI requests

The description of fields in context is as follows:

NameData TypeRequirementDefault ValueDescription
fileUrlstringRequired-URL of the file that stores AI conversation context. Only pure text file content is supported.
serviceNamestringRequired-The complete name of the Higress backend service corresponding to the URL.
servicePortnumberRequired-The access port of the Higress backend service corresponding to the URL.

The description of fields in customSettings is as follows:

NameData TypeRequirementDefault ValueDescription
namestringRequired-Name of the parameter to set, e.g., max_tokens
valuestring/int/float/boolRequired-Value for the parameter to set, e.g., 0
modestringOptional”auto”Mode for parameter settings, can be set to “auto” or “raw”. If “auto”, parameter names will be automatically rewritten based on the protocol; if “raw”, no rewriting or validation checks will be done.
overwriteboolOptionaltrueIf false, the parameter will only be filled if the user hasn’t set it; otherwise, it will overwrite the user’s original parameter settings.

Custom settings will follow the table below to replace corresponding fields based on name and protocol. Users need to fill in values that exist in the settingName column of the table. For example, if the user sets name to max_tokens, it will be replaced by max_tokens in the OpenAI protocol, and by maxOutputTokens in Gemini. none indicates that the protocol does not support this parameter. If name is not in this table or the corresponding protocol does not support this parameter, and if raw mode is not set, the configuration will not take effect.

settingNameopenaibaidusparkqwengeminihunyuanclaudeminimax
max_tokensmax_tokensmax_output_tokensmax_tokensmax_tokensmaxOutputTokensnonemax_tokenstokens_to_generate
temperaturetemperaturetemperaturetemperaturetemperaturetemperatureTemperaturetemperaturetemperature
top_ptop_ptop_pnonetop_ptopPTopPtop_ptop_p
top_knonenonetop_knonetopKnonetop_knone
seedseednonenoneseednonenonenonenone

If raw mode is enabled, custom settings will directly use the input name and value to change the JSON content of the request without any restrictions or modifications to the parameter names.

For most protocols, custom settings will modify or fill parameters at the root path of the JSON content. For the qwen protocol, the ai-proxy will configure under the parameters sub-path in JSON. For the gemini protocol, it will be configured under the generation_config sub-path.

Provider-Specific Configuration

OpenAI

The type corresponding to OpenAI is openai. Its specific configuration fields are as follows:

NameData TypeRequirementDefault ValueDescription
openaiCustomUrlstringOptional-Custom backend URL based on OpenAI protocol, e.g., www.example.com/myai/v1/chat/completions
responseJsonSchemaobjectOptional-Predefined Json Schema that OpenAI responses must satisfy, currently only supported by specific models.

Azure OpenAI

The type corresponding to Azure OpenAI is azure. Its specific configuration fields are as follows:

NameData TypeRequirementDefault ValueDescription
azureServiceUrlstringRequired-URL of Azure OpenAI service, must include api-version query parameter.
Note: Azure OpenAI only supports the configuration of one API Token.

Moonshot

The type corresponding to Moonshot is moonshot. Its specific configuration fields are as follows:

NameData TypeRequirementDefault ValueDescription
moonshotFileIdstringOptional-File ID uploaded to Moonshot via the file interface, its content will be used as the context for AI conversation. Cannot be configured simultaneously with the context field.

Qwen

The type corresponding to Qwen is qwen. Its specific configuration fields are as follows:

NameData TypeRequirementDefault ValueDescription
qwenEnableSearchbooleanOptional-Whether to enable the built-in internet search functionality of Qwen.
qwenFileIdsarray of stringOptional-File IDs uploaded to Dashscope via the file interface, its contents will be used as the context for AI conversation. Cannot be configured simultaneously with the context field.

Baichuan AI

The type corresponding to Baichuan AI is baichuan. It has no specific configuration fields.

Yi

The type corresponding to Yi is yi. It has no specific configuration fields.

Zhipu AI

The type corresponding to Zhipu AI is zhipuai. It has no specific configuration fields.

DeepSeek

The type corresponding to DeepSeek is deepseek. It has no specific configuration fields.

Groq

The type corresponding to Groq is groq. It has no specific configuration fields.

Baidu

The type corresponding to Baidu is baidu. It has no specific configuration fields.

AI360

The type corresponding to AI360 is ai360. It has no specific configuration fields.

Mistral

The type corresponding to Mistral is mistral. It has no specific configuration fields.

MiniMax

The type corresponding to MiniMax is minimax. Its specific configuration fields are as follows:

NameData TypeRequirementDefault ValueDescription
minimaxGroupIdstringRequired when using abab6.5-chat, abab6.5s-chat, abab5.5s-chat, or abab5.5-chat models-When using these models, ChatCompletion Pro will be used, and groupID needs to be set.

Anthropic Claude

The type corresponding to Anthropic Claude is claude. Its specific configuration fields are as follows:

NameData TypeRequirementDefault ValueDescription
claudeVersionstringOptional-The API version for Claude service, defaults to 2023-06-01

Ollama

The type corresponding to Ollama is ollama. Its specific configuration fields are as follows:

NameData TypeRequirementDefault ValueDescription
ollamaServerHoststringRequired-Host address for the Ollama server
ollamaServerPortnumberRequired-Port number for the Ollama server, defaults to 11434

Hunyuan

The type corresponding to Hunyuan is hunyuan. Its specific configuration fields are as follows:

NameData TypeRequirementDefault ValueDescription
hunyuanAuthIdstringRequired-ID used for Hunyuan authentication with version v3
hunyuanAuthKeystringRequired-Key used for Hunyuan authentication with version v3

Stepfun

The type corresponding to Stepfun is stepfun. It has no specific configuration fields.

Cloudflare Workers AI

The type corresponding to Cloudflare Workers AI is cloudflare. Its specific configuration fields are as follows:

NameData TypeRequirementDefault ValueDescription
cloudflareAccountIdstringRequired-Cloudflare Account ID

Spark

The type corresponding to Spark is spark. It has no specific configuration fields.

The apiTokens field value for iFlytek’s Spark cognitive large model is APIKey:APISecret. That is, fill in your own APIKey and APISecret, separated by :.

Gemini

The type corresponding to Gemini is gemini. Its specific configuration fields are as follows:

NameData TypeRequirementDefault ValueDescription
geminiSafetySettingmap of stringOptional-Gemini AI content filtering and safety level settings. Refer to Safety settings.

DeepL

The type corresponding to DeepL is deepl. Its specific configuration fields are as follows:

NameData TypeRequirementDefault ValueDescription
targetLangstringRequired-Target language required by DeepL translation service.

Cohere

The type corresponding to Cohere is cohere. It has no specific configuration fields.

Usage Examples

Using OpenAI Protocol to Proxy Azure OpenAI Service

Using the most basic Azure OpenAI service with no context configured.

Configuration Information

provider:
type: azure
apiTokens:
- "YOUR_AZURE_OPENAI_API_TOKEN"
azureServiceUrl: "https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/chat/completions?api-version=2024-02-15-preview"

Using OpenAI Protocol to Proxy Qwen Service

Using Qwen service with a model mapping from OpenAI large models to Qwen.

Configuration Information

provider:
type: qwen
apiTokens:
- "YOUR_QWEN_API_TOKEN"
modelMapping:
'gpt-3': "qwen-turbo"
'gpt-35-turbo': "qwen-plus"
'gpt-4-turbo': "qwen-max"
'gpt-4-*': "qwen-max"
'gpt-4o': "qwen-vl-plus"
'text-embedding-v1': 'text-embedding-v1'
'*': "qwen-turbo"

Using original protocol to Proxy Baichuan AI proxy application

Configuration Information

provider:
type: qwen
apiTokens:
- "YOUR_DASHSCOPE_API_TOKEN"
protocol: original

Using OpenAI Protocol to Proxy Doubao Large Model Service

Configuration Information

provider:
type: doubao
apiTokens:
- "YOUR_DOUBAO_API_KEY"
modelMapping:
'*': YOUR_DOUBAO_ENDPOINT
timeout: 1200000

Using Moonshot with its native file context

Pre-upload a file to Moonshot to use its content as context for its AI service.

Configuration Information

provider:
type: moonshot
apiTokens:
- "YOUR_MOONSHOT_API_TOKEN"
moonshotFileId: "YOUR_MOONSHOT_FILE_ID"
modelMapping:
'*': "moonshot-v1-32k"

Using OpenAI Protocol to Proxy Groq Service

Configuration Information

provider:
type: groq
apiTokens:
- "YOUR_GROQ_API_TOKEN"

Using OpenAI Protocol to Proxy Claude Service

Configuration Information

provider:
type: claude
apiTokens:
- "YOUR_CLAUDE_API_TOKEN"
version: "2023-06-01"

Using OpenAI Protocol to Proxy Hunyuan Service

Configuration Information

provider:
type: "hunyuan"
hunyuanAuthKey: "<YOUR AUTH KEY>"
apiTokens:
- ""
hunyuanAuthId: "<YOUR AUTH ID>"
timeout: 1200000
modelMapping:
"*": "hunyuan-lite"

Using OpenAI Protocol to Proxy Baidu Wenxin Service

Configuration Information

provider:
type: baidu
apiTokens:
- "YOUR_BAIDU_API_TOKEN"
modelMapping:
'gpt-3': "ERNIE-4.0"
'*': "ERNIE-4.0"

Using OpenAI Protocol to Proxy MiniMax Service

Configuration Information

provider:
type: minimax
apiTokens:
- "YOUR_MINIMAX_API_TOKEN"
modelMapping:
"gpt-3": "abab6.5g-chat"
"gpt-4": "abab6.5-chat"
"*": "abab6.5g-chat"
minimaxGroupId: "YOUR_MINIMAX_GROUP_ID"

Using OpenAI Protocol to Proxy AI360 Service

Configuration Information

provider:
type: ai360
apiTokens:
- "YOUR_MINIMAX_API_TOKEN"
modelMapping:
"gpt-4o": "360gpt-turbo-responsibility-8k"
"gpt-4": "360gpt2-pro"
"gpt-3.5": "360gpt-turbo"
"text-embedding-3-small": "embedding_s1_v1.2"
"*": "360gpt-pro"

Using OpenAI Protocol to Proxy Cloudflare Workers AI Service

Configuration Information

provider:
type: cloudflare
apiTokens:
- "YOUR_WORKERS_AI_API_TOKEN"
cloudflareAccountId: "YOUR_CLOUDFLARE_ACCOUNT_ID"
modelMapping:
"*": "@cf/meta/llama-3-8b-instruct"

Using OpenAI Protocol to Proxy Spark Service

Configuration Information

provider:
type: spark
apiTokens:
- "APIKey:APISecret"
modelMapping:
"gpt-4o": "generalv3.5"
"gpt-4": "generalv3"
"*": "general"

Using OpenAI Protocol to Proxy Gemini Service

Configuration Information

provider:
type: gemini
apiTokens:
- "YOUR_GEMINI_API_TOKEN"
modelMapping:
"*": "gemini-pro"
geminiSafetySetting:
"HARM_CATEGORY_SEXUALLY_EXPLICIT": "BLOCK_NONE"
"HARM_CATEGORY_HATE_SPEECH": "BLOCK_NONE"
"HARM_CATEGORY_HARASSMENT": "BLOCK_NONE"
"HARM_CATEGORY_DANGEROUS_CONTENT": "BLOCK_NONE"

Using OpenAI Protocol to Proxy DeepL Text Translation Service

Configuration Information

provider:
type: deepl
apiTokens:
- "YOUR_DEEPL_API_TOKEN"
targetLang: "ZH"

Request Example In this context, model indicates the type of DeepL service, which can only be Free or Pro. The content sets the text to be translated; in the role: system content, context that may affect the translation but itself will not be translated can be included. For example, when translating product names, product descriptions can be passed as context, and this additional context may improve the quality of the translation.

{
"model": "Free",
"messages": [
{
"role": "system",
"content": "money"
},
{
"content": "sit by the bank"
},
{
"content": "a bank in China"
}
]
}

Response Example

{
"choices": [
{
"index": 0,
"message": { "name": "EN", "role": "assistant", "content": "坐庄" }
},
{
"index": 1,
"message": { "name": "EN", "role": "assistant", "content": "中国银行" }
}
],
"created": 1722747752,
"model": "Free",
"object": "chat.completion",
"usage": {}
}