Skip to content
云栖回顾 | 2024 云栖大会微服务和网关相关演讲材料Know more

AI Cache

Function Description

LLM result caching plugin, the default configuration can be directly used for result caching under the OpenAI protocol, and it supports caching of both streaming and non-streaming responses.

Runtime Properties

Plugin Execution Phase: Authentication Phase Plugin Execution Priority: 10

Configuration Description

NameTypeRequirementDefaultDescription
cacheKeyFrom.requestBodystringoptionalmessages.@reverse.0.contentExtracts a string from the request Body based on GJSON PATH syntax
cacheValueFrom.responseBodystringoptional”choices.0.message.content”Extracts a string from the response Body based on GJSON PATH syntax
cacheStreamValueFrom.responseBodystringoptional”choices.0.delta.content”Extracts a string from the streaming response Body based on GJSON PATH syntax
cacheKeyPrefixstringoptional”higress-ai-cache:“Prefix for the Redis cache key
cacheTTLintegeroptional0Cache expiration time in seconds, default value is 0, which means never expire
redis.serviceNamestringrequired-The complete FQDN name of the Redis service, including the service type, e.g., my-redis.dns, redis.my-ns.svc.cluster.local
redis.servicePortintegeroptional6379Redis service port
redis.timeoutintegeroptional1000Timeout for requests to Redis, in milliseconds
redis.usernamestringoptional-Username for logging into Redis
redis.passwordstringoptional-Password for logging into Redis
returnResponseTemplatestringoptional{"id":"from-cache","choices":[%s],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}Template for returning HTTP response, with %s marking the part to be replaced by cache value
returnStreamResponseTemplatestringoptionaldata:{"id":"from-cache","choices":[{"index":0,"delta":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}\n\ndata:[DONE]\n\nTemplate for returning streaming HTTP response, with %s marking the part to be replaced by cache value

Configuration Example

redis:
serviceName: my-redis.dns
timeout: 2000

Advanced Usage

The current default cache key is based on the GJSON PATH expression: messages.@reverse.0.content, meaning to get the content of the first item after reversing the messages array;
GJSON PATH supports conditional syntax, for instance, if you want to take the content of the last role as user as the key, it can be written as: messages.@reverse.#(role=="user").content;
If you want to concatenate all the content with role as user into an array as the key, it can be written as: messages.@reverse.#(role=="user")#.content;
It also supports pipeline syntax, for example, if you want to take the second role as user as the key, it can be written as: messages.@reverse.#(role=="user")#.content|1.
For more usage, you can refer to the official documentation and use the GJSON Playground for syntax testing.