AI Data Masking
Function Description
Interception and replacement of sensitive words in requests/responses
Data Handling Scope
- openai protocol: Request/response conversation content
- jsonpath: Only process specified fields
- raw: Entire request/response body
Sensitive Word Interception
- Directly intercept sensitive words in the data handling scope and return preset error messages
- Supports system’s built-in sensitive word library and custom sensitive words
Sensitive Word Replacement
- Replace sensitive words in request data with masked strings before passing to back-end services. Ensures that sensitive data does not leave the domain
- Some masked data can be restored after being returned by the back-end service
- Custom rules support standard regular expressions and grok rules, and replacement strings support variable substitution
Execution Properties
Plugin Execution Phase: Authentication Phase
Plugin Execution Priority: 991
Configuration Fields
Name | Data Type | Default Value | Description |
---|---|---|---|
deny_openai | bool | true | Intercept openai protocol |
deny_jsonpath | string | [] | Intercept specified jsonpath |
deny_raw | bool | false | Intercept raw body |
system_deny | bool | true | Enable built-in interception rules |
deny_code | int | 200 | HTTP status code when intercepted |
deny_message | string | Sensitive words found in the question or answer have been blocked | AI returned message when intercepted |
deny_raw_message | string | {“errmsg”:“Sensitive words found in the question or answer have been blocked”} | Content returned when not openai intercepted |
deny_content_type | string | application/json | Content type header returned when not openai intercepted |
deny_words | array of string | [] | Custom sensitive word list |
replace_roles | array | - | Custom sensitive word regex replacement |
replace_roles.regex | string | - | Rule regex (built-in GROK rule) |
replace_roles.type | [replace, hash] | - | Replacement type |
replace_roles.restore | bool | false | Whether to restore |
replace_roles.value | string | - | Replacement value (supports regex variables) |
Configuration Example
system_deny: truedeny_openai: truedeny_jsonpath: - "$.messages[*].content"deny_raw: truedeny_code: 200deny_message: "Sensitive words found in the question or answer have been blocked"deny_raw_message: "{\"errmsg\":\"Sensitive words found in the question or answer have been blocked\"}"deny_content_type: "application/json"deny_words: - "Custom sensitive word 1" - "Custom sensitive word 2"replace_roles: - regex: "%{MOBILE}" type: "replace" value: "****" # Mobile number 13800138000 -> **** - regex: "%{EMAILLOCALPART}@%{HOSTNAME:domain}" type: "replace" restore: true value: "****@$domain" # Email admin@gmail.com -> ****@gmail.com - regex: "%{IP}" type: "replace" restore: true value: "***.***.***.***" # IP 192.168.0.1 -> ***.***.***.*** - regex: "%{IDCARD}" type: "replace" value: "****" # ID card number 110000000000000000 -> **** - regex: "sk-[0-9a-zA-Z]*" restore: true type: "hash" # hash sk-12345 -> 9cb495455da32f41567dab1d07f1973d # The hashed value is provided to the large model, and the hash value will be restored to the original value from the data returned by the large model
Sensitive Word Replacement Example
User Request Content
Please change curl http://172.20.5.14/api/openai/v1/chat/completions -H "Authorization: sk-12345" -H "Auth: test@gmail.com"
to POST method
Processed Request Large Model Content
curl http://***.***.***.***/api/openai/v1/chat/completions -H "Authorization: 48a7e98a91d93896d8dac522c5853948" -H "Auth: ****@gmail.com"
change to POST method
Large Model Returned Content
You want to convert a curl
GET request to a POST request, and this request is sending data to a specific API. Below is the modified curl
command to send as POST:
curl -X POST \ -H "Authorization: 48a7e98a91d93896d8dac522c5853948" \ -H "Auth: ****@gmail.com" \ -H "Content-Type: application/json" \ -d '{"key":"value"}' \ http://***.***.***.***/api/openai/v1/chat/completions
Here are the following modifications made:
-X POST
sets the request method to POST.-H "Content-Type: application/json"
sets theContent-Type
in the request header toapplication/json
, which is typically used to inform the server that the data you are sending is in JSON format.-d '{"key":"value"}'
sets the data to be sent, where'{"key":"value"}'
is a simple example of a JSON object. You need to replace it with the actual data you want to send.
Please note that you need to replace "key":"value"
with the actual data content you want to send. If your API accepts a different data structure or requires specific fields, please adjust this part according to your actual situation.
Processed Return to User Content
You want to convert a curl
GET request to a POST request, and this request is sending data to a specific API. Below is the modified curl
command to send as POST:
curl -X POST \ -H "Authorization: sk-12345" \ -H "Auth: test@gmail.com" \ -H "Content-Type: application/json" \ -d '{"key":"value"}' \ http://172.20.5.14/api/openai/v1/chat/completions
Here are the following modifications made:
-X POST
sets the request method to POST.-H "Content-Type: application/json"
sets theContent-Type
in the request header toapplication/json
, which is typically used to inform the server that the data you are sending is in JSON format.-d '{"key":"value"}'
sets the data to be sent, where'{"key":"value"}'
is a simple example of a JSON object. You need to replace it with the actual data you want to send.
Please note that you need to replace "key":"value"
with the actual data content you want to send. If your API accepts a different data structure or requires specific fields, please adjust this part according to your actual situation.
Related Notes
- In streaming mode, if the masked words are split across multiple chunks, restoration may not be possible
- In streaming mode, if sensitive words are split across multiple chunks, there may be cases where part of the sensitive word is returned to the user
- Grok built-in rule list: https://help.aliyun.com/zh/sls/user-guide/grok-patterns
- Built-in sensitive word library data source: https://github.com/houbb/sensitive-word/tree/master/src/main/resources