Skip to content

feat: add ai-lakera-guard plugin#13570

Open
janiussyafiq wants to merge 7 commits into
apache:masterfrom
janiussyafiq:feat/ai-lakera-guard-pr1
Open

feat: add ai-lakera-guard plugin#13570
janiussyafiq wants to merge 7 commits into
apache:masterfrom
janiussyafiq:feat/ai-lakera-guard-pr1

Conversation

@janiussyafiq

@janiussyafiq janiussyafiq commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Description

This PR adds a new plugin, ai-lakera-guard, that integrates APISIX with the Lakera Guard v2 /guard API to perform ML-based security scanning of LLM requests at the gateway — prompt injection / jailbreak, PII leakage, content-policy violations, and malicious / unknown links — so each backend LLM service no longer has to implement its own guardrails.

This is PR-1 (input guard MVP) of a planned, independently shippable series (input → output → streaming → observability), modeled closely on ai-aliyun-content-moderation.

How it works

  • Runs in the access phase at priority 1028, just below ai-proxy (1040) and ai-proxy-multi (1041), so the AI context is already populated. The plugin is meant to run behind one of those proxies; requests that did not pass through ai-proxy/ai-proxy-multi are handled per the configurable fail_mode (default skip — passed through unchecked; set fail_mode: error to reject them with 500).
  • Extracts the whole request conversation via apisix.plugins.ai-protocols (no role distinction) and sends it to Lakera POST /v2/guard.
  • On a flagged verdict it applies the configured action:
    • block (default) — returns a provider-compatible deny response (a valid chat-completion, or SSE for streaming requests) carrying request_failure_message, built via proto.build_deny_response, so client SDKs render the refusal as a normal completion. The status is deny_code (default 200; set a 4xx to surface blocks as HTTP errors).
    • alert — log-only shadow mode; traffic passes through.
  • Lakera errors / timeouts are governed by fail_open (fail-closed by default).
  • api_key is secret-managed via encrypt_fields + native $secret:// / $env:// resolution.
  • reveal_failure_categories optionally appends the matched detectors to the deny message; every flagged verdict logs Lakera's full per-detector breakdown and request_uuid.

Configuration

api_key is the only required field. Others: lakera_endpoint, project_id, direction (input only in this PR), action, fail_open, timeout, ssl_verify, reveal_failure_categories, deny_code, request_failure_message.

Files

  • Plugin: apisix/plugins/ai-lakera-guard.lua, apisix/plugins/ai-lakera-guard/schema.lua, apisix/plugins/ai-lakera-guard/client.lua
  • Registration: apisix/cli/config.lua, conf/config.yaml.example
  • Docs: docs/en/latest/plugins/ai-lakera-guard.md, docs/en/latest/config.json
  • Tests: t/plugin/ai-lakera-guard.t, t/plugin/ai-lakera-guard-secrets.t, fixtures under t/fixtures/lakera/

Which issue(s) this PR fixes:

Part of #13291

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (new, opt-in plugin disabled by default; additive registration only)

Add the ai-lakera-guard plugin (PR-1, input guard MVP) integrating APISIX
with the Lakera Guard v2 /guard API to scan LLM request prompts for prompt
injection, PII, content-policy violations, and malicious/unknown links at
the gateway.

The plugin runs in the access phase at priority 1028, below ai-proxy /
ai-proxy-multi, which it requires. It extracts the whole request
conversation via apisix.plugins.ai-protocols and calls Lakera POST
/v2/guard. On a flagged verdict it either blocks with a provider-compatible
deny response (a valid chat-completion or SSE carrying request_failure_message,
returned with deny_code, default 200) or alerts (log-only shadow mode).
Lakera errors and timeouts are governed by fail_open (fail-closed by
default). The api_key is secret-managed via encrypt_fields and the native
$secret:// / $env:// resolution.

Signed-off-by: janiussyafiq <izzraff.js@gmail.com>
@dosubot dosubot Bot added enhancement New feature or request plugin size:XXL This PR changes 1000+ lines, ignoring generated files. labels Jun 18, 2026
Comment thread apisix/plugins/ai-lakera-guard.lua
- Makefile: install apisix/plugins/ai-lakera-guard/*.lua so the
  luarocks 'diff -rq' check no longer reports the dir as uninstalled
- t/admin/plugins.t: add ai-lakera-guard to the priority-ordered
  expected plugin list (priority 1028, between ai-aliyun-content-
  moderation 1029 and proxy-mirror 1010)
Handle requests this plugin cannot inspect (no picked ai instance, or an
unsupported protocol) via the shared ai-protocols.binding helper and a
configurable fail_mode (skip/warn/error, default skip) instead of a hard
500, matching ai-aliyun-content-moderation. This lets non-AI traffic pass
through unchecked when the plugin is bound at the Consumer/Service level.

fail_mode is distinct from fail_open, which governs Lakera API failures.

Also collapse the test routes onto a single route id (overwrite-in-place,
grouping default-config tests first) to match the convention used by the
sibling AI plugins.

- schema: add fail_mode = binding.schema_property("skip")
- access: route no-instance / unsupported-protocol through on_unsupported
- docs: document fail_mode; clarify non-ai-proxy traffic behavior
- t: fail_mode=error (500) and default skip (pass-through) coverage
nic-6443
nic-6443 previously approved these changes Jun 20, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new APISIX AI security plugin, ai-lakera-guard, which calls Lakera Guard v2 (/v2/guard) during the access phase to scan LLM request content for unsafe/promotional injection/PII/content-policy issues and either block (default) or alert (shadow mode) based on the verdict.

Changes:

  • Added the ai-lakera-guard plugin implementation (schema + HTTP client + access-phase enforcement and provider-compatible deny responses).
  • Registered the plugin in default configs/build install rules and documentation navigation.
  • Added end-to-end tests (including $secret:// and $env:// api_key resolution) and Lakera response fixtures.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
apisix/plugins/ai-lakera-guard.lua Main plugin logic: extract request content via ai-protocols, call Lakera, block/alert, build provider-compatible deny responses.
apisix/plugins/ai-lakera-guard/client.lua Lakera /v2/guard HTTP client (request building, timeout/ssl_verify handling, response decoding).
apisix/plugins/ai-lakera-guard/schema.lua Plugin schema, defaults, and secret encryption (encrypt_fields).
apisix/cli/config.lua Adds ai-lakera-guard to the default CLI plugin list.
conf/config.yaml.example Documents plugin ordering/priority in the example configuration.
Makefile Installs the new plugin directory and Lua files during make install.
docs/en/latest/plugins/ai-lakera-guard.md New English plugin documentation page (usage, attributes, examples).
docs/en/latest/config.json Adds ai-lakera-guard to the English docs sidebar under AI plugins.
t/admin/plugins.t Adds plugin name to the admin plugin list test coverage.
t/plugin/ai-lakera-guard.t Core behavioral tests: clean/flagged, fail-open/closed, reveal categories, fail_mode behavior, etc.
t/plugin/ai-lakera-guard-secrets.t Tests secret reference and env var resolution for api_key.
t/fixtures/lakera/scan-clean.json Fixture for non-flagged Lakera response.
t/fixtures/lakera/scan-flagged.json Fixture for flagged Lakera response with per-detector breakdown.
Comments suppressed due to low confidence (1)

docs/en/latest/config.json:85

  • This English sidebar adds plugins/ai-lakera-guard, but the Chinese sidebar (docs/zh/latest/config.json) still lists the AI plugins and currently does not include this new entry. If the plugin docs are intended to be discoverable in the zh docs as well, add the corresponding plugins/ai-lakera-guard entry there (and ideally a zh doc page).
            "plugins/ai-proxy",
            "plugins/ai-proxy-multi",
            "plugins/ai-rate-limiting",
            "plugins/ai-prompt-guard",
            "plugins/ai-aws-content-moderation",
            "plugins/ai-aliyun-content-moderation",
            "plugins/ai-lakera-guard",
            "plugins/ai-prompt-decorator",
            "plugins/ai-prompt-template",
            "plugins/ai-rag",
            "plugins/ai-request-rewrite"
          ]

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread apisix/plugins/ai-lakera-guard/schema.lua
Comment thread apisix/plugins/ai-lakera-guard/schema.lua
api_key is required but the string had no length constraint, so an empty
value passed validation and would have sent an empty Authorization header.
Add minLength = 1, matching the credential fields in ai-aliyun-content-moderation
and ai-proxy.
Translate the ai-lakera-guard plugin page into Chinese and add it to the
zh sidebar, mirroring the English version. Code samples are kept identical.
nic-6443
nic-6443 previously approved these changes Jun 22, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Comment thread docs/en/latest/plugins/ai-lakera-guard.md
Comment thread docs/zh/latest/plugins/ai-lakera-guard.md
Comment thread apisix/plugins/ai-lakera-guard/schema.lua
Comment thread apisix/plugins/ai-lakera-guard.lua Outdated
@membphis

Copy link
Copy Markdown
Member

P1: Preserve Lakera message roles instead of flattening the conversation into one user message

The plugin currently calls proto.extract_request_content(request_tab), concatenates all extracted text, and client.scan sends the result as:

messages = { { role = "user", content = content } }

This loses the original role and turn boundaries. For OpenAI Chat, this can turn system, assistant, historical user, and current user content into one current user message. For Anthropic and Responses requests, the protocol adapters already have role-preserving canonical message helpers, so flattening here bypasses information the codebase can keep.

Why this blocks merge: Lakera Guard's /v2/guard API is message-based, and role/context semantics matter for policy behavior. Sending the system prompt, assistant output, or older history as a new user message can block valid follow-up requests because old or non-user content is rescanned as the current user input. It can also make the gateway's enforcement differ from the API contract this plugin is integrating with.

Suggested fix:

  • Pass a messages array to client.scan, not a flattened string.
  • Build it from the protocol-normalized message helper, preserving system, user, and assistant roles where available.
  • Only fall back to one user message when the protocol has no role-preserving representation.
  • Update the "whole conversation is scanned" test to verify the full message array is sent without converting history/system/assistant messages into the latest user input.

Address review feedback on the input-guard MVP:

- Forward the role-tagged conversation to Lakera via proto.get_messages
  instead of flattening it into one user message. Normalize each message's
  content to text and drop non-text parts so multimodal requests stay
  within Lakera /v2/guard's text-only contract; fall back to a single user
  message only when a protocol has no role-preserving representation.
- Guard the nil return from get_json_request_body_table() and route it
  through binding.on_unsupported so fail_mode is honored.
- Clarify in the schema and the en/zh docs that action=alert governs
  flagged verdicts only; Lakera API errors stay controlled by fail_open.
- Update the conversation test to assert roles reach Lakera unflattened.
- Decode the Lakera response with null_as_nil and guard the result by
  type, so a JSON null (e.g. "metadata": null) cannot surface the truthy
  cjson.null sentinel and error when indexed.
- Stop logging the Authorization header in the test mocks so the api key
  / resolved secret is never written to CI logs.
- Strengthen the role-preservation test to assert each role is paired
  with its own content, not just that the role labels are present.
@janiussyafiq

Copy link
Copy Markdown
Contributor Author

Reply to @membphis :

  • client.scan now receives and forwards a role-tagged messages array instead of a concatenated string.
  • The array is built from proto.get_messages() — the protocol's canonical {role, content} helper — so system/user/assistant turns are preserved for openai-chat, Anthropic and Responses.
  • Each message's content is coerced to text and non-text parts (e.g. multimodal image_url) are dropped. Lakera /v2/guard rejects non-text content with HTTP 400, which under the default fail-closed mode would otherwise block legitimate multimodal requests.
  • Falls back to a single user message only when a protocol exposes no role-preserving representation.
  • The "whole conversation" test now asserts the full role-tagged array reaches Lakera, with each role paired to its own content.

This also matches how other gateways integrate Lakera.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request plugin size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants