Skip to content

fix(server): pin dp-manager REST clients to HTTP/1.1 (#535)#538

Closed
moonming wants to merge 1 commit into
mainfrom
fix/issue-535-dp-rest-http1-only
Closed

fix(server): pin dp-manager REST clients to HTTP/1.1 (#535)#538
moonming wants to merge 1 commit into
mainfrom
fix/issue-535-dp-rest-http1-only

Conversation

@moonming

@moonming moonming commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

Problem

After the object_store observability sink (#531) landed, every DP→dp-manager REST call — budget_check, heartbeat, telemetry — started failing at the transport layer (error sending request for url (https://dpm:7944/dp/...), latency_ms=1). budget_check fails closed, so chat completions 429. The kine/etcd config path keeps working, which is why models still resolve. Blocks AISIX-Cloud main regression (#535).

Root cause

object_store's aws/gcp/azure features pull reqwest with its http2 feature. Cargo feature unification is per-package across the workspace, so reqwest/http2 flips on for the single shared reqwest — including the mTLS client these REST calls use. Confirmed with the resolver (cargo tree -e features -i reqwest):

reqwest features REST wire
before #531 (881c2e2, green) rustls-tls only — no http2 HTTP/1.1
origin/main (#531+, red) http2 + h2 negotiates ALPN h2

dp-manager serves kine/etcd gRPC and the /dp/* REST API on one TLS port, fanned out by cmux: HTTP/2 streams with content-type: application/grpc → kine, everything else → an HTTP/1.1-only gin http.Server. With http2 on, the REST client negotiates ALPN h2; cmux routes the non-grpc h2 connection to the gin listener, which can't parse the h2 preface → connection dies → transport error. etcd/kine is unaffected because it speaks real gRPC.

This is a feature flip, not a version bump — invisible in Cargo.lock, which is why a bisect shows the TLS stack "unchanged".

Fix

Add .http1_only() to both CP-REST mTLS client builders (heartbeat::build_client, telemetry::build_client; BudgetClient reuses heartbeat's). They now advertise only http/1.1 in ALPN regardless of which reqwest features other deps unify in. The dp-manager REST surface is HTTP/1.1 by design; only kine/etcd is h2 and that goes through the separate etcd-client.

Tests

Follow-up (separate, not blocking)

Defensive hardening on the CP side — make dp-manager's cmux REST branch h2-capable so a future h2 REST client can't silently break this way again. Recommend a separate AISIX-Cloud issue.

Test plan

  • cargo test -p aisix-server (52 pass)
  • cargo clippy -p aisix-server --tests (clean)
  • regression test fails without fix, passes with it
  • cross-repo DP-mediated e2e green on the merged :dev image (AISIX-Cloud)

Fixes #535

Summary by CodeRabbit

  • Bug Fixes

    • Improved reliability of internal service-to-service communication by correcting protocol negotiation behavior to prevent routing issues and ensure consistent request handling.
  • Tests

    • Added regression tests to validate protocol negotiation works as expected and requests are properly routed through the system.

The DP→dp-manager REST clients (budget_check, heartbeat, telemetry) all
began failing at the transport layer ("error sending request") after the
object_store observability sink landed. object_store pulls reqwest with
its `http2` feature, and Cargo feature unification enables http2 on the
single shared workspace reqwest — including the mTLS client these REST
calls use.

dp-manager multiplexes kine/etcd gRPC and the /dp/* REST API on one TLS
port via cmux: HTTP/2 streams carrying `content-type: application/grpc`
go to kine, everything else to an HTTP/1.1-only gin server. With http2
enabled the REST client negotiates ALPN h2, cmux routes the non-grpc h2
connection to the gin listener, which can't parse the h2 preface, so
every budget_check / heartbeat / telemetry request dies before a
response. The etcd/kine path is unaffected because it speaks real gRPC.

Pin these clients to HTTP/1.1 with `.http1_only()` so they advertise only
http/1.1 in ALPN regardless of which reqwest features other deps unify
in. Add regression tests asserting the CP-REST ClientHello advertises
http/1.1 and not h2.
@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown

Review Change Stack

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Free

Run ID: 920249d3-c8c4-4394-8df7-9ac69631d3b5

📥 Commits

Reviewing files that changed from the base of the PR and between af43b3c and b68c584.

📒 Files selected for processing (2)
  • crates/aisix-server/src/heartbeat.rs
  • crates/aisix-server/src/telemetry.rs

📝 Walkthrough

Walkthrough

The heartbeat and telemetry mTLS clients used for DP → cp-api REST calls are updated to explicitly pin TLS ALPN to HTTP/1.1 by adding .http1_only() to the reqwest client builder. Regression tests verify the TLS ClientHello advertises only http/1.1 and not h2.

Changes

HTTP/1.1 ALPN Pinning and Verification

Layer / File(s) Summary
HTTP/1.1 ALPN pinning in mTLS clients
crates/aisix-server/src/heartbeat.rs, crates/aisix-server/src/telemetry.rs
Both heartbeat and telemetry mTLS client builders add .http1_only() to the reqwest TLS client configuration with explanatory comments, ensuring only HTTP/1.1 is advertised during TLS negotiation.
Regression tests for ALPN ClientHello verification
crates/aisix-server/src/heartbeat.rs, crates/aisix-server/src/telemetry.rs
New regression tests in both heartbeat and telemetry capture raw TLS ClientHello bytes from a local TCP listener, verify the ALPN list includes http/1.1, and assert that h2 is not present.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes


Note

🎁 Summarized by CodeRabbit Free

Your organization has reached its limit of developer seats under the Pro Plan. For new users, CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please add seats to your subscription by visiting https://app.coderabbit.ai/login.If you believe this is a mistake and have available seats, please assign one to the pull request author through the subscription management page using the link above.

Comment @coderabbitai help to get the list of available commands and usage tips.

@moonming

moonming commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

Closing as a duplicate: #536 (af43b3c) already fixed #535 on main with the identical approach — .http1_only() on the dp-manager REST mTLS clients in heartbeat::build_client and telemetry::build_client — and merged while this branch was in flight.

#536 shipped the fix without a test, and main currently has no guard for the ALPN contract. The regression tests from this PR are split into a focused test-only follow-up: #539.

@moonming moonming closed this Jun 8, 2026
@moonming moonming deleted the fix/issue-535-dp-rest-http1-only branch June 8, 2026 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DP can't reach dp-manager /dp/budget_check (transport error) → chats fail-close to 429; breaks all DP-mediated AISIX-Cloud e2e

1 participant