fix(server): pin dp-manager REST clients to HTTP/1.1 (#535)#538
fix(server): pin dp-manager REST clients to HTTP/1.1 (#535)#538moonming wants to merge 1 commit into
Conversation
The DP→dp-manager REST clients (budget_check, heartbeat, telemetry) all
began failing at the transport layer ("error sending request") after the
object_store observability sink landed. object_store pulls reqwest with
its `http2` feature, and Cargo feature unification enables http2 on the
single shared workspace reqwest — including the mTLS client these REST
calls use.
dp-manager multiplexes kine/etcd gRPC and the /dp/* REST API on one TLS
port via cmux: HTTP/2 streams carrying `content-type: application/grpc`
go to kine, everything else to an HTTP/1.1-only gin server. With http2
enabled the REST client negotiates ALPN h2, cmux routes the non-grpc h2
connection to the gin listener, which can't parse the h2 preface, so
every budget_check / heartbeat / telemetry request dies before a
response. The etcd/kine path is unaffected because it speaks real gRPC.
Pin these clients to HTTP/1.1 with `.http1_only()` so they advertise only
http/1.1 in ALPN regardless of which reqwest features other deps unify
in. Add regression tests asserting the CP-REST ClientHello advertises
http/1.1 and not h2.
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Free Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe heartbeat and telemetry mTLS clients used for DP → cp-api REST calls are updated to explicitly pin TLS ALPN to HTTP/1.1 by adding ChangesHTTP/1.1 ALPN Pinning and Verification
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Note 🎁 Summarized by CodeRabbit FreeYour organization has reached its limit of developer seats under the Pro Plan. For new users, CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please add seats to your subscription by visiting https://app.coderabbit.ai/login.If you believe this is a mistake and have available seats, please assign one to the pull request author through the subscription management page using the link above. Comment |
|
Closing as a duplicate: #536 (af43b3c) already fixed #535 on #536 shipped the fix without a test, and |
Problem
After the
object_storeobservability sink (#531) landed, every DP→dp-manager REST call —budget_check,heartbeat,telemetry— started failing at the transport layer (error sending request for url (https://dpm:7944/dp/...),latency_ms=1).budget_checkfails closed, so chat completions 429. The kine/etcd config path keeps working, which is why models still resolve. Blocks AISIX-Cloud main regression (#535).Root cause
object_store'saws/gcp/azurefeatures pullreqwestwith itshttp2feature. Cargo feature unification is per-package across the workspace, soreqwest/http2flips on for the single shared reqwest — including the mTLS client these REST calls use. Confirmed with the resolver (cargo tree -e features -i reqwest):881c2e2, green)rustls-tlsonly — no http2http2+h2dp-manager serves kine/etcd gRPC and the
/dp/*REST API on one TLS port, fanned out by cmux: HTTP/2 streams withcontent-type: application/grpc→ kine, everything else → an HTTP/1.1-only ginhttp.Server. With http2 on, the REST client negotiates ALPN h2; cmux routes the non-grpc h2 connection to the gin listener, which can't parse the h2 preface → connection dies → transport error. etcd/kine is unaffected because it speaks real gRPC.This is a feature flip, not a version bump — invisible in
Cargo.lock, which is why a bisect shows the TLS stack "unchanged".Fix
Add
.http1_only()to both CP-REST mTLS client builders (heartbeat::build_client,telemetry::build_client;BudgetClientreuses heartbeat's). They now advertise onlyhttp/1.1in ALPN regardless of which reqwest features other deps unify in. The dp-manager REST surface is HTTP/1.1 by design; only kine/etcd is h2 and that goes through the separateetcd-client.Tests
cp_rest_client_advertises_only_http1_alpn(heartbeat + telemetry): captures the raw TLS ClientHello and asserts ALPN advertiseshttp/1.1and noth2. Verified RED without the fix (reproduces DP can't reach dp-manager /dp/budget_check (transport error) → chats fail-close to 429; breaks all DP-mediated AISIX-Cloud e2e #535), GREEN with it.build_clienttests + fullaisix-serversuite (52 tests) pass;clippy -D warningsclean.Follow-up (separate, not blocking)
Defensive hardening on the CP side — make dp-manager's cmux REST branch h2-capable so a future h2 REST client can't silently break this way again. Recommend a separate AISIX-Cloud issue.
Test plan
cargo test -p aisix-server(52 pass)cargo clippy -p aisix-server --tests(clean):devimage (AISIX-Cloud)Fixes #535
Summary by CodeRabbit
Bug Fixes
Tests