Skip to content

[feature] Native EXPath HTTP Client (java.net.http + Methanol); remove Apache HttpClient#6482

Open
joewiz wants to merge 8 commits into
eXist-db:developfrom
joewiz:feature/native-expath-http-client
Open

[feature] Native EXPath HTTP Client (java.net.http + Methanol); remove Apache HttpClient#6482
joewiz wants to merge 8 commits into
eXist-db:developfrom
joewiz:feature/native-expath-http-client

Conversation

@joewiz

@joewiz joewiz commented Jun 15, 2026

Copy link
Copy Markdown
Member

[This PR was co-authored with Claude Code. -Joe]

Summary

Replaces eXist's EXPath HTTP Client (http:send-request) with a native implementation built on the JDK java.net.http.HttpClient, augmented by Methanol (thanks to @dizzzz for the suggestion to investigate Methanol), and removes Apache HttpClient from the code base entirely — production and tests. Along the way it brings the client closer to the EXPath spec: full authentication (preemptive and challenge-response, Basic and Digest) and spec-compliant error codes.

After this PR, git grep org.apache.httpcomponents over the poms and import org.apache.http / import org.apache.hc over the sources both return nothing.

This is the rewrite requested in #5771, and it also fixes #4256 (non-spec-compliant errors). It supersedes #6473 (which migrated to Apache HttpClient 5 as an intermediate step) and #6474 (the draft HC5 EXPath prerelease) — both can be closed once this lands. With the native module now living in core, the eXist-db/exist-http-client prototype repo can also be archived after merge.

Why

eXist's EXPath HTTP client depended on Apache HttpClient (HC4) and the third-party http-client-java EXPath library, and the integration test suite drove the server through HC4's fluent client (plus, for WebDAV, the milton-client library, which itself pulls in Apache HttpClient). Methanol is a thin, Apache-2.0 augmentation of the JDK client: it adds transparent gzip/deflate decoding, a read (inactivity) timeout, MediaType, and MultipartBodyPublisher — the pieces the bare JDK client lacks — while pulling in no Apache HttpClient.

What changed

Native EXPath HTTP Client module (extensions/modules/http-client)

  • New org.exist.xquery.modules.httpclient module implementing http:send-request (namespace http://expath.org/ns/http-client) on java.net.http.HttpClient. The http:request element, namespace, and function signatures are unchanged, so existing queries are unaffected.
  • The client is a Methanol client with autoAcceptEncoding (advertises Accept-Encoding and transparently decodes gzip/deflate) and a read timeout.
  • Multipart request bodies are built with Methanol's MultipartBodyPublisher (the JDK client has no multipart support); each http:body part keeps its own Content-Type.

Authentication (per EXPath spec §3.3)

  • @send-authorization='true' sends Basic credentials preemptively.
  • Otherwise (the spec default), credentials are withheld until the server returns a 401 challenge, and the request is re-sent once with the computed Authorization header — Basic or RFC 2617 Digest (MD5, qop=auth). This challenge-response mode was missing from the initial port; it matches the previous eXist client and the BaseX reference implementation.

Error handling (fixes #4256)

  • Connection / I/O failures surface as expath-err:HC001, timeouts as expath-err:HC006, in the EXPath error namespace (http://expath.org/ns/error) — catchable from XQuery, rather than the raw org.expath.httpclient.HttpClientException the old client raised.

Old EXPath client removed

  • Deletes the old org.expath.exist HTTP client (SendRequestFunction, HttpClientModule, the EXistResult/EXistTreeBuilder adapters, and the orphaned org.expath.tools model classes). The exist-expath module keeps only its Zip functions. Drops http-client-java, tools-java, apache-mime4j-core and the HC4 httpcore dependency.
  • debuggee's HttpSession (the XDEBUG_SESSION form POST) is migrated to the JDK client.
  • Every conf.xml re-points http://expath.org/ns/http-client to the native module.

Tests migrated off Apache HttpClient

  • AbstractHttpTest now builds on java.net.http.HttpClient; the integration tests across exist-core, restxq, file and persistentlogin use it. GetParameterTest (multipart) uses Methanol's MultipartBodyPublisher.
  • The WebDAV tests were the last test code pulling in Apache HttpClient — they drove the server through milton-client. They are replaced with JDK java.net.http round-trip tests (WebDavRoundTripTest), and the now-unused Milton stack is removed (the WebDAV server already runs on Apache Jackrabbit). The standalone webapp web.xml is corrected to map the WebDAV path to the Jackrabbit ExistWebdavServlet (develop still referenced the long-deleted MiltonWebDAVServlet).

Build

  • exist-distribution bundles the new exist-http-client module (runtime dependency) so the repointed conf.xml resolves the module class at runtime.
  • exist-parent BOM: drop the Apache HttpClient (HC4) and Milton managed dependencies and their version properties; add Methanol.

Spec references

  • EXPath HTTP Client Module 1.0http:send-request, the http:request element, the response format, authentication (§3.3), and error codes (HC001/HC006).

Test plan

  • New module: 72 http:send-request integration tests (in-process com.sun.net.httpserver target + embedded eXist) and 43 ContentTypeHelper unit tests — all green. Coverage includes HTTP methods, request/response bodies, multipart, gzip/deflate decoding, Content-Type classification (XML/HTML/text/JSON/binary), redirects, timeouts (HC006), status codes, multiple response headers, and authentication: Basic preemptive, Basic and Digest challenge-response (the test server validates the Digest response hash end-to-end), and connection errors catchable as expath-err:HC001 (EXPath HttpClient errors are not spec compliant #4256).
  • Migrated integration tests run green across exist-core, restxq, file and persistentlogin.
  • WebDAV: WebDavRoundTripTest green — exact round-trip of XML DOCTYPE, XML declaration, CDATA, namespaces, non-ASCII content, and a binary document, over the JDK client against the Jackrabbit servlet.
  • Full reactor build (clean install) green; full unit suite green; container CI (incl. litmus WebDAV compliance) exercised.

Notes for reviewers

  • The only remaining Codacy/PMD warning is NPathComplexity on the test class SendRequestFunctionTest.startHttpServer() (the in-process test server's endpoint registration). It is below where it started before this PR's auth additions; per review it can be ignored for now.
  • The Milton/WebDAV removal is a consequence of removing all Apache HttpClient: milton-client is an Apache HttpClient consumer, so retiring those tests is required. Their WebDAV protocol coverage is provided by the litmus compliance suite in container CI (98/98); WebDavRoundTripTest covers the complementary eXist content-fidelity angle.
  • Dropping http-client-java removes eXist's dependency on the upstream EXPath HTTP client, so merging this removes the eXist-related motivation from Cut a release from main (Apache HttpComponents 5) expath/expath-http-client-java#59 (that issue can remain open on its own merits).

Closes #5771
Closes #4256

Rewrites the HTTP-client side of the integration test suite onto the JDK
java.net.http.HttpClient, removing Apache HttpClient from the test scope (the
WebDAV tests are migrated separately in the following commit).

- AbstractHttpTest: the shared test HTTP infrastructure now builds on
  java.net.http.HttpClient, exposing newHttpClient(), authenticatedRequest(),
  basicAuthorizationHeader(), executeForStatus(), executeForStatusAndBody(),
  assertRequestResponse() and the HttpResponseResult record (no Apache HC).
- Integration tests across exist-core, restxq, file and persistentlogin migrated
  to those helpers; behavior and assertions unchanged. Requests that need
  preemptive HTTP Basic auth set the Authorization header explicitly.
- GetParameterTest sends a multipart/form-data body, so it uses Methanol's
  MultipartBodyPublisher (Methanol added to the BOM and as an exist-core test
  dependency; exist-core also now publishes its test-jar for the new module).
- Dropped the now-unused Apache HttpClient test dependencies from exist-core,
  restxq, file and persistentlogin.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
joewiz and others added 2 commits June 15, 2026 04:28
…ip tests

The WebDAV integration tests were the last test code pulling in Apache HttpClient:
they drove the server through the milton-client WebDAV library, which depends on
Apache HttpClient. Replace them with JDK java.net.http round-trip tests and finish
removing the now-unused Milton stack (the WebDAV server itself already runs on
Apache Jackrabbit since the Jackrabbit migration).

Why delete rather than port the old tests: their WebDAV *protocol* coverage (COPY,
MOVE, LOCK/UNLOCK, PROPFIND, DELETE) is already provided, far more thoroughly, by the
litmus compliance suite that runs in container CI (exist-docker/src/test/bats/
04-webdav-litmus.bats -- 98/98: basic 16, copymove 13, props 33, locks 36). The old
milton-client JUnit tests were both redundant with litmus and coupled to the milton
client library this PR removes. WebDavRoundTripTest is intentionally narrower and
complementary: it covers the eXist-specific concern litmus does not -- that content
stored over WebDAV round-trips faithfully through eXist's storage and serialization.

- Delete the milton-client-based tests (Copy/Delete/Lock/Rename/Replace/Serialization/
  StoreAndRetrieve/CData) and the com.ettrema cache + AlwaysBasicPreAuth test helpers.
- Add WebDavRoundTripTest with a small WebDavHttpClient helper (JDK java.net.http,
  PUT/GET/DELETE against /webdav/db). It asserts exact round-trip of an XML DOCTYPE,
  an XML declaration, a CDATA section, namespace declarations, non-ASCII content, and
  a binary document.
- standalone webapp web.xml: map the WebDAV path to the Jackrabbit ExistWebdavServlet
  instead of the long-deleted MiltonWebDAVServlet (a dangling reference on develop that
  the round-trip test now depends on being correct).
- Drop the milton-client (and its Apache httpclient/httpcore) dependencies from the
  webdav pom, the com.bradmcevoy log4j2 logger, and the ettrema dependabot group.
- BUILD.md: document running the WebDAV round-trip tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds extensions/modules/http-client: a native implementation of the EXPath HTTP
Client (http:send-request, namespace http://expath.org/ns/http-client) built on the
JDK java.net.http.HttpClient, augmented by Methanol. No Apache HttpClient and no
third-party EXPath HTTP library.

- The client is a Methanol client with autoAcceptEncoding (advertises Accept-Encoding
  and transparently decodes gzip/deflate) and a read (inactivity) timeout.
- Multipart request bodies are built with Methanol's MultipartBodyPublisher (the JDK
  client has no multipart support); each http:body part keeps its own Content-Type.
- ResponseHandler relies on the client for transfer decoding rather than hand-rolling
  gzip handling.

Includes a self-contained integration test (in-process com.sun.net.httpserver target
plus embedded eXist): 70 send-request tests and 43 content-type unit tests, all green.
This is the production home of the work prototyped at eXist-db/exist-http-client.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@joewiz joewiz force-pushed the feature/native-expath-http-client branch from d6b5b3f to 0496911 Compare June 15, 2026 08:29
@joewiz

joewiz commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

[This response was co-authored with Claude Code. -Joe]

A note on the WebDAV test change in this PR, since it may raise eyebrows in review:

The deleted CopyTest / DeleteTest / LockTest / RenameTest / ReplaceTest / SerializationTest / StoreAndRetrieveTest were built on the milton-client library, which pulls in Apache HttpClient — the very dependency this PR removes. So they could not simply be "ported to the new client": there is no drop-in JDK equivalent for milton-client's object model.

Their WebDAV protocol coverage (COPY, MOVE, LOCK/UNLOCK, PROPFIND, DELETE) is, however, already provided — and far more thoroughly — by the litmus compliance suite that runs in container CI (exist-docker/src/test/bats/04-webdav-litmus.bats), currently at 98/98: basic 16, copymove 13, props 33, locks 36. The milton JUnit tests were redundant with it.

WebDavRoundTripTest is intentionally narrower and complementary: it covers the eXist-specific concern litmus does not — that content stored over WebDAV round-trips faithfully through eXist's own storage and serialization. It asserts exact round-trip of an XML DOCTYPE, an XML declaration, a CDATA section, namespace declarations, non-ASCII content, and a binary document, via a small JDK java.net.http helper (WebDavHttpClient).

@duncdrum

Copy link
Copy Markdown
Contributor

@joewiz the repeated failures in the docker job are a concern. Exist has errors in the logs upon a clean boot.

not ok 5 logs are error free
# (in test file exist-docker/src/test/bats/01-connect-spec.bats, line 26)
#   `[ "$result" -eq 0 ]' failed

@line-o line-o left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet we have these constants in some package we already depend on.

Comment thread exist-core/src/test/java/org/exist/http/urlrewrite/ControllerTest.java Outdated
@dizzzz dizzzz requested review from a team, dizzzz and duncdrum June 15, 2026 11:25
joewiz and others added 2 commits June 15, 2026 08:20
…p http-client-java

Switches the http://expath.org/ns/http-client registration in every conf.xml to the
new native module (org.exist.xquery.modules.httpclient.HttpClientModule) and removes
the old implementation, which depended on Apache HttpClient and the third-party EXPath
http-client-java library.

- extensions/expath: delete the old SendRequestFunction, HttpClientModule, EXistResult,
  EXistTreeBuilder and the now-orphaned org.expath.tools adapters; the module keeps only
  its Zip functions. Drop the http-client-java, tools-java, apache-mime4j-core and HC4
  httpcore dependencies (and the now-unused junit).
- exist-distribution: add the new exist-http-client module as a runtime dependency so it
  ships in the assembled distribution (and container). The old HttpClientModule shipped
  inside exist-expath; without this, the repointed conf.xml registration fails to load
  the class on a clean boot (ClassNotFoundException).
- restxq: its XQSuite tests resolve http:send-request from the EXPath HTTP client, so
  swap the exist-expath test dependency for the new exist-http-client module (the client
  no longer lives in exist-expath).
- debuggee: migrate HttpSession's XDEBUG_SESSION form POST from Apache HttpClient to the
  JDK java.net.http.HttpClient; drop the Apache HttpClient dependency from its pom.

No production code uses Apache HttpClient after this change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e build

With the EXPath HTTP client now native (java.net.http + Methanol), the integration
tests on the JDK client, debuggee migrated, and the milton-client WebDAV tests
replaced, nothing in the code base uses Apache HttpClient. Remove it from the build:

- exist-parent BOM: drop the httpcore/httpclient/httpmime/fluent-hc (Apache HC4)
  managed dependencies and the apache.httpcomponents.* version properties; drop the
  now-unused milton-api/milton-client/milton-servlet managed dependencies and their
  version properties (Methanol stays).
- exist-installer: drop the removed version properties from the IzPack includeProperties.

`git grep org.apache.httpcomponents` over the poms and `import org.apache.hc` /
`org.apache.http` over the sources now return nothing. Full reactor build is green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@joewiz joewiz force-pushed the feature/native-expath-http-client branch from 0496911 to 8217ed6 Compare June 15, 2026 12:21
Addresses review feedback: replace the remaining magic HTTP status integers in the
migrated integration tests with java.net.HttpURLConnection.HTTP_* constants, matching
the rest of the migrated suite (most of which already uses them; removing Apache
HttpClient's HttpStatus.SC_* had left a few call sites on raw ints).

- ControllerTest: 200 -> HTTP_OK, 201 -> HTTP_CREATED
- JmxRemoteTest: 200 -> HTTP_OK
- WebDavRoundTripTest: 201 -> HTTP_CREATED, 200 -> HTTP_OK

Scoped to the tests this PR already touches; a wider sweep of status-code literals
across the rest of the codebase is left as a separate change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@joewiz

joewiz commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

[This response was co-authored with Claude Code. -Joe]

Thanks for catching it — root-caused and fixed. The clean-boot error was:

Configuration problem: class not found for module 'http://expath.org/ns/http-client'
(ClassNotFoundException); class:'org.exist.xquery.modules.httpclient.HttpClientModule'

This was a real packaging bug, not a test-only issue: the old HttpClientModule shipped inside exist-expath; this PR moves the EXPath HTTP client into a new exist-http-client module and repoints conf.xml to it, but the distribution wasn't yet bundling that module — so the class wasn't on the runtime classpath and boot logged the error (in a real install, http:send-request would have failed to load too).

Fixed by adding exist-http-client as a runtime dependency of exist-distribution (00b194c), mirroring how exist-file/exist-expath are bundled; confirmed with dependency:tree that it now resolves as a runtime artifact. The container job should be back to error-free on this run.

…hentication

The EXPath HTTP Client spec (3.3) defines @send-authorization=false (the default) as:
send the request without credentials, and only if the server answers with a 401
challenge, resend with the credentials. The native module previously implemented
preemptive auth only — it required @send-authorization='true' and otherwise ignored
the credentials — so the spec's default auth mode did not work. That is a regression
versus both eXist's previous EXPath client and the BaseX reference implementation,
which both perform challenge-response.

- RequestBuilder: withhold credentials unless @send-authorization is true; add
  challengeResponse() to answer a 401 WWW-Authenticate challenge, computing a Basic or
  RFC 2617 Digest (MD5, qop=auth) Authorization header from the request credentials.
  build() now accepts an explicit Authorization header for the re-send. The body and
  auth assembly is split into helpers to keep NPath in check.
- SendRequestFunction: on a 401 with credentials present and not yet sent, re-send once
  with the computed Authorization header.
- Tests: add basicChallengeResponse and digestChallengeResponse — the test server issues
  the challenge and validates the Digest response hash end-to-end (72 module tests pass).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@line-o line-o self-requested a review June 15, 2026 13:30

@line-o line-o left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
This will make our lives much easier going forward.
It will be interesting to learn if we get failures in apps making heavy use of http-client.
The remaining codacy warning is in a test class which I would vote to ignore for the moment.

@line-o

line-o commented Jun 15, 2026

Copy link
Copy Markdown
Member

Does this PR fix #4256

eXist-db#4256)

Adds connectionErrorIsCatchableAsExpathHC001, mirroring the reproducer in eXist-db#4256: a
send-request to an unreachable host must surface the EXPath error expath-err:HC001
(namespace http://expath.org/ns/error), catchable from XQuery, rather than the raw
org.expath.httpclient.HttpClientException the old Apache-based client raised. The native
module maps connection/IO failures to HC001, so the issue is fixed; this test pins it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@joewiz

joewiz commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

[This response was co-authored with Claude Code. -Joe]

Yes — it fixes #4256. The old client raised a raw org.expath.httpclient.HttpClientException; the native module maps connection/I/O failures to expath-err:HC001 (and timeouts to expath-err:HC006) in the EXPath error namespace. I added a regression test mirroring the issue's reproducer — a send-request to an unreachable host caught with catch expath-err:HC001 — and added Closes #4256 to the description.

On apps that make heavy use of http-client: the surface is deliberately unchanged — same http:request element, same http://expath.org/ns/http-client namespace, same function signatures — so existing queries don't need edits. The 72 module integration tests exercise the behaviors such apps lean on: methods, request/response bodies, multipart, gzip/deflate decoding, Content-Type classification, redirects, timeouts, status codes, multiple response headers, and authentication (Basic preemptive, plus Basic and Digest challenge-response, validated end-to-end). Where behavior changed, it changed toward the spec: challenge-response auth and the HC001/HC006 error codes both restore conformance the old client lacked. The honest caveat is that the JDK/Methanol client can differ from Apache in edge cases (specific header quirks, proxy/TLS behavior); if any such reports surface we can address them, but the EXPath contract itself is preserved.

Agreed on the remaining Codacy warning — it's the in-process test server's setup method; happy to leave it for now (it's actually lower than before this PR's additions).

Thanks for the review!

@dizzzz dizzzz requested review from a team and reinhapa June 15, 2026 20:08
Comment thread extensions/expath/pom.xml
<artifactId>jsr305</artifactId>
</dependency>

<dependency>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did not expect unit to disappear here. No tests remaining?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[This response was co-authored with Claude Code. -Joe]

Right — HttpClientTest was the only test in exist-expath, and it exercised the old Apache-HttpClient-based EXPath client that this PR removes. With it gone, there are no test classes or test runner left in the module (only the log4j2.xml/conf.xml test resources), so junit is genuinely unused here. The HTTP client's tests now live in the new exist-http-client module.

Those leftover log4j2.xml/conf.xml test resources are now orphaned too — happy to strip them for a fully clean src/test, or leave them in case the module grows tests again. Whichever you'd prefer.

*/
package org.exist.xquery.modules.httpclient;

import com.sun.net.httpserver.HttpServer;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd stay out of the com.sun packages. These are not guaranteed to be present.

a better route and sustainable route is to use wire mock.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[This response was co-authored with Claude Code. -Joe]

Fair flag — sharing what we found so you and the devs can steer it. com.sun.net.httpserver here is the exported public API of the JDK jdk.httpserver module (documented since JDK 9), not a sun.* internal, so it is present in every standard JDK; it let the tests stand up a tiny in-process target with no extra dependency. There's no existing precedent in eXist's own tests either way — neither com.sun.net.httpserver nor WireMock is used today. WireMock would be the more conventional choice but adds a test dependency and a rewrite of the in-process server. No strong preference from our side; happy to go with whatever you and the team settle on as the house convention for this kind of HTTP test target.

@dizzzz

dizzzz commented Jun 15, 2026

Copy link
Copy Markdown
Member

I see that on some locations "preemptive authorization" is applied.

For running the tests this makes sense as this will potentially save a handshake, But I am not sure if this is switched on for every outbound HTTPcall, as this might leak sensitive data.

@dizzzz

dizzzz commented Jun 15, 2026

Copy link
Copy Markdown
Member

About performance/heavy load as asked by @line-o ; I need to deep dive here. If we are able to use the connection pool and have good (default) Executor management, I expect it will work out well. But the proof is in the pudding.

@dizzzz dizzzz left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Will spin up IntelliJ for a more in-depth review.

@joewiz

joewiz commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

[This response was co-authored with Claude Code. -Joe]

On preemptive authorization: it's opt-in per request, not applied to every outbound call. Credentials are sent preemptively only when the request sets send-authorization="true" (the EXPath default is false). Without it, no Authorization header goes out until the server replies with a 401 challenge, and then it is sent only to that challenging origin. The places you saw it are the tests deliberately setting send-authorization="true". This follows the EXPath spec's auth model, so there's no blanket credential leak.

On performance / connection pooling and Executor management: good idea, and it looks readily achievable — the per-request client construction is the single seam where a shared, pooled client (with sensible Executor defaults, proxy, etc.) would slot in. Probably cleanest as a follow-on, but we're happy to fold it into this PR if you'd prefer it here. And agreed — the proof is in the pudding; glad to benchmark once there's a pooled path to measure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feature] http-client rewrite using modern java API EXPath HttpClient errors are not spec compliant

4 participants