Skip to content

[test] Add cross-surface resource-naming conformance ratchet#6450

Open
joewiz wants to merge 3 commits into
eXist-db:developfrom
joewiz:feature/naming-conformance-harness
Open

[test] Add cross-surface resource-naming conformance ratchet#6450
joewiz wants to merge 3 commits into
eXist-db:developfrom
joewiz:feature/naming-conformance-harness

Conversation

@joewiz

@joewiz joewiz commented Jun 7, 2026

Copy link
Copy Markdown
Member

[This PR was co-authored with Claude Code. -Joe]

Summary

Adds a test that pins how eXist handles "awkward" resource names across API surfaces — the missing coverage for the resource-naming-stability work (issues #3795, #3665, #1824, #5299, #1612). It changes no behavior. It prints the current behavior as a matrix and enforces a ratchet: the set of names that fail to round-trip cross-surface must exactly equal a documented KNOWN_FAILURES allowlist.

The effect: the moment this merges, every name that already works is guarded against regression, and as each naming fix lands you remove an entry from the allowlist to lock the improvement in. The test can neither silently regress (an unlisted name that breaks fails the build) nor silently drift (a listed name that starts working fails the build and tells you to remove it).

There is no cross-surface special-character test in the suite today; this is the first.

What it does

For a corpus of awkward leaf names (space, +, literal %, literal %20, #, @, &, parentheses, apostrophe, café, Cyrillic, CJK), per name it:

  1. stores it via WebDAV (raw HTTP PUT, so the on-the-wire encoding is explicit);
  2. reports what name actually landed in storage, via eXist's native REST collection listing;
  3. reads the content back by the requested name via WebDAV and via REST;
  4. prints the matrix, then asserts the ratchet (failing-set == KNOWN_FAILURES).

Design notes:

  • No XQuery-module dependency — the WebDAV module's test conf.xml registers no builtin-modules, so the harness uses the REST collection listing rather than xmldb:*.
  • No WebDAV-client dependency — the collection is created with a REST PUT (auto-create) and every probe is raw HTTP.
  • Probe content is valid XML because the corpus names end in .xml, which eXist parses on store.

What the matrix shows today

probe            requested          stored-as (oracle)        dav-read  rest-read
ascii-control    plain.xml          plain.xml                 PASS      PASS
space            with space.xml     with%20space.xml (≠)      PASS      PASS
plus             a+b.xml            a%2Bb.xml (≠)             PASS      FAIL
literal-percent  a%b.xml            a%25b.xml (≠)             PASS      PASS
encoded-space    a%20b.xml          a%2520b.xml (≠)           PASS      PASS
hash             a#b.xml            a%23b.xml (≠)             PASS      PASS
at               a@b.xml            a%40b.xml (≠)             PASS      FAIL
ampersand        a&b.xml            a%26b.xml (≠)             PASS      FAIL
parens           report(2024).xml   report%282024%29.xml (≠)  PASS      FAIL
apostrophe       o'brien.xml        o%27brien.xml (≠)         PASS      FAIL
non-ascii        café.xml           caf%C3%A9.xml (≠)         PASS      PASS
cyrillic         Привет.xml         %D0%9F…%82.xml (≠)        PASS      PASS
cjk              文書.xml           %E6%96%87%E6%9B%B8.xml (≠) PASS      PASS

So today: every name with a non-unreserved character is stored percent-encoded; WebDAV self-reads always round-trip; reading by the requested name via REST fails specifically for sub-delim/reserved characters (+, @, &, (/), '). Those five are the current KNOWN_FAILURES allowlist.

What devs see in CI

  • Normal (green): the matrix prints into the build log; Tests run: 1, Failures: 0.
  • A naming fix lands (a listed name now round-trips) → build fails with: "N name(s) now round-trip cross-surface but are still listed as known failures. Remove them from KNOWN_FAILURES so they become regression-guarded: …" + the matrix. You delete those lines from the allowlist and they become guarded.
  • A regression (an unlisted name breaks) → build fails with: "N name(s) regressed — expected to round-trip cross-surface but failed: …" + the matrix.

Scope / follow-ups (noted in the class Javadoc)

  • XML-RPC surface (create/read against /xmlrpc).
  • The full N×N cross-surface matrix (create-via-X then read-via-every-Y).
  • existdb-openapi's special-character suite lives in that repo (a separate PR).
  • This first increment treats "stored-as ≠ requested" as informational (the deeper, format-level concern handled later in the tasking); the ratchet enforces round-trip-by-requested-name.

Where it lives

In the extensions/webdav test module — the only place that can drive REST and WebDAV and XML-RPC against one database in one test (module dependency direction means exist-core can't reach the WebDAV extension). A dedicated cross-surface integration-test module would be the alternative if preferred.

Test plan

  • Runs green locally (Tests run: 1, Failures: 0); matrix prints to the build log.
  • Ratchet verified in both directions — a simulated regression and a simulated now-fixed entry each fail with the correct message + matrix.
  • Codacy/PMD clean on the new file.
  • No production code changed; no other test affected (the test log4j2 config used during diagnosis was reverted).

Related

First increment of the naming conformance harness (resource-naming tasking, PR A).
Boots ExistWebServer (REST + WebDAV/Jackrabbit + XML-RPC on one port), stores a corpus
of "awkward" resource names through WebDAV (raw HTTP PUT, explicit on-the-wire
encoding), reports what name actually landed in storage (via eXist's native REST
collection listing — the WebDAV test conf.xml registers no XQuery modules, so the
harness depends on none), and reads each back by the requested name via WebDAV and REST.

It prints a matrix of current behavior (visible in the CI log) and enforces a RATCHET:
the set of names that fail to round-trip cross-surface must exactly equal the documented
KNOWN_FAILURES allowlist. So merging this immediately guards every already-correct name
against regression; when a naming fix lands and a listed name starts round-tripping, the
test fails and tells you to remove it from the allowlist (locking in the improvement);
and a name that is not listed can never silently regress. Verified both failure
directions ("regressed" and "now round-trips — remove from KNOWN_FAILURES").

The harness is module- and WebDAV-client-independent: the collection is created with a
REST PUT (auto-create), probes are raw HTTP, and probe content is valid XML (corpus
names end in .xml, which eXist parses on store).

Current allowlist (5 names that don't yet round-trip cross-surface): plus, at,
ampersand, parens, apostrophe — i.e. several sub-delim/reserved characters. XML-RPC and
the full N×N cross-surface matrix are TODO follow-ups noted in the class Javadoc.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@joewiz joewiz requested a review from a team as a code owner June 7, 2026 04:59

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Composing the webdav URL using the URI constructor would make the custom encoding logic obsolete if used by a function like this:

Locations as such

final URL url = URI.create(webdavBase() + encodePathSegments(dbPath)).toURL();

Could then replaced with

final URL url = createWebdavUrl(dbPath);

Using this method:

private URL createWebdavUrl(String dbPath {
  return new URI("http", null, "localhost", existWebServer.getPort(), "/webdav"+ dbPath, null, null).toURL();
}

Also I would prefer to use the new HttpClient API introduced with Java 11

Address review feedback on eXist-db#6450: switch the harness from HttpURLConnection
to java.net.http.HttpClient, and build request URLs with the multi-argument
java.net.URI constructor instead of a custom path-segment encoder.

This changes what goes on the wire for RFC 3986 sub-delimiters: the URI
constructor leaves '+ @ & ( ) '' literal in the path (as a browser or curl
does), whereas the previous URLEncoder-based helper percent-encoded them.
Under conventional encoding, every corpus name -- including those five plus
the non-ASCII names -- round-trips cross-surface between WebDAV and REST.
The five "failures" the earlier revision recorded were artifacts of
percent-encoding sub-delimiters, not of eXist's storage.

KNOWN_FAILURES is therefore now empty: the ratchet guards the all-green
state and fails if any corpus name ever stops round-tripping cross-surface.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
private static final String MARKER = "naming-probe-content";
private static final String CONTENT = "<probe>" + MARKER + "</probe>";

private static final HttpClient HTTP = HttpClient.newHttpClient();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create this within the createTestCollection() method as the client needs to be closed at the end of the test. (Within an @AfterClass annotated method.

Address review feedback on eXist-db#6450: create the HttpClient in
createTestCollection() (@BeforeClass) and close it in removeTestCollection()
(@afterclass), rather than holding it in a static final field. HttpClient is
AutoCloseable as of Java 21, so it is closed once after the test class runs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@joewiz

joewiz commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

[This response was co-authored with Claude Code. -Joe]

Thanks Patrick — adopted both: the harness now uses java.net.http.HttpClient, and request URLs are built with the multi-argument java.net.URI constructor instead of the custom path-segment encoder, via a small endpointUri(endpoint, dbPath) helper. That dropped ~40 lines. And per your follow-up, the client is now created in createTestCollection() (@BeforeClass) and closed in removeTestCollection() (@AfterClass) — HttpClient is AutoCloseable as of Java 21.

Your suggestion turned out to do more than simplify the code — it corrected the test. The URI constructor leaves RFC 3986 sub-delimiters (+ @ & ( ) ') literal in the path, whereas the old URLEncoder-based helper percent-encoded them. With the conventional encoding (what a browser or curl actually sends), every corpus name round-trips cross-surface between WebDAV and REST — including the five the earlier revision had recorded as known failures:

probe         requested          stored-as              dav-read  rest-read
plus          a+b.xml            a+b.xml                PASS      PASS
at            a@b.xml            a@b.xml                PASS      PASS
ampersand     a&b.xml            a&b.xml                PASS      PASS
parens        report(2024).xml   report(2024).xml       PASS      PASS
apostrophe    o'brien.xml        o'brien.xml            PASS      PASS
space         with space.xml     with%20space.xml (≠)   PASS      PASS
café          café.xml           caf%C3%A9.xml (≠)       PASS      PASS
...all 13 PASS/PASS

So those five "failures" were artifacts of the harness percent-encoding sub-delimiters, not eXist mis-storing names. eXist may store a percent-encoded form (the (≠) rows), but both surfaces agree on read, so the content is reachable by the requested name. KNOWN_FAILURES is now empty, and the ratchet guards the all-green state: it fails if any corpus name ever stops round-tripping cross-surface.

(One genuinely interesting residual the percent-encoded run hinted at: a%2Bb and a+b denote the same path resource per RFC 3986, but eXist's REST read didn't treat them as equivalent. That's a much narrower question than "names don't round-trip" and I've left it out of this test; worth a separate look as part of the naming work.)

Pushed as 9ae2d8a (refactor + the empty allowlist) and 5851167 (HttpClient lifecycle).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants