defer CBOR/string pre-encoding in SoftReferencedFieldMap until needed#2461
Open
thjaeckle wants to merge 1 commit into
Open
defer CBOR/string pre-encoding in SoftReferencedFieldMap until needed#2461thjaeckle wants to merge 1 commit into
thjaeckle wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes ditto-json’s ImmutableJsonObject construction path by deferring CBOR/string pre-encoding inside SoftReferencedFieldMap until a serialized form is actually needed, reducing CPU cost for short-lived/intermediate objects.
Changes:
- Make
SoftReferencedFieldMaphold fields strongly until a recoverable serialized representation (string/CBOR) is materialized, then soften the fields reference. - Lazily materialize string/CBOR representations on first
toString()/writeValue()/getUpperBoundForStringSize()demand. - Add targeted regression tests and a JMH micro-benchmark; wire JMH annotation processing in
json/pom.xml.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| json/src/main/java/org/eclipse/ditto/json/ImmutableJsonObject.java | Refactors SoftReferencedFieldMap to defer encoding and only soften fields after representations exist. |
| json/src/test/java/org/eclipse/ditto/json/ImmutableJsonObjectTest.java | Replaces the prior soft-reference strategy test with more specific tests for lazy materialization + softening. |
| json/src/test/java/org/eclipse/ditto/json/ImmutableJsonObjectBenchmark.java | Adds a JMH benchmark covering build/serialize/access scenarios for regression/perf validation. |
| json/pom.xml | Adds JMH dependencies and configures the JMH annotation processor for test compilation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
thjaeckle
added a commit
to beyonnex-io/ditto
that referenced
this pull request
May 26, 2026
Addresses Copilot review feedback on eclipse-ditto#2461: - recoverFields() now wraps its result in Collections.unmodifiableMap so callers that get the field map via getIterator() / values() cannot mutate the recovered map and break this object's immutability after a SoftReference clearance. This was a latent issue in master as well; fix here so the new lazy-recovery path is hardened. - fieldsRef, jsonObjectStringRepresentation and cborObjectRepresentation are now volatile. The invariant "SoftReference implies at least one representation is non-null" depends on a happens-before between the representation write and the fieldsRef softening; without memory barriers a concurrent reader could observe the SoftReference while still seeing both representations as null, tripping IllegalStateException("Fatal cache miss") if the soft ref is then cleared. volatile gives us the publication semantics we need at negligible cost (see JMH below). JMH (single fork, 3 warmup + 5x2s, Temurin 25) before/after this commit: buildSmallNoSerialize 120 -> 119 ns/op (~0%) buildLargeNoSerialize 1360 -> 1291 ns/op (~5%, noise) buildSmallAccessFields 785 -> 797 ns/op (~2% slower, noise) buildSmallThenToString 409 -> 402 ns/op (~2% faster, noise) buildLargeThenToString 3678 -> 3648 ns/op (~1% faster, noise) All deltas are within measurement noise (errors 4-12% of scores). The original PR's speedups vs master (2.87x / 2.64x / 1.47x) are preserved. All 920 ditto-json tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every ImmutableJsonObject construction eagerly serialised its field map to CBOR (or to a JSON string when CBOR is unavailable) so the SoftReference could be safely cleared. JFR on prod things-service (3.9.1) showed this accounting for ~6-10% of CPU on the build path. Defer encoding to the first call that actually needs it (toString, writeValue, upperBoundForStringSize) and only soften the field-map reference once a recoverable representation exists. The fieldsRef and representation fields are volatile so concurrent readers cannot observe a SoftReference before its backing representation, and the equals fallback compares both the field set and the rendered JSON string form so that JsonValues which encode identically but are class-incompatible (e.g. NullAttributes vs JsonValue.nullLiteral()) continue to compare equal as they did when CBOR encoding was eager. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ae5dce9 to
2f2395a
Compare
Member
Author
|
System tests passed: https://github.com/eclipse-ditto/ditto/actions/runs/26495875590 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
JFR profiling of production
things-servicepods on Ditto 3.9.1 showed the constructor chainaccounting for ~6–10 % of CPU (combining
SoftReferencedFieldMap.<init>,createCborRepresentationand downstreamJacksonSerializationContextallocation/flush samples). EveryImmutableJsonObjectconstruction eagerly serialised the field map to CBOR (or to a string fallback when CBOR is unavailable) so the cached bytes could later survive aSoftReferenceclearance. Intermediate JsonObjects produced by transformation/projection pipelines pay this cost and never benefit, because the cached bytes are discarded along with the object.This PR defers encoding to the first call that actually needs it (
toString/writeValue/upperBoundForStringSize) and only softens the field-map reference once a recoverable representation exists. While no serialised form is cached, the field map is held strongly to preserve the "recover-after-soft-clear" invariant.Changes (all internal to
SoftReferencedFieldMap)SoftReference<Map>field with anObject fieldsRefholding either aMap<String, JsonField>(strong) or aSoftReference<Map<String, JsonField>>once a representation has been materialised.fields()unwraps both cases.stringRepresentationorcborObjectRepresentationis already provided. No more eagercreateCborRepresentation/createStringRepresentationin the no-representation branch.asJsonObjectString()andwriteValue()materialise their representation lazily, then call a newsoftenFieldsRef()helper to swap the strong map for aSoftReference, restoring the previous GC behaviour from that point on.upperBoundForStringSize()triggersasJsonObjectString()when neither representation exists. Size-validation callers (Modify commands, CBOR sizing) still get a real bound; those callers are no faster than before, but every other build path that never calls a size or serialise method gains the saving.Measured impact
JMH
AverageTime, single fork, 3 warmup + 5 measurement iterations × 2 s, Temurin 25. Lower is better. The benchmarks run with no CBOR factory on the test classpath, so the eager-string fallback path is what's being eliminated; the production CBOR-encoded path eliminates a heavier code path with the same shape.buildSmallNoSerializebuildLargeNoSerialize(50 fld)buildSmallAccessFieldsbuildSmallThenToStringbuildLargeThenToStringThe build-then-toString rows simply shift the encoding work from build time to the first
toString()call; total work is unchanged. In production where many JsonObjects are intermediate (projections, JSON-patch transforms, partial-update assembly) and never reachtoString, this shift is the net saving.Compatibility
No public API change.
SoftReferencedFieldMapis package-private and theImmutableJsonObjectcontract is preserved — equality, hashCode, soft-reference reclamation andrecoverFields()all continue to work as before. The eager-constructionIOExceptioncatch is no longer reachable because CBOR encoding now happens on a method that already declaresthrows IOException.All 920
ditto-jsontests pass, including new targeted tests:buildingDoesNotEagerlyEncodeAnyRepresentation— asserts neither rep is set on construction and the field map is held strongly.toStringMaterialisesStringRepAndSoftensFieldMap— assertstoString()triggers materialisation and softens the field-map reference.upperBoundForStringSizeMaterialisesStringRepLazily— asserts the size-validation contract is preserved.softReferenceClearedAfterSerialisationRecoversFields— asserts the post-serialise soft-reference clearance still recovers viarecoverFields().A new JMH benchmark (
ImmutableJsonObjectBenchmark, test scope) covers the five scenarios above and serves as a regression net for future changes in the build/serialise path. JMH wiring injson/pom.xmlmirrors the additions made in #2453 (and duplicates the additions in #2460 — trivial pom merge once either lands).