Skip to content

defer CBOR/string pre-encoding in SoftReferencedFieldMap until needed#2461

Open
thjaeckle wants to merge 1 commit into
eclipse-ditto:masterfrom
beyonnex-io:refactor/lazy-cbor-encoding
Open

defer CBOR/string pre-encoding in SoftReferencedFieldMap until needed#2461
thjaeckle wants to merge 1 commit into
eclipse-ditto:masterfrom
beyonnex-io:refactor/lazy-cbor-encoding

Conversation

@thjaeckle
Copy link
Copy Markdown
Member

Summary

JFR profiling of production things-service pods on Ditto 3.9.1 showed the constructor chain

ImmutableJsonObject.of(map)
  └── new SoftReferencedFieldMap(map)
        └── JacksonCborFactory.createCborRepresentation(...)
              └── new JacksonSerializationContext + IOContext + flush()

accounting for ~6–10 % of CPU (combining SoftReferencedFieldMap.<init>, createCborRepresentation and downstream JacksonSerializationContext allocation/flush samples). Every ImmutableJsonObject construction eagerly serialised the field map to CBOR (or to a string fallback when CBOR is unavailable) so the cached bytes could later survive a SoftReference clearance. Intermediate JsonObjects produced by transformation/projection pipelines pay this cost and never benefit, because the cached bytes are discarded along with the object.

This PR defers encoding to the first call that actually needs it (toString / writeValue / upperBoundForStringSize) and only softens the field-map reference once a recoverable representation exists. While no serialised form is cached, the field map is held strongly to preserve the "recover-after-soft-clear" invariant.

Changes (all internal to SoftReferencedFieldMap)

  • Replace SoftReference<Map> field with an Object fieldsRef holding either a Map<String, JsonField> (strong) or a SoftReference<Map<String, JsonField>> once a representation has been materialised. fields() unwraps both cases.
  • Constructor only soft-references when stringRepresentation or cborObjectRepresentation is already provided. No more eager createCborRepresentation / createStringRepresentation in the no-representation branch.
  • asJsonObjectString() and writeValue() materialise their representation lazily, then call a new softenFieldsRef() helper to swap the strong map for a SoftReference, restoring the previous GC behaviour from that point on.
  • upperBoundForStringSize() triggers asJsonObjectString() when neither representation exists. Size-validation callers (Modify commands, CBOR sizing) still get a real bound; those callers are no faster than before, but every other build path that never calls a size or serialise method gains the saving.

Measured impact

JMH AverageTime, single fork, 3 warmup + 5 measurement iterations × 2 s, Temurin 25. Lower is better. The benchmarks run with no CBOR factory on the test classpath, so the eager-string fallback path is what's being eliminated; the production CBOR-encoded path eliminates a heavier code path with the same shape.

Scenario Before (eager) After (lazy) Change
buildSmallNoSerialize 344.4 ns/op 120.1 ns/op 2.87× faster
buildLargeNoSerialize (50 fld) 3592.7 ns/op 1360.0 ns/op 2.64× faster
buildSmallAccessFields 1151.7 ns/op 784.9 ns/op 1.47× faster
buildSmallThenToString 344.7 ns/op 409.2 ns/op 19 % slower
buildLargeThenToString 3493.7 ns/op 3678.1 ns/op 5 % slower

The build-then-toString rows simply shift the encoding work from build time to the first toString() call; total work is unchanged. In production where many JsonObjects are intermediate (projections, JSON-patch transforms, partial-update assembly) and never reach toString, this shift is the net saving.

Compatibility

No public API change. SoftReferencedFieldMap is package-private and the ImmutableJsonObject contract is preserved — equality, hashCode, soft-reference reclamation and recoverFields() all continue to work as before. The eager-construction IOException catch is no longer reachable because CBOR encoding now happens on a method that already declares throws IOException.

All 920 ditto-json tests pass, including new targeted tests:

  • buildingDoesNotEagerlyEncodeAnyRepresentation — asserts neither rep is set on construction and the field map is held strongly.
  • toStringMaterialisesStringRepAndSoftensFieldMap — asserts toString() triggers materialisation and softens the field-map reference.
  • upperBoundForStringSizeMaterialisesStringRepLazily — asserts the size-validation contract is preserved.
  • softReferenceClearedAfterSerialisationRecoversFields — asserts the post-serialise soft-reference clearance still recovers via recoverFields().

A new JMH benchmark (ImmutableJsonObjectBenchmark, test scope) covers the five scenarios above and serves as a regression net for future changes in the build/serialise path. JMH wiring in json/pom.xml mirrors the additions made in #2453 (and duplicates the additions in #2460 — trivial pom merge once either lands).

@thjaeckle thjaeckle added this to the 3.9.2 milestone May 26, 2026
@thjaeckle thjaeckle self-assigned this May 26, 2026
@thjaeckle thjaeckle marked this pull request as ready for review May 26, 2026 12:28
@thjaeckle thjaeckle requested review from Copilot and hu-ahmed May 26, 2026 12:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes ditto-json’s ImmutableJsonObject construction path by deferring CBOR/string pre-encoding inside SoftReferencedFieldMap until a serialized form is actually needed, reducing CPU cost for short-lived/intermediate objects.

Changes:

  • Make SoftReferencedFieldMap hold fields strongly until a recoverable serialized representation (string/CBOR) is materialized, then soften the fields reference.
  • Lazily materialize string/CBOR representations on first toString() / writeValue() / getUpperBoundForStringSize() demand.
  • Add targeted regression tests and a JMH micro-benchmark; wire JMH annotation processing in json/pom.xml.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
json/src/main/java/org/eclipse/ditto/json/ImmutableJsonObject.java Refactors SoftReferencedFieldMap to defer encoding and only soften fields after representations exist.
json/src/test/java/org/eclipse/ditto/json/ImmutableJsonObjectTest.java Replaces the prior soft-reference strategy test with more specific tests for lazy materialization + softening.
json/src/test/java/org/eclipse/ditto/json/ImmutableJsonObjectBenchmark.java Adds a JMH benchmark covering build/serialize/access scenarios for regression/perf validation.
json/pom.xml Adds JMH dependencies and configures the JMH annotation processor for test compilation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread json/src/main/java/org/eclipse/ditto/json/ImmutableJsonObject.java
Comment thread json/src/main/java/org/eclipse/ditto/json/ImmutableJsonObject.java
thjaeckle added a commit to beyonnex-io/ditto that referenced this pull request May 26, 2026
Addresses Copilot review feedback on eclipse-ditto#2461:

- recoverFields() now wraps its result in Collections.unmodifiableMap
  so callers that get the field map via getIterator() / values() cannot
  mutate the recovered map and break this object's immutability after a
  SoftReference clearance. This was a latent issue in master as well;
  fix here so the new lazy-recovery path is hardened.

- fieldsRef, jsonObjectStringRepresentation and cborObjectRepresentation
  are now volatile. The invariant "SoftReference implies at least one
  representation is non-null" depends on a happens-before between the
  representation write and the fieldsRef softening; without memory
  barriers a concurrent reader could observe the SoftReference while
  still seeing both representations as null, tripping
  IllegalStateException("Fatal cache miss") if the soft ref is then
  cleared. volatile gives us the publication semantics we need at
  negligible cost (see JMH below).

JMH (single fork, 3 warmup + 5x2s, Temurin 25) before/after this commit:

  buildSmallNoSerialize    120 ->  119 ns/op   (~0%)
  buildLargeNoSerialize   1360 -> 1291 ns/op   (~5%, noise)
  buildSmallAccessFields   785 ->  797 ns/op   (~2% slower, noise)
  buildSmallThenToString   409 ->  402 ns/op   (~2% faster, noise)
  buildLargeThenToString  3678 -> 3648 ns/op   (~1% faster, noise)

All deltas are within measurement noise (errors 4-12% of scores). The
original PR's speedups vs master (2.87x / 2.64x / 1.47x) are preserved.

All 920 ditto-json tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every ImmutableJsonObject construction eagerly serialised its field map
to CBOR (or to a JSON string when CBOR is unavailable) so the
SoftReference could be safely cleared. JFR on prod things-service
(3.9.1) showed this accounting for ~6-10% of CPU on the build path.

Defer encoding to the first call that actually needs it (toString,
writeValue, upperBoundForStringSize) and only soften the field-map
reference once a recoverable representation exists. The fieldsRef and
representation fields are volatile so concurrent readers cannot observe
a SoftReference before its backing representation, and the equals
fallback compares both the field set and the rendered JSON string form
so that JsonValues which encode identically but are class-incompatible
(e.g. NullAttributes vs JsonValue.nullLiteral()) continue to compare
equal as they did when CBOR encoding was eager.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@thjaeckle thjaeckle force-pushed the refactor/lazy-cbor-encoding branch from ae5dce9 to 2f2395a Compare May 26, 2026 19:22
@thjaeckle
Copy link
Copy Markdown
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants