Skip to content

feat: add egress_format_version option for legacy vCon output#190

Merged
pavanputhra merged 4 commits into
mainfrom
pavankumar/con-581-egress-compatibility-link-configurable-vcon-format-version
Jun 7, 2026
Merged

feat: add egress_format_version option for legacy vCon output#190
pavanputhra merged 4 commits into
mainfrom
pavankumar/con-581-egress-compatibility-link-configurable-vcon-format-version

Conversation

@pavanputhra

@pavanputhra pavanputhra commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds an opt-in egress_format_version option that emits an older vCon format version from individual egress points, without changing the canonical representation used inside the pipeline. This unblocks deployments whose downstream consumers (analytics pipelines, indexes, BI tooling) were built against an older vCon schema and cannot be migrated yet.

How it works

A single converter, lib.vcon_egress_compat.to_legacy, downgrades the outgoing payload to a target legacy version. It is the inverse of the read/write normalization that brings legacy producers up to the current spec, and it never mutates the canonical vCon in Redis — only the emitted/persisted copy.

Set egress_format_version on any of these egress points to opt in; leave it unset for current behavior (byte-identical to before):

  • the webhook link
  • the postgres, s3, and elasticsearch storage modules
storages:
  postgres:
    module: storage.postgres
    options:
      egress_format_version: "0.0.1"

Deltas applied for 0.0.1

  • vcon set to 0.0.1
  • amendedappended, criticalmust_support (top level + dialog/analysis entries)
  • attachment purposetype (only when type is absent)
  • mediatypemimetype, schemaschema_version
  • analysis/attachment JSON-string bodies re-inflated to native objects with encoding: "none"
  • empty group/redacted/appended re-added at the top level

Read-path safety

When a storage backend is configured to persist an older format, a vCon loaded back from storage (e.g. after the Redis copy expires) must not leak that older shape into the pipeline or to API clients. This PR ensures both storage-fallback paths canonicalize to the current spec before caching/returning:

  • VconRedis.get_vcon / get_vcon_dict already did this.
  • api.sync_vcon_from_storage now does too (it previously cached and returned the raw stored dict), so the API and the conserver share identical canonicalization.

Additionally, the spec-enforcement step now stamps the current spec version on every write. Because the field names are normalized up to the current spec on the same pass, a stale legacy version string (e.g. on a payload loaded back from older-format storage) is upgraded to match rather than left inconsistent.

Other changes

  • The elasticsearch attachment index-name lookup now accepts both type and purpose, so it no longer raises on current-spec payloads.
  • Adds unit, schema-validation, and integration tests; a derived 0.0.1 JSON schema; and a configuration doc.

Testing

Full test suite passes in the dev container (641 passed, 18 skipped).

🤖 Generated with Claude Code

@pavanputhra pavanputhra force-pushed the pavankumar/con-581-egress-compatibility-link-configurable-vcon-format-version branch 2 times, most recently from 41d974c to 3f6c7a4 Compare June 6, 2026 02:35
pavanputhra and others added 3 commits June 6, 2026 08:07
Add an opt-in `egress_format_version` option that converts an outgoing
vCon to an older format version at the egress point, while keeping the
canonical in-pipeline representation on the current spec. This lets
deployments whose downstream consumers were built against an older vCon
schema keep working while a migration is planned.

- New shared converter `lib.vcon_egress_compat.to_legacy`, the inverse
  of `lib.vcon_compat.normalize_legacy_fields`: reverses the field
  renames (purpose->type, mediatype->mimetype, schema->schema_version,
  amended->appended, critical->must_support), re-inflates JSON-string
  analysis/attachment bodies to native objects, and re-adds the legacy
  top-level keys. Supports target version 0.0.1.
- Wire the option into the webhook link and the postgres, s3 and
  elasticsearch storage modules. Unset -> current behavior, unchanged.
- Make the elasticsearch attachment index-name lookup tolerant of both
  `type` and `purpose` so it no longer errors on current-spec payloads.
- Add unit, schema-validation and integration tests, a derived 0.0.1
  JSON schema, and configuration docs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`api.sync_vcon_from_storage` previously cached and returned the raw dict
read from a storage backend. With a backend configured to persist an
older format (egress_format_version), a Redis miss could surface a
legacy-shaped vCon to API clients and poison the Redis cache that
downstream links read from. Run the same spec-enforcement the
VconRedis storage-fallback uses before caching/returning, so the API
and the conserver share identical canonicalization. Add a regression
test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
_enforce_spec_on_write normalizes field names up to the current spec but
previously preserved a stale top-level version string (e.g. "0.0.1" on a
payload loaded back from older-format storage), leaving the declared
version inconsistent with the now-canonical data. Stamp the current spec
version unconditionally instead, and add a test for the legacy-upgrade
case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pavanputhra pavanputhra force-pushed the pavankumar/con-581-egress-compatibility-link-configurable-vcon-format-version branch from 3f6c7a4 to 70ec78c Compare June 6, 2026 02:38
…ION setting

Replace the per-module `egress_format_version` option with one deployment-wide
`EGRESS_FORMAT_VERSION` setting consulted by every egress point, so the legacy
output format is configured once instead of on each link/storage. A new
`to_configured_legacy()` helper reads the setting and applies the conversion
when it is set.

- Add EGRESS_FORMAT_VERSION to settings.
- Switch the webhook link and the postgres/s3/elasticsearch storages to the
  helper; drop the per-module option.
- Apply the conversion to the API read endpoints (GET /vcon, GET /vcons) so
  external consumers also receive the legacy format. The Redis cache and
  internal processing reads stay canonical.
- Update docs and tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pavanputhra pavanputhra merged commit 4944a98 into main Jun 7, 2026
1 check passed
@pavanputhra pavanputhra deleted the pavankumar/con-581-egress-compatibility-link-configurable-vcon-format-version branch June 7, 2026 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant