Skip to content

Config v2.0#1351

Open
MrAlias wants to merge 28 commits intoopen-telemetry:mainfrom
MrAlias:config-v2
Open

Config v2.0#1351
MrAlias wants to merge 28 commits intoopen-telemetry:mainfrom
MrAlias:config-v2

Conversation

@MrAlias
Copy link
Copy Markdown
Contributor

@MrAlias MrAlias commented Feb 23, 2026

  • Design plan methodology for v2.0 of OBI config
  • JSON Schema for v2.0
  • Migration plan
  • Examples and tooling

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 23, 2026

Codecov Report

❌ Patch coverage is 0% with 321 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.45%. Comparing base (4793b9b) to head (842e765).
⚠️ Report is 367 commits behind head on main.

Files with missing lines Patch % Lines
devdocs/config/version-2.0/verify.go 0.00% 321 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1351       +/-   ##
===========================================
+ Coverage   43.76%   68.45%   +24.69%     
===========================================
  Files         308      277       -31     
  Lines       33495    32996      -499     
===========================================
+ Hits        14658    22588     +7930     
+ Misses      17894     9230     -8664     
- Partials      943     1178      +235     
Flag Coverage Δ
integration-test 56.93% <ø> (+35.25%) ⬆️
integration-test-arm 29.55% <ø> (+29.55%) ⬆️
integration-test-vm-x86_64-5.15.152 ?
integration-test-vm-x86_64-6.10.6 ?
k8s-integration-test 43.39% <ø> (+41.06%) ⬆️
oats-test 38.09% <ø> (+38.09%) ⬆️
unittests 56.71% <0.00%> (+12.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@NimrodAvni78 NimrodAvni78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!
i know its in draft but wanted to give some input as well

Comment thread devdocs/config/version-2.0/examples/default-configuration.yaml Outdated
Comment thread devdocs/config/version-2.0/examples/default-configuration.yaml Outdated
Comment thread devdocs/config/version-2.0/examples/default-configuration.yaml Outdated
Comment thread devdocs/config/version-2.0/examples/default-configuration.yaml Outdated
Comment thread devdocs/config/version-2.0/examples/default-configuration.yaml
@MrAlias MrAlias marked this pull request as ready for review February 25, 2026 20:11
@MrAlias MrAlias requested a review from a team as a code owner February 25, 2026 20:11
Copilot AI review requested due to automatic review settings February 25, 2026 20:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces the v2.0 configuration schema for OBI (OpenTelemetry Binary Instrumentation), representing a comprehensive redesign of the configuration model to better align with OpenTelemetry's declarative configuration format and improve user experience.

Changes:

  • Design documentation defining principles, user journeys, and configuration structure for v2.0
  • JSON Schema definition for the extensions.obi section with comprehensive validation rules
  • Migration plan outlining the strategy for transitioning from v1 to v2 configuration
  • Example default configuration in v2 format with detailed comments
  • Verification tooling (Go and Python) to validate schema correctness and ensure feature parity between v1 and v2

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
devdocs/config/version-2.0/config-v2.md Design document describing principles, user journeys, and the target v2 configuration structure with detailed field mappings from v1
devdocs/config/version-2.0/migration.md Migration plan outlining parsing behavior, CLI tooling requirements, and phased rollout strategy
devdocs/config/version-2.0/obi-extension.schema.json JSON Schema (Draft 2020-12) defining the complete v2 configuration structure for the extensions.obi section
devdocs/config/version-2.0/examples/default-configuration.yaml Comprehensive example showing the default v2 configuration with inline comments and OTel integration
devdocs/config/version-2.0/verify.go Go verification tool that validates feature parity between v1 defaults and v2 defaults through 94+ mapping checks
devdocs/config/version-2.0/validate_example.py Python validation script that validates configuration files against both the OBI extension schema and OTel declarative schema
devdocs/config/version-2.0/.verify/dump_default_config.go Utility to dump the current v1 default configuration for verification purposes
devdocs/config/version-2.0/.verify/default-config-current.yaml Snapshot of the current v1 default configuration used as baseline for parity verification

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread devdocs/config/version-2.0/migration.md Outdated
Comment thread devdocs/config/version-2.0/migration.md Outdated
Comment thread devdocs/config/version-2.0/migration.md Outdated
Comment thread devdocs/config/version-2.0/migration.md Outdated
Comment thread devdocs/config/version-2.0/config-v2.md Outdated
Comment thread devdocs/config/version-2.0/config-v2.md Outdated
Comment thread devdocs/config/version-2.0/migration.md Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@fstab
Copy link
Copy Markdown
Member

fstab commented Feb 26, 2026

Hi Tyler, thanks a lot for your awesome work!

Quick question: One requirement that we are frequently seeing at Grafana is configuration per service. For example, if tracing causes issues for a shopping cart service, you might want to disable it for that specific service but keep it for all other services. Another example: You may want specific HTTP route matchers for individual services.

Is per service configuration possible with the new config? If not, it would be great to add that.

@MrAlias
Copy link
Copy Markdown
Contributor Author

MrAlias commented Feb 26, 2026

One requirement that we are frequently seeing at Grafana is configuration per service.

@fstab yes, that should be possible. As I have designed the configuration, something like this would address the user concern:

extensions:
  obi:
    version: "2.0"
    selection:
      policy:
        # Exclude all services not matched
        default_action: exclude
      rules:
        - action: include
          match:
            process:
              exe_path_glob:
                - "/path/to/my_service"

@MrAlias
Copy link
Copy Markdown
Contributor Author

MrAlias commented Feb 26, 2026

Another example: You may want specific HTTP route matchers for individual services.

This is an interesting idea. I do not think it is possible in the current configuration, nor explicitly what I have specified here. I can take a look at the HTTP routes section a bit more and see about per-service matching. 👍

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@grcevski grcevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic! I really like the structure and the focus on what the end user wants to achieve.

I just have a comment on providing a way to override defaults for a selection -> rule. Given we run in a "daemonset" mode and instrument many services, more and more so, we've had the request to make certain settings be configurable per selection criteria.

It would be really nice if we could provide a way to override certain global settings.

Comment thread devdocs/config/version-2.0/examples/default-configuration.yaml
Copy link
Copy Markdown
Contributor

@mariomac mariomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! I'd like to drop few minor comments.

Comment thread devdocs/config/version-2.0/.verify/default-config-current.yaml
Comment thread devdocs/config/version-2.0/.verify/default-config-current.yaml
Comment thread devdocs/config/version-2.0/examples/default-configuration.yaml
Comment thread devdocs/config/version-2.0/examples/default-configuration.yaml
Comment thread devdocs/config/version-2.0/examples/default-configuration.yaml
MrAlias added 3 commits April 14, 2026 13:33
Add `refine` block to selection rules for per-workload configuration
overrides, and document the k8s enricher / Collector receiver boundary.

Per-workload overrides (grcevski, fstab):
- Add `SelectionRuleRefine` schema definition with explicit closed
  vocabulary: exports (traces/metrics), http.routes, http.filters
- Add `refine` property to selection rule items in the schema
- Add commented example in default-configuration.yaml showing
  signal disable and per-service HTTP route patterns
- Document the refine block design in config-v2.md with examples

Kubernetes enricher / Collector receiver (dmitryax):
- Add `mode` field to enrich.enrichers.kubernetes in schema with
  autodetect/enabled/disabled options
- Add documentation section in config-v2.md explaining the
  standalone vs receiver tradeoff and when to use k8sattributesprocessor
- Annotate the example config's kubernetes.mode comment
Split the flat obi extension config into deployment-scoped sections:

- capture: receiver-embeddable block containing everything OBI needs to
  select workloads and capture telemetry. Valid in all deployment modes.
  When running OBI as a Collector receiver, this block is embedded
  directly in the receiver configuration.
  - policy + rules flattened to capture.policy / capture.rules
    (removes the selection indirection for the 80% case)
  - instrumentation, runtimes, network moved under capture
  - operations.capture renamed to capture.engine (eBPF internals)
  - operations.limits, safety, runtime.channels, telemetry (cache/TTL)
    moved under capture.*

- enrich, correlation: standalone-mode only (unchanged paths)

- daemon: new section for OBI process controls (standalone only).
  Contains logging, profiling, shutdown, internal_metrics, and
  Prometheus-specific telemetry shaping. Replaces the daemon-facing
  fields that were previously in operations.

Update obi-extension.schema.json, default-configuration.yaml,
verify.go, and config-v2.md (high-level shape, section descriptions,
and v1->v2 mapping table) to reflect new paths throughout.

All 85 verify.go parity checks pass. OBI and OTel schema validation pass.
Expand config-v2.md with rationale sections for each major structural
choice: why capture is a named grouping (vs. flat + deployment flag),
why policy/rules are direct children of capture (vs. nested under
capture.selection), why refine uses a closed vocabulary (vs. deep-merge),
why sampling stays in tracer_provider.sampler (vs. capture.rules refine),
why engine not capture.capture, why enrich/correlation/daemon are
standalone-only, and why daemon not operations.

Update migration.md to reflect the new capture/daemon/enrich/correlation
structure, add deployment-mode validation semantics, document what
non-deterministic migration means with concrete examples, add rationale
for keeping parser scope narrow (parse+validate only, not setup/routing),
and explain the phased rollout rationale including why dual-read is
necessary.
@MrAlias
Copy link
Copy Markdown
Contributor Author

MrAlias commented Apr 14, 2026

Design update — capture/daemon split + per-workload refine

The design has been significantly revised based on reviewer feedback. Summary of the main changes:

Deployment-aware structure (capture vs. daemon/enrich/correlation)

The flat extensions.obi top-level structure has been reorganized around deployment scope:

  • capture — receiver-embeddable, valid in all deployment modes. Contains everything OBI needs to select workloads and capture telemetry: policy, rules, instrumentation, runtimes, network, limits, engine, safety, channels, telemetry.
  • enrich, correlation, daemon — standalone-mode only. Not valid in Collector receiver deployments.

This directly addresses @dmitryax's concerns: when OBI runs as a Collector receiver, enrich (including k8s metadata, service name, attribute enrichment) is absent. The Collector pipeline handles enrichment via k8sattributesprocessor and other processors — no duplication. log_trace_annotation (correlation) is also standalone-only; a future standalone Collector component is the planned path there.

Per-workload refine block

Addresses the feedback from @grcevski and @fstab on per-service configuration. Include rules now support an optional refine block:

capture:
  rules:
    - action: include
      match:
        kubernetes:
          namespace_glob: ["staging-*"]
      refine:
        exports:
          traces: false
          metrics: true
    - action: include
      match:
        kubernetes:
          namespace_glob: ["orders"]
      refine:
        http:
          routes:
            unmatched: wildcard
            patterns:
              - /orders/{id}

refine uses an explicit, closed vocabulary (exports, http.routes, http.filters) rather than generic deep-merge, to avoid ambiguous array/map merge semantics. New fields can be added deliberately as use cases emerge.

Per-workload sampling stays in tracer_provider.sampler (canonical OTel location) via the planned obi_rule_based custom sampler plugin — keeping sampling out of the OBI extension namespace.

policy/rules flattened into capture

The earlier capture.selection.policy / capture.selection.rules nesting has been removed. policy and rules are now direct children of capture, saving one indentation level on the most-written field.

operations split and renamed

The old operations section mixed capture-valid and daemon-only fields. It's now split:

  • eBPF engine internals → capture.engine
  • Process management (logging, profiling, shutdown, internal metrics) → daemon

Updated files: config-v2.md (design rationale for every structural decision), migration.md (deployment-mode validation, non-deterministic migration definition, phased rollout rationale), obi-extension.schema.json, examples/default-configuration.yaml, verify.go.

Marking this ready for re-review. Happy to discuss any of the above.

@MrAlias
Copy link
Copy Markdown
Contributor Author

MrAlias commented Apr 14, 2026

@fstab — yes, both of your examples are now supported. The updated design adds a refine block on include rules with an explicit vocabulary of per-workload overrides:

  • exports: override which signals (traces, metrics) are emitted for a matched workload — so you can disable traces for a specific service while keeping them globally.
  • http.routes: override HTTP route patterns and fallback policy for a specific service.
  • http.filters: replace HTTP trace/metric filters for a specific service.

Example covering both your cases:

capture:
  rules:
    # Disable traces for shopping cart, keep metrics
    - action: include
      name: shopping-cart
      match:
        kubernetes:
          namespace_glob: ["shopping-cart"]
      refine:
        exports:
          traces: false
          metrics: true

    # Custom HTTP route matchers for orders service
    - action: include
      name: orders-service
      match:
        kubernetes:
          namespace_glob: ["orders"]
      refine:
        http:
          routes:
            unmatched: wildcard
            patterns:
              - /orders/{id}
              - /orders/{id}/items

For per-workload sampling overrides, the design keeps sampling in tracer_provider.sampler (the canonical OTel location) via a planned obi_rule_based custom sampler plugin that supports workload-matching rules. See the updated config-v2.md for the full design rationale.

# Delay before Java route template discovery to allow runtime readiness.
delay: 1m0s
# HTTP payload-level extraction features.
payload_extraction:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for all of these (and other protocols we infer on top of http) the current configuration is really complicated, needed a new field and env var per protocol, unlike the normal instrumentations field where its just a list of protocols, the only difference is that this detection is done over http, and probably needs http large buffers enabled to work reliably
i think we should just unify this config instrumentations config, maybe warn on startup / document that some of these wont work reliably without large buffers.
and allow for some unifying semantics (for example instead of specifying anthropic, openai, gemini we can allow for users to write genai, maybe something similar for other stuff like couchbase (KV + n1ql), aws (sqs and s3), elasticsearch (elasticsearch + opensearch), sql (mysql + postgresql + mssql) and so on.
this could maybe be fixed in OBI now or we can wait for config v2

Copy link
Copy Markdown
Contributor

@grcevski grcevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants