Skip to content

Filter OTEL by environment#47574

Open
getvictor wants to merge 2 commits into
mainfrom
victor/47573-otel-environment
Open

Filter OTEL by environment#47574
getvictor wants to merge 2 commits into
mainfrom
victor/47573-otel-environment

Conversation

@getvictor

@getvictor getvictor commented Jun 14, 2026

Copy link
Copy Markdown
Member

Related issue: Resolves #47573

Checklist for submitter

If some of the following don't apply, delete the relevant line.

  • Changes file added for user-visible changes in changes/, orbit/changes/ or ee/fleetd-chrome/changes.
    See Changes files for more information.

Testing

  • QA'd all new/changed functionality manually

Summary by CodeRabbit

  • New Features

    • Added new HTTP errors dashboard for monitoring client and server errors
    • Implemented environment filtering across SigNoz dashboards to scope telemetry per deployment environment
  • Improvements

    • Enhanced host cache dashboard with improved hit rate threshold and environment filtering
    • Updated multiple dashboard queries to support dynamic environment selection

@getvictor

Copy link
Copy Markdown
Member Author

@coderabbitai full review

@getvictor

Copy link
Copy Markdown
Member Author

/agentic_review

@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Full review finished.

@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 14, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0)

Grey Divider


Remediation recommended

1. Duplicate env resource attribute 🐞 Bug ⚙ Maintainability
Description
initOTELProviders sets the deployment environment twice (via semconv.DeploymentEnvironmentName and a
separate hard-coded "deployment.environment" attribute), creating two sources of truth that can
silently diverge and confuse/break environment scoping in SigNoz. This also adds an extra OTEL
attribute import solely to support the duplicate attribute.
Code

cmd/fleet/otel.go[R40-50]

+	// can override the defaults below. resource.WithFromEnv() runs after WithAttributes, so env-provided values win on conflict.
+	// We always emit a deployment.environment so the attribute key exists in every SigNoz instance Fleet reports to. This lets
+	// dashboards use a dynamic environment selector.
	res, err := resource.New(context.Background(),
		resource.WithSchemaURL(semconv.SchemaURL),
		resource.WithAttributes(
			semconv.ServiceName("fleet"),
			semconv.ServiceVersion(version.Version().Version),
+			semconv.DeploymentEnvironmentName("default"),
+			attribute.String("deployment.environment", "default"), // 2026-06-14: deprecated attribute still used by SigNoz
		),
Evidence
The OTEL resource is currently configured with two separate environment attributes at initialization
time, while the SigNoz dashboards consistently filter on deployment.environment; two sources of
truth makes future changes risky and can lead to mismatched filtering/scoping behavior.

cmd/fleet/otel.go[39-52]
tools/signoz/database_custom_dashboard.json[8-16]
tools/signoz/http_errors_dashboard.json[5-22]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`cmd/fleet/otel.go` sets the deployment environment resource attribute twice: once using `semconv.DeploymentEnvironmentName("default")` and once using a raw `attribute.String("deployment.environment", "default")`. Having two parallel definitions for the same conceptual attribute increases the risk of accidental divergence (e.g., one gets updated/overridden while the other does not), which would make SigNoz environment filtering confusing or incorrect.

## Issue Context
SigNoz dashboards in `tools/signoz/*.json` filter on the `deployment.environment` resource attribute and advertise that Fleet always emits it.

## Fix Focus Areas
- cmd/fleet/otel.go[39-52]

## Suggested fix
Choose a single source of truth for the deployment environment resource attribute:
- Prefer using the semantic convention helper (`semconv.DeploymentEnvironmentName(...)`) *or* a single explicitly-named attribute key, but not both.
- Remove the now-unneeded import (`go.opentelemetry.io/otel/attribute`) if you drop the explicit attribute.
- Ensure the chosen key is the one the dashboards query (`deployment.environment`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Important

Review skipped

Review was skipped as selected files did not have any reviewable changes.

💤 Files selected but had no reviewable changes (1)
  • tools/signoz/database_custom_dashboard.json
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8abfd14d-d8f3-46dc-b828-dac502581dac

📥 Commits

Reviewing files that changed from the base of the PR and between 93851e0 and c40c8da.

📒 Files selected for processing (1)
  • tools/signoz/database_custom_dashboard.json

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

The Fleet OTEL provider (cmd/fleet/otel.go) is extended to emit deployment.environment as a resource attribute set to "default", using both the current semantic convention and the deprecated key form for SigNoz compatibility. The existing tools/signoz/host_cache_dashboard.json dashboard gains a dynamic environment template variable backed by deployment.environment, and all four panel metric queries are updated to filter with deployment.environment IN $environment; the Hit rate orange threshold is also revised from 0.8 to 0.9. A new dashboard file tools/signoz/http_errors_dashboard.json is added, defining two stacked bar chart panels for fleet.http.client_errors and fleet.http.server_errors, both filtered by the same environment variable and grouped by error.type.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Filter OTEL by environment' clearly and concisely summarizes the main change: adding environment filtering to OpenTelemetry dashboards.
Description check ✅ Passed The PR description includes the related issue (#47573), checks the 'Changes file added' box as required, and confirms manual QA was performed, meeting template requirements.
Linked Issues check ✅ Passed The PR successfully implements all key requirements: adds deployment.environment templating to SigNoz dashboards, updates OTEL resource attributes in Fleet code, creates new HTTP errors dashboard, and updates existing dashboards with environment filtering.
Out of Scope Changes check ✅ Passed All changes are directly within scope: updates to SigNoz dashboards and OTEL configuration to add environment filtering as required by issue #47573.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch victor/47573-otel-environment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tools/signoz/host_cache_dashboard.json (1)

46-46: 💤 Low value

Incidental improvement: hit rate target raised to 90%.

The hit rate description and threshold are updated from 80% to 90%. This is a sensible operational improvement but is not directly related to the environment filtering feature. Consider noting this change separately in the changelog or PR description for clarity.

Also applies to: 106-106

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/signoz/host_cache_dashboard.json` at line 46, The hit rate target has
been updated from 80% to 90% in the host_cache_dashboard.json file at line 46
(and also at line 106). While this is a sensible operational improvement, it is
not directly related to the main feature being implemented. Add a note to the
changelog or PR description clearly documenting this as a separate, incidental
improvement to the cache hit rate target. This will help clarify that the change
is an independent operational update and not part of the core feature work.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tools/signoz/host_cache_dashboard.json`:
- Line 46: The hit rate target has been updated from 80% to 90% in the
host_cache_dashboard.json file at line 46 (and also at line 106). While this is
a sensible operational improvement, it is not directly related to the main
feature being implemented. Add a note to the changelog or PR description clearly
documenting this as a separate, incidental improvement to the cache hit rate
target. This will help clarify that the change is an independent operational
update and not part of the core feature work.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 65590ecb-154c-47bb-8543-eb8005a23856

📥 Commits

Reviewing files that changed from the base of the PR and between 268c918 and 93851e0.

⛔ Files ignored due to path filters (1)
  • tools/signoz/README.md is excluded by !**/*.md
📒 Files selected for processing (5)
  • changes/47573-signoz-dashboards-deployment-environment
  • cmd/fleet/otel.go
  • tools/signoz/database_custom_dashboard.json
  • tools/signoz/host_cache_dashboard.json
  • tools/signoz/http_errors_dashboard.json

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

  • Copilot's review of this pull request may be incomplete because some of the changed files are excluded by your Copilot content exclusion settings. See Excluding content from Copilot for details.

Pull request overview

Updates Fleet’s OpenTelemetry resource attributes and bundled SigNoz dashboards so operators can filter metrics by deployment environment using a shared environment dashboard variable backed by the deployment.environment resource attribute.

Changes:

  • Emit deployment.environment (default default, overridable via OTEL_RESOURCE_ATTRIBUTES) in Fleet OTEL resource attributes.
  • Add an environment dashboard variable + deployment.environment IN $environment filtering to the existing SigNoz dashboards.
  • Add a new pre-canned http_errors_dashboard.json and document it in tools/signoz/README.md.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
cmd/fleet/otel.go Adds default environment resource attributes so SigNoz can dynamically discover and filter by environment.
tools/signoz/README.md Documents the new HTTP errors dashboard and the cross-dashboard environment selector convention.
tools/signoz/http_errors_dashboard.json New dashboard for fleet.http.client_errors / fleet.http.server_errors with environment filtering.
tools/signoz/host_cache_dashboard.json Adds environment variable and filters queries by deployment.environment (also changes hit-rate target/threshold).
tools/signoz/database_custom_dashboard.json Adds environment variable and filters all panels by deployment.environment.
changes/47573-signoz-dashboards-deployment-environment User-visible change entry (content excluded by policy).
Files excluded by content exclusion policy (1)
  • changes/47573-signoz-dashboards-deployment-environment

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/signoz/host_cache_dashboard.json
Comment thread tools/signoz/host_cache_dashboard.json
@codecov

codecov Bot commented Jun 14, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 67.19%. Comparing base (0e9b27f) to head (c40c8da).
⚠️ Report is 30 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff            @@
##             main   #47574    +/-   ##
========================================
  Coverage   67.19%   67.19%            
========================================
  Files        3489     3491     +2     
  Lines      228536   228717   +181     
  Branches    11872    11872            
========================================
+ Hits       153557   153695   +138     
- Misses      61154    61186    +32     
- Partials    13825    13836    +11     
Flag Coverage Δ
backend 68.83% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@getvictor getvictor marked this pull request as ready for review June 14, 2026 10:35
@getvictor getvictor requested a review from a team as a code owner June 14, 2026 10:35
@qodo-free-for-open-source-projects

Copy link
Copy Markdown

CI Feedback 🧐

A test triggered by this PR failed. Here is an AI-generated analysis of the failure:

Action: aggregate-result

Failed stage: Check for failures [❌]

Failed test name: integration-enterprise-mysql8.0.44

Failure summary:

The action failed because the workflow’s Run failed_tests="" step detected a failed test job from
the downloaded status artifacts:
- The status file
./integration-enterprise-mysql8.0.44-status/status contained fail (lines 179-180).
- The script then
exited with code 1 after reporting ❌ One or more test jobs failed:
integration-enterprise-mysql8.0.44 (lines 189-190).

Relevant error logs:
1:  ##[group]Runner Image Provisioner
2:  Hosted Compute Agent
...

124:  Artifact download completed successfully.
125:  Extracting artifact entry: /home/runner/work/fleet/fleet/mysql-mysql8.0.44-status/status
126:  Extracting artifact entry: /home/runner/work/fleet/fleet/main-mysql8.0.44-status/status
127:  Extracting artifact entry: /home/runner/work/fleet/fleet/integration-core-mysql8.0.44-status/status
128:  Artifact download completed successfully.
129:  Artifact download completed successfully.
130:  Artifact download completed successfully.
131:  Extracting artifact entry: /home/runner/work/fleet/fleet/integration-mdm-mysql8.0.44-status/status
132:  Artifact download completed successfully.
133:  Redirecting to blob download url: https://productionresultssa0.blob.core.windows.net/actions-results/56fb93cc-43c6-4e4d-9729-498524455d84/workflow-job-run-2855d22e-56e8-5feb-91f2-5aa5b8b4a56a/artifacts/023cd68f0028f26ec8a302741c14d1063773dfa881aec556f78d076f4e6e56a0.zip
134:  Starting download of artifact to: /home/runner/work/fleet/fleet/integration-enterprise-mysql8.0.44-status
135:  Extracting artifact entry: /home/runner/work/fleet/fleet/integration-enterprise-mysql8.0.44-status/status
136:  Artifact download completed successfully.
137:  Total of 10 artifact(s) downloaded
138:  Download artifact has finished successfully
139:  ##[group]Run failed_tests=""
140:  �[36;1mfailed_tests=""�[0m
141:  �[36;1mstatus_count=0�[0m
142:  �[36;1m# Find all status files (they are in directories like 'fleetctl-mysql8.0.44-status/status')�[0m
143:  �[36;1mfor status_file in $(find ./ -type f -name 'status'); do�[0m
144:  �[36;1m  status_count=$((status_count + 1))�[0m
145:  �[36;1m  # Extract test name from parent directory (e.g., 'fleetctl-mysql8.0.44-status')�[0m
146:  �[36;1m  test_dir=$(basename $(dirname "$status_file"))�[0m
147:  �[36;1m  # Remove '-status' suffix to get the test name�[0m
148:  �[36;1m  test_name="${test_dir%-status}"�[0m
149:  �[36;1m  status_content=$(cat "$status_file")�[0m
150:  �[36;1m  echo "Processing: $status_file (Test: $test_name) with status content: $status_content"�[0m
151:  �[36;1m  if grep -q "fail" "$status_file"; then�[0m
152:  �[36;1m    echo "  ❌ Test failed: $test_name"�[0m
153:  �[36;1m    failed_tests="${failed_tests}${test_name}, "�[0m
154:  �[36;1m  else�[0m
155:  �[36;1m    echo "  ✅ Test passed: $test_name"�[0m
156:  �[36;1m  fi�[0m
157:  �[36;1mdone�[0m
158:  �[36;1mif [[ $status_count -eq 0 ]]; then�[0m
159:  �[36;1m  echo "❌ ERROR: No status files found! This indicates a workflow issue."�[0m
160:  �[36;1m  exit 1�[0m
161:  �[36;1mfi�[0m
162:  �[36;1mif [[ -n "$failed_tests" ]]; then�[0m
163:  �[36;1m  echo "❌ One or more test jobs failed: ${failed_tests%, }"�[0m
164:  �[36;1m  exit 1�[0m
165:  �[36;1mfi�[0m
166:  �[36;1mecho "✅ All test jobs succeeded."�[0m
167:  shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
168:  ##[endgroup]
169:  Processing: ./fleetctl-mysql8.0.44-status/status (Test: fleetctl-mysql8.0.44) with status content: success
170:  ✅ Test passed: fleetctl-mysql8.0.44
171:  Processing: ./vuln-mysql8.0.44-status/status (Test: vuln-mysql8.0.44) with status content: success
172:  ✅ Test passed: vuln-mysql8.0.44
173:  Processing: ./service-mysql8.0.44-status/status (Test: service-mysql8.0.44) with status content: success
174:  ✅ Test passed: service-mysql8.0.44
175:  Processing: ./integration-core-mysql8.0.44-status/status (Test: integration-core-mysql8.0.44) with status content: success
176:  ✅ Test passed: integration-core-mysql8.0.44
177:  Processing: ./mysql-mysql8.0.44-status/status (Test: mysql-mysql8.0.44) with status content: success
178:  ✅ Test passed: mysql-mysql8.0.44
179:  Processing: ./integration-enterprise-mysql8.0.44-status/status (Test: integration-enterprise-mysql8.0.44) with status content: fail
180:  ❌ Test failed: integration-enterprise-mysql8.0.44
181:  Processing: ./integration-mdm-mysql8.0.44-status/status (Test: integration-mdm-mysql8.0.44) with status content: success
182:  ✅ Test passed: integration-mdm-mysql8.0.44
183:  Processing: ./scripts-status/status (Test: scripts) with status content: success
184:  ✅ Test passed: scripts
185:  Processing: ./fast-status/status (Test: fast) with status content: success
186:  ✅ Test passed: fast
187:  Processing: ./main-mysql8.0.44-status/status (Test: main-mysql8.0.44) with status content: success
188:  ✅ Test passed: main-mysql8.0.44
189:  ❌ One or more test jobs failed: integration-enterprise-mysql8.0.44
190:  ##[error]Process completed with exit code 1.
191:  Post job cleanup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Standardize SigNoz OTEL dashboards on the deployment.environment variable

3 participants