Skip to content

Fix shared pidns disambiguation#1875

Open
NimrodAvni78 wants to merge 7 commits intoopen-telemetry:mainfrom
coralogix:nimrodavni78/fix-shared-pidns-disambiguation
Open

Fix shared pidns disambiguation#1875
NimrodAvni78 wants to merge 7 commits intoopen-telemetry:mainfrom
coralogix:nimrodavni78/fix-shared-pidns-disambiguation

Conversation

@NimrodAvni78
Copy link
Copy Markdown
Contributor

@NimrodAvni78 NimrodAvni78 commented Apr 18, 2026

Summary

  • Fix PodContainerByPIDNs to use the host PID for disambiguation when multiple containers share a PID namespace (e.g. hostPID: true pods all map to the host init_pid_ns inode)
  • Previously, the function iterated a Go map and returned the first entry, causing non-deterministic misattribution of k8s metadata across pods
  • Add unit tests for the shared PID namespace scenario

Resolves #1871

Testing

  • Added unit tests for PodContainerByPIDNs
  • Added integration tests for situations where an application and a daemonset are running together to make sure there is no misattribution
  • Tested on openshift cluster and missattributions were gone

Validation

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 18, 2026

Codecov Report

❌ Patch coverage is 80.55556% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.32%. Comparing base (4590ae5) to head (a34c8b9).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
pkg/kube/store.go 78.78% 6 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1875      +/-   ##
==========================================
- Coverage   69.40%   69.32%   -0.08%     
==========================================
  Files         277      277              
  Lines       32992    33280     +288     
==========================================
+ Hits        22897    23072     +175     
- Misses       8882     8972      +90     
- Partials     1213     1236      +23     
Flag Coverage Δ
integration-test 56.31% <0.00%> (+0.08%) ⬆️
integration-test-arm 29.38% <0.00%> (+0.12%) ⬆️
integration-test-vm-x86_64-5.15.152 30.09% <0.00%> (+0.84%) ⬆️
integration-test-vm-x86_64-6.10.6 29.60% <0.00%> (-0.44%) ⬇️
k8s-integration-test 42.70% <70.96%> (-0.23%) ⬇️
oats-test 38.16% <0.00%> (+0.26%) ⬆️
unittests 58.39% <93.54%> (-0.13%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Kubernetes metadata misattribution when multiple containers share the same PID namespace (notably hostPID: true pods mapping to the host init_pid_ns) by disambiguating via host PID, and adds coverage for the shared-PID-namespace scenario.

Changes:

  • Update PodContainerByPIDNs to prefer an exact host-PID match and avoid nondeterministic selection when PID namespaces are shared.
  • Propagate host PID into call sites that decorate spans and process events.
  • Add unit + integration tests covering shared PID namespace attribution (Deployment + hostPID DaemonSet).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/transform/k8s.go Passes host PID into PodContainerByPIDNs for correct disambiguation during decoration.
pkg/kube/store.go Implements deterministic PID-namespace lookup with host-PID-based disambiguation and safe fallback behavior.
pkg/kube/store_test.go Adds unit tests for exact-match, unambiguous fallback, and ambiguous shared-namespace behavior.
internal/test/integration/k8s/sharedpidns/k8s_sharedpidns_test.go Adds integration assertions ensuring no cross-pod metadata leakage under shared PID namespaces.
internal/test/integration/k8s/sharedpidns/k8s_sharedpidns_main_test.go Sets up a dedicated Kind-based integration scenario for shared PID namespaces.
internal/test/integration/k8s/manifests/05-hostpid-daemonset.yml Introduces a hostPID DaemonSet workload to reproduce the shared PID namespace condition.
internal/test/integration/k8s/manifests/06-obi-daemonset-sharedpidns.yml Adds an OBI config/DaemonSet manifest to instrument both workloads in the shared-PIDNS test.

Comment thread pkg/kube/store.go Outdated
Comment thread internal/test/integration/k8s/sharedpidns/k8s_sharedpidns_test.go Outdated
Comment thread internal/test/integration/k8s/sharedpidns/k8s_sharedpidns_test.go Outdated
NimrodAvni78 and others added 3 commits April 19, 2026 10:53
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Comment thread pkg/kube/store.go
continue
}
if info.ContainerID != pick.ContainerID {
s.log.Debug("cannot disambiguate shared PID namespace without matching host PID; skipping k8s decoration",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we increase log verbosity here? not sure between info/warn

Copy link
Copy Markdown
Contributor

@rafaelroquetto rafaelroquetto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-rhetorical question: why can't we just rely on the host pid?

@NimrodAvni78
Copy link
Copy Markdown
Contributor Author

NimrodAvni78 commented Apr 21, 2026

Non-rhetorical question: why can't we just rely on the host pid?

I think it might work, i can try it out and test myself
maybe @grcevski @mariomac have any input here on why it was done using pidNs instead of hostPID

Edit: not sure we can only rely on hostPID, we do register child processes to the map, but it is done via polling, if an event from a child process of a normal pod (no hostPID:true) comes without the childPID being registered yet we wont find it, right now we will find the correct pidNs and attribute it to the correct pod

i think the only situation where there will be a problem with the newly implemented solution is for hostPID:true pods with child processes, and they have yet to be registered via the polling cycle. in that case we will hit the case where we see multiple processes from multiple containers under the same pidNS, we will see multiple infos from multiple containers without an exact hostPID match, so we decide to not enrich the span instead of picking a random container (what the previous solution has done)

tbh not sure how to fix that, unless we move to monitoring fork/exec via eBPF to not allow for periods where pids are not registred

@MrAlias
Copy link
Copy Markdown
Contributor

MrAlias commented Apr 22, 2026

SIG meeting: this may be specific to deployment difference between sidecar and daemonset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Spans misattributed across pods sharing a PID namespace

5 participants