feat: Implement process context publishing (OTEP-4719)#3460
feat: Implement process context publishing (OTEP-4719)#3460scottgerring wants to merge 6 commits intoopen-telemetry:mainfrom
Conversation
0c9f6f1 to
f57d295
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3460 +/- ##
=======================================
- Coverage 83.2% 83.2% -0.1%
=======================================
Files 128 130 +2
Lines 25086 25364 +278
=======================================
+ Hits 20896 21125 +229
- Misses 4190 4239 +49 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f57d295 to
1099f1b
Compare
| @@ -0,0 +1,139 @@ | |||
| //! Process context sharing via memory-mapped regions. | |||
There was a problem hiding this comment.
I've pulled this up to represent the public API, subbing in either a linux-64 or a no-op impl based on the target
| tonic-prost-build = { workspace = true } | ||
| tempfile = { workspace = true } | ||
| serde_json = { workspace = true } | ||
| serial_test = { workspace = true } |
There was a problem hiding this comment.
Makes it easier for testing
|
@cijothomas @lalitb it would be great to get alignment on where this should live (see my idea of the options in the PR text) first! |
| /// is Undefined Behavior, for example. We assume that a forking runtime (such as Python or | ||
| /// Ruby) that doesn't follow with an immediate `exec` is already "taking that risk", so to | ||
| /// speak (typically, if no thread is ever spawned before the fork, things are mostly fine). | ||
| pub(crate) fn publish_raw_payload(payload: Vec<u8>) -> Result<(), Error> { |
There was a problem hiding this comment.
I wonder if the whole fork business should really be here? I suppose support for languages with a forking runtime isn't really Datadog-specific, but one could also leave that out for a first implementation. I don't really have an opinion myself, just asking.
There was a problem hiding this comment.
Good catch; this is certainly not a common pattern in Rust! I will remove it.
There was a problem hiding this comment.
I also meant the whole pid business in the code. That is, I think we can simplify the code a bit if we don't care about forks (basically removing the pid field and the case Some(handle) => below when pid != getpid()). We would also get away from the whole "memory allocation is UB after a fork" trap: we don't pretend to handle arbitrary forks anymore.
Co-authored-by: Yann Hamdaoui <yann.hamdaoui@gmail.com> Co-authored-by: Scott Gerring <scottgerring@users.noreply.github.com>
One or more co-authors of this pull request were not found. You must specify co-authors in commit message trailer via: Supported
Alternatively, if the co-author should not be included, remove the Please update your commit message(s) by doing |
Hey @yannham , do you mind signing the CTA now that i've taken your suggestion ? |
Summary
Implements the publisher side of OTEP-4719 (Process Context Sharing) behind the
process-contextfeature flag onopentelemetry-proto. This is helpful for signal correlation - and together with the thread context sharing (OTEP-4947), will allow the Rust community to have request-correlated profiling support. This is a linux-only mechanism by design and the API to support it provides a no-op impl for other targets.On Linux, this publishes SDK resource attributes to a named memory mapping (
OTEL_CTX) so external readers (such as the OpenTelemetry eBPF Profiler) can discover process metadata without direct integration or process activity.This is a precursor for OTEP-4947 which will enable per-thread resource attribution building on top of the process-level mechanism implemented here.
The implementation is adapted from the implementation used in libdatadog, a shared library we use underneath many of our own datadog libraries.
Usage
Decisions
Why
opentelemetry-proto?I reckon this probably belongs in
opentelemetry-sdk, but becauseopentelemetry-protodepends on that for its transformation layer we can't reference our proto types there (see #3045).We have options:
I'd like to do (3) (and am happy to impl it myself), but as I poked this issue recently and heard nothing back I am hesitant to put it in the dependency path for this.
Publishing mechanism
I've made this explicit - the user must explicitly use this API to publish - look at the updated example code. I'd prefer if this happened automatically, but because each signal stands alone and is provided its own resource, it is difficult to envisage how an automatic publishing mechanism should work. If anyone has any better ideas lmk.
Merge requirement checklist
CHANGELOG.mdfiles updated for non-trivial, user-facing changes