OTEP: Thread Context: Sharing Thread-Level Information with external readers#4947
OTEP: Thread Context: Sharing Thread-Level Information with external readers#4947scottgerring wants to merge 13 commits intoopen-telemetry:mainfrom
Conversation
…elemetry eBPF Profiler
b0f4345 to
580100a
Compare
… for eBPF profiler
3cf6426 to
047db5a
Compare
e687c4b to
0504443
Compare
|
Now that #4719 has been accepted, this one is ready for review! |
Co-authored-by: Attila Szegedi <szegedi@users.noreply.github.com>
|
|
||
| * **Inability to correlate observations with context metadata** - Without visibility into context information such as the active span, an external reader cannot attribute its observations to particular HTTP endpoints or other request characteristics | ||
| * **Lack of request metadata for samples collected on threads with un-sampled traces** - in many cases, the active span observed by the external process *may not* be sampled by the OpenTelemetry SDK. In these cases it is useful to make extra metadata available directly to the external process, so that its samples maintain useful context even in the face of sampling on the tracer side. | ||
|
|
There was a problem hiding this comment.
Preventing puplication - out-of-process readers, such as OTel OBI, could prevent redundant instrumentation efforts if OTel SDK instrumented processes shared necessary thread context information.
|
|
||
| ## Explanation | ||
|
|
||
| We propose a mechanism for OpenTelemetry SDKs to publish thread-level information reflecting the context of the active request, if any, through a standard format based on the ELF Thread Local Storage (TLS) TLSDESC dialect. |
There was a problem hiding this comment.
As the focus of this OTEP is on ELF Thread Local Storage, maybe there should be a paragraph about the support scope. Something like that is limited to Unix environments and exclude Windows and OSX.
| # Thread Context: Sharing Thread-Level Information with External Readers | ||
|
|
||
| Introduce a standard mechanism for OpenTelemetry SDKs to publish thread-level attributes for out-of-process readers such as the OpenTelemetry eBPF profilers. | ||
| It is related to [OTEP 4719: Process Context](4719-process-ctx.md). |
There was a problem hiding this comment.
Maybe it should be mentioned how this OTEP is related to 4719, how it extends the effort and the key differences.
|
|
||
| The following values are stored: | ||
|
|
||
| * `threadlocal.schema_version` - type and version of the schema - initially "tlsdesc\_v1\_dev" for experimentation (to be changed to "tlsdesc\_v1" once the OTEP gets merged) |
There was a problem hiding this comment.
Is there progress or discussions with the SemConv SIG to add these attributes and claim the threadlocal namespace?
Changes
External readers like the OpenTelemetry eBPF Profiler operate outside the instrumented process and cannot collect information about active OpenTelemetry traces running within the process they are sampling. We (@ivoanjo and @scottgerring) propose a mechanism for OpenTelemetry SDKs to publish thread-level attributes — including trace ID, span ID, and configurable custom attributes — through a standard format based on the ELF Thread Local Storage (TLSDESC) dialect.
Because this mechanism relies on having a native component and knowing when a runtime switches contexts, we consider it optional for SDKs to support, as some runtimes (or even runtime versions) may not be able to feasibly/efficiently (or undesirable, maintenance-wise) to implement it.
When a request context is attached or detached from a thread, the SDK publishes select information to a thread-local variable that external readers such as the eBPF profiler can discover and read. This enables correlation of profiling samples with request context, even when the active span was not sampled by the SDK.
This builds on and extends OTEP 4719: Process Context, using its process context mechanism to store the static, process-scoped reference data (schema version and attribute key map) that the thread-local records reference.
Why open as draft: OTEP 4719 is a dependency for this OTEP, and thus we'll need to wait for that OTEP to land to ensure we have a solid underpinning to build on.
This OTEP is based and heavily inspired on the custom-labels work by [Polar Signals](https://www.polarsignals.com/) and the universal profiling integration by Elastic — big thanks to them for the prior art and inspiration.and everyone that collaborated with us on the draft google doc this is based on.
CHANGELOG.mdfile updated for non-trivial changes