Skip to content

Commit 660b8a8

Browse files
authored
feat(otel_thread_ctx): thread-level ctx publication (#1791)
# What does this PR do? This PR implements a first version of publishing side of the [OTel proposal on thread-level context sharing](https://github.com/open-telemetry/opentelemetry-specification/pull/4947/changes). The changes are two-folds: 1. Add a C shim for accessing the correct thread-local variable that holds the thread context. Indeed the spec mandates the use of the TLSDESC dialect, but this isn't possible in Rust, which uses the classical TLS dialect (`gnu` instead of `gnu2`) by default. This is not possible to configure on stable Rust. See additional notes below for more details. 2. Add a new `otel_thread_ctx` module, similar to the `otel_process_ctx` one, which provides an abstraction over the thread-level context and handle attach/detach/modify. The part of the spec around the interned string table, hooking it up in the process-level ctx and tracer metadata is left for future work. # Motivation See the corresponding OTEP linked above for more details on the motivation. # Additional Notes TLSDESC is chosen in the spec for performance reasons. I initially found a bit sad that we have to call to a C function from `libdatadog` to retrieve the TLS address, which incurs an additional cost. I researched potential alternatives: - force Rust to use the TLSDESC dialect. This is just not possible/supported. - cache the address (which is stable per thread). But this must be put in...drumroll... another thread-local variable, since it's well, thread-local, so back to square one. One possibility would be to use a `cached_addr` thread-local and force the access to use the `initial-exec` model, which is very fast (the price to pay is that libdatadog coulnd't be `dlopen` at runtime, but I'm not sure there's any usage for that). We would trade a function call for an offset and loads, which I expect to be faster. But this requires [nightly Rust](https://doc.rust-lang.org/unstable-book/compiler-flags/tls-model.html), unfortunately, and would apply to the entirety of `libdatadog`. - inline-asm: the access sequence for TLSDESC is really simple (it's a few LLVM IR instructions, calling a function obtained from the global offset table). I looked into inline LLVM assembly for Rust, but it's not supported and is anon-goal (LLVM is almost considered as an implementation detail, as rust-gcc could one day become a viable alternative, for example). Native ASM is just too much hassle to pay off. All in all, I think there's no simple, reasonable, portable and maintainable alternative to the C shim for now. It is worth noting that calling to the C shim is likely to be still faster than the default TLS model chosen by Rust in a dynamic library (the latter requires a function call to `__get_tls_addr` anyway, but that function is more involved). I wonder if some aggressive cross-language late LTO could inline the C shim. # How to test the change? There are a bunch of tests in this PR. Ideally we will later test this against other thread-level ctx readers implementations once we hook this work in the FFI; this is left for a follow-up. Co-authored-by: yann.hamdaoui <yann.hamdaoui@datadoghq.com>
1 parent 15860bb commit 660b8a8

7 files changed

Lines changed: 671 additions & 2 deletions

File tree

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

libdd-library-config/src/otel_process_ctx.rs

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// Copyright 2021-Present Datadog, Inc. https://www.datadoghq.com/
1+
// Copyright 2026-Present Datadog, Inc. https://www.datadoghq.com/
22
// SPDX-License-Identifier: Apache-2.0
33

44
//! Implementation of the publisher part of the [OTEL process
@@ -8,7 +8,8 @@
88
//!
99
//! Process context sharing implies concurrently writing to a memory area that another process
1010
//! might be actively reading. However, reading isn't done as direct memory accesses but goes
11-
//! through the OS, so the Rust definition of race conditions doesn't really apply.
11+
//! through the OS, so the Rust definition of race conditions doesn't really apply. We also use
12+
//! atomics and fences, see MappingHeader's documentation.
1213
1314
#[cfg(target_os = "linux")]
1415
#[cfg(target_has_atomic = "64")]

libdd-profiling/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,4 +81,5 @@ strum = { version = "0.26", features = ["derive"] }
8181
tempfile = "3"
8282

8383
[build-dependencies]
84+
cc = "1.1.31"
8485
cxx-build = { version = "1.0", optional = true }

libdd-profiling/build.rs

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,20 @@ fn main() {
1111

1212
println!("cargo:rerun-if-changed=src/cxx.rs");
1313
}
14+
15+
// Only compile the TLS shim on Linux; the thread-level context feature is Linux-only.
16+
#[cfg(target_os = "linux")]
17+
{
18+
let mut build = cc::Build::new();
19+
20+
// - On aarch64, TLSDESC is already the only dynamic TLS model so no flag is needed.
21+
// - On x86-64, we use `-mtls-dialect=gnu2` (supported since GCC 4.4 and Clang 19+) to force
22+
// the use of TLSDESC as mandated by the spec. If it's not supported, this build will
23+
// fail.
24+
#[cfg(target_arch = "x86_64")]
25+
build.flag("-mtls-dialect=gnu2");
26+
27+
build.file("src/tls_shim.c").compile("tls_shim");
28+
println!("cargo:rerun-if-changed=src/tls_shim.c");
29+
}
1430
}

libdd-profiling/src/lib.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,5 +14,6 @@ pub mod cxx;
1414
pub mod exporter;
1515
pub mod internal;
1616
pub mod iter;
17+
pub mod otel_thread_ctx;
1718
pub mod pprof;
1819
pub mod profiles;

0 commit comments

Comments
 (0)