feat(v17): Implement tracing channels by logaretm · Pull Request #4670 · graphql/graphql-js

logaretm · 2026-04-18T04:22:20Z

This PR implements tracing channel support as discussed in #4629

graphql-js auto-resolves node:diagnostics_channel at module load via process.getBuiltinModule, with a require fallback for Node 20.0–20.15. On runtimes where node:diagnostics_channel is unavailable (e.g. browsers) the load silently no-ops and emission sites short-circuit. APMs subscribe purely on the subscriber side — no interaction with the graphql API required.

Most of the changes are tests; the actual source changes are mostly tame and introduce wrappers around pre-existing code paths.

API

graphql-js vendors a minimal structural subset of the tracing channel types for internal use. Ideally these wouldn't include deep specifics like runStores, but that surface is required to maintain the sync-or-async behavior we have today. The alternative is to normalize to async for all calls which isn't ideal and probably would have a perf implication.

The implemented channels are:

Channel	Context fields
`graphql:parse`	`source`
`graphql:validate`	`schema`, `document`
`graphql:execute`	`schema`, `document`, `variableValues`, `operationName` (lazy), `operationType` (lazy)
`graphql:execute`	`schema`, `operation`, `variableValues`, `operationName`, `operationType`
`graphql:subscribe`	`schema`, `document`, `variableValues`, `operationName` (lazy), `operationType` (lazy)
`graphql:resolve`	`fieldName`, `parentType`, `fieldType`, `args`, `isTrivialResolver`, `fieldPath` (lazy)

Usage

APMs and users subscribe by channel name on their own node:diagnostics_channel import — no graphql API call needed:

import dc from 'node:diagnostics_channel';

// Subscribe to any of the 5 tracing channels; an APM would typically
// subscribe to all of them.
dc.tracingChannel('graphql:resolve').subscribe({
  start: () => {
    // start a span or log or whatever....
  },
  end: ({ result }) => {
    // if result is present then it is a sync resolve, otherwise they should end the span in asyncEnd
  },
  asyncEnd: ({ result }) => {
    // end the span
  },
  error: ({ error }) => {
    // report resolve errors
  }
});

I tested this locally in a dashboard and it produces waterfall structure like so with basic Sentry integration:

You can view that trace here.

closes #4629

vercel · 2026-04-18T04:22:27Z

@logaretm is attempting to deploy a commit to the The GraphQL Foundation Team on Vercel.

A member of the Team first needs to authorize it.

linux-foundation-easycla · 2026-04-18T04:22:29Z

The committers listed above are authorized under a signed CLA.

✅ login: logaretm / name: Abdelrahman Awad (03c9fea, 04a3054, 11e890d, 13ccd02, 1d469bb, 23124e5, 2a9d88a, 2e9ebd5, 68d0f57, 82d00ac, 8a1e473, 8a70ca2, 9ad8549, 9efde95, abc84db, acb1945, b84efbd, bbddc0c, d3f2943, ea6e2d5, ec51b09)

Copilot

Pull request overview

This PR introduces opt-in diagnostics tracing channels for GraphQL core operations while keeping the codebase runtime-agnostic (no direct node:diagnostics_channel dependency) by requiring consumers to register a compatible module via enableDiagnosticsChannel().

Changes:

Added src/diagnostics.ts with minimal duck-typed interfaces and maybeTrace* helpers, plus public exports from src/index.ts.
Wrapped parse, validate, execute/subscribe, and field resolution to emit graphql:* tracing channel lifecycles with lazy context fields where appropriate.
Added unit tests (via a fake diagnostics channel) and a Node integration test project to validate real node:diagnostics_channel behavior.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/diagnostics.ts	New core diagnostics/tracing channel registration + trace helper implementations.
src/index.ts	Exposes `enableDiagnosticsChannel` and related public types from the package entrypoint.
src/language/parser.ts	Wraps `parse()` with tracing emission.
src/validation/validate.ts	Wraps `validate()` with tracing emission.
src/execution/execute.ts	Wraps `execute()`/`subscribe()` and related execution entrypoints with tracing + lazy operation metadata.
src/execution/Executor.ts	Wraps field resolver invocation to emit `graphql:resolve` lifecycle events with lazy `fieldPath`.
src/testUtils/fakeDiagnosticsChannel.ts	Adds fake tracing channel implementation to support deterministic unit tests.
src/tests/diagnostics-test.ts	Verifies registration behavior and channel identity semantics.
src/language/tests/parser-diagnostics-test.ts	Validates `graphql:parse` lifecycle emission.
src/validation/tests/validate-diagnostics-test.ts	Validates `graphql:validate` lifecycle emission including error paths.
src/execution/tests/execute-diagnostics-test.ts	Validates `graphql:execute` lifecycle emission for sync/async/error paths.
src/execution/tests/subscribe-diagnostics-test.ts	Validates `graphql:subscribe` emission for sync vs async subscription setup.
src/execution/tests/resolve-diagnostics-test.ts	Validates `graphql:resolve` emission for sync/async/error and lazy `fieldPath`.
resources/integration-test.ts	Adds the new `diagnostics` integration test project to the integration suite.
integrationTests/diagnostics/test.js	Real Node integration tests for channel lifecycles and ALS propagation.
integrationTests/diagnostics/package.json	Adds diagnostics integration test project metadata and node engine constraint.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

yaacovCR · 2026-04-18T19:19:04Z

+ *   - `graphql:subscribe`
+ *   - `graphql:resolve`
+ *
+ * Calling this repeatedly is safe: subsequent calls replace the stored


is a potential failure mode if this is for some reason called by different APMs with different values?

As long as it is the actual diagnostic channel module, it will work fine. The failure mode is if one APM provides an implementation and a different one provides another. The diagnostic channel runtime implementations today are idempotent AFAIK so it doesn't matter as long everyone ends up on the same channels.

I think this is a highly specialized case, I initially considered either throwing to force only implementation over the other OR silently NOOP but I that favors whoever comes first which isn't ideal.

WDYT, should this be tweaked?

looping @Qard here as he might have some thoughts here.

We could support multiple diagnostic channels just to avoid this failure mode. Most of the time there would be just one and we would pay the cost of indexing the one item array on every publish, but it seems safest. Thoughts?

in terms of costs in general the introspection query seems to take a 5% hit.

I wonder if that could be reduced if instead of using wrapping functions, we embedded the published decision within the execution code which already tests for sync/async. I guess we would also need some logic to make sure is Publishing/hasSubscribers is uniform

We could support multiple diagnostic channels just to avoid this failure mode...

Storing two refs means publishing the same event to the same underlying channel twice, which double-fires every subscriber or more depending on how many enableDiagnosticsChannel calls with the same dc module were made.

In server environments it is very unlikely they would resort to anything else other than the diagnostic_channels module available on the runtime because it is already on all these runtimes, yet it is still possible. I think this failure is not worth the added complexity here.

Dedup isn't easy either, import/require would normally resolve the same module, but with mixed cjs/esm cases they would have different references so double fires will happen in that case. A shared global channel is a strength of diag channels, so we should lean on it and trust the users/runtimes do it properly.

We could throw in that case specifically when different dc implementations are detected to make this visible. What do you think?

I wonder if that could be reduced if instead of using wrapping functions, we embedded the published decision within the execution code...

We can try but the issue is runStores will have to wrap everything in the code paths to maintain AsyncLocalStorage store propagation from start->asyncEnd which means either we duplicate a lot of code (traced vs. untraced paths) or take the hit 🤔

I might've overthinked that last part, pushed 908e1ee to see if it can help. It caches the trace decision on the executor instance and avoids allocating closures if no traced calls are made.

Thanks for checking, I had AI run similar tests but it seems like it used a different module loader or createRequire and they were two different references 🤔 but maybe the test was flawed.

Uh-oh I would be interested in how this could fail? Do you have a way of sharing the failing test code?

Sure I put it up here:

https://github.com/logaretm/node-dc-identity-repro

That makes sense, and you also point to how we can deduplicate around the identity of dc.tracingChannel, so that seems workable for this hopefully unlikely failure mode.

I would assume import * as dc from 'node:diagnostics_channel would be the same as dc = import('node:diagnostics_channel), both would be covered by checking dc.tracingChannel

Actually, @logaretm on further reflection, because APMs would own the process and instructions for browser bundlers, I think we can use the original mechanism you suggested which implicitly enables directly from node. Apologies for the back and forth.

No worries, happy to re-work the PR and thanks for discussing it!

yaacovCR · 2026-04-21T07:52:05Z

The implemented channels are...

Please excuse my limited understanding of the current tracing space, but is there a concept of nesting of channels? In theory, I would expect we would want to:

(1) nest the "execute" channel for subscription events under "subscribe" and
(2) nest "resolve" under "execute"

Naively, I would assume this would be very useful information to understanding the hierarchy.

And I would think this would provide a way to pass the overarching context of the subscribe option to execute (i.e. execute would always have the document, as in your description of the channels/context within the linked issue #4629)?

And similarly, we could also have the overall execute (or subscribe/execute) passed to resolve?

logaretm · 2026-04-21T16:18:03Z

@yaacovCR Happy to elaborate more!

is there a concept of nesting of channels?

From a purely diag/tracing channels API perspective, there's no parent/child relationship in the API. Each channel is independent. graphql:execute and graphql:resolve don't know about each other at the module level.

That said, the hierarchy you're describing does exist at runtime, it's just reconstructed on the subscriber side rather than on the emitter side. The mechanism is done through AsyncLocalStorage:

You can check the image I had in the PR description that represents how it looks like, or check this trace view as you can see the spans are nested under one another.

This works because tracing channels propagate the async execution context (unlike plain event emitters). Subscribers bind an AsyncLocalStorage to the channel, and every downstream channel fired under that sees the same store via async_hooks. Think of it as an implicit requestId threaded through by the runtime.

It's why I kept the runStores wrapper in the types and impl even after dropping tracePromise. TracingChannel.traceSync/tracePromise wrap runStores for you, but we roll the lifecycle manually here to handle the mixed sync-or-promise return path cleanly, so we need to call runStores ourselves to preserve ALS propagation.

Now, for graphql as the emitter, there are a couple of reasons not to encode the hierarchy ourselves:

GraphQL wouldn't know if the channels are nested, unless we implement logic that does that or use ALS ourselves but that's strictly a node API that's not polyfillable.
We'd have to track the hierarchy unconditionally even when nothing is subscribed because we don't know at emit time whether a subscriber cares. That defeats the hasSubscribers short-circuit we use on the hot paths today.
Subscribers can attach to one channel or several or all of them. If graphql baked the hierarchy into the ctx object, we risk breaking it for those partial subs.

The good news is subscribers will have access to document and other parent event objects naturally through their ALS if they want to, graphql doesn't have to do it and subscribers get it for free.

Here is a quick example snippet:

import { AsyncLocalStorage } from 'node:async_hooks';
import dc from 'node:diagnostics_channel';
import { enableDiagnosticsChannel } from 'graphql';

enableDiagnosticsChannel(dc);

const executeCtx = new AsyncLocalStorage();

const executeChannel = dc.tracingChannel('graphql:execute');
const resolveChannel = dc.tracingChannel('graphql:resolve');

// Bind on `start` only, the store activates when execute's runStores frame
// opens and stays active through the entire lifecycle. `end`, `asyncStart`,
// `asyncEnd`, and `error` all fire inside that frame (sync handlers directly,
// async ones via async_hooks propagation), so they inherit the same store
// without any extra wiring.
executeChannel.start.bindStore(executeCtx, ctx => ({
  document: ctx.document,
  operationName: ctx.operationName,
  startedAt: Date.now(),
}));

// Subscribe to the full execute lifecycle. Every handler sees the same store.
executeChannel.subscribe({
  start: () => {
    const { operationName } = executeCtx.getStore();
    console.log(`[execute:start] ${operationName}`);
  },
  end: () => {
    const { operationName, startedAt } = executeCtx.getStore();
    console.log(`[execute:end]   ${operationName} (+${Date.now() - startedAt}ms sync)`);
  },
  asyncStart: () => {
    const { operationName } = executeCtx.getStore();
    console.log(`[execute:asyncStart] ${operationName}`);
  },
  asyncEnd: () => {
    const { operationName, startedAt } = executeCtx.getStore();
    console.log(`[execute:asyncEnd]   ${operationName} (+${Date.now() - startedAt}ms total)`);
  },
  error: ({ error }) => {
    const { operationName } = executeCtx.getStore();
    console.log(`[execute:error] ${operationName} -> ${error.message}`);
  },
});

// Nested channels inherit the parent's store too, because resolve fires
// beneath execute's runStores frame.
resolveChannel.subscribe({
  start: resolveCtx => {
    const parent = executeCtx.getStore();
    console.log(`  [resolve:start] ${resolveCtx.fieldPath} (under ${parent.operationName})`);
  },
  asyncEnd: resolveCtx => {
    const parent = executeCtx.getStore();
    console.log(`  [resolve:asyncEnd] ${resolveCtx.fieldPath} (under ${parent.operationName})`);
  },
  error: ({ error }, name) => {
    const parent = executeCtx.getStore();
    console.log(`  [resolve:error] under ${parent.operationName}: ${error.message}`);
  },
});

yaacovCR · 2026-04-22T10:24:58Z

@yaacovCR Happy to elaborate more!

This was hugely helpful. On this background, I am wondering whether instead of:

Channel	Context fields
`graphql:parse`	`source`
`graphql:validate`	`schema`, `document`
`graphql:execute`	`schema`, `document`, `variableValues`, `operationName` (lazy), `operationType` (lazy)
`graphql:subscribe`	`schema`, `document`, `variableValues`, `operationName` (lazy), `operationType` (lazy)
`graphql:resolve`	`fieldName`, `parentType`, `fieldType`, `args`, `isTrivialResolver`, `fieldPath` (lazy)

Maybe something like the below would be better for our official tracing channels:

Channel	Context fields
`graphql:parse`	`source`
`graphql:validate`	`schema`, `document`
`graphql:execute`	`schema`, `document`, `variableValues`, `operationName` (lazy), `operationType` (lazy)
`graphql:executeQueryMutationOrSubscriptionEvent`	`schema`, `operation`, `variableValues`, `operationName`, `operationType`
`graphql:resolve`	`fieldName`, `parentType`, `fieldType`, `args`, `isTrivialResolver`, `fieldPath` (lazy)

The idea would be that subscribe is not really "a thing" according to the spec, and in theory our separate execute and subscribe endpoints are just a way to keep track of the relationship between execute and executeSubscriptionEvent. This would nest executeQueryMutationOrSubscriptionEvent under every execute call for queries, mutations, and subscriptions, with one executeQueryMutationOrSubscriptionEvent for each query/mutation execute, and a variable number of executeQueryMutationOrSubscriptionEvent for a subscription execute.

The downside is you can no longer top-level distinguish between subscriptions and queries/mutations. Do we want to preserve that? Do we want to top-level distinguish between each of our operations? What about between queries and mutations?

Again, I am asking these questions with very little knowledge of how the tracing system is used in practice, hopefully you can guide me here?

Looping in @graphql/tsc if they have any thoughts.

logaretm · 2026-04-22T21:27:51Z

@yaacovCR

Maybe something like the below would be better for our official tracing channels:

We could do graphql:operation as top level and graphql:execute 1x for queries/mutations and N x per sub. Does that track or do we want that lengthy name instead 😅

The downside is you can no longer top-level distinguish between subscriptions and queries/mutations...

The distinction can be done with the args already sent to the execute channel via the operationType property we have right now. Channel level distinction is unlikely to be worth it given how low the overhead is.

I initially added a subscribe channel because it felt like its own thing, but it is better to name channels from GQL's POV, like "I'm doing this thing right here....".

What do u think?

yaacovCR · 2026-04-24T04:02:07Z

What do u think?

Naming = so hard. Definitely like shorter. But I do prefer consistency with the other verbs we have for our channels.

We could consider:

execute vs. executeQueryMutationOrSubscriptionEvent (not great if another operation type lands)
operation vs. execute (not great as operation = noun)
execute vs. operation (ha! both are not great as it's not clear really which should be the wrapper, although this is incorrect as the nested executeSubscriptionEvent is not an operation, it's more like a sub-operation which is not a thing.)
executeRequest vs. executeOperation (not correct, a subscription operation does not contain multiple underlying operations, as above)
executeRequest vs. executeRootSelectionSet (correct and technically disambiguated, as when we execute a single root field to create the source event stream for a subscription, we do not [currently] allow for merging multiple fields by utilizing multiple root fields, so on the subscription side it is [presently] executeSingleAllowedRootField and if we ever change that we could just call that part createSourceEventStream)

I am leaning toward the last option, again looping in better namers @graphql/tsc @graphql/tsc-emeriti

yaacovCR · 2026-04-24T04:22:18Z

@logaretm

From my machine on 17.x.x I ran:

npm run benchmark -- benchmark/introspectionFromSchema-benchmark.js benchmark/list-sync-benchmark.js benchmark/list-async-benchmark.js benchmark/list-asyncIterable-benchmark.js benchmark/object-async-benchmark.js --revs pr/4670 17.x.x

And got:

On our real-world large sync query we got a 7% slowdown. This is the most concerning result.
On a synthetic sync query with a lot of list items we got a 23% slowdown, also concerning, although the query is artificial.
On synthetic async queries, we were at parity or faster, somehow thanks to this PR and presumably v8 magic!

Not sure if you have any insight. I would assume most of this is secondary to graphql:resolve. Ideally we would of course have no slowdowns at all, at least on these benchmarks when tracing is not enabled.

logaretm · 2026-04-24T06:32:02Z

@yaacovCR I will re-work the PR to use the autoloading snippet and I will dig into the benchmarks and see what I can do, ideally it should offer close to zero slowdown so I must be doing something wrong.

yaacovCR · 2026-04-24T06:52:04Z

executeRequest vs. executeRootSelectionSet (correct and technically disambiguated, as when we execute a single root field to create the source event stream for a subscription, we do not [currently] allow for merging multiple fields by utilizing multiple root fields, so on the subscription side it is [presently] executeSingleAllowedRootField and if we ever change that we could just call that part createSourceEventStream)

I am leaning toward the last option, again looping in better namers @graphql/tsc @graphql/tsc-emeriti

Although maybe we should do instead:
graphql:execute and graphql:execute:rootSelectionSet
or even
graphql:execute and graphql:execute:root-selection-set

is there a convention in the diagnostic channels world?

logaretm · 2026-04-24T07:29:24Z

@yaacovCR Not a strict convention, but this is what is currently recommended.

For the names, I think we can keep the inner one called graphql:execute for the outer one, how about:

graphql:request
graphql:run
graphql:process
graphql:handle

Naming is hard 😅

yaacovCR · 2026-04-24T13:00:41Z

Not necessarily related, but possibly, what would a trace look like for incremental delivery?

Options abound because of the disconnect between execution groups and delivery groups.

Introduces src/diagnostics.ts which exposes a single public function, enableDiagnosticsChannel(dc), that APMs use to register a node:diagnostics_channel-compatible module with graphql-js. Channel names (graphql:parse, graphql:validate, graphql:execute, graphql:resolve, graphql:subscribe) are owned by graphql-js so multiple APM subscribers converge on the same cached TracingChannel instances. Structural MinimalChannel / MinimalTracingChannel / MinimalDiagnosticsChannel types describe the subset of the Node API graphql-js needs; no dependency on @types/node is introduced, and no runtime-specific import is added to the core. Subsequent commits wire emission sites into parse, validate, execute, subscribe, and the resolver path.

Wraps parse() with the graphql:parse channel when any subscribers are attached. The published context carries the Source (or string) handed to parse() so APM tools can record the GraphQL source on their span. Adds two shared helpers on src/diagnostics.ts used by every subsequent emission site: maybeTraceSync(name, ctxFactory, fn) maybeTracePromise(name, ctxFactory, fn) They share a shouldTrace gate that uses hasSubscribers !== false so runtimes without an aggregated hasSubscribers getter (notably Node 18, see nodejs/node#54470) still publish; Node's traceSync/tracePromise gate each sub-channel internally. Context construction is lazy through the factory so the no-subscriber path allocates nothing beyond the closure.

…gnostics_channel Adds an integrationTests/diagnostics/ project that installs the built graphql package, registers node:diagnostics_channel, and asserts that subscribers to graphql:parse receive the expected start/end/error lifecycle events. Complements the unit tests (which use a FakeDc) with coverage against a real TracingChannel. The in-tree eslint rules forbid node: imports and flag TracingChannel as experimental in src/; integrationTests/* overrides already allow the node: import, and a file-scope disable handles the experimental warning. The integration project pins engines to >=22 to match where graphql-js v17 is actively tested. This file grows as later channels (validate, execute, subscribe, resolve) are wired up.

Wraps the public validate() entry point with maybeTraceSync. The context carries both the schema being validated against and the parsed document, so APM tools can associate the span with a concrete GraphQL operation. validate() is synchronous so traceSync is appropriate; errors thrown by assertValidSchema or the visitor rethrow propagate through the channel's error sub-channel. Also extracts the in-memory TracingChannel fake used by parser-diagnostics into src/__testUtils__/fakeDiagnosticsChannel.ts so subsequent emission sites (validate, execute, subscribe, resolve) can reuse it without duplicating the lifecycle simulation.

Wraps the public execute / experimentalExecuteIncrementally / executeIgnoringIncremental / executeSubscriptionEvent entry points with maybeTraceMixed so subscribers see a single graphql:execute span per top-level operation invocation, including every per-event execution of a subscription stream. The context exposes the document, schema, variableValues, operationName, and operationType. operationName and operationType are lazy getters that only resolve the operation AST (getOperationAST) if a subscriber reads them, keeping the gate cheap for APMs that do not need them. The ValidatedExecutionArgs variant used by executeSubscriptionEvent uses the already-resolved operation from validation and therefore needs no lazy lookup; it carries the resolved operation in place of the (unavailable) document. Adds a shared maybeTraceMixed helper to src/diagnostics.ts for PromiseOrValue-returning functions. It delegates start / end / error to Node's traceSync (which also runs fn inside end.runStores for AsyncLocalStorage propagation) and adds the asyncStart / asyncEnd / error-on-rejection emission when the return value is a promise.

… shape maybeTraceMixed now publishes asyncStart immediately once it knows the operation is in-flight asynchronously and asyncEnd in a .finally once the promise settles, matching the shape subscribers expect from a tracePromise-style channel (asyncStart brackets the async tail rather than being paired with asyncEnd in a single microtask). Also brings the in-memory FakeTracingChannel in line with Node's actual traceSync behavior: Node sets ctx.result before publishing end, letting subscribers check isPromise(ctx.result) inside their end handler to decide whether asyncEnd will follow or the span is complete. The fake now does the same so unit tests aren't looser than real-Node behavior.

Wraps the public subscribe() entry point with maybeTraceMixed using the same ExecutionArgs-based context factory as graphql:execute. The subscription setup path may return sync (when the subscribe resolver is synchronous) or async (when the resolver returns a promise resolving to an AsyncIterable), so maybeTraceMixed's sync / promise branching fits exactly. Per-subscription-event executions continue to publish on graphql:execute via executeSubscriptionEvent, which was wired in the previous commit. The graphql:subscribe context therefore owns the document reference and covers the setup span; subscribers that need to correlate per-event execute spans to their parent subscription use AsyncLocalStorage context propagation provided by Channel.runStores.

Wraps the single resolver invocation site in Executor.executeField with maybeTraceMixed so each field resolver call emits start/end (and the async tail when the resolver returns a promise, or error when it throws). The emission is inside the engine — no field.resolve mutation, no schema walking, nothing stacks across multiple APM subscribers. The published context captures the field-level metadata APM tools use to name and attribute per-field spans: - fieldName from info.fieldName - parentType from info.parentType.name - fieldType stringified info.returnType - args resolver arguments, by reference - isTrivialResolver (field.resolve === undefined), a hint for subscribers that want to skip default property-access resolvers - fieldPath lazy getter, serializes info.path only on first read since it is O(depth) work and many subscribers depth-filter without reading it graphql:resolve is the noisiest channel (one event per field per operation); all emission sites are gated on hasSubscribers before any context is constructed, so it remains zero-cost when no APM is loaded.

maybeTraceMixed previously delegated start / end to traceSync and then attached its own .then / .finally outside that scope. That worked for event ordering but broke AsyncLocalStorage propagation: a subscriber that called channel.start.bindStore(als, ...) could read that store in its start and end handlers but not in asyncStart or asyncEnd, because those fired in a .then attached outside the start.runStores scope. Promise continuations inherit the AsyncLocalStorage context of the frame that attaches them, so the fix is to attach .then inside the start.runStores block. This mirrors Node's own TracingChannel.tracePromise structure exactly, just with an additional sync branch that short- circuits to [start, end] when fn returns synchronously. Also updates the FakeChannel helper to match: its runStores now publishes the context on entry (the behavior Node's Channel.runStores has), so the fake traceSync / tracePromise implementations match real Node's event counts without the old end.runStores workaround. Adds an integration test that binds a store on graphql:execute's start sub-channel and asserts every lifecycle handler sees it when a resolver returns a promise.

maybeTraceMixed was firing asyncStart and asyncEnd back-to-back inside the .then callback. That left no meaningful window between the two events: for APM subscribers, 'async work begins' and 'async work completes' fired in the same microtask, making the pair useless for measuring the async duration. Correct shape: asyncStart publishes synchronously as soon as we know the operation is in-flight asynchronously (right after end), and asyncEnd publishes in a .finally after settlement. Both the .then and .finally are attached inside start.runStores, so AsyncLocalStorage context is preserved end-to-end via async_hooks. Event order now: sync path: [start, end] async path: [start, end, asyncStart, ..., asyncEnd] async error: [start, end, asyncStart, error, asyncEnd] Mirrors the same fix in the FakeTracingChannel helper.

Calling enableDiagnosticsChannel again with the same dc is now a no-op; a different dc throws to surface the misconfiguration rather than silently replacing channel references that subscribers already hold.

…el function

On parse/validate/execute/subscribe entry points, the previous `maybeTrace*` wrapper allocated a ctx-factory closure and a body closure on every call before short-circuiting on `hasSubscribers`. Inline the gate as `if (!channel?.hasSubscribers) return impl(args)` so closures and ctx objects are only allocated when a subscriber is attached. Recovers roughly half of the tracing-channel overhead seen in benchmarks. List-sync is back within noise of the pre-tracing baseline. Introspection remains ~3% slower due to the per-field resolve gate; the shape-change fix (dropping `_resolveChannel` from Executor) in the previous commit already captured the other half of that regression.

logaretm · 2026-04-24T14:38:20Z

@yaacovCR So I pushed a couple of optimizations, I think it is down significantly after running the benchmark u gave me. It was at 6% overhead at the time of your posting on my machine and now its down to 2.3%-3%.

Can u have a go and see if that's acceptable? I have more ideas but will require juggling alternate Executor class implementations.

Not necessarily related, but possibly, what would a trace look like for incremental delivery?

Right now they aren't handled and any deferred executions will be orphaned (up to the APM how to deal with that) since they will still fire the resolve channels.

So it would look like this:

  time ──────────────────────────────────────────────────────►

  execute   ◄────── runStores frame #1 ──────►
  ├─ resolve fieldA        (#2, parented to #1)  ✓
  ├─ resolve fieldB        (#3, parented to #1)  ✓
  └─ returns {initialResult, subsequentResults}
                      ▲ frame #1 ends here ─ generator has not run yet

          ─── gap: framework owns the generator ───

           await subsequentResults.next()   (fresh async context, no ALS)
                                                                       
  resolve fieldC (@defer)  (#4, ⚠ ORPHANED — no parent span)
  resolve fieldD (@stream) (#5, ⚠ ORPHANED — no parent span)

I think we can keep the current PR scoped to this at the moment, and I can try a follow up if this one goes through. I'm thinking we can add another :group channel, so it looks like:

 time ──────────────────────────────────────────────────────►

  execute   ◄────── runStores frame #1 ──────►
  ├─ resolve fieldA        (#2, parented to #1)  ✓
  ├─ resolve fieldB        (#3, parented to #1)  ✓
  └─ returns {initialResult, subsequentResults}
                      ▲ frame #1 ends

          ─── gap: framework drives the generator ───

           await subsequentResults.next()

  execute:group 'slowDefer'  ◄── runStores frame #4 ──►
  └─ resolve fieldC (@defer) (#5, parented to #4)  ✓

  execute:group 'feed@stream' ◄── runStores frame #6 ──►
  └─ resolve fieldD (@stream) (#7, parented to #6)  ✓

The group events look like extra spans, but their real job is to keep an ALS frame alive while the deferred resolvers run, so anything the APM bound stays readable. I'll verify end-to-end in the follow-up

…aceSync The traced branch of `executeIgnoringIncremental` was not exercised by any subscriber-attached test, tripping the 100% coverage threshold on CI. Add a case in execute-diagnostics-test that subscribes to `graphql:execute` and asserts start/end around a call. Also drop the `traceSync` helper from diagnostics.ts; every call site invokes `channel.traceSync(...)` directly, so the export was dead code. Mark the `require('node:diagnostics_channel')` CJS fallback and the enclosing `catch` block with c8 ignore comments — they are only reachable on Node 20.0-20.15 (pre-`getBuiltinModule`) and on runtimes without `diagnostics_channel` at all, neither of which the unit tests run on.

Hoist the resolve tracing decision once per `executeFields` / `executeFieldsSerially` call instead of re-checking `shouldTrace(resolveChannel)` per field, and move the traced branch's `traceMixed` call + ctx closure into a module-scope helper `invokeResolverWithTracing`. The latter is the load-bearing change: once `executeField`'s body referenced `traceMixed` and the `() => resolveFn(...)` closure, V8 stopped inlining `executeField` into `executeFields`, paying a real call frame per field. Extracting the two-function tail keeps `executeField`'s bytecode small enough to inline again. Effect on the introspection benchmark (the worst-case fan-out shape, ~1000 resolver calls per iteration, no subscribers attached): - with the inline ternary: -5.94% vs upstream - with this refactor: -2.05% / -2.30% across two paired runs Other benchmarks (list-sync, list-async, list-asyncIterable, object-async) sit at the noise floor with 95% CIs that cross zero. Adds a serial-mutation case to resolve-diagnostics-test to cover the `executeFieldsSerially` branch the hoisted snapshot now lives in.

logaretm · 2026-04-24T22:50:47Z

Squeezed a bit of performance with the last couple of commits 🙌 for the channel names happy to pick whatever works for us here to make this ready to go.

Copilot AI review requested due to automatic review settings April 18, 2026 04:22

Copilot started reviewing on behalf of logaretm April 18, 2026 04:22 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

Comment thread src/diagnostics.ts Outdated

Comment thread src/__testUtils__/fakeDiagnosticsChannel.ts Outdated

Comment thread integrationTests/diagnostics/package.json

logaretm force-pushed the tracing-channel-support branch from 5333b30 to faf327b Compare April 18, 2026 04:30

yaacovCR reviewed Apr 18, 2026

View reviewed changes

yaacovCR mentioned this pull request Apr 21, 2026

Use Tracing Channel for observability #4629

Open

logaretm added 12 commits April 24, 2026 10:14

chore: remove old comments no longer apply

acb1945

ref: remove tracePromise as it was not needed with runStores

ea6e2d5

logaretm added 7 commits April 24, 2026 10:14

ref(perf): cache publish decision in excutor

9ad8549

test: coverage and cleanup unused type

8a70ca2

feat(diagnostics): throw on re-registration with different dc module

11e890d

Calling enableDiagnosticsChannel again with the same dc is now a no-op; a different dc throws to surface the misconfiguration rather than silently replacing channel references that subscribers already hold.

feat(diagnostics): allow re-registration with equivalent tracingChann…

b84efbd

…el function

ref: autoload tracing channels

2a9d88a

ref: create direct pointers to tracing channels

68d0f57

logaretm force-pushed the tracing-channel-support branch from a001dd3 to 03c9fea Compare April 24, 2026 14:17

logaretm force-pushed the tracing-channel-support branch from b6189b1 to 183bb06 Compare April 24, 2026 15:33

logaretm force-pushed the tracing-channel-support branch from 183bb06 to bbddc0c Compare April 24, 2026 15:34

Conversation

logaretm commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

API

Usage

Uh oh!

vercel Bot commented Apr 18, 2026

Uh oh!

linux-foundation-easycla Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaacovCR Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

logaretm Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaacovCR commented Apr 21, 2026

Uh oh!

logaretm commented Apr 21, 2026

Uh oh!

yaacovCR commented Apr 22, 2026

Uh oh!

logaretm commented Apr 22, 2026

Uh oh!

yaacovCR commented Apr 24, 2026

Uh oh!

yaacovCR commented Apr 24, 2026

Uh oh!

logaretm commented Apr 24, 2026

Uh oh!

yaacovCR commented Apr 24, 2026

Uh oh!

logaretm commented Apr 24, 2026

Uh oh!

yaacovCR commented Apr 24, 2026

Uh oh!

logaretm commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

logaretm commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

logaretm commented Apr 18, 2026 •

edited

Loading

linux-foundation-easycla Bot commented Apr 18, 2026 •

edited

Loading

yaacovCR Apr 19, 2026 •

edited

Loading

logaretm Apr 20, 2026 •

edited

Loading

logaretm commented Apr 24, 2026 •

edited

Loading