Skip to content

telemetry: Reduce cardinality of some otel span names#3558

Open
lann wants to merge 1 commit into
mainfrom
otel-reduce-span-name-cardinality
Open

telemetry: Reduce cardinality of some otel span names#3558
lann wants to merge 1 commit into
mainfrom
otel-reduce-span-name-cardinality

Conversation

@lann
Copy link
Copy Markdown
Collaborator

@lann lann commented Jun 4, 2026

This PR removes a lot of detail from some span names.

https://github.com/open-telemetry/opentelemetry-specification/blob/v1.56.0/specification/trace/api.md#span

The span name SHOULD be the most general string that identifies a (statistically) interesting class of Spans, rather than individual Span instances while still being human-readable. [...] Generality SHOULD be prioritized over human-readability.

This is somewhat subjective but I think in the context of a general-purpose runtime it makes sense to err on the side of not blowing up someone's metrics/traces indexing:

  • SQL stores (mysql, pg, sqlite) now share a helper function that uses a very simple tokenizer to (attempt to) build a span name as described by OTel guidelines: https://opentelemetry.io/docs/specs/semconv/db/database-spans/#name
  • Redis span names are reduced to just command, which seems to be conventional.
  • MQTT span names dropped topic; many MQTT schemas use dynamic (e.g. per-user) topics.

Given that a large number of #[instrument(...)] attrs were being touched anyway, I took this opportunity to split most of them across two (or more) lines, including some lines (but not files) that weren't otherwise updated.

@lann lann requested a review from calebschoepp June 4, 2026 19:15
@lann lann force-pushed the otel-reduce-span-name-cardinality branch from d8b23cd to 6165f42 Compare June 4, 2026 19:26
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time trying to simplify this but everything I did broke one or more test cases below and the test cases all look pretty reasonable.

otel.name = "publish",
messaging.operation = "publish",
messaging.system = "mqtt",
messaging.destination.name = topic,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved topic here (per otel semantic conventions)

Copy link
Copy Markdown
Collaborator

@rylev rylev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not sure the query parsing is worth the hassle. I agree with the goal of this PR, but I'm not convinced that we need to be this clever.

messaging.system = "mqtt"))]
fields(
otel.kind = "producer",
otel.name = "publish",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

publish seems too general. I don't understand why using the "spin_outbound_mqtt.publish" name field would be inappropriate here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that would probably be fine. I was just going for minimal changes: "{topic} publish" -> "publish". 🤷

messaging.system = "mqtt"))]
fields(
otel.kind = "producer",
otel.name = "publish",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.


impl<C: Client> v2::HostConnection for InstanceState<C> {
#[instrument(name = "spin_outbound_mysql.open", skip(self, address), err(level = Level::INFO), fields(otel.kind = "client", db.system = "mysql", db.address = Empty, server.port = Empty, db.namespace = Empty))]
#[instrument(name = "spin_outbound_mysql.open", skip(self, address), err(level = Level::INFO),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: shouldn't we use skip_all instead of skip(self, address)?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds fine but I don't really want to audit all arguments in this PR.

Comment on lines +170 to +171
#[instrument(name = "spin_outbound_mysql.execute", skip(self, connection, params), err(level = Level::INFO),
fields(otel.kind = "client", db.system = "mysql", otel.name = spin_telemetry::db::sql_span_name(&statement)))]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I'm not convinced that using some sort of simplified query representation is better than name being "spin_outbound_mysql.execute" and the statement being captured...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends a lot on your tooling. It doesn't really matter for something like Honeycomb which gives you a lot of flexibility in how you visualize traces, but for e.g. Jaeger you only see a few specific attributes at the top level and span name is a big one.

#[instrument(name = "spin_outbound_mysql.query", skip(self, connection, params), err(level = Level::INFO), fields(otel.kind = "client", db.system = "mysql", otel.name = statement))]
#[instrument(name = "spin_outbound_mysql.query", skip(self, connection, params), err(level = Level::INFO),
fields(otel.kind = "client", db.system = "mysql", otel.name = spin_telemetry::db::sql_span_name(&statement)))]
async fn query(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love that otel consumers lose the knowledge that this comes from the query method.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, though there is quiet a bit of other info available:

code.filepath: [...SNIP...]/crates/factor-outbound-mysql/src/host.rs
code.lineno: 57
code.namespace: spin_factor_outbound_mysql::host

Copy link
Copy Markdown
Collaborator

@calebschoepp calebschoepp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this PR is plugging a big gap in our tracing cardinality risk — I could be convinced to approve this ~as is.

However, I'm feeling vaguely uncomfortable about two things.

  1. The implementation of sql_span_name seems like an endlessly deep tunnel of edge cases. It seems like the kind of code we're going to have to keep coming back to time after time to fix little things. I like that it is dependency free, but did we fully rule out that there isn't an existing library that could solve this problem for us?
  2. It feels like we have a mix of OTel convention span names and "spin style" span names. Maybe this is just fine and we can't do much better, but I feel weird about them being mixed. (I'm also aware this is a bit outside the scope of your PR so feel free to ignore this concern)

}

#[cfg(test)]
mod tests {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this implementation robust to CTEs, subqueries, etc.

I wonder if there is an easy way to fuzz/property-based test this

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CTEs: no; I think it would probably just report WITH.
Subqueries: a little; see select_anonymous_table_has_no_target below.

This is very much focused on lowering cardinality first and retaining a little bit of readability second.

async fn execute(
&mut self,
connection: Resource<v2::Connection>,
statement: String,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do anything to obfuscate the statement in the attribute?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one is tricky. There really shouldn't be anything sensitive in the statement, but of course that isn't going to stop people from doing it. My current opinion is that we should leave it up to the user's telemetry pipeline to deal with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants