telemetry: Reduce cardinality of some otel span names#3558
Conversation
I tried to follow OpenTelemetry naming guidance, e.g. https://opentelemetry.io/docs/specs/semconv/db/database-spans/#name
d8b23cd to
6165f42
Compare
There was a problem hiding this comment.
I spent some time trying to simplify this but everything I did broke one or more test cases below and the test cases all look pretty reasonable.
| otel.name = "publish", | ||
| messaging.operation = "publish", | ||
| messaging.system = "mqtt", | ||
| messaging.destination.name = topic, |
There was a problem hiding this comment.
Moved topic here (per otel semantic conventions)
rylev
left a comment
There was a problem hiding this comment.
I'm really not sure the query parsing is worth the hassle. I agree with the goal of this PR, but I'm not convinced that we need to be this clever.
| messaging.system = "mqtt"))] | ||
| fields( | ||
| otel.kind = "producer", | ||
| otel.name = "publish", |
There was a problem hiding this comment.
publish seems too general. I don't understand why using the "spin_outbound_mqtt.publish" name field would be inappropriate here?
There was a problem hiding this comment.
Oh that would probably be fine. I was just going for minimal changes: "{topic} publish" -> "publish". 🤷
| messaging.system = "mqtt"))] | ||
| fields( | ||
| otel.kind = "producer", | ||
| otel.name = "publish", |
|
|
||
| impl<C: Client> v2::HostConnection for InstanceState<C> { | ||
| #[instrument(name = "spin_outbound_mysql.open", skip(self, address), err(level = Level::INFO), fields(otel.kind = "client", db.system = "mysql", db.address = Empty, server.port = Empty, db.namespace = Empty))] | ||
| #[instrument(name = "spin_outbound_mysql.open", skip(self, address), err(level = Level::INFO), |
There was a problem hiding this comment.
Nit: shouldn't we use skip_all instead of skip(self, address)?
There was a problem hiding this comment.
That sounds fine but I don't really want to audit all arguments in this PR.
| #[instrument(name = "spin_outbound_mysql.execute", skip(self, connection, params), err(level = Level::INFO), | ||
| fields(otel.kind = "client", db.system = "mysql", otel.name = spin_telemetry::db::sql_span_name(&statement)))] |
There was a problem hiding this comment.
In general, I'm not convinced that using some sort of simplified query representation is better than name being "spin_outbound_mysql.execute" and the statement being captured...
There was a problem hiding this comment.
It depends a lot on your tooling. It doesn't really matter for something like Honeycomb which gives you a lot of flexibility in how you visualize traces, but for e.g. Jaeger you only see a few specific attributes at the top level and span name is a big one.
| #[instrument(name = "spin_outbound_mysql.query", skip(self, connection, params), err(level = Level::INFO), fields(otel.kind = "client", db.system = "mysql", otel.name = statement))] | ||
| #[instrument(name = "spin_outbound_mysql.query", skip(self, connection, params), err(level = Level::INFO), | ||
| fields(otel.kind = "client", db.system = "mysql", otel.name = spin_telemetry::db::sql_span_name(&statement)))] | ||
| async fn query( |
There was a problem hiding this comment.
I don't love that otel consumers lose the knowledge that this comes from the query method.
There was a problem hiding this comment.
That's fair, though there is quiet a bit of other info available:
code.filepath: [...SNIP...]/crates/factor-outbound-mysql/src/host.rs
code.lineno: 57
code.namespace: spin_factor_outbound_mysql::host
calebschoepp
left a comment
There was a problem hiding this comment.
Overall I think this PR is plugging a big gap in our tracing cardinality risk — I could be convinced to approve this ~as is.
However, I'm feeling vaguely uncomfortable about two things.
- The implementation of
sql_span_nameseems like an endlessly deep tunnel of edge cases. It seems like the kind of code we're going to have to keep coming back to time after time to fix little things. I like that it is dependency free, but did we fully rule out that there isn't an existing library that could solve this problem for us? - It feels like we have a mix of OTel convention span names and "spin style" span names. Maybe this is just fine and we can't do much better, but I feel weird about them being mixed. (I'm also aware this is a bit outside the scope of your PR so feel free to ignore this concern)
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { |
There was a problem hiding this comment.
Is this implementation robust to CTEs, subqueries, etc.
I wonder if there is an easy way to fuzz/property-based test this
There was a problem hiding this comment.
CTEs: no; I think it would probably just report WITH.
Subqueries: a little; see select_anonymous_table_has_no_target below.
This is very much focused on lowering cardinality first and retaining a little bit of readability second.
| async fn execute( | ||
| &mut self, | ||
| connection: Resource<v2::Connection>, | ||
| statement: String, |
There was a problem hiding this comment.
Do we need to do anything to obfuscate the statement in the attribute?
There was a problem hiding this comment.
That one is tricky. There really shouldn't be anything sensitive in the statement, but of course that isn't going to stop people from doing it. My current opinion is that we should leave it up to the user's telemetry pipeline to deal with that.
This PR removes a lot of detail from some span names.
https://github.com/open-telemetry/opentelemetry-specification/blob/v1.56.0/specification/trace/api.md#span
This is somewhat subjective but I think in the context of a general-purpose runtime it makes sense to err on the side of not blowing up someone's metrics/traces indexing:
Given that a large number of
#[instrument(...)]attrs were being touched anyway, I took this opportunity to split most of them across two (or more) lines, including some lines (but not files) that weren't otherwise updated.