Skip to content

[CELEBORN-2357] Add Metrics Prefix#3732

Open
afterincomparableyum wants to merge 1 commit into
apache:mainfrom
afterincomparableyum:celeborn-2357
Open

[CELEBORN-2357] Add Metrics Prefix#3732
afterincomparableyum wants to merge 1 commit into
apache:mainfrom
afterincomparableyum:celeborn-2357

Conversation

@afterincomparableyum

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Enable users the option to prefix their metrics with something unique that will allow them to easily find any celeborn related metircs in observability platforms.

Why are the changes needed?

Users will be able to easily identify celeborn metrics on observability tools with this prefix.

Does this PR resolve a correctness bug?

  • Yes

Does this PR introduce any user-facing change?

  • Yes

How was this patch tested?

CI/CD

Enable users the option to prefix their metrics with something unique that will allow them to easily find any celeborn related metircs in observability platforms.
@s0nskar

s0nskar commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Replaces the hardcoded metrics_ prefix in metric names with a configurable
celeborn.metrics.prefix (default "metrics"). Small, focused change touching
CelebornConf, AbstractSource.normalizeKey, docs, and a test.

Verdict

Solid, mergeable with one substantive concern.

What's good

  • Backward compatible. Default is "metrics", so normalizeKey produces the
    exact same output (metrics_abc_) as before. Every other "metrics_" reference
    in the codebase lives in test files that rely on the default — none break.
  • Version is correct. .version("0.7.0") matches project.version = 0.7.0-SNAPSHOT
    in pom.xml.
  • Single point of change. normalizeKey is the only place metric names are built
    (5 callers, all in AbstractSource), so the change is complete — no missed call sites.
  • Test + docs included, and the existing default-prefix test
    (CelebornSourceSuite.scala:38) implicitly guards backward compatibility.

Inline comments

1. The prefix is not sanitized (substantive)

File: common/src/main/scala/org/apache/celeborn/common/metrics/source/AbstractSource.scala
Line: 704

s"${metricsPrefix}_${key.replaceAll("[^a-zA-Z0-9]", "_")}_"

The metric key is sanitized to [a-zA-Z0-9_], but the user-supplied metricsPrefix
is interpolated raw. The hardcoded "metrics" was always alphanumeric; an arbitrary
user value may not be. Prometheus metric names must match [a-zA-Z_:][a-zA-Z0-9_:]*,
so a natural choice like celeborn.metrics.prefix = my-app produces my-app_abc_Count
— an invalid Prometheus name that silently breaks scraping. This is a regression in
robustness from the previously-guaranteed-valid hardcoded prefix.

Recommend one of:

  1. Apply the same sanitization to the prefix:
    s"${metricsPrefix.replaceAll("[^a-zA-Z0-9]", "_")}_${key.replaceAll("[^a-zA-Z0-9]", "_")}_"
  2. Or validate at config time with .checkValue(...) on METRICS_PREFIX to reject
    invalid prefixes early with a clear error.

Option 1 is the most user-forgiving and matches the existing key-handling intent.

2. Empty prefix yields a leading underscore (minor)

File: common/src/main/scala/org/apache/celeborn/common/metrics/source/AbstractSource.scala
Line: 704

An empty prefix ("") yields a leading-underscore name _abc_Count — valid Prometheus,
but worth noting in the doc, or guard against it if undesired.

3. Document the prefix constraint (minor)

File: common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala
Lines: 5905–5912

val METRICS_PREFIX: ConfigEntry[String] =
  buildConf("celeborn.metrics.prefix")
    .categories("metrics")
    .doc("Prefix metrics with this value.")
    .version("0.7.0")
    .stringConf
    .createWithDefault("metrics")

The doc string "Prefix metrics with this value." could mention the alphanumeric
constraint so users don't trip over it (also regenerate docs/configuration/metrics.md
if the doc string changes).

4. Add a test for a "dirty" prefix (minor)

File: common/src/test/scala/org/apache/celeborn/common/metrics/source/CelebornSourceSuite.scala
Lines: 41–58

The new test only covers a clean prefix ("celeborn"). If you adopt sanitization
(comment #1), add a case asserting a dirty prefix (e.g. "my-app") gets normalized —
that's exactly the behavior most likely to regress later.

– Reviewed with Claude Code

@afterincomparableyum

Copy link
Copy Markdown
Contributor Author

@s0nskar none of the comments Claude gave were good. no need to do them. purpose of this metric is to let users apply whatever header they want for their metrics, or use the default name provided/that was there originally. @SteNicholas could you please take a look when you get the chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants