Skip to content

Prototype bound instruments#8314

Draft
jack-berg wants to merge 12 commits intoopen-telemetry:mainfrom
jack-berg:prototype-bound-instruments-1
Draft

Prototype bound instruments#8314
jack-berg wants to merge 12 commits intoopen-telemetry:mainfrom
jack-berg:prototype-bound-instruments-1

Conversation

@jack-berg
Copy link
Copy Markdown
Member

@jack-berg jack-berg commented Apr 21, 2026

Builds off of #8308, #8313.

Related to open-telemetry/opentelemetry-specification#4126

Usage example:

LongCounter rolls =
    meter
        .counterBuilder("dice.rolls")
        .setDescription("The number of times each side of the die was rolled")
        .setUnit("{roll}")
        .build();

  // Bind one LongCounterOp per die face. Each bind() call resolves the underlying timeseries
  // once, so subsequent add() calls record directly without any attribute lookup.
  //
  // Equivalent unbound setup (no bind calls needed, but per-recording overhead is higher):
  //   // no setup — just call rolls.add(1, ROLL_N) inline below
  LongCounterOp face1 = rolls.bind(ROLL_1);
  LongCounterOp face2 = rolls.bind(ROLL_2);
  LongCounterOp face3 = rolls.bind(ROLL_3);
  LongCounterOp face4 = rolls.bind(ROLL_4);
  LongCounterOp face5 = rolls.bind(ROLL_5);
  LongCounterOp face6 = rolls.bind(ROLL_6);

  // Simulate 600 rolls with a fixed seed for a reproducible distribution.
  Random random = new Random(42);
  long[] counts = new long[7]; // indexed 1..6; index 0 unused

  for (int i = 0; i < 600; i++) {
    int result = random.nextInt(6) + 1;
    counts[result]++;
    switch (result) {
      case 1:
        face1.add(1);
        // Equivalent unbound: rolls.add(1, ROLL_1);
        break;
      case 2:
        face2.add(1);
        // Equivalent unbound: rolls.add(1, ROLL_2);
        break;
      case 3:
        face3.add(1);
        // Equivalent unbound: rolls.add(1, ROLL_3);
        break;
      case 4:
        face4.add(1);
        // Equivalent unbound: rolls.add(1, ROLL_4);
        break;
      case 5:
        face5.add(1);
        // Equivalent unbound: rolls.add(1, ROLL_5);
        break;
      case 6:
        face6.add(1);
        // Equivalent unbound: rolls.add(1, ROLL_6);
        break;
      default:
        break;
    }
  }

MetricRecordBenchmark has been updated with a new isBound=true|false parameter. The following characterizes the change in performance from isBound=false to isBound=true:

Threads Temporality Cardinality Instrument false (ops/s) true (ops/s) Δ ops/s Δ %
1 DELTA 1 COUNTER_SUM 118,208,794 130,599,635 † +12,390,841 +10.5%
1 DELTA 1 UP_DOWN_COUNTER_SUM 102,544,568 129,911,312 † +27,366,744 +26.7%
1 DELTA 1 GAUGE_LAST_VALUE 36,996,170 48,977,805 +11,981,635 +32.4%
1 DELTA 1 HISTOGRAM_EXPLICIT 58,932,937 122,397,793 +63,464,856 +107.7%
1 DELTA 1 HISTOGRAM_BASE2_EXPONENTIAL 43,456,120 44,312,699 +856,579 +2.0%
1 DELTA 128 COUNTER_SUM 94,602,944 114,178,751 † +19,575,807 +20.7%
1 DELTA 128 UP_DOWN_COUNTER_SUM 99,861,595 114,805,474 † +14,943,879 +15.0%
1 DELTA 128 GAUGE_LAST_VALUE 30,515,600 36,287,352 +5,771,752 +18.9%
1 DELTA 128 HISTOGRAM_EXPLICIT 68,355,768 89,869,366 +21,513,598 +31.5%
1 DELTA 128 HISTOGRAM_BASE2_EXPONENTIAL 40,758,132 46,997,748 +6,239,616 +15.3%
1 CUMULATIVE 1 COUNTER_SUM 165,529,273 216,674,644 +51,145,371 +30.9%
1 CUMULATIVE 1 UP_DOWN_COUNTER_SUM 167,603,291 216,483,041 +48,879,750 +29.2%
1 CUMULATIVE 1 GAUGE_LAST_VALUE 48,499,078 75,562,700 +27,063,622 +55.8%
1 CUMULATIVE 1 HISTOGRAM_EXPLICIT 98,273,713 134,302,636 +36,028,923 +36.7%
1 CUMULATIVE 1 HISTOGRAM_BASE2_EXPONENTIAL 46,687,571 51,549,345 +4,861,774 +10.4%
1 CUMULATIVE 128 COUNTER_SUM 88,309,144 214,865,227 +126,556,083 +143.3%
1 CUMULATIVE 128 UP_DOWN_COUNTER_SUM 97,199,593 205,305,518 +108,105,925 +111.2%
1 CUMULATIVE 128 GAUGE_LAST_VALUE 102,375,233 204,971,971 +102,596,738 +100.2%
1 CUMULATIVE 128 HISTOGRAM_EXPLICIT 75,716,082 113,538,456 +37,822,374 +49.9%
1 CUMULATIVE 128 HISTOGRAM_BASE2_EXPONENTIAL 44,206,432 48,689,276 +4,482,844 +10.1%
4 DELTA 1 COUNTER_SUM 15,865,839 18,124,749 +2,258,910 +14.2%
4 DELTA 1 UP_DOWN_COUNTER_SUM 18,307,609 17,372,346 -935,263 -5.1%
4 DELTA 1 GAUGE_LAST_VALUE 12,223,215 17,542,726 +5,319,511 +43.5%
4 DELTA 1 HISTOGRAM_EXPLICIT 12,133,325 12,767,563 +634,238 +5.2%
4 DELTA 1 HISTOGRAM_BASE2_EXPONENTIAL 10,102,515 10,626,084 +523,569 +5.2%
4 DELTA 128 COUNTER_SUM 76,699,353 68,831,065 -7,868,288 -10.3%
4 DELTA 128 UP_DOWN_COUNTER_SUM 74,165,415 65,990,906 -8,174,509 -11.0%
4 DELTA 128 GAUGE_LAST_VALUE 50,121,436 50,574,389 +452,953 +0.9%
4 DELTA 128 HISTOGRAM_EXPLICIT 60,837,029 59,375,391 -1,461,638 -2.4%
4 DELTA 128 HISTOGRAM_BASE2_EXPONENTIAL 61,647,646 57,771,684 -3,875,962 -6.3%
4 CUMULATIVE 1 COUNTER_SUM 72,849,913 74,565,813 +1,715,900 +2.4%
4 CUMULATIVE 1 UP_DOWN_COUNTER_SUM 75,558,659 55,029,449 -20,529,210 -27.2%
4 CUMULATIVE 1 GAUGE_LAST_VALUE 28,552,341 28,218,133 -334,208 -1.2%
4 CUMULATIVE 1 HISTOGRAM_EXPLICIT 15,952,626 21,438,633 +5,486,007 +34.4%
4 CUMULATIVE 1 HISTOGRAM_BASE2_EXPONENTIAL 16,650,679 17,252,153 +601,474 +3.6%
4 CUMULATIVE 128 COUNTER_SUM 114,706,054 123,732,233 +9,026,179 +7.9%
4 CUMULATIVE 128 UP_DOWN_COUNTER_SUM 110,953,644 122,402,433 +11,448,789 +10.3%
4 CUMULATIVE 128 GAUGE_LAST_VALUE 101,753,088 108,879,557 +7,126,469 +7.0%
4 CUMULATIVE 128 HISTOGRAM_EXPLICIT 77,679,812 83,538,482 +5,858,670 +7.5%
4 CUMULATIVE 128 HISTOGRAM_BASE2_EXPONENTIAL 75,555,258 79,784,244 +4,228,986 +5.6%

⚠️ Several rows with isBound=true have very high variance (>20%) — marked with † — treat those deltas with caution.

Modest to large gains across the board, with larger gains for cases with reduced contention and cumulative temporality, where the map lookup represents a larger share of the time to record.

Leaving as draft because:

  1. Need to land a spec PR first
  2. Need to restructure to move the API incubator

@otelbot otelbot Bot added the api-change Changes to public API surface area label Apr 21, 2026
@otelbot
Copy link
Copy Markdown
Contributor

otelbot Bot commented Apr 21, 2026

⚠️ API changes detected — additional maintainer review required

@jack-berg @jkwatson

This PR modifies the public API surface area of the following module(s):

  • opentelemetry-api

Please review the changes in docs/apidiffs/current_vs_latest/ carefully before approving.

@dashpole
Copy link
Copy Markdown
Contributor

Threads Temporality Cardinality Instrument false (ops/s) true (ops/s) Δ ops/s Δ %
1 DELTA 1 COUNTER_SUM 118,208,794 130,599,635 † +12,390,841 +10.5%

For the bound=false case, 1/118,208,794 = 8 ns. Dang. Are java concurrent map lookups just that fast?

@jack-berg
Copy link
Copy Markdown
Member Author

For the bound=false case, 1/118,208,794 = 8 ns. Dang. Are java concurrent map lookups just that fast?

That case has no concurrency (threads=1). We optimize map lookups slightly by caching the hashcode of our Attribute implementation.

That number (and all of them frankly) is suspiciously fast, so I'm double checking things. Things are checking out initially. There is an issue with the cardinality=1 case, where its possible the JIT compiler is lifting hositing the map lookup, but its possible the JIT compiler could do that in a real application in a cardinality=1 case as well, so not wrong per say. But even the cardinality=128 cases where a JIT hoist is unlikely are blazing fast so speed can't be only attributed to JIT.

I ran those benchmarks on my mac mini, which uses apple m4 chip. Currently on the main branch, running on the dedicated benchmark bare metal hardware, that same series gets 29340717 ops/s, or 34ns. Which is fast but more believable. Maybe apple silicon / ARM architecture is exceptionally fast for these types of benchmarks.

https://open-telemetry.github.io/opentelemetry-java/benchmarks/
But you'll probably have to go to the raw data backing those graphs because they're currently pretty unusable for zooming in on a specific series and copying figures: https://raw.githubusercontent.com/open-telemetry/opentelemetry-java/refs/heads/benchmarks/benchmarks/data.js

If you spot any problems with the methodology of MetricRecordBenchmark, please let me know.

@dashpole
Copy link
Copy Markdown
Contributor

Yeah, I couldn't find any problem with the methodology or the code. For Go, the map lookup takes ~20 ns, and ~45 ns with high concurrency, and the atomic counter increment takes < 10 ns, so we see a bigger difference. I was curious if you had any tricks up your sleeve, or if java maps were just faster

@jack-berg
Copy link
Copy Markdown
Member Author

I was curious if you had any tricks up your sleeve, or if java maps were just faster

I was curious about the map lookup perf as well, so created a dedicated benchmark based on it: jack-berg@b9cf4c4

Parameters I test:

  • Concurrent access: 1 or 4 threads
  • Cardinality (size of map): 1, 128, 1024
  • Size of map keys: small (~126 char key), medium (~1026 char keys), large (~100x26 char keys)
  • Key type: string (plain ole string), attributes_cached (java attr impl w/ cached hashCode), attributes_uncached (java attrs impl w/o cached hashCode)

Results:

threads=1 — ns/op

keySize cardinality STRING ATTR_CACHED ATTR_UNCACHED
SMALL 1 1.4 2.4 3.7
SMALL 128 2.1 6.6 7.3
SMALL 1024 2.1 6.5 8.6
MEDIUM 1 1.4 2.5 13.7
MEDIUM 128 2.4 7.0 19.5
MEDIUM 1024 2.8 7.0 26.4
LARGE 1 1.4 2.5 157.2
LARGE 128 8.8 11.0 179.2
LARGE 1024 9.5 10.9 186.8

threads=4 — (4-thread aggregate; multiply by 4 for per-thread cost)

keySize cardinality STRING ATTR_CACHED ATTR_UNCACHED
SMALL 1 0.7† 0.7† 0.9
SMALL 128 1.7 1.7 1.8
SMALL 1024 0.9††† 1.7 2.1
MEDIUM 1 0.8† 0.8† 3.3
MEDIUM 128 1.9 1.8 5.0
MEDIUM 1024 1.9 1.9 6.8
LARGE 1 0.7† 0.8† 39.2
LARGE 128 3.3 2.7 44.7
LARGE 1024 3.7 2.9 46.7

† ±11–16% variance ††† ±46% variance (discard)

So lookups are really fast. Caching hashCodes matters a lot, especially as the size of keys becomes larger (this is intuitive). Cardinality matters a little bit, but not as much as key size. I only tested up to 1024, but given that default cardinality limit is 2k, I think this reasonably represents the use case.

Taking these conclusions back to bound instruments, I think the benchmark setup I have for MetricRecordBenchmark is reasonable. The cardinality is small (128) and attributes are small (just 1*26 char key), but since we cache hashCode, those don't matter that much. I could increase cardinality and attribute size to increase the positive impact of bound instruments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-change Changes to public API surface area

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants