Prototype bound instruments by jack-berg · Pull Request #8314 · open-telemetry/opentelemetry-java

jack-berg · 2026-04-21T23:29:00Z

Builds off of #8308, #8313.

Related to open-telemetry/opentelemetry-specification#4126

LongCounter rolls =
    meter
        .counterBuilder("dice.rolls")
        .setDescription("The number of times each side of the die was rolled")
        .setUnit("{roll}")
        .build();

  // Bind one LongCounterOp per die face. Each bind() call resolves the underlying timeseries
  // once, so subsequent add() calls record directly without any attribute lookup.
  //
  // Equivalent unbound setup (no bind calls needed, but per-recording overhead is higher):
  //   // no setup — just call rolls.add(1, ROLL_N) inline below
  LongCounterOp face1 = rolls.bind(ROLL_1);
  LongCounterOp face2 = rolls.bind(ROLL_2);
  LongCounterOp face3 = rolls.bind(ROLL_3);
  LongCounterOp face4 = rolls.bind(ROLL_4);
  LongCounterOp face5 = rolls.bind(ROLL_5);
  LongCounterOp face6 = rolls.bind(ROLL_6);

  // Simulate 600 rolls with a fixed seed for a reproducible distribution.
  Random random = new Random(42);
  long[] counts = new long[7]; // indexed 1..6; index 0 unused

  for (int i = 0; i < 600; i++) {
    int result = random.nextInt(6) + 1;
    counts[result]++;
    switch (result) {
      case 1:
        face1.add(1);
        // Equivalent unbound: rolls.add(1, ROLL_1);
        break;
      case 2:
        face2.add(1);
        // Equivalent unbound: rolls.add(1, ROLL_2);
        break;
      case 3:
        face3.add(1);
        // Equivalent unbound: rolls.add(1, ROLL_3);
        break;
      case 4:
        face4.add(1);
        // Equivalent unbound: rolls.add(1, ROLL_4);
        break;
      case 5:
        face5.add(1);
        // Equivalent unbound: rolls.add(1, ROLL_5);
        break;
      case 6:
        face6.add(1);
        // Equivalent unbound: rolls.add(1, ROLL_6);
        break;
      default:
        break;
    }
  }

MetricRecordBenchmark has been updated with a new isBound=true|false parameter. The following characterizes the change in performance from isBound=false to isBound=true:

Threads	Temporality	Cardinality	Instrument	false (ops/s)	true (ops/s)	Δ ops/s	Δ %
1	DELTA	1	COUNTER_SUM	118,208,794	130,599,635 †	+12,390,841	+10.5%
1	DELTA	1	UP_DOWN_COUNTER_SUM	102,544,568	129,911,312 †	+27,366,744	+26.7%
1	DELTA	1	GAUGE_LAST_VALUE	36,996,170	48,977,805	+11,981,635	+32.4%
1	DELTA	1	HISTOGRAM_EXPLICIT	58,932,937	122,397,793	+63,464,856	+107.7%
1	DELTA	1	HISTOGRAM_BASE2_EXPONENTIAL	43,456,120	44,312,699	+856,579	+2.0%
1	DELTA	128	COUNTER_SUM	94,602,944	114,178,751 †	+19,575,807	+20.7%
1	DELTA	128	UP_DOWN_COUNTER_SUM	99,861,595	114,805,474 †	+14,943,879	+15.0%
1	DELTA	128	GAUGE_LAST_VALUE	30,515,600	36,287,352	+5,771,752	+18.9%
1	DELTA	128	HISTOGRAM_EXPLICIT	68,355,768	89,869,366	+21,513,598	+31.5%
1	DELTA	128	HISTOGRAM_BASE2_EXPONENTIAL	40,758,132	46,997,748	+6,239,616	+15.3%
1	CUMULATIVE	1	COUNTER_SUM	165,529,273	216,674,644	+51,145,371	+30.9%
1	CUMULATIVE	1	UP_DOWN_COUNTER_SUM	167,603,291	216,483,041	+48,879,750	+29.2%
1	CUMULATIVE	1	GAUGE_LAST_VALUE	48,499,078	75,562,700	+27,063,622	+55.8%
1	CUMULATIVE	1	HISTOGRAM_EXPLICIT	98,273,713	134,302,636	+36,028,923	+36.7%
1	CUMULATIVE	1	HISTOGRAM_BASE2_EXPONENTIAL	46,687,571	51,549,345	+4,861,774	+10.4%
1	CUMULATIVE	128	COUNTER_SUM	88,309,144	214,865,227	+126,556,083	+143.3%
1	CUMULATIVE	128	UP_DOWN_COUNTER_SUM	97,199,593	205,305,518	+108,105,925	+111.2%
1	CUMULATIVE	128	GAUGE_LAST_VALUE	102,375,233	204,971,971	+102,596,738	+100.2%
1	CUMULATIVE	128	HISTOGRAM_EXPLICIT	75,716,082	113,538,456	+37,822,374	+49.9%
1	CUMULATIVE	128	HISTOGRAM_BASE2_EXPONENTIAL	44,206,432	48,689,276	+4,482,844	+10.1%
4	DELTA	1	COUNTER_SUM	15,865,839	18,124,749	+2,258,910	+14.2%
4	DELTA	1	UP_DOWN_COUNTER_SUM	18,307,609	17,372,346	-935,263	-5.1%
4	DELTA	1	GAUGE_LAST_VALUE	12,223,215	17,542,726	+5,319,511	+43.5%
4	DELTA	1	HISTOGRAM_EXPLICIT	12,133,325	12,767,563	+634,238	+5.2%
4	DELTA	1	HISTOGRAM_BASE2_EXPONENTIAL	10,102,515	10,626,084	+523,569	+5.2%
4	DELTA	128	COUNTER_SUM	76,699,353	68,831,065	-7,868,288	-10.3%
4	DELTA	128	UP_DOWN_COUNTER_SUM	74,165,415	65,990,906	-8,174,509	-11.0%
4	DELTA	128	GAUGE_LAST_VALUE	50,121,436	50,574,389	+452,953	+0.9%
4	DELTA	128	HISTOGRAM_EXPLICIT	60,837,029	59,375,391	-1,461,638	-2.4%
4	DELTA	128	HISTOGRAM_BASE2_EXPONENTIAL	61,647,646	57,771,684	-3,875,962	-6.3%
4	CUMULATIVE	1	COUNTER_SUM	72,849,913	74,565,813	+1,715,900	+2.4%
4	CUMULATIVE	1	UP_DOWN_COUNTER_SUM	75,558,659	55,029,449	-20,529,210	-27.2%
4	CUMULATIVE	1	GAUGE_LAST_VALUE	28,552,341	28,218,133	-334,208	-1.2%
4	CUMULATIVE	1	HISTOGRAM_EXPLICIT	15,952,626	21,438,633	+5,486,007	+34.4%
4	CUMULATIVE	1	HISTOGRAM_BASE2_EXPONENTIAL	16,650,679	17,252,153	+601,474	+3.6%
4	CUMULATIVE	128	COUNTER_SUM	114,706,054	123,732,233	+9,026,179	+7.9%
4	CUMULATIVE	128	UP_DOWN_COUNTER_SUM	110,953,644	122,402,433	+11,448,789	+10.3%
4	CUMULATIVE	128	GAUGE_LAST_VALUE	101,753,088	108,879,557	+7,126,469	+7.0%
4	CUMULATIVE	128	HISTOGRAM_EXPLICIT	77,679,812	83,538,482	+5,858,670	+7.5%
4	CUMULATIVE	128	HISTOGRAM_BASE2_EXPONENTIAL	75,555,258	79,784,244	+4,228,986	+5.6%

⚠️ Several rows with isBound=true have very high variance (>20%) — marked with † — treat those deltas with caution.

Modest to large gains across the board, with larger gains for cases with reduced contention and cumulative temporality, where the map lookup represents a larger share of the time to record.

Leaving as draft because:

Need to land a spec PR first
Need to restructure to move the API incubator

…bound-instruments-1

…d-instruments-1

otelbot · 2026-04-21T23:29:11Z

⚠️ API changes detected — additional maintainer review required

@jack-berg @jkwatson

This PR modifies the public API surface area of the following module(s):

opentelemetry-api

Please review the changes in docs/apidiffs/current_vs_latest/ carefully before approving.

dashpole · 2026-04-22T15:31:14Z

Threads	Temporality	Cardinality	Instrument	false (ops/s)	true (ops/s)	Δ ops/s	Δ %
1	DELTA	1	COUNTER_SUM	118,208,794	130,599,635 †	+12,390,841	+10.5%

For the bound=false case, 1/118,208,794 = 8 ns. Dang. Are java concurrent map lookups just that fast?

jack-berg · 2026-04-22T16:26:43Z

For the bound=false case, 1/118,208,794 = 8 ns. Dang. Are java concurrent map lookups just that fast?

That case has no concurrency (threads=1). We optimize map lookups slightly by caching the hashcode of our Attribute implementation.

That number (and all of them frankly) is suspiciously fast, so I'm double checking things. Things are checking out initially. There is an issue with the cardinality=1 case, where its possible the JIT compiler is lifting hositing the map lookup, but its possible the JIT compiler could do that in a real application in a cardinality=1 case as well, so not wrong per say. But even the cardinality=128 cases where a JIT hoist is unlikely are blazing fast so speed can't be only attributed to JIT.

I ran those benchmarks on my mac mini, which uses apple m4 chip. Currently on the main branch, running on the dedicated benchmark bare metal hardware, that same series gets 29340717 ops/s, or 34ns. Which is fast but more believable. Maybe apple silicon / ARM architecture is exceptionally fast for these types of benchmarks.

https://open-telemetry.github.io/opentelemetry-java/benchmarks/
But you'll probably have to go to the raw data backing those graphs because they're currently pretty unusable for zooming in on a specific series and copying figures: https://raw.githubusercontent.com/open-telemetry/opentelemetry-java/refs/heads/benchmarks/benchmarks/data.js

If you spot any problems with the methodology of MetricRecordBenchmark, please let me know.

dashpole · 2026-04-22T16:41:02Z

Yeah, I couldn't find any problem with the methodology or the code. For Go, the map lookup takes ~20 ns, and ~45 ns with high concurrency, and the atomic counter increment takes < 10 ns, so we see a bigger difference. I was curious if you had any tricks up your sleeve, or if java maps were just faster

jack-berg · 2026-04-22T20:15:32Z

I was curious if you had any tricks up your sleeve, or if java maps were just faster

I was curious about the map lookup perf as well, so created a dedicated benchmark based on it: jack-berg@b9cf4c4

Parameters I test:

Concurrent access: 1 or 4 threads
Cardinality (size of map): 1, 128, 1024
Size of map keys: small (~126 char key), medium (~1026 char keys), large (~100x26 char keys)
Key type: string (plain ole string), attributes_cached (java attr impl w/ cached hashCode), attributes_uncached (java attrs impl w/o cached hashCode)

Results:

threads=1 — ns/op

keySize	cardinality	STRING	ATTR_CACHED	ATTR_UNCACHED
SMALL	1	1.4	2.4	3.7
SMALL	128	2.1	6.6	7.3
SMALL	1024	2.1	6.5	8.6
MEDIUM	1	1.4	2.5	13.7
MEDIUM	128	2.4	7.0	19.5
MEDIUM	1024	2.8	7.0	26.4
LARGE	1	1.4	2.5	157.2
LARGE	128	8.8	11.0	179.2
LARGE	1024	9.5	10.9	186.8

threads=4 — (4-thread aggregate; multiply by 4 for per-thread cost)

keySize	cardinality	STRING	ATTR_CACHED	ATTR_UNCACHED
SMALL	1	0.7†	0.7†	0.9
SMALL	128	1.7	1.7	1.8
SMALL	1024	0.9†††	1.7	2.1
MEDIUM	1	0.8†	0.8†	3.3
MEDIUM	128	1.9	1.8	5.0
MEDIUM	1024	1.9	1.9	6.8
LARGE	1	0.7†	0.8†	39.2
LARGE	128	3.3	2.7	44.7
LARGE	1024	3.7	2.9	46.7

† ±11–16% variance ††† ±46% variance (discard)

So lookups are really fast. Caching hashCodes matters a lot, especially as the size of keys becomes larger (this is intuitive). Cardinality matters a little bit, but not as much as key size. I only tested up to 1024, but given that default cardinality limit is 2k, I think this reasonably represents the use case.

Taking these conclusions back to bound instruments, I think the benchmark setup I have for MetricRecordBenchmark is reasonable. The cardinality is small (128) and attributes are small (just 1*26 char key), but since we cache hashCode, those don't matter that much. I could increase cardinality and attribute size to increase the positive impact of bound instruments.

jack-berg added 12 commits April 6, 2026 20:01

init

df1924a

Tune benchmark params to reduce variance

4da3e71

Break out top level classes for cumulative, delta sync storage

55a2672

wip

195e93d

tests passing

aec0fd6

spotless

e09fb8d

Cleanup

ba71652

Merge branch 'delta-aggregator-handle-coordination-1' into prototype-…

82b1e83

…bound-instruments-1

Implement bound instruments

888549d

Restore benchmark

5def057

Merge branch 'reduce-metric-benchmark-variance-1' into prototype-boun…

77c3bb7

…d-instruments-1

Add usage demosntration

96a63d1

otelbot Bot added the api-change Changes to public API surface area label Apr 21, 2026

jack-berg mentioned this pull request Apr 21, 2026

Add support for bound instruments to the metrics API open-telemetry/opentelemetry-specification#4126

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype bound instruments#8314

Prototype bound instruments#8314
jack-berg wants to merge 12 commits intoopen-telemetry:mainfrom
jack-berg:prototype-bound-instruments-1

jack-berg commented Apr 21, 2026 •

edited

Loading

Uh oh!

otelbot Bot commented Apr 21, 2026

Uh oh!

dashpole commented Apr 22, 2026

Uh oh!

jack-berg commented Apr 22, 2026

Uh oh!

dashpole commented Apr 22, 2026

Uh oh!

jack-berg commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jack-berg commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

otelbot Bot commented Apr 21, 2026

⚠️ API changes detected — additional maintainer review required

Uh oh!

dashpole commented Apr 22, 2026

Uh oh!

jack-berg commented Apr 22, 2026

Uh oh!

dashpole commented Apr 22, 2026

Uh oh!

jack-berg commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jack-berg commented Apr 21, 2026 •

edited

Loading