[ntuple] Support multiple column representations in the merger by silverweed · Pull Request #22017 · root-project/root

silverweed · 2026-04-22T15:07:43Z

This Pull request:

Significantly reworks the innards of the RNTupleMerger to support fast merging of fields with different but compatible column representations.
Basically it does two things:

turns all L3 merging cases into L2/L1.
no longer rejects merging fields with different column representations (previously this was only supported for representations that were the split/unsplit version of each other, and only via L3 merging).

A potentially negative consequence that we might want to revisit is that now the merger won't ever adapt the columns' splitness to the output compression (e.g. if merging changes the source compression from 0 to 505 it will still encode the columns as unsplit, and vice-versa). This will probably be readded in a future PR.

In order to achieve this, some new internal functionality had to be added, most notably RPagePersistentSink::AddColumnRepresentation.

Note that this PR is independent on #21740, which in fact might not be needed at all.

IMPORTANT

This PR introduces our first feature flag and thus the first bump to the specs' major version (1.1.0.0). This means we can now start producing RNTuples which cannot be read by older ROOT versions.

TODO

check if we need a feature flag for the changes in AddExtendedColumnRanges
add a test for merging of Real32Trunc/Quant columns with different bit width/value range
properly split the big merger commit
update Merging.md

Checklist:

tested changes locally
updated the docs (if necessary)

github-actions · 2026-04-22T17:16:23Z

Test Results

21 files 21 suites 3d 3h 27m 32s ⏱️
3 862 tests 3 862 ✅ 0 💤 0 ❌
73 374 runs 73 374 ✅ 0 💤 0 ❌

Results for commit ab08930.

♻️ This comment has been updated with latest results.

Instead of calling continue multiple times in the AddColumnFromField loop, just early return in case of projected fields.

We are currently serializing columns per-field, but in case of late column extension this might result in inconsistent sorting of the columns in the serialized footer. e.g. assume you have fields "A" and "B", both late model extended, both with a single column: - col 0 -> field A, repr 0 - col 1 -> field B, repr 0 Now you add a new column representation to field "A"; this new column has id 2: - col 2 -> field A, repr 1 When serializing this RNTuple, all columns are written in the footer by RNTupleSerialize::SerializeColumnsForFields(). Before this change, they would end up on disk in order: [0, 2, 1]. This would corrupt the data by swapping the pages for columns 2 and 1. After this change, they get written as [0, 1, 2] which is the correct order. Note that this exact case is tested in ntuple_merger in the unit test MergeDeferredAdvanced.

Also fix the type of result

Internal functionality to be used by the Merger. This entails 2 additional changes: - AddExtendedColumnRanges needs to be updated to handle the case where a column representation is added to a field during writing after some clusters have already been written; - ShiftAliasColumns needs to properly shift the ids of extended alias columns when called, otherwise a mismatch may happen when serializing the footer

pcanal · 2026-05-23T05:19:25Z

now the merger won't ever adapt the columns' splitness to the output compression (e.g. if merging changes the source compression from 0 to 505 it will still encode the columns as unsplit, and vice-versa). This will probably be readded in a future PR.

indeed, we do need to provide a way for the user to require the L3 type of merging ('urgency' of this is less if we already have a way to for L4 type of merging).

Addendum: L4 is currently not supported, so re-adding L3 would be helpful. In particular to allow re-selection of the compression algorithm used.

pcanal · 2026-05-23T05:23:43Z

+
+| Flag Bit | Introduced in | Name                    | Meaning                                      |
+|----------|---------------|-------------------------|----------------------------------------------|
+| 0        | 1.1.0.0       | Nested Deferred Columns | Signals that the RNTuple contains at least one deferred column that is part of a collection and was extended<br>(i.e. it appears in the footer). This can happen when merging two RNTuples that have the same collection field<br>backed by columns with different encoding, e.g. a `vector<float>` whose elements are represented by SplitReal32<br>in the first ntuple and by Real32 in the second. |


This mentions explicitly collections. Is the feature (merging RNTuple with 'same' column with different representation) not supported for simple type (i.e. just a float instead of a vector<float>)? If not, why not?

pcanal · 2026-05-23T05:24:36Z

I did not mean to close this.

silverweed requested a review from jblomer as a code owner April 22, 2026 15:07

silverweed marked this pull request as draft April 22, 2026 15:07

silverweed changed the title ~~Ntuple merge colrep2~~ [ntuple] Support multiple column representations in the merger Apr 22, 2026

silverweed added the in:RNTuple label Apr 22, 2026

silverweed self-assigned this Apr 22, 2026

silverweed force-pushed the ntuple_merge_colrep2 branch 3 times, most recently from b2ae5fc to 1db6b5e Compare April 22, 2026 15:22

silverweed force-pushed the ntuple_merge_colrep2 branch 9 times, most recently from 82e5299 to d3efcfc Compare April 30, 2026 09:45

silverweed force-pushed the ntuple_merge_colrep2 branch 2 times, most recently from 857d425 to 60b1bde Compare May 4, 2026 13:36

silverweed mentioned this pull request May 4, 2026

[ntuple] Some changes/fixes to better handle feature flags #22135

Merged

2 tasks

silverweed force-pushed the ntuple_merge_colrep2 branch 3 times, most recently from 1d462ee to a249301 Compare May 5, 2026 07:42

silverweed added 6 commits May 13, 2026 17:03

[ntuple] Fix RFieldFundamental::GenerateColumns using the wrong repr idx

6765b07

[ntuple] Small improvement in RNTupleMerger

5e53d8e

Instead of calling continue multiple times in the AddColumnFromField loop, just early return in case of projected fields.

[ntuple] update merger test to make sure we test nRepetitions

4f57dd7

[ntuple] Clarify a bit RFieldBase::EntryToColumnElementIndex

355862b

Also fix the type of result

[ntuple] Add RClusterDescriptor::TryGetColumnRange

da1c6e0

silverweed added 7 commits May 13, 2026 17:03

[ntuple] Introduce feature flag 0 (nested deferred columns)

211b602

[ntuple] add RPagePersistentSink::AddAliasColumn

d59e5ef

[ntuple] Add multiple column representation support in the Merger

8fcbcfa

[ntuple] support merging columns with metadata (with different types)

44c939d

[ntuple] Support merging columns with same type and different metadata

f4c002b

[ntuple][NFC] update Merging.md

ab08930

silverweed force-pushed the ntuple_merge_colrep2 branch from a249301 to ab08930 Compare May 13, 2026 15:05

silverweed marked this pull request as ready for review May 15, 2026 06:48

silverweed requested review from enirolf, hahnjo, pcanal and vepadulano May 15, 2026 06:48

pcanal closed this May 23, 2026

pcanal reviewed May 23, 2026

View reviewed changes

pcanal reopened this May 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ntuple] Support multiple column representations in the merger#22017

[ntuple] Support multiple column representations in the merger#22017
silverweed wants to merge 13 commits into
root-project:masterfrom
silverweed:ntuple_merge_colrep2

silverweed commented Apr 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

pcanal commented May 23, 2026 •

edited

Loading

Uh oh!

pcanal May 23, 2026

Uh oh!

pcanal commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

silverweed commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This Pull request:

IMPORTANT

TODO

Checklist:

Uh oh!

github-actions Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

pcanal commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pcanal May 23, 2026

Choose a reason for hiding this comment

Uh oh!

pcanal commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

silverweed commented Apr 22, 2026 •

edited

Loading

github-actions Bot commented Apr 22, 2026 •

edited

Loading

pcanal commented May 23, 2026 •

edited

Loading