feat: handle cross-batch schema evolution in ArrowToParquetWriter (#3… by AyushPatel101 · Pull Request #3896 · dlt-hub/dlt

AyushPatel101 · 2026-04-27T16:56:23Z

Description

arrow_concat_promote_options currently only handles type mismatches within a single flush batch (via pa.concat_tables). But pyarrow.ParquetWriter locks its schema on the first write_table() call, so mismatches that span different flush batches crash with ArrowInvalid - even for safe promotions like float32 -> float64.

This makes correctness depend on data volume: a pipeline that works with 2000 rows per batch crashes with 6000 rows when batches land in separate flushes.

This PR extends ArrowToParquetWriter.write_data() to reconcile schemas across flush batches using pa.unify_schemas() with the same promote_options value already used for within-batch concat:

Incoming narrower than writer (e.g. float32 into float64 writer): cast up to match. Lossless, same file.
Incoming wider than writer (e.g. float64 into float32 writer): rotate to a new parquet file. Destinations already handle multiple files per table.

promote_options="none" (default) is completely unchanged.

Related Issues

Fixes ParquetWriter rejects cross-batch type mismatches that arrow_concat_promote_options should handle #3895

…t-hub#3895) ParquetWriter locks schema on first write_table() call, rejecting subsequent batches with different types even when arrow_concat_promote_options is set to handle them. This extends type promotion to work across flush batch boundaries by casting narrower types up or rotating to a new file for wider types.

Ayush Patel added 2 commits April 27, 2026 11:52

refactor: rename SchemaChanged to SchemaEvolutionRequired

c2404d9

rudolfix self-assigned this Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: handle cross-batch schema evolution in ArrowToParquetWriter (#3…#3896

feat: handle cross-batch schema evolution in ArrowToParquetWriter (#3…#3896
AyushPatel101 wants to merge 2 commits intodlt-hub:develfrom
AyushPatel101:fix/3895-cross-batch-schema-promotion

AyushPatel101 commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AyushPatel101 commented Apr 27, 2026

Description

Related Issues

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants