Skip to content

feat(ipc): configurable zstd compression level#9748

Open
andraztori wants to merge 4 commits intoapache:mainfrom
andraztori:zstd-ipc-compression-level
Open

feat(ipc): configurable zstd compression level#9748
andraztori wants to merge 4 commits intoapache:mainfrom
andraztori:zstd-ipc-compression-level

Conversation

@andraztori
Copy link
Copy Markdown

Which issue does this PR close?

  • Closes #NNN.

Rationale for this change

arrow-ipc currently hardcodes zstd to zstd::DEFAULT_COMPRESSION_LEVEL (level 3). Users who want tighter compression (for cold storage / WAN transfer) or faster compression (for hot paths) have no way to tune this without forking the crate.

parquet::basic::Compression::ZSTD(ZstdLevel) already exposes the exact same knob, so users writing both Parquet and IPC get an inconsistent experience today.

This PR adds configurable zstd compression levels to arrow-ipc, mirroring the parquet API as closely as possible so the two stay familiar side-by-side.

What changes are included in this PR?

  • New arrow_ipc::compression::ZstdLevel(i32) — validated newtype matching the shape of parquet::basic::ZstdLevel (same range 1..=22, same try_new / compression_level() / Default).
  • New arrow_ipc::compression::IpcCompression enum — writer-side codec + parameter selector, analogous to parquet::basic::Compression:
    pub enum IpcCompression {
        Lz4Frame,
        Zstd(ZstdLevel),
    }
  • IpcWriteOptions::try_with_compression now takes Option<IpcCompression> instead of Option<CompressionType> (source-breaking change, see below).
  • CompressionContext::with_zstd_level(ZstdLevel) constructor; FileWriter / StreamWriter build their context via the configured level instead of the hardcoded default.
  • ZstdLevel and IpcCompression are re-exported from arrow_ipc::writer so the public surface stays in one place.

On-wire format is unchanged — the IPC flatbuffer BodyCompression.codec enum is 1:1 with the wire codec; the zstd level is a purely writer-side parameter (decoders do not need to know it, same as in parquet).

Are these changes tested?

Yes:

  • test_write_file_with_zstd_non_default_level — writes a record batch at a non-default zstd level through the public FileWriter API and reads it back with the stock FileReader, verifying identity.
  • Existing zstd round-trip / compression tests continue to pass (test_write_file_with_zstd_compression, etc.).
  • All in-crate callers (arrow-ipc tests/benches, arrow-integration-testing) updated to the new IpcCompression type.

Verified locally with cargo fmt, cargo build -p arrow-ipc --all-features, cargo test -p arrow-ipc --all-features (107 unit tests + doctests pass), and builds of arrow-flight / arrow-integration-testing.

Are there any user-facing changes?

Yes — one source-breaking change to a public API:

// Before:
pub fn try_with_compression(self, batch_compression: Option<CompressionType>) -> Result<Self, ArrowError>

// After:
pub fn try_with_compression(self, batch_compression: Option<IpcCompression>) -> Result<Self, ArrowError>

Call-site migration:

// Before
.try_with_compression(Some(CompressionType::ZSTD))?
.try_with_compression(Some(CompressionType::LZ4_FRAME))?

// After
.try_with_compression(Some(IpcCompression::zstd_default()))?           // same behavior as before
.try_with_compression(Some(IpcCompression::Zstd(ZstdLevel::try_new(9)?)))? // new: non-default level
.try_with_compression(Some(IpcCompression::Lz4Frame))?

Because this is a breaking change, it should land in the next major release (59.0.0). Happy to gate or defer if maintainers prefer.


Disclosure

This PR was drafted with AI assistance (Cursor / Anthropic Claude). All code has been reviewed, built, tested, and formatted locally by me. The design was chosen to mirror existing parquet crate conventions; no LLM-authored code was committed without review.

Made with Cursor

Add IpcCompression::Zstd(ZstdLevel) and thread the level into
CompressionContext, mirroring parquet::basic::Compression::ZSTD.
IpcWriteOptions::try_with_compression now takes Option<IpcCompression>
(breaking source change; on-wire format unchanged).

Co-generated-by: Cursor/Opus
@github-actions github-actions Bot added the arrow Changes to the arrow crate label Apr 16, 2026
@alamb alamb added the api-change Changes to the arrow API label Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-change Changes to the arrow API arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants