Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions parquet/src/encodings/encoding/dict_encoder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ impl<T: DataType> Storage for KeyStorage<T> {
}

fn estimated_memory_size(&self) -> usize {
self.size_in_bytes + self.uniques.capacity() * std::mem::size_of::<T::T>()
self.uniques.capacity() * std::mem::size_of::<T::T>()
}
}

Expand Down Expand Up @@ -183,6 +183,6 @@ impl<T: DataType> Encoder<T> for DictEncoder<T> {
///
/// For this encoder, the indices are unencoded bytes (refer to [`Self::write_indices`]).
fn estimated_memory_size(&self) -> usize {
self.interner.storage().size_in_bytes + self.indices.len() * std::mem::size_of::<usize>()
self.interner.estimated_memory_size() + self.indices.len() * std::mem::size_of::<usize>()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right direction (in the sense of account for the size in the structures that hold the memory, rather than the wrapper)

However, it is not clear to me that KeyStorage includes the heap bytes for types like BYTE_ARRAY

I think we need to add some tests for this code to make sure we have it right

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing upper bounds would require intimate knowledge of reallocation behavior of Vec and hashbrown, but I'll try to get some confirmation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you are correct about the byte arrays, I need to account for variable- and fixed length array values.

Copy link
Copy Markdown
Contributor Author

@mzabaluev mzabaluev Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have accounted for the byte arrays' allocations and added some tests.

}
}
Loading