Skip to content

Simplify flexible versions#1139

Open
vmaurin wants to merge 2 commits intoaio-libs:masterfrom
vmaurin:simplify_flexible_versions
Open

Simplify flexible versions#1139
vmaurin wants to merge 2 commits intoaio-libs:masterfrom
vmaurin:simplify_flexible_versions

Conversation

@vmaurin
Copy link
Copy Markdown
Contributor

@vmaurin vmaurin commented Nov 27, 2025

The flexible versions is a protocol specificity for newer versions of the API. When an API is flexible, it is using more compact structures and also allow additional "dynamic" fields that could be added without the need to introduce a new API versions.

This commit move the flexible versions support to the protocol layer, so it is more transparent and easy when defining Struct classes and schemas.

When defining the schema, we can specify a tagged field with a tuple containing the field name and the field tag.

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • Add a new news fragment into the CHANGES folder
    • name it <issue_id>.<type> (e.g. 588.bugfix)
    • if you don't have an issue_id change it to the pr id after creating the PR
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: Fix issue with non-ascii contents in doctest text files.

@vmaurin vmaurin force-pushed the simplify_flexible_versions branch from 9289e11 to 8622716 Compare November 27, 2025 20:35
@classmethod
@abc.abstractmethod
def encode(cls, value: T) -> bytes: ...
def encode(cls, value: T, flexible: bool) -> bytes: ...

Check notice

Code scanning / CodeQL

Statement has no effect Note

This statement has no effect.
@classmethod
@abc.abstractmethod
def decode(cls, data: BytesIO) -> T: ...
def decode(cls, data: BytesIO, flexible: bool) -> T: ...

Check notice

Code scanning / CodeQL

Statement has no effect Note

This statement has no effect.
@vmaurin vmaurin marked this pull request as draft November 27, 2025 20:37
@codecov
Copy link
Copy Markdown

codecov Bot commented Nov 27, 2025

Codecov Report

❌ Patch coverage is 97.81022% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.23%. Comparing base (00b099f) to head (7c67eba).

Files with missing lines Patch % Lines
aiokafka/protocol/message.py 75.00% 2 Missing ⚠️
aiokafka/protocol/types.py 98.68% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1139      +/-   ##
==========================================
+ Coverage   94.97%   95.23%   +0.26%     
==========================================
  Files          89       89              
  Lines       16041    15987      -54     
  Branches     1397     1387      -10     
==========================================
- Hits        15235    15226       -9     
+ Misses        556      516      -40     
+ Partials      250      245       -5     
Flag Coverage Δ
cext 95.20% <97.81%> (+0.26%) ⬆️
integration 95.12% <97.81%> (+0.27%) ⬆️
purepy 95.20% <97.81%> (+0.26%) ⬆️
unit 52.53% <96.35%> (+0.18%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vmaurin vmaurin force-pushed the simplify_flexible_versions branch 3 times, most recently from 31b6d5a to b38742d Compare November 27, 2025 21:27
@vmaurin vmaurin marked this pull request as ready for review November 27, 2025 21:46
@vmaurin vmaurin force-pushed the simplify_flexible_versions branch from b38742d to c9d49dd Compare December 1, 2025 09:11
@vmaurin
Copy link
Copy Markdown
Contributor Author

vmaurin commented Dec 1, 2025

@ods Let me know if you need additional info. The main reason I would need this to improve the API version coverage is to be able to support/specify tagged field properly. I took inspiration from the java client JSON format, where you give the tag of a field along with the name, like here https://github.com/apache/kafka/blob/trunk/clients/src/main/resources/common/message/ApiVersionsResponse.json#L64

@ods
Copy link
Copy Markdown
Collaborator

ods commented Dec 2, 2025

@vmaurin Thank you for the contribution, and sorry for the delay. I’m not familiar enough with this code for quick answer, so I need to find some time for research.

@vmaurin
Copy link
Copy Markdown
Contributor Author

vmaurin commented Dec 2, 2025

@vmaurin Thank you for the contribution, and sorry for the delay. I’m not familiar enough with this code for quick answer, so I need to find some time for research.

No problem @ods My overall goal here is to have something closer to the java client for schemas definitions. In java client, they have these extended json format (one per API request, one per API response) that are then used to generate a java classes. Being in Python, it is probably better to express schema in Python, and we don't really need the code generation as we have the class level facilities.
My issues with the current implementation of flexible versions/tagged fields:

  • compact structure need to be explicitly declared, while a boolean saying "use compact structure" should be enough to properly encode and decode "normal" type in schema (String, Arrays, Bytes)
  • tagged fields are not meant to be used passing a dict. They should be treated as "normal" property of API, but it allows API versions to be forward compatible, just ignoring new tagged fields
  • I am also fixing a bug serializing tagged fields (serializing the size was missing)

Comment on lines +135 to +137
UnsignedVarInt32.encode(0)
if flexible and self.allow_flexible
else Int16.encode(-1, flexible)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protocol everywhere specify either STRING or COMPACT_STRING. Why do we switch inside of the single class based on property which is not directly related?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took inspiration from the java client schema's json files like here https://github.com/apache/kafka/blob/trunk/clients/src/main/resources/common/message/FindCoordinatorRequest.json#L36

When you specify the schema, it is easier and less to just say "it is a String" and mark the flexible versions rather than having to remember it is a more compact version everywhere. The same of avoiding at each level of schemas to specify it can accept flexible fields.

For flexible fields, like in the java client json files, it is easier to declare it as other fields, with a name and type + the additional tag id, rather than declaring a generic structure on every structs and then having an extra layer of serialization on top

("name", String("utf-8")),
(
"partitions",
CompactArray(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From code here it's not obvious if correct (compact) form will be used

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to #1139 (comment)

In the java client json, you can see they just say "it is an array" https://github.com/apache/kafka/blob/trunk/clients/src/main/resources/common/message/AlterPartitionReassignmentsResponse.json#L36

Then, it is because the version is marked "flexible" that it is using the more compact serialization

Comment thread aiokafka/protocol/api.py
@vmaurin vmaurin force-pushed the simplify_flexible_versions branch 2 times, most recently from ac55e29 to 7880f3d Compare December 12, 2025 08:43
@vmaurin vmaurin force-pushed the simplify_flexible_versions branch from 7880f3d to 0655add Compare January 5, 2026 14:27
@sheinbergon
Copy link
Copy Markdown

@vmaurin @ods this is a blocker towards kafka 4.x compatibility right? anything I can do to help here?

@vmaurin
Copy link
Copy Markdown
Contributor Author

vmaurin commented Jan 27, 2026

@vmaurin @ods this is a blocker towards kafka 4.x compatibility right? anything I can do to help here?

Yes and no. Current version should be compatible with 4.x as main kafka project rollbacked their deprecation plans. Still, they might plan to deprecated some versions of message in future, so we should try to be up to date.

About this MR:
There is already a flexible field implementations in the current master branch, but it has a bug + it is not very convenient to define the message schemas. My idea with this MR is to make the flexible fields easy to define and use, similar to what it is done in the official java client json "schemas". It was "ready" to go, but it seems there was some test failures on the latest rebase I did (not sure then if it is flaky tests or a real issue)

The flexible versions is a protocol specificity for newer versions of
the API. When an API is flexible, it is using more compact structures
and also allow additional "dynamic" fields that could be added without
the need to introduce a new API versions.

This commit move the flexible versions support to the protocol layer, so
it is more transparent and easy when defining Struct classes and
schemas.

When defining the schema, we can specify a tagged field with a tuple
containing the field name and the field tag.
@vmaurin vmaurin force-pushed the simplify_flexible_versions branch from 0655add to 3a7003e Compare February 2, 2026 11:15
@vmaurin
Copy link
Copy Markdown
Contributor Author

vmaurin commented Apr 23, 2026

@ods A small reminder about this one, let me know if I should make it ready to merge again (it is a bit outdated)

About this MR: There is already a flexible field implementations in the current master branch, but it has a bug + it is not very convenient to define the message schemas. My idea with this MR is to make the flexible fields easy to define and use, similar to what it is done in the official java client json "schemas".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants