KAFKA-20514: kraft observers should use previous fetch response to decide where to send the next fetch#22111
Open
kevin-wu24 wants to merge 4 commits intoapache:trunkfrom
Open
KAFKA-20514: kraft observers should use previous fetch response to decide where to send the next fetch#22111kevin-wu24 wants to merge 4 commits intoapache:trunkfrom
kevin-wu24 wants to merge 4 commits intoapache:trunkfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
Currently, there is a timing issue where a KRaft observer can be stuck fetching from the leader if the next poll occurs after the previous fetch's backoff has completed, and the previous request did not time out. An example of this is if the leader's advertised endpoints are not routable. The bootstrap server endpoints could contain routable endpoints for the leader, but the observer would be stuck fetching from the unroutable endpoints.
Previously, there was an issue where observers could be stuck fetching from the bootstrap servers even if it discovers leader endpoints from the bootstrap fetch. This is because the fetch timeout is not reset on the observer.
What changed
Observer fetching logic should ensure that within the same epoch, all the bootstrap server endpoints and the leader have a chance to serve fetch requests. This logic should be independent of request manager's state.
A voter has similar functionality where a fetch timeout expiration and a failed pre-vote election results in a reset of the fetch timer to the same leader in the same epoch.
Testing
Added unit test to KafkaRaftClientFetchTest to show fetches oscillate between the leader and bootstrap endpoints based on the fetch timer.