KAFKA-20456: Bound waitForFuture() timeout to less than max.poll.interval#22092
KAFKA-20456: Bound waitForFuture() timeout to less than max.poll.interval#22092bbejeck wants to merge 3 commits intoapache:trunkfrom
Conversation
UladzislauBlok
left a comment
There was a problem hiding this comment.
LGTM overall, but I have one question:
Don't you think max.poll.interval.ms / 2 is too strict? It can be something with percentage, e.g. max.poll.interval.ms * 0.9 with conversion back to long
| final StreamsConfig config = new StreamsConfig(properties); | ||
| thread = createStreamThread(CLIENT_ID, config); | ||
|
|
||
| assertThat(thread.taskManager().waitForFutureTimeoutMs(), equalTo(10_000L)); |
There was a problem hiding this comment.
minor: shouldn't we use assertj?
There was a problem hiding this comment.
It's not a dependency in AK right now, we use Junit+Hamcrest, plus IMHO what's there now is clear enough.
I'm not sure. One thing, this PR part of another stacked one - #22094 this fix isn't meant to go in by itself, so I think giving up early is ok as the task gets cleaned up. My concern would be not giving the |
|
Hey |
TaskManager.waitForFuture()previously used a hardcoded 5-minutetimeout when waiting for the
StateUpdaterto process a REMOVE action.When the
StateUpdateris blocked (e.g., RocksDB write stall duringchangelog restoration), the StreamThread blocks for the full timeout
duration and cannot poll, exceeding
max.poll.interval.msand gettingkicked from the consumer group. This triggers a rebalance cascade
that can lead to a crash loop.
max.poll.interval.msconfig (maxPollIntervalMs / 2), ensuring the
StreamThreadregainscontrol and can poll before the consumer is removed from the group.
times out and the task is silently dropped) is addressed in a follow-up
PR.
Reviewers: Uladzislau Blok
123193120+UladzislauBlok@users.noreply.github.com