Skip to content

HBASE-29912 Codel lifoThreshold should be applied on soft queue limit…#7864

Open
Umeshkumar9414 wants to merge 4 commits into
apache:masterfrom
Umeshkumar9414:HBASE-29912
Open

HBASE-29912 Codel lifoThreshold should be applied on soft queue limit…#7864
Umeshkumar9414 wants to merge 4 commits into
apache:masterfrom
Umeshkumar9414:HBASE-29912

Conversation

@Umeshkumar9414
Copy link
Copy Markdown
Contributor

@Umeshkumar9414 Umeshkumar9414 commented Mar 5, 2026

… instead of hard limit

I think that after HBASE-16089 codel target delay was changed to 100, that is not standard for CoDel (patch). I have changed it back.
To visualize why we should have 5 target delay with 100 interval, I used the model, With target delay 100 and interval 100 we are more prone the queue being full, although CoDel will be still helpful in LIFO mode but a full queue is problem as it can cause OOM.


// so we can calculate actual threshold to switch to LIFO under load
private int maxCapacity;
private int currentQueueLimit;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this can be updated with onConfigurationChange should this be volatile ? Just a nit.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also confusing in that it seems to shadow a variable of the same name that is declared volatile?

Please don't shadow. Why do we need this separately?

Copy link
Copy Markdown
Contributor Author

@Umeshkumar9414 Umeshkumar9414 Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made currentQueueLimit volatile.

Please don't shadow. Why do we need this separately?

Generally a queue should only handle hard limit and layer above it can handle soft limit and that is what was happening but for codel, we needed queue to know the soft limit as well, I didn't saw a way to use the same variable at both place.

Copy link
Copy Markdown
Contributor

@apurtell apurtell Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least let's not shadow the parent class's variable with the same name. That is a code smell. Pick another one.

// Multiple calls would incorrectly use the hard limit as the soft limit.
// As all the queues has same initArgs and queueClass, there should be no need to call this again.
protected void initializeQueues(final int numQueues) {
if (!queues.isEmpty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this guard?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no we don't need it now.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? DOesn't it change the early out behavior?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added this guard becuase we used to initialize soft limit from queueInitArgs[0] and then change it to hard limit. Now if we call it again. We will set soft limit with queueInitArgs[0] that is hard limit now.

From this change we are moving the calculation of soft limit and hard limit to RPCExecutor.

int numCalls = 48;
for (int i = 0; i < numCalls; i++) {
final int callId = i;
CallRunner call = createMockTask(HConstants.NORMAL_QOS);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in two more cases createMockTask will mock a ServerCall without setting getReceiveTime(). It will default to 0. Doesn't this mess up the call delay calculation that coDel will do? It will be something like callDelay = System.currentTimeMillis() - 0 . I guess your tests are passing but CoDel will immediately detect overload. The LIFO switch counter will still increment but other aspects are probably wrong.
You should look at this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it. Thanks

@Umeshkumar9414
Copy link
Copy Markdown
Contributor Author

Yetus JDK17 Hadoop3 Unit check failed on TestNamespaceReplication Test case with timout, I checked on my local, it is passing. No related change in this PR as well. Looks like a flakey.

@Umeshkumar9414
Copy link
Copy Markdown
Contributor Author

cc @saintstack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants