puller(ticdc): cap resolve lock target ts by PD tso#12741
Conversation
Fetch the current PD TSO before scheduling stale-lock resolution and cap the ScanLock target timestamp by that value. This prevents local clock skew from producing a target timestamp ahead of PD time. Add unit coverage for the cap and freshness fence conditions. Close pingcap#12740 Signed-off-by: tenfyzhong <tenfy@tenfy.cn>
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to cap the target timestamp for resolving locks using the current PD TSO, preventing potential issues with uncapped timestamps. It adds a GetTS method to SharedClient, updates resolveLock logic in MultiplexingPuller to use this timestamp, and includes corresponding unit tests. The reviewer suggests handling context cancellation gracefully in runResolveLockChecker to avoid spammy warning logs when the context is canceled, and adding namespace and changefeed fields to the warning log for better observability.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| physical, logical, err := p.client.GetTS(ctx) | ||
| if err != nil { | ||
| log.Warn("get ts from pd failed", zap.Error(err)) | ||
| continue | ||
| } |
There was a problem hiding this comment.
When the context is canceled (e.g., during changefeed shutdown or pause), p.client.GetTS(ctx) will fail with a context cancellation error. Logging this as a warning can be misleading and spammy. We should check ctx.Err() and return early if the context is canceled.
Additionally, to improve observability and make troubleshooting easier in production environments with multiple changefeeds, we should include the namespace and changefeed fields in the warning log.
physical, logical, err := p.client.GetTS(ctx)
if err != nil {
if ctx.Err() != nil {
return ctx.Err()
}
log.Warn("get ts from pd failed",
zap.String("namespace", p.changefeed.Namespace),
zap.String("changefeed", p.changefeed.ID),
zap.Error(err))
continue
}|
close tikv/tikv#19755 |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: asddongmen, hongyunyan The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
…patibility - TiDB classic rejects starter-only FULLTEXT indexes, so the index is commented out. - This ensures the test DDL can be executed on TiDB without errors. Signed-off-by: tenfyzhong <tenfy@tenfy.cn>
|
/retest-required |
|
In response to a cherrypick label: new pull request created to branch |
|
In response to a cherrypick label: new pull request created to branch |
What problem does this PR solve?
Issue Number: close #12740
TiCDC stale-lock resolving can derive the
ScanLocktarget timestamp from local clock based resolved-time calculations. If the local CDC clock is ahead of PD time, the target timestamp can exceed the latest PD TSO, which may advance TiKV localMaxTSand make async-commit residual locks fail to resolve withcommit_ts_expired.This ports the same fix as pingcap/ticdc#5419 to TiFlow.
What is changed and how it works?
This PR changes the TiCDC multiplexing puller resolve-lock checker to fetch the current TSO from PD before scanning stale locks and cap the stale-lock target timestamp by that PD
currentTs.The freshness fence for
resolvedTsUpdatedstill uses the local clock becauseresolvedTsUpdatedis written withtime.Now(). The PD-derived timestamp is used only as the upper bound for theScanLocktarget timestamp.It also adds unit coverage for clock skew, target timestamp capping, and the existing fence conditions.
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
No compatibility impact is expected. The resolve-lock checker now performs one PD
GetTScall per resolve-lock tick before scheduling stale-lock scans.Do you need to update user documentation, design documentation or monitoring documentation?
No.
Release note