puller(ticdc): cap resolve lock target ts by PD tso (#12741)#12743
Conversation
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
There was a problem hiding this comment.
Code Review
This pull request caps the target timestamp for lock resolution using the current PD TSO, preventing potential issues with uncapped timestamps. It also adds corresponding unit tests and a new integration test DDL. The reviewer suggested returning a nil error when the PD client is nil in GetTS to prevent log spamming during unit tests.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| func (s *SharedClient) GetTS(ctx context.Context) (int64, int64, error) { | ||
| if s.pd == nil { | ||
| return 0, 0, errors.New("pd client is nil") | ||
| } | ||
| return s.pd.GetTS(ctx) | ||
| } |
There was a problem hiding this comment.
In unit tests (such as TestMultiplexingPullerResolvedForward), s.pd is often nil. Returning an error in this case causes runResolveLockChecker to log a warning (get ts from pd failed) every 2 seconds, which spams the test logs.
Since currentTs being 0 naturally disables lock resolution (as targetTs becomes 0), returning 0, 0, nil when s.pd is nil is a clean way to silence these warnings in tests without affecting production (where s.pd is always initialized).
| func (s *SharedClient) GetTS(ctx context.Context) (int64, int64, error) { | |
| if s.pd == nil { | |
| return 0, 0, errors.New("pd client is nil") | |
| } | |
| return s.pd.GetTS(ctx) | |
| } | |
| func (s *SharedClient) GetTS(ctx context.Context) (int64, int64, error) { | |
| if s.pd == nil { | |
| return 0, 0, nil | |
| } | |
| return s.pd.GetTS(ctx) | |
| } |
Codecov Report❌ Patch coverage is Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. @@ Coverage Diff @@
## release-8.5 #12743 +/- ##
================================================
Coverage ? 53.5572%
================================================
Files ? 1006
Lines ? 138224
Branches ? 0
================================================
Hits ? 74029
Misses ? 58655
Partials ? 5540 🚀 New features to boost your workflow:
|
|
@tenfyzhong: adding LGTM is restricted to approvers and reviewers in OWNERS files. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test pull-cdc-integration-mysql-test |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lidezhu, tenfyzhong The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest-required |
|
/test pull-cdc-integration-mysql-test |
This is an automated cherry-pick of #12741
What problem does this PR solve?
Issue Number: close #12740
TiCDC stale-lock resolving can derive the
ScanLocktarget timestamp from local clock based resolved-time calculations. If the local CDC clock is ahead of PD time, the target timestamp can exceed the latest PD TSO, which may advance TiKV localMaxTSand make async-commit residual locks fail to resolve withcommit_ts_expired.This ports the same fix as pingcap/ticdc#5419 to TiFlow.
What is changed and how it works?
This PR changes the TiCDC multiplexing puller resolve-lock checker to fetch the current TSO from PD before scanning stale locks and cap the stale-lock target timestamp by that PD
currentTs.The freshness fence for
resolvedTsUpdatedstill uses the local clock becauseresolvedTsUpdatedis written withtime.Now(). The PD-derived timestamp is used only as the upper bound for theScanLocktarget timestamp.It also adds unit coverage for clock skew, target timestamp capping, and the existing fence conditions.
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
No compatibility impact is expected. The resolve-lock checker now performs one PD
GetTScall per resolve-lock tick before scheduling stale-lock scans.Do you need to update user documentation, design documentation or monitoring documentation?
No.
Release note