Skip to content

puller(ticdc): cap resolve lock target ts by PD tso (#12741)#12743

Merged
ti-chi-bot[bot] merged 1 commit into
pingcap:release-8.5from
ti-chi-bot:cherry-pick-12741-to-release-8.5
Jun 29, 2026
Merged

puller(ticdc): cap resolve lock target ts by PD tso (#12741)#12743
ti-chi-bot[bot] merged 1 commit into
pingcap:release-8.5from
ti-chi-bot:cherry-pick-12741-to-release-8.5

Conversation

@ti-chi-bot

Copy link
Copy Markdown
Member

This is an automated cherry-pick of #12741

What problem does this PR solve?

Issue Number: close #12740

TiCDC stale-lock resolving can derive the ScanLock target timestamp from local clock based resolved-time calculations. If the local CDC clock is ahead of PD time, the target timestamp can exceed the latest PD TSO, which may advance TiKV local MaxTS and make async-commit residual locks fail to resolve with commit_ts_expired.

This ports the same fix as pingcap/ticdc#5419 to TiFlow.

What is changed and how it works?

This PR changes the TiCDC multiplexing puller resolve-lock checker to fetch the current TSO from PD before scanning stale locks and cap the stale-lock target timestamp by that PD currentTs.

The freshness fence for resolvedTsUpdated still uses the local clock because resolvedTsUpdated is written with time.Now(). The PD-derived timestamp is used only as the upper bound for the ScanLock target timestamp.

It also adds unit coverage for clock skew, target timestamp capping, and the existing fence conditions.

Check List

Tests

  • Unit test
go test ./cdc/puller -run TestGetResolveLockTargetTs -count=1
go test -tags=intest ./cdc/puller -count=1
go test ./cdc/kv -run TestResolveLock -count=1

Questions

Will it cause performance regression or break compatibility?

No compatibility impact is expected. The resolve-lock checker now performs one PD GetTS call per resolve-lock tick before scheduling stale-lock scans.

Do you need to update user documentation, design documentation or monitoring documentation?

No.

Release note

Fix a TiCDC stale-lock resolving issue where the ScanLock target timestamp could advance TiKV local MaxTS beyond the latest PD TSO and make async-commit residual locks fail to resolve.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot ti-chi-bot added lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR. labels Jun 25, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request caps the target timestamp for lock resolution using the current PD TSO, preventing potential issues with uncapped timestamps. It also adds corresponding unit tests and a new integration test DDL. The reviewer suggested returning a nil error when the PD client is nil in GetTS to prevent log spamming during unit tests.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread cdc/kv/shared_client.go
Comment on lines +280 to +285
func (s *SharedClient) GetTS(ctx context.Context) (int64, int64, error) {
if s.pd == nil {
return 0, 0, errors.New("pd client is nil")
}
return s.pd.GetTS(ctx)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In unit tests (such as TestMultiplexingPullerResolvedForward), s.pd is often nil. Returning an error in this case causes runResolveLockChecker to log a warning (get ts from pd failed) every 2 seconds, which spams the test logs.

Since currentTs being 0 naturally disables lock resolution (as targetTs becomes 0), returning 0, 0, nil when s.pd is nil is a clean way to silence these warnings in tests without affecting production (where s.pd is always initialized).

Suggested change
func (s *SharedClient) GetTS(ctx context.Context) (int64, int64, error) {
if s.pd == nil {
return 0, 0, errors.New("pd client is nil")
}
return s.pd.GetTS(ctx)
}
func (s *SharedClient) GetTS(ctx context.Context) (int64, int64, error) {
if s.pd == nil {
return 0, 0, nil
}
return s.pd.GetTS(ctx)
}

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 26.31579% with 14 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release-8.5@12921db). Learn more about missing BASE report.

Additional details and impacted files
Components Coverage Δ
cdc 57.5472% <0.0000%> (?)
dm 49.4553% <0.0000%> (?)
engine 50.6547% <0.0000%> (?)
Flag Coverage Δ
cdc 57.5472% <26.3157%> (?)
unit 53.5572% <26.3157%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

@@               Coverage Diff                @@
##             release-8.5     #12743   +/-   ##
================================================
  Coverage               ?   53.5572%           
================================================
  Files                  ?       1006           
  Lines                  ?     138224           
  Branches               ?          0           
================================================
  Hits                   ?      74029           
  Misses                 ?      58655           
  Partials               ?       5540           
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ti-chi-bot

ti-chi-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

@tenfyzhong: adding LGTM is restricted to approvers and reviewers in OWNERS files.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tenfyzhong

Copy link
Copy Markdown
Contributor

/test pull-cdc-integration-mysql-test

@ti-chi-bot

ti-chi-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lidezhu, tenfyzhong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the approved label Jun 25, 2026
@tenfyzhong

Copy link
Copy Markdown
Contributor

/retest-required

@tenfyzhong

Copy link
Copy Markdown
Contributor

/test pull-cdc-integration-mysql-test

@ti-chi-bot ti-chi-bot Bot added cherry-pick-approved Cherry pick PR approved by release team. and removed do-not-merge/cherry-pick-not-approved labels Jun 29, 2026
@ti-chi-bot ti-chi-bot Bot merged commit 21bdf3f into pingcap:release-8.5 Jun 29, 2026
27 checks passed
@ti-chi-bot ti-chi-bot Bot deleted the cherry-pick-12741-to-release-8.5 branch June 29, 2026 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved cherry-pick-approved Cherry pick PR approved by release team. lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants