puller(ticdc): cap resolve lock target ts by PD tso (#12741) by ti-chi-bot · Pull Request #12742 · pingcap/tiflow

ti-chi-bot · 2026-06-25T01:56:17Z

This is an automated cherry-pick of #12741

What problem does this PR solve?

Issue Number: close #12740

TiCDC stale-lock resolving can derive the ScanLock target timestamp from local clock based resolved-time calculations. If the local CDC clock is ahead of PD time, the target timestamp can exceed the latest PD TSO, which may advance TiKV local MaxTS and make async-commit residual locks fail to resolve with commit_ts_expired.

This ports the same fix as pingcap/ticdc#5419 to TiFlow.

What is changed and how it works?

This PR changes the TiCDC multiplexing puller resolve-lock checker to fetch the current TSO from PD before scanning stale locks and cap the stale-lock target timestamp by that PD currentTs.

The freshness fence for resolvedTsUpdated still uses the local clock because resolvedTsUpdated is written with time.Now(). The PD-derived timestamp is used only as the upper bound for the ScanLock target timestamp.

It also adds unit coverage for clock skew, target timestamp capping, and the existing fence conditions.

Check List

Tests

Unit test

go test ./cdc/puller -run TestGetResolveLockTargetTs -count=1
go test -tags=intest ./cdc/puller -count=1
go test ./cdc/kv -run TestResolveLock -count=1

Questions

Will it cause performance regression or break compatibility?

No compatibility impact is expected. The resolve-lock checker now performs one PD GetTS call per resolve-lock tick before scheduling stale-lock scans.

Do you need to update user documentation, design documentation or monitoring documentation?

No.

Release note

Fix a TiCDC stale-lock resolving issue where the ScanLock target timestamp could advance TiKV local MaxTS beyond the latest PD TSO and make async-commit residual locks fail to resolve.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

ti-chi-bot · 2026-06-25T01:56:20Z

This cherry pick PR is for a release branch and has not yet been approved by triage owners.
Adding the do-not-merge/cherry-pick-not-approved label.

To merge this cherry pick:

It must be LGTMed and approved by the reviewers firstly.
For pull requests to TiDB-x branches, it must have no failed tests.
AFTER it has lgtm and approved labels, please wait for the cherry-pick merging approval from triage owners.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ti-chi-bot · 2026-06-25T01:56:21Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign 5kbpers for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2026-06-25T01:56:22Z

@tenfyzhong This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

ti-chi-bot · 2026-06-25T01:56:24Z

@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

gemini-code-assist

Code Review

This pull request aims to cap the resolve lock target timestamp by the PD TSO in the multiplexing puller. However, the changes introduce unresolved git conflict markers, duplicate/broken definitions, and compilation errors (such as referencing the non-existent p.pdClock and p.subscriptionIDs). Additionally, the feedback highlights a correctness and testability issue in getResolveLockTargetTs where the real system clock is used instead of the passed-in currentTime parameter, which should also be addressed in the corresponding unit tests to ensure determinism.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-25T01:57:42Z

+<<<<<<< HEAD
 type tableProgressWithSubID struct {
+=======
+func (p *tableProgress) handleResolvedSpans(ctx context.Context, e *model.ResolvedSpans) (err error) {
+	for _, resolvedSpan := range e.Spans {
+		if !spanz.IsSubSpan(resolvedSpan.Span, p.spans...) {
+			log.Panic("the resolved span is not in the table spans",
+				zap.String("namespace", p.changefeed.Namespace),
+				zap.String("changefeed", p.changefeed.ID),
+				zap.String("tableName", p.tableName),
+				zap.Any("spans", p.spans))
+		}
+		p.tsTracker.Forward(resolvedSpan.Region, resolvedSpan.Span, e.ResolvedTs)
+		if e.ResolvedTs > p.maxIngressResolvedTs.Load() {
+			p.maxIngressResolvedTs.Store(e.ResolvedTs)
+		}
+	}
+	resolvedTs := p.tsTracker.Frontier()
+
+	if resolvedTs > 0 && p.initialized.CompareAndSwap(false, true) {
+		log.Info("puller is initialized",
+			zap.String("namespace", p.changefeed.Namespace),
+			zap.String("changefeed", p.changefeed.ID),
+			zap.String("tableName", p.tableName),
+			zap.Any("tableID", p.spans),
+			zap.Uint64("resolvedTs", resolvedTs),
+			zap.Duration("duration", time.Since(p.start)),
+		)
+	}
+	if resolvedTs > p.resolvedTs.Load() {
+		p.resolvedTs.Store(resolvedTs)
+		p.resolvedTsUpdated.Store(time.Now().Unix())
+		raw := &model.RawKVEntry{CRTs: resolvedTs, OpType: model.OpTypeResolved}
+		err = p.consume.f(ctx, raw, p.spans)
+	}
+
+	return
+}
+
+func (p *tableProgress) getResolveLockTargetTs(currentTime time.Time, currentTs uint64) uint64 {
+	resolvedTsUpdated := time.Unix(p.resolvedTsUpdated.Load(), 0)
+	if !p.initialized.Load() || time.Since(resolvedTsUpdated) < resolveLockFence {
+		return 0
+	}
+	resolvedTs := p.resolvedTs.Load()
+	resolvedTime := oracle.GetTimeFromTS(resolvedTs)
+	if currentTime.Sub(resolvedTime) < resolveLockFence {
+		return 0
+	}
+
+	return min(currentTs, oracle.GoTimeToTS(resolvedTime.Add(resolveLockFence)))
+}
+
+func (p *tableProgress) resolveLock(currentTime time.Time, currentTs uint64) {
+	targetTs := p.getResolveLockTargetTs(currentTime, currentTs)
+	if targetTs == 0 {
+		return
+	}
+
+	for _, subID := range p.subscriptionIDs {
+		p.client.ResolveLock(subID, targetTs)
+	}
+}
+
+type subscription struct {
+>>>>>>> 42ad8a6dc4 (puller(ticdc): cap resolve lock target ts by PD tso (#12741))


There are unresolved git conflict markers in this file, and the cherry-pick has introduced duplicate/broken definitions of handleResolvedSpans and resolveLock.

Additionally, there is a correctness/testability bug in getResolveLockTargetTs: it uses time.Since(resolvedTsUpdated) which depends on the real system clock, instead of using the passed-in currentTime parameter (i.e., currentTime.Sub(resolvedTsUpdated)). This makes the function non-deterministic and can cause test flakiness or incorrect behavior under clock skew.

Please resolve the conflict markers and apply the following fixes:

Keep tableProgressWithSubID and define getResolveLockTargetTs using currentTime.Sub(resolvedTsUpdated).

Define the updated resolveLock taking currentTs and using p.subIDs (instead of the undefined p.subscriptionIDs).

Note: Since Go does not support method overloading, you must also delete the old, duplicate definitions of handleResolvedSpans (at line 486) and resolveLock (at line 519) that were left outside of the conflict markers during the cherry-pick.

type tableProgressWithSubID struct { *tableProgress subID kv.SubscriptionID } func (p *tableProgress) getResolveLockTargetTs(currentTime time.Time, currentTs uint64) uint64 { resolvedTsUpdated := time.Unix(p.resolvedTsUpdated.Load(), 0) if !p.initialized.Load() || currentTime.Sub(resolvedTsUpdated) < resolveLockFence { return 0 } resolvedTs := p.resolvedTs.Load() resolvedTime := oracle.GetTimeFromTS(resolvedTs) if currentTime.Sub(resolvedTime) < resolveLockFence { return 0 } return min(currentTs, oracle.GoTimeToTS(resolvedTime.Add(resolveLockFence))) } func (p *tableProgress) resolveLock(currentTime time.Time, currentTs uint64) { targetTs := p.getResolveLockTargetTs(currentTime, currentTs) if targetTs == 0 { return } for _, subID := range p.subIDs { p.client.ResolveLock(subID, targetTs) } }

gemini-code-assist · 2026-06-25T01:57:42Z

+<<<<<<< HEAD
 		currentTime := p.client.GetPDClock().CurrentTime()
+=======
+		physical, logical, err := p.client.GetTS(ctx)
+		if err != nil {
+			log.Warn("get ts from pd failed", zap.Error(err))
+			continue
+		}
+		currentTs := oracle.ComposeTS(physical, logical)
+		currentTime := p.pdClock.CurrentTime()
+>>>>>>> 42ad8a6dc4 (puller(ticdc): cap resolve lock target ts by PD tso (#12741))


There are unresolved git conflict markers here. Additionally, p.pdClock is used, but MultiplexingPuller does not have a pdClock field. It should use p.client.GetPDClock() instead.

physical, logical, err := p.client.GetTS(ctx) if err != nil { log.Warn("get ts from pd failed", zap.Error(err)) continue } currentTs := oracle.ComposeTS(physical, logical) currentTime := p.client.GetPDClock().CurrentTime()

gemini-code-assist · 2026-06-25T01:57:42Z

+	progress.initialized.Store(true)
+	progress.resolvedTsUpdated.Store(time.Now().Unix())
+	require.Zero(t, progress.getResolveLockTargetTs(localClockNow, currentTs))


Since getResolveLockTargetTs is updated to use currentTime.Sub(resolvedTsUpdated) instead of the system clock-dependent time.Since(resolvedTsUpdated), we should update the test to set resolvedTsUpdated relative to localClockNow (the mocked current time) instead of time.Now().Unix(). This ensures the test remains deterministic and correctly tests the fence condition.

Suggested change

progress.initialized.Store(true)

progress.resolvedTsUpdated.Store(time.Now().Unix())

require.Zero(t, progress.getResolveLockTargetTs(localClockNow, currentTs))

progress.initialized.Store(true)

progress.resolvedTsUpdated.Store(localClockNow.Unix())

require.Zero(t, progress.getResolveLockTargetTs(localClockNow, currentTs))

ti-chi-bot · 2026-06-25T02:03:59Z

@ti-chi-bot: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-verify	`fed69e7`	link	true	`/test pull-verify`
pull-cdc-integration-mysql-test	`fed69e7`	link	true	`/test pull-cdc-integration-mysql-test`
pull-cdc-integration-storage-test	`fed69e7`	link	true	`/test pull-cdc-integration-storage-test`
pull-cdc-integration-pulsar-test	`fed69e7`	link	true	`/test pull-cdc-integration-pulsar-test`
pull-cdc-integration-kafka-test	`fed69e7`	link	true	`/test pull-cdc-integration-kafka-test`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

This is an automated cherry-pick of pingcap#12741

fed69e7

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

ti-chi-bot mentioned this pull request Jun 25, 2026

puller(ticdc): cap resolve lock target ts by PD tso #12741

Merged

ti-chi-bot Bot added the do-not-merge/cherry-pick-not-approved label Jun 25, 2026

ti-chi-bot assigned tenfyzhong Jun 25, 2026

ti-chi-bot Bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 25, 2026

gemini-code-assist Bot reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

puller(ticdc): cap resolve lock target ts by PD tso (#12741)#12742

puller(ticdc): cap resolve lock target ts by PD tso (#12741)#12742
ti-chi-bot wants to merge 1 commit into
pingcap:release-7.5from
ti-chi-bot:cherry-pick-12741-to-release-7.5

ti-chi-bot commented Jun 25, 2026

Uh oh!

ti-chi-bot Bot commented Jun 25, 2026

Uh oh!

ti-chi-bot Bot commented Jun 25, 2026

Uh oh!

ti-chi-bot commented Jun 25, 2026

Uh oh!

ti-chi-bot Bot commented Jun 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

ti-chi-bot Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ti-chi-bot commented Jun 25, 2026

What problem does this PR solve?

What is changed and how it works?

Check List

Tests

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Uh oh!

ti-chi-bot Bot commented Jun 25, 2026

Uh oh!

ti-chi-bot Bot commented Jun 25, 2026

Uh oh!

ti-chi-bot commented Jun 25, 2026

Uh oh!

ti-chi-bot Bot commented Jun 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants