Skip to content

sync_diff_inspector: avoid full-table scan for limit iterator progress#12727

Draft
joechenrh wants to merge 1 commit into
pingcap:masterfrom
joechenrh:sync-diff-limit-no-full-scan
Draft

sync_diff_inspector: avoid full-table scan for limit iterator progress#12727
joechenrh wants to merge 1 commit into
pingcap:masterfrom
joechenrh:sync-diff-limit-no-full-scan

Conversation

@joechenrh

@joechenrh joechenrh commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

The limit splitter runs SELECT COUNT(*) on every table just to compute a chunk total for the progress bar, and it does so even when chunk-size is configured. On large tables this scan is very slow. The limit iterator is designed to produce chunks lazily without knowing the total upfront, so the scan defeats its purpose.

That count was only used to choose a chunk size when none is configured, and to compute Len() for the progress bar denominator. Normal diff mode already drives the progress bar dynamically and never uses Len(); only global-checksum mode consumes it.

What is changed and how it works?

The limit splitter no longer scans the table. It reads an estimated row count from information_schema.tables.TABLE_ROWS and uses it only to seed the progress bar. As chunks are produced the denominator is kept at max(estimate, produced) and corrected to the exact count on completion, so progress never exceeds 100%. When no estimate is available (table not analyzed, query failure, or checkpoint resume with a WHERE clause) it falls back to the existing dynamic total, and estimated figures are shown with a ~ marker.

This also fixes a latent bug in the progress bar: an underestimated total could make the coefficient exceed 1, passing a negative count to strings.Repeat (panic) and printing above 100%. The coefficient is now clamped to [0, 1].

Global-checksum mode uses the same estimate-and-self-correct logic. The random and bucket iterators are unchanged: random needs an exact count to split, and bucket already relies on statistics instead of a scan.

Check List

Tests

  • Unit test

Added/updated tests cover: the limit iterator uses the estimate and issues no COUNT(*); an unavailable estimate falls back to a dynamic total; the progress bar clamps an overrun and renders the ~ marker.

Questions

Will it cause performance regression or break compatibility?

No regression; it removes a full-table scan. No compatibility change.

Do you need to update user documentation, design documentation or monitoring documentation?

No.

Release note

sync-diff-inspector: the limit splitter no longer runs a full-table COUNT(*) scan to drive the progress bar, speeding up comparisons on large tables.

The limit iterator ran SELECT COUNT(*) on every table to derive a chunk
total for the progress bar, scanning the whole table even when chunk-size
was configured. The limit iterator is meant to produce chunks lazily, so
this upfront scan defeated its purpose and was very slow on large tables.

Replace the exact COUNT(*) with a cheap estimated row count from table
statistics (information_schema.tables.TABLE_ROWS) used only to seed the
progress bar. The producer keeps the bar denominator ahead of the real
chunk count and corrects it to the exact value on completion, so progress
never overflows. When no estimate is available (unanalyzed table, query
error, or checkpoint resume with a WHERE clause) it falls back to a purely
dynamic total. Estimated figures are rendered with a ~ marker.

Also clamp the progress bar coefficient so an underestimate can never push
strings.Repeat to a negative count (a latent panic) or show above 100%,
and apply the same estimate-seed + self-correct logic to global-checksum
mode.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ti-chi-bot

ti-chi-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jun 17, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign wangxiangustc for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 17, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes the progress bar initialization for the limit iterator by retrieving estimated row counts from table statistics (information_schema.tables) instead of performing expensive COUNT(*) scans. It introduces dynamic adjustments to the progress bar total as chunks are produced, clamps the progress percentage to prevent panics from negative counts, and adds an estimation marker (~) to the output. The review feedback suggests refining the estimation marker logic so it only applies to active tables, preventing premature estimation indicators, and removing a redundant time.Sleep in the progress test since p.Close() already blocks synchronously.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +454 to +459
for _, e := range tpp.tableMap {
if !e.Value.(*TableProgress).totalStopUpdate {
mark = "~"
break
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Currently, this checks totalStopUpdate for all tables in tpp.tableMap. However, tables that are registered but not yet started also reside in tpp.tableMap and have totalStopUpdate set to false by default. This causes the progress bar to display the estimation marker ~ even when using exact splitters (like bucket or random) until the very last table starts. We should only check totalStopUpdate for tables that are actually active (i.e., in tableStatePrestart or tableStateComparing state).

Suggested change
for _, e := range tpp.tableMap {
if !e.Value.(*TableProgress).totalStopUpdate {
mark = "~"
break
}
}
for _, e := range tpp.tableMap {
tp := e.Value.(*TableProgress)
if (tp.state&(tableStatePrestart|tableStateComparing)) != 0 && !tp.totalStopUpdate {
mark = "~"
break
}
}

for i := 0; i < 10; i++ {
p.Inc("t")
}
time.Sleep(300 * time.Millisecond)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The time.Sleep here is redundant and can be removed. p.Close() internally sends a close operator to the queue and synchronously blocks on <-tpp.finishCh until all pending progress updates (including the 10 Inc calls) are fully processed and flushed to the output buffer. Removing this sleep avoids unnecessary test delays and potential flakiness.

@ti-chi-bot

ti-chi-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

📖 For more info, you can check the "Contribute Code" section in the development guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant