sync_diff_inspector: avoid full-table scan for limit iterator progress#12727
sync_diff_inspector: avoid full-table scan for limit iterator progress#12727joechenrh wants to merge 1 commit into
Conversation
The limit iterator ran SELECT COUNT(*) on every table to derive a chunk total for the progress bar, scanning the whole table even when chunk-size was configured. The limit iterator is meant to produce chunks lazily, so this upfront scan defeated its purpose and was very slow on large tables. Replace the exact COUNT(*) with a cheap estimated row count from table statistics (information_schema.tables.TABLE_ROWS) used only to seed the progress bar. The producer keeps the bar denominator ahead of the real chunk count and corrects it to the exact value on completion, so progress never overflows. When no estimate is available (unanalyzed table, query error, or checkpoint resume with a WHERE clause) it falls back to a purely dynamic total. Estimated figures are rendered with a ~ marker. Also clamp the progress bar coefficient so an underestimate can never push strings.Repeat to a negative count (a latent panic) or show above 100%, and apply the same estimate-seed + self-correct logic to global-checksum mode. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request optimizes the progress bar initialization for the limit iterator by retrieving estimated row counts from table statistics (information_schema.tables) instead of performing expensive COUNT(*) scans. It introduces dynamic adjustments to the progress bar total as chunks are produced, clamps the progress percentage to prevent panics from negative counts, and adds an estimation marker (~) to the output. The review feedback suggests refining the estimation marker logic so it only applies to active tables, preventing premature estimation indicators, and removing a redundant time.Sleep in the progress test since p.Close() already blocks synchronously.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| for _, e := range tpp.tableMap { | ||
| if !e.Value.(*TableProgress).totalStopUpdate { | ||
| mark = "~" | ||
| break | ||
| } | ||
| } |
There was a problem hiding this comment.
Currently, this checks totalStopUpdate for all tables in tpp.tableMap. However, tables that are registered but not yet started also reside in tpp.tableMap and have totalStopUpdate set to false by default. This causes the progress bar to display the estimation marker ~ even when using exact splitters (like bucket or random) until the very last table starts. We should only check totalStopUpdate for tables that are actually active (i.e., in tableStatePrestart or tableStateComparing state).
| for _, e := range tpp.tableMap { | |
| if !e.Value.(*TableProgress).totalStopUpdate { | |
| mark = "~" | |
| break | |
| } | |
| } | |
| for _, e := range tpp.tableMap { | |
| tp := e.Value.(*TableProgress) | |
| if (tp.state&(tableStatePrestart|tableStateComparing)) != 0 && !tp.totalStopUpdate { | |
| mark = "~" | |
| break | |
| } | |
| } |
| for i := 0; i < 10; i++ { | ||
| p.Inc("t") | ||
| } | ||
| time.Sleep(300 * time.Millisecond) |
There was a problem hiding this comment.
The time.Sleep here is redundant and can be removed. p.Close() internally sends a close operator to the queue and synchronously blocks on <-tpp.finishCh until all pending progress updates (including the 10 Inc calls) are fully processed and flushed to the output buffer. Removing this sleep avoids unnecessary test delays and potential flakiness.
|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the 📖 For more info, you can check the "Contribute Code" section in the development guide. |
What problem does this PR solve?
Issue Number: close #xxx
The limit splitter runs
SELECT COUNT(*)on every table just to compute a chunk total for the progress bar, and it does so even whenchunk-sizeis configured. On large tables this scan is very slow. The limit iterator is designed to produce chunks lazily without knowing the total upfront, so the scan defeats its purpose.That count was only used to choose a chunk size when none is configured, and to compute
Len()for the progress bar denominator. Normal diff mode already drives the progress bar dynamically and never usesLen(); only global-checksum mode consumes it.What is changed and how it works?
The limit splitter no longer scans the table. It reads an estimated row count from
information_schema.tables.TABLE_ROWSand uses it only to seed the progress bar. As chunks are produced the denominator is kept atmax(estimate, produced)and corrected to the exact count on completion, so progress never exceeds 100%. When no estimate is available (table not analyzed, query failure, or checkpoint resume with a WHERE clause) it falls back to the existing dynamic total, and estimated figures are shown with a~marker.This also fixes a latent bug in the progress bar: an underestimated total could make the coefficient exceed 1, passing a negative count to
strings.Repeat(panic) and printing above 100%. The coefficient is now clamped to[0, 1].Global-checksum mode uses the same estimate-and-self-correct logic. The random and bucket iterators are unchanged: random needs an exact count to split, and bucket already relies on statistics instead of a scan.
Check List
Tests
Added/updated tests cover: the limit iterator uses the estimate and issues no
COUNT(*); an unavailable estimate falls back to a dynamic total; the progress bar clamps an overrun and renders the~marker.Questions
Will it cause performance regression or break compatibility?
No regression; it removes a full-table scan. No compatibility change.
Do you need to update user documentation, design documentation or monitoring documentation?
No.
Release note