Skip to content

syncer(dm): centralize row image layout mapping#12750

Draft
joechenrh wants to merge 1 commit into
pingcap:masterfrom
joechenrh:draft-row-image-layout-stage1
Draft

syncer(dm): centralize row image layout mapping#12750
joechenrh wants to merge 1 commit into
pingcap:masterfrom
joechenrh:draft-row-image-layout-stage1

Conversation

@joechenrh

@joechenrh joechenrh commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

This is a follow-up draft for centralizing row image layout handling after the hidden/generated column fixes. The current code has similar source-column/visible-row/writable-column mapping logic in WhereHandle, RowChange, expression filters, and validator code, which makes hidden-column layout handling easy to drift.

Issue Number: ref #12747

What is changed and how it works?

This PR adds sqlmodel.RowImageLayout as the stage-1 shared representation for row image layout metadata:

  • source table columns
  • visible columns present in binlog row images
  • writable downstream columns
  • source column offset to row value offset mapping
  • expansion from visible-only row image to full source-column layout

WhereHandle and RowChange.dmlRowMapping() now build their DML/WHERE value mapping from this layout. DM expression filter and validator paths also reuse the same layout helpers instead of maintaining local hidden-column filtering/mapping logic.

Check List

Tests

  • Unit test

Verified locally:

go test ./pkg/sqlmodel
go test ./dm/syncer -run 'TestTableInfoForVisibleColumnCount|TestValidatorWorkerUsesVisibleColumns|TestSkipDMLByExpressionWithHiddenColumnBeforeVisibleColumn'
git diff --check

Questions

Will it cause performance regression or break compatibility?

No expected compatibility impact. This is intended as an internal refactor. Performance impact should be neutral: the new layout replaces existing mapper/helper construction and removes duplicated mapping logic.

Do you need to update user documentation, design documentation or monitoring documentation?

No. This is an internal code structure change.

Release note

None

@ti-chi-bot

ti-chi-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-triage-completed area/dm Issues or PRs related to DM. labels Jul 1, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign gmhdbjd for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jul 1, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request centralizes the mapping of binlog row images to source table columns by introducing the RowImageLayout struct in pkg/sqlmodel/row_image_layout.go, replacing duplicated helper functions and the rowValueMapper struct. The review feedback highlights several critical safety improvements to prevent potential panics. Specifically, it points out potential out-of-bounds slice indexing in NewRowImageLayoutFromColumns, FullValues, and valueOffset when handling sliced columns with original offsets, as well as a missing nil-pointer check in tableInfoForVisibleColumnCount.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +32 to +51
func NewRowImageLayoutFromColumns(sourceColumns, targetColumns []*model.ColumnInfo) RowImageLayout {
visibleColumns := VisibleColumns(sourceColumns)
visibleOffsetByColumnOffset := make([]int, len(sourceColumns))
for i := range visibleOffsetByColumnOffset {
visibleOffsetByColumnOffset[i] = -1
}
for i, column := range visibleColumns {
visibleOffsetByColumnOffset[column.Offset] = i
}

layout := RowImageLayout{
columns: sourceColumns,
visibleColumns: visibleColumns,
visibleOffsetByColumnOffset: visibleOffsetByColumnOffset,
}
if targetColumns != nil {
layout.writableColumns = writableSourceColumns(visibleColumns, targetColumns)
}
return layout
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In NewRowImageLayoutFromColumns, visibleOffsetByColumnOffset is sized to len(sourceColumns). However, if sourceColumns is a sliced columns list (e.g., clone.Columns[:sourceColumnCount]), len(sourceColumns) will be smaller than the original column count, but the individual column.Offset values will still retain their original values. This can cause visibleOffsetByColumnOffset[column.Offset] = i to panic with an out-of-bounds index. Sizing the slice based on the maximum offset found in sourceColumns avoids this panic.

func NewRowImageLayoutFromColumns(sourceColumns, targetColumns []*model.ColumnInfo) RowImageLayout {\n\tvisibleColumns := VisibleColumns(sourceColumns)\n\tmaxOffset := -1\n\tfor _, column := range sourceColumns {\n\t\tif column.Offset > maxOffset {\n\t\t	maxOffset = column.Offset\n\t\t}\n\t}\n\tvisibleOffsetByColumnOffset := make([]int, maxOffset+1)\n\tfor i := range visibleOffsetByColumnOffset {\n\t\tvisibleOffsetByColumnOffset[i] = -1\n\t}\n\tfor i, column := range visibleColumns {\n\t\tif column.Offset >= 0 && column.Offset <= maxOffset {\n\t\t	visibleOffsetByColumnOffset[column.Offset] = i\n\t\t}\n\t}\n\n\tlayout := RowImageLayout{\n\t\tcolumns:                     sourceColumns,\n\t\tvisibleColumns:              visibleColumns,\n\t\tvisibleOffsetByColumnOffset: visibleOffsetByColumnOffset,\n\t}\n\tif targetColumns != nil {\n\t\tlayout.writableColumns = writableSourceColumns(visibleColumns, targetColumns)\n\t}\n\treturn layout\n}

Comment on lines 860 to 873
func tableInfoForVisibleColumnCount(tableInfo *model.TableInfo, columnCount int) (*model.TableInfo, bool) {
visibleCount := 0
stripColumnCount := len(tableInfo.Columns)
for i, col := range tableInfo.Columns {
if col.Hidden {
continue
}
visibleCount++
if visibleCount > columnCount {
stripColumnCount = i
break
}
}
if visibleCount < columnCount {
layout := sqlmodel.NewRowImageLayoutFromColumns(tableInfo.Columns, nil)
sourceColumnCount, ok := layout.SourceColumnCountForVisibleColumnCount(columnCount)
if !ok {
return nil, false
}
if visibleCount == columnCount {
if sourceColumnCount == len(tableInfo.Columns) {
return tableInfo, true
}

clone := tableInfo.Clone()
clone.Columns = clone.Columns[:stripColumnCount]
clone.Columns = clone.Columns[:sourceColumnCount]
return clone, true
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function tableInfoForVisibleColumnCount does not check if the input tableInfo is nil. If a nil pointer is passed, calling tableInfo.Columns will cause a nil pointer dereference panic. Adding a defensive nil check at the beginning of the function will make it more robust.

func tableInfoForVisibleColumnCount(tableInfo *model.TableInfo, columnCount int) (*model.TableInfo, bool) {\n\tif tableInfo == nil {\n\t	return nil, false\n\t}\n\tlayout := sqlmodel.NewRowImageLayoutFromColumns(tableInfo.Columns, nil)\n\tsourceColumnCount, ok := layout.SourceColumnCountForVisibleColumnCount(columnCount)\n\tif !ok {\n\t	return nil, false\n\t}\n\tif sourceColumnCount == len(tableInfo.Columns) {\n\t	return tableInfo, true\n\t}\n\n\tclone := tableInfo.Clone()\n\tclone.Columns = clone.Columns[:sourceColumnCount]\n\treturn clone, true\n}

Comment on lines +91 to +104
func (l RowImageLayout) FullValues(row []any) ([]any, bool) {
if len(row) == len(l.columns) {
return row, true
}
if len(row) != len(l.visibleColumns) {
return nil, false
}

fullValues := make([]any, len(l.columns))
for i, col := range l.visibleColumns {
fullValues[col.Offset] = row[i]
}
return fullValues, true
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In FullValues, fullValues is sized to len(l.columns). If col.Offset is greater than or equal to len(l.columns) (which can happen if the columns slice is sliced or has non-contiguous offsets), assigning fullValues[col.Offset] = row[i] will panic with an out-of-bounds index. Adding a defensive bounds check prevents this panic.

func (l RowImageLayout) FullValues(row []any) ([]any, bool) {\n\tif len(row) == len(l.columns) {\n\t	return row, true\n\t}\n\tif len(row) != len(l.visibleColumns) {\n\t	return nil, false\n\t}\n\n\tfullValues := make([]any, len(l.columns))\n\tfor i, col := range l.visibleColumns {\n\t\tif col.Offset >= 0 && col.Offset < len(fullValues) {\n\t\t	fullValues[col.Offset] = row[i]\n\t\t}\n\t}\n\treturn fullValues, true\n}

Comment on lines +164 to +169
func (l RowImageLayout) valueOffset(columnOffset int, values []any) int {
if l.isFullValues(values) {
return columnOffset
}
return l.visibleOffsetByColumnOffset[columnOffset]
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In valueOffset, accessing l.visibleOffsetByColumnOffset[columnOffset] directly can panic if columnOffset is out of bounds (e.g., if it is negative or greater than or equal to the length of the slice). Adding a defensive bounds check makes the method safer.

func (l RowImageLayout) valueOffset(columnOffset int, values []any) int {\n\tif l.isFullValues(values) {\n\t	return columnOffset\n\t}\n\tif columnOffset < 0 || columnOffset >= len(l.visibleOffsetByColumnOffset) {\n\t	return -1\n\t}\n\treturn l.visibleOffsetByColumnOffset[columnOffset]\n}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/dm Issues or PRs related to DM. do-not-merge/needs-triage-completed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant