Add pre commit and CI checks by jjomier · Pull Request #5 · NVIDIA-Medtech/NV-Reason-CXR

jjomier · 2026-02-25T19:17:42Z

Add pre-commit script
Add CI checks
Fix linting

greptile-apps · 2026-02-25T19:20:50Z

Greptile Summary

This PR adds pre-commit hooks and CI checks to enforce code quality standards, and applies auto-formatting fixes across the codebase.

Major changes:

Added GitHub Actions CI workflow to run pre-commit checks on push/PR
Configured pre-commit with ruff (Python linting/formatting), markdownlint, and file checks
Applied ruff auto-formatting to all Python files (import sorting, whitespace cleanup, line length fixes)
Fixed markdown formatting issues in README and config files

Issues found:

Critical bug in train/vlm_sft_train.py:184: The mapped dataset is immediately discarded by calling .cast() on the original dataset instead of the mapped one, losing all image path transformations
Minor: Project name "NV-Generate-CTMR" in comments should be "NV-Reason-CXR" (.markdownlint.yaml, pyproject.toml)

Confidence Score: 2/5

Critical bug in SFT training will cause dataset transformations to be silently discarded
The PR adds valuable tooling infrastructure (pre-commit, CI) and applies consistent formatting. However, train/vlm_sft_train.py:184 contains a critical logic error where the mapped dataset is immediately overwritten by calling .cast() on the original unmapped dataset. This will cause image path updates and field transformations to be lost during SFT training, potentially breaking the training pipeline. The formatting changes are safe, but this bug must be fixed before merge.
train/vlm_sft_train.py requires immediate attention due to the dataset mapping bug on line 184

Important Files Changed

Filename	Overview
.github/workflows/ci.yml	Adds standard CI workflow for running pre-commit checks on push/PR to main
.pre-commit-config.yaml	Configures pre-commit hooks for file checks, ruff linting/formatting, and markdown linting
.markdownlint.yaml	Configures markdown linting with relaxed line length; contains wrong project name in comment
pyproject.toml	Adds ruff configuration for Python linting and formatting; contains wrong project name in comment
train/vlm_sft_train.py	Ruff auto-formatting applied, but critical bug remains: line 184 discards mapped dataset from line 183

_{Last reviewed commit: b405a44}

greptile-apps · 2026-02-25T19:20:53Z

Greptile Summary

This PR adds a pre-commit framework and CI pipeline for code quality enforcement, along with lint/format fixes applied by ruff across the codebase.

Adds .pre-commit-config.yaml with hooks for trailing whitespace, YAML/JSON validation, ruff linting/formatting, and markdownlint
Adds .github/workflows/ci.yml to run pre-commit checks on push/PR to main
Adds pyproject.toml with ruff configuration (Python 3.11, line-length 150, standard lint rules)
Adds .markdownlint.yaml with relaxed line-length settings for existing docs
Applies ruff auto-formatting across all Python files (vlm_grpo_train.py, vlm_rewards.py, vlm_sft_train.py): import sorting, whitespace cleanup, quote normalization, type hint modernization
Cleans up trailing whitespace in README.md, configs/grpo_config.yaml, configs/sft_config.yaml, and images/README.md

Issues found:

Pre-existing bug in train/vlm_sft_train.py:183-184: the .map() result is immediately overwritten by .cast() on the original dataset, discarding the data transformations
Duplicate check-merge-conflict hook in .pre-commit-config.yaml
All four new config files reference "NV-Generate-CTMR" instead of the correct project name "NV-Reason-CXR"
.markdownlint.yaml is missing a trailing newline, which would be caught by its own end-of-file-fixer hook

Confidence Score: 4/5

This PR is safe to merge — the changes are predominantly formatting/linting with no new logic, though a pre-existing dataset bug should be addressed.
The vast majority of changes are whitespace/formatting fixes auto-applied by ruff, plus new CI/pre-commit configuration files. No new logic is introduced. The one logic issue (discarded .map() result in vlm_sft_train.py) is pre-existing and was not introduced by this PR, but is in the diff due to reformatting. The remaining issues (duplicate hook, wrong project name, missing newline) are minor.
train/vlm_sft_train.py has a pre-existing bug where the dataset .map() result is discarded on line 184. .pre-commit-config.yaml has a duplicate hook entry.

Important Files Changed

Filename	Overview
.github/workflows/ci.yml	New CI workflow running pre-commit on push/PR to main. References wrong project name in comment ("NV-Generate-CTMR").
.markdownlint.yaml	New markdownlint config with relaxed line-length rule. Missing trailing newline; wrong project name in comment.
.pre-commit-config.yaml	New pre-commit config with general hooks, ruff, and markdownlint. Has duplicate `check-merge-conflict` hook entry.
README.md	Whitespace cleanup and markdown formatting fixes (trailing spaces, blank lines, heading syntax). No functional changes.
configs/grpo_config.yaml	Trailing whitespace removed throughout. No functional changes to GRPO training configuration.
configs/sft_config.yaml	Trailing whitespace removed throughout. No functional changes to SFT training configuration.
images/README.md	Added proper markdown heading and trailing newline to placeholder README.
pyproject.toml	New ruff configuration targeting Python 3.11 with standard lint rules. Wrong project name in comment.
train/vlm_grpo_train.py	Ruff formatting applied: import reordering, whitespace fixes, quote normalization, type hint modernization. No logical changes.
train/vlm_rewards.py	Ruff formatting applied: whitespace, blank line cleanup, list reformatting. No logical changes to reward functions.
train/vlm_sft_train.py	Ruff formatting applied. Pre-existing bug preserved: `.map()` result on line 183 is discarded by immediate overwrite on line 184.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Push/PR to main] --> B[CI Workflow Triggered]
    B --> C[Checkout Code]
    C --> D[Setup Python 3.11]
    D --> E[Install pre-commit]
    E --> F[Run pre-commit hooks]
    F --> G[pre-commit-hooks]
    F --> H[ruff lint + format]
    F --> I[markdownlint]
    G --> G1[trailing-whitespace]
    G --> G2[end-of-file-fixer]
    G --> G3[check-yaml / check-json]
    G --> G4[check-merge-conflict x2]
    G --> G5[check-added-large-files]
    G --> G6[debug-statements]
    H --> H1[ruff --fix]
    H --> H2[ruff-format]
    I --> I1[markdownlint with .markdownlint.yaml]

_{Last reviewed commit: 7a456b5}

greptile-apps

_{11 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-25T19:20:57Z

    train_dataset = my_dataset.map(vlm_data_format_dict, fn_kwargs={"image_dir": script_args.image_dir})
-    train_dataset = my_dataset.cast(features)  
+    train_dataset = my_dataset.cast(features)


Mapped dataset is silently discarded

Line 183 maps my_dataset with vlm_data_format_dict and stores the result in train_dataset, but line 184 immediately overwrites train_dataset by calling my_dataset.cast(features) on the original (unmapped) dataset. This discards all transformations from the .map() call, meaning image paths are never updated and extra columns are never set to None.

This should likely be:

Suggested change

train_dataset = my_dataset.map(vlm_data_format_dict, fn_kwargs={"image_dir": script_args.image_dir})

train_dataset = my_dataset.cast(features)

train_dataset = my_dataset.cast(features)

train_dataset = my_dataset.map(vlm_data_format_dict, fn_kwargs={"image_dir": script_args.image_dir})

train_dataset = train_dataset.cast(features)

greptile-apps

_{11 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-25T19:35:06Z

+    )
    train_dataset = my_dataset.map(vlm_data_format_dict, fn_kwargs={"image_dir": script_args.image_dir})
-    train_dataset = my_dataset.cast(features)  
+    train_dataset = my_dataset.cast(features)


Line 184 calls .cast() on the original my_dataset, discarding the mapped train_dataset from line 183. The image path updates from vlm_data_format_dict are lost.

Should be:

Suggested change

train_dataset = my_dataset.cast(features)

train_dataset = train_dataset.cast(features)

greptile-apps · 2026-02-25T19:35:07Z

@@ -0,0 +1,7 @@
+# Markdownlint config for NV-Generate-CTMR


Comment references NV-Generate-CTMR but this repo is NV-Reason-CXR

Suggested change

# Markdownlint config for NV-Generate-CTMR

# Markdownlint config for NV-Reason-CXR

greptile-apps · 2026-02-25T19:35:08Z

@@ -0,0 +1,13 @@
+# Minimal config for tooling (pre-commit, ruff). NV-Generate-CTMR has no installable package.


Comment references NV-Generate-CTMR but this repo is NV-Reason-CXR

Suggested change

# Minimal config for tooling (pre-commit, ruff). NV-Generate-CTMR has no installable package.

# Minimal config for tooling (pre-commit, ruff). NV-Reason-CXR has no installable package.

jjomier added 2 commits February 25, 2026 14:16

Added pre-commit and CI

407db33

Fix lint

7a456b5

greptile-apps Bot reviewed Feb 25, 2026

View reviewed changes

jjomier added 2 commits February 25, 2026 14:29

Fix more linting

61d97fd

More fixes

b405a44

greptile-apps Bot reviewed Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pre commit and CI checks#5

Add pre commit and CI checks#5
jjomier wants to merge 4 commits intomainfrom
jjomier/add_pre_commit

jjomier commented Feb 25, 2026

Uh oh!

greptile-apps Bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Feb 25, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

Uh oh!

greptile-apps Bot Feb 25, 2026

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot left a comment

Uh oh!

greptile-apps Bot Feb 25, 2026

Uh oh!

greptile-apps Bot Feb 25, 2026

Uh oh!

greptile-apps Bot Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	train_dataset = my_dataset.cast(features)
	train_dataset = train_dataset.cast(features)

	# Markdownlint config for NV-Generate-CTMR
	# Markdownlint config for NV-Reason-CXR

		@@ -0,0 +1,13 @@
		# Minimal config for tooling (pre-commit, ruff). NV-Generate-CTMR has no installable package.

Conversation

jjomier commented Feb 25, 2026

Uh oh!

greptile-apps Bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Uh oh!

greptile-apps Bot commented Feb 25, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps Bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Feb 25, 2026 •

edited

Loading