Conversation
jjomier
commented
Feb 25, 2026
- Add pre-commit script
- Add CI checks
- Fix linting
Greptile SummaryThis PR adds pre-commit hooks and CI checks to enforce code quality standards, and applies auto-formatting fixes across the codebase. Major changes:
Issues found:
Confidence Score: 2/5
Important Files Changed
Last reviewed commit: b405a44 |
Greptile SummaryThis PR adds a pre-commit framework and CI pipeline for code quality enforcement, along with lint/format fixes applied by ruff across the codebase.
Issues found:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Push/PR to main] --> B[CI Workflow Triggered]
B --> C[Checkout Code]
C --> D[Setup Python 3.11]
D --> E[Install pre-commit]
E --> F[Run pre-commit hooks]
F --> G[pre-commit-hooks]
F --> H[ruff lint + format]
F --> I[markdownlint]
G --> G1[trailing-whitespace]
G --> G2[end-of-file-fixer]
G --> G3[check-yaml / check-json]
G --> G4[check-merge-conflict x2]
G --> G5[check-added-large-files]
G --> G6[debug-statements]
H --> H1[ruff --fix]
H --> H2[ruff-format]
I --> I1[markdownlint with .markdownlint.yaml]
Last reviewed commit: 7a456b5 |
| train_dataset = my_dataset.map(vlm_data_format_dict, fn_kwargs={"image_dir": script_args.image_dir}) | ||
| train_dataset = my_dataset.cast(features) | ||
| train_dataset = my_dataset.cast(features) |
There was a problem hiding this comment.
Mapped dataset is silently discarded
Line 183 maps my_dataset with vlm_data_format_dict and stores the result in train_dataset, but line 184 immediately overwrites train_dataset by calling my_dataset.cast(features) on the original (unmapped) dataset. This discards all transformations from the .map() call, meaning image paths are never updated and extra columns are never set to None.
This should likely be:
| train_dataset = my_dataset.map(vlm_data_format_dict, fn_kwargs={"image_dir": script_args.image_dir}) | |
| train_dataset = my_dataset.cast(features) | |
| train_dataset = my_dataset.cast(features) | |
| train_dataset = my_dataset.map(vlm_data_format_dict, fn_kwargs={"image_dir": script_args.image_dir}) | |
| train_dataset = train_dataset.cast(features) |
| ) | ||
| train_dataset = my_dataset.map(vlm_data_format_dict, fn_kwargs={"image_dir": script_args.image_dir}) | ||
| train_dataset = my_dataset.cast(features) | ||
| train_dataset = my_dataset.cast(features) |
There was a problem hiding this comment.
Line 184 calls .cast() on the original my_dataset, discarding the mapped train_dataset from line 183. The image path updates from vlm_data_format_dict are lost.
Should be:
| train_dataset = my_dataset.cast(features) | |
| train_dataset = train_dataset.cast(features) |
| @@ -0,0 +1,7 @@ | |||
| # Markdownlint config for NV-Generate-CTMR | |||
There was a problem hiding this comment.
Comment references NV-Generate-CTMR but this repo is NV-Reason-CXR
| # Markdownlint config for NV-Generate-CTMR | |
| # Markdownlint config for NV-Reason-CXR |
| @@ -0,0 +1,13 @@ | |||
| # Minimal config for tooling (pre-commit, ruff). NV-Generate-CTMR has no installable package. | |||
There was a problem hiding this comment.
Comment references NV-Generate-CTMR but this repo is NV-Reason-CXR
| # Minimal config for tooling (pre-commit, ruff). NV-Generate-CTMR has no installable package. | |
| # Minimal config for tooling (pre-commit, ruff). NV-Reason-CXR has no installable package. |