NVIDIA-Medtech · jjomier · Feb 25, 2026 · Feb 25, 2026 · Feb 25, 2026 · Feb 25, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -0,0 +1,26 @@
+# CI for NV-Reason-CXR: lint and format checks via pre-commit
+name: CI
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+jobs:
+  pre-commit:
+    name: pre-commit
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+
+      - name: Install pre-commit
+        run: pip install pre-commit
+
+      - name: Run pre-commit
+        run: pre-commit run --all-files
diff --git a/.markdownlint.yaml b/.markdownlint.yaml
@@ -0,0 +1,7 @@
+# Markdownlint config for NV-Generate-CTMR
-# Markdownlint config for NV-Generate-CTMR
+# Markdownlint config for NV-Reason-CXR
-# Markdownlint config for NV-Generate-CTMR
+# Markdownlint config for NV-Reason-CXR
+# Relaxed for existing docs (READMEs with tables, HTML, long lines).
+# Re-enable rules as you clean up docs or for new files.
+
+# Line length: allow long lines common in docs (tables, code, links)
+MD013:
+  line_length: 700
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,35 @@
+# Pre-commit hooks for NV-Reason-CXR
+# Install: pip install pre-commit && pre-commit install
+# Run manually: pre-commit run --all-files
+
+repos:
+  # General file checks
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v5.0.0
+    hooks:
+      - id: trailing-whitespace
+      - id: end-of-file-fixer
+      - id: check-yaml
+      - id: check-json
+      - id: check-merge-conflict
+      - id: check-added-large-files
+        args: [--maxkb=1000]
+      - id: check-case-conflict
+      - id: debug-statements
+
+  # Python linting and formatting (ruff) — fixes applied locally
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.8.4
+    hooks:
+      - id: ruff
+        args: [--fix]
+      - id: ruff-format
+
+  # Markdown linting
+  - repo: https://github.com/igorshubovych/markdownlint-cli
+    rev: v0.38.0
+    hooks:
+      - id: markdownlint
+
+ci:
+  autoupdate_commit_msg: "chore: pre-commit autoupdate"
diff --git a/README.md b/README.md
@@ -1,19 +1,20 @@
 # NV-Reason-CXR-3B
 
 ## Description
+
 NV-Reason-CXR-3B is a specialized vision-language model designed for medical reasoning and interpretation of chest X-ray images, with detailed explanations. The model combines visual understanding with medical reasoning capabilities, enabling healthcare professionals to access comprehensive analyses and engage in follow-up discussions about radiological findings. NV-Reason-CXR-3B provides step-by-step reasoning that mirrors clinical thinking patterns, making it valuable for educational and research applications in medical imaging.
 
 This model is for research and development only. It is intended to empower developers to extend this work in their tasks and to provide practical examples of applying the methodology across medical domains.
 
-**Table of Contents**  
-1. [Overview](#overview)  
-2. [Introduction](#introduction) 
-3. [Installation](#installation)  
-4. [Training models](#training-models)  
-   - [SFT](#sft)  
-   - [GRPO](#grpo)  
-5. [Data](#data)  
+## Table of Contents
 
+1. [Overview](#overview)
+2. [Introduction](#introduction)
+3. [Installation](#installation)
+4. [Training models](#training-models)
+   - [SFT](#sft)
+   - [GRPO](#grpo)
+5. [Data](#data)
 
 ## Overview
 
@@ -25,37 +26,35 @@ The goal of this repo is to provide examples for inference and training of the [
 
 ## Introduction
 
-Vision–language models (VLMs) have shown strong promise for medical image analysis, but most remain opaque, offering predictions without the transparent, stepwise reasoning clinicians rely on. We present a framework that brings chain-of-thought (CoT) reasoning to chest X-ray interpretation. 
+Vision–language models (VLMs) have shown strong promise for medical image analysis, but most remain opaque, offering predictions without the transparent, stepwise reasoning clinicians rely on. We present a framework that brings chain-of-thought (CoT) reasoning to chest X-ray interpretation.
 Our approach is designed to learn how experts reason—not just what they conclude—by aligning intermediate steps with observable image evidence and radiology workflow. Beyond accuracy, the explicit reasoning traces support clinical auditability: they reveal why a conclusion was reached, which alternatives were considered, and where uncertainty remains—enabling quality assurance, error analysis, and safer human–AI collaboration.
 
 Inspired by reasoning-first training (DeepSeek-R1 and Open-R1), our approach combines a radiologist-style supervised fine-tuning (SFT) warm start with GRPO reinforcement learning (RL) and verifiable rewards defined over a list of chest X-ray abnormalities.
 
-We enlisted several experienced radiologists to annotate their internal reasoning while reading chest X-ray cases. To support this, we developed an internal web platform that makes thought capture as seamless as possible. The platform provides automated voice recording, transcription, error correction, and optional translation into English. We used both the collected human reasoning data and synthetic reasoning data for training with SFT, as well as abnormality list only (from MIMIC-CXR) for GRPO training.  
+We enlisted several experienced radiologists to annotate their internal reasoning while reading chest X-ray cases. To support this, we developed an internal web platform that makes thought capture as seamless as possible. The platform provides automated voice recording, transcription, error correction, and optional translation into English. We used both the collected human reasoning data and synthetic reasoning data for training with SFT, as well as abnormality list only (from MIMIC-CXR) for GRPO training.
 
 In an expert reader study, AI-assisted reasoning increased confidence, supported targeted error auditing, and reduced time to finalize reports—particularly for abnormal cases. On out-of-distribution (OOD) evaluation using the CheXpert test set, the model attains competitive multi-label classification while providing faithful rationales.
 
 NV-Reason-CXR-3B is designed to respond in the style of a teacher, a senior radiologist, explaining the problem and the solution and offers:
 
-- Chain-of-thought processing 
-    - The reasoning engine generates step-by-step diagnostic analysis
-    - Systematic anatomical review
-    - Identification of normal and abnormal findings
-    - Differential diagnosis consideration
-- Clinical output generation 
-    - Main findings 
-    - Step-by-step reasoning pathway
-    - Differential diagnoses and their likelihood
-    - Recommendations for follow-up or clinical correlation
-    - Clarification multi-step follow-up chat
-    - Structured report generation
-
+- Chain-of-thought processing
+  - The reasoning engine generates step-by-step diagnostic analysis
+  - Systematic anatomical review
+  - Identification of normal and abnormal findings
+  - Differential diagnosis consideration
+- Clinical output generation
+  - Main findings
+  - Step-by-step reasoning pathway
+  - Differential diagnoses and their likelihood
+  - Recommendations for follow-up or clinical correlation
+  - Clarification multi-step follow-up chat
+  - Structured report generation
 
 An example of the model output:
 
-![](docs/d1.png) 
-
-You can try the 🩻 [\[Web Demo\]](https://huggingface.co/spaces/nvidia/nv-reason-cxr) for examples of the model output, where you can also ask the follow up questions such as "provide differentials" and "write a structured report". 
+![Model output example](docs/d1.png)
 
+You can try the 🩻 [\[Web Demo\]](https://huggingface.co/spaces/nvidia/nv-reason-cxr) for examples of the model output, where you can also ask the follow up questions such as "provide differentials" and "write a structured report".
 
 ## Preliminary subjective evaluation
 
@@ -69,27 +68,26 @@ text);
 - Full AI reasoning: Full AI reasoning output and the structured report.
 Readers were instructed to behave as in routine practice.
 
-
 | ![1](docs/se1.png) | ![2](docs/se2.png) |
 | - | - |
 | ![3](docs/se3.png) | ![4](docs/se4.png) |
 
 Overall, experts rated the reasoning traces as accurate, appropriately qualified, and practically useful; full reasoning notably improved trust and confidence and yielded substantial time savings—especially for abnormal studies.
 
+### Use Case
 
-### Use Case:
 Radiologists, medical students, and medical researchers would be expected to use this system for chest X-ray interpretation with detailed reasoning, educational training with AI-generated explanations, and research applications requiring explainable medical AI analyses.
 
 **Important Medical AI Considerations:**
 This model is designed for research and educational purposes only and should not be used for clinical diagnosis or treatment decisions. All outputs should be reviewed by qualified medical professionals. The model's reasoning capabilities are intended to support medical education and research, not replace clinical judgment.
 
-## Model Architecture:
+## Model Architecture
+
 - **Architecture Type:** Transformer
 - **Network Architecture:** Vision-Language Model based on [Qwen2.5-VL-3B](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) architecture with medical reasoning capabilities
 
 This model was developed by fine-tuning Qwen2.5-VL-3B using Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) for enhanced medical reasoning.
 
-
 ## Quick start / Inference
 
 ```python
@@ -98,7 +96,7 @@ from transformers import AutoModelForImageTextToText, AutoProcessor
 from PIL import Image
 
 
-# Load the model 
+# Load the model
 model_name = "nvidia/NV-Reason-CXR-3B"
 model = AutoModelForImageTextToText.from_pretrained(
     model_name,
@@ -135,7 +133,7 @@ text = processor.apply_chat_template(messages, add_generation_prompt=True)
 inputs = processor(text=text, images=[image], return_tensors="pt")
 inputs = inputs.to(model.device)
 
-# Generate 
+# Generate
 generated_ids = model.generate(**inputs,  max_new_tokens=2048)
 
 # Trim and decode
@@ -155,10 +153,10 @@ print(generated_text)
 
 ```
 
-
 ## Installation
 
-For inference only, the minimal set of required dependencies are 
+For inference only, the minimal set of required dependencies are
+
 ```shell
 pip install torch==2.7.1 torchvision==0.22.1 transformers==4.56.1
 ```
@@ -167,7 +165,7 @@ For training, we recommend to create a Python virtual environment with `uv`.
 To install `uv`, follow instructions [here](https://docs.astral.sh/uv/getting-started/installation/).
 
 ```shell
-uv venv --seed --python 3.11 nvreasoncxr  && source nvreasoncxr/bin/activate 
+uv venv --seed --python 3.11 nvreasoncxr  && source nvreasoncxr/bin/activate
 ```
 
 Then, install dependencies:
@@ -176,7 +174,7 @@ Then, install dependencies:
 uv pip install vllm==0.10.1.1
 uv pip install flash-attn==2.8.3 --no-build-isolation
 uv pip install accelerate bitsandbytes datasets peft wandb deepspeed einops flake8 hf_transfer huggingface-hub isort liger-kernel packaging  parameterized  safetensors pandas numpy scikit-learn qwen-vl-utils
-uv pip install trl==0.22.2 transformers==4.56.1 
+uv pip install trl==0.22.2 transformers==4.56.1
 
 ```
 
@@ -188,12 +186,10 @@ Optionally, log into your WANDB account to view training progress later:
 wandb login
 ```
 
-
 ## Training models
 
 The training configuration assumes a node of 8 x A100s NVIDIA GPUs (80GB). You'll need to download the images for the examples first, and place them into the "images" folder, see the "Data" section below.
 
-
 ### SFT
 
 ```shell
@@ -207,11 +203,10 @@ accelerate launch --config_file accelerate/zero2.yaml \
     --output_dir data/output_sft_model \
     --dataset_path datalists/sft.jsonl \
     --num_train_epochs 1 \
-    --dataset_streaming false \ 
+    --dataset_streaming false \
     --gradient_accumulation_steps 8
 ```
 
-
 ### GRPO
 
 ```shell
@@ -228,24 +223,22 @@ accelerate launch --config_file accelerate/zero2.yaml \
     --gradient_accumulation_steps 8
 ```
 
+## Data
 
-
-## Data 
 The model was trained on both internally collected human reasoning data and synthetic data. The small datasets provided below are examples only, intended to demonstrate the training code. The full training dataset is currently not provided.
 
 ### Data for SFT training example
-Download the x-ray images of the MIMIC-CXR-JPG dataset from [here](https://physionet.org/content/mimic-cxr-jpg/2.1.0/). You'll need to comply with the data Terms and Conditions. The training example uses only a small subset of 256 cases, so you could download only the images listed [here](datalists/sft.jsonl). In these examples the radiology thinking process was synthetically generated with LLM by rewriting the x-ray report text.  This small subset is intended only as an example.  Extract the image files (ignoring any subfolders) into the "images/mimic-cxr-jpg/images_512" folder. 
 
+Download the x-ray images of the MIMIC-CXR-JPG dataset from [here](https://physionet.org/content/mimic-cxr-jpg/2.1.0/). You'll need to comply with the data Terms and Conditions. The training example uses only a small subset of 256 cases, so you could download only the images listed [here](datalists/sft.jsonl). In these examples the radiology thinking process was synthetically generated with LLM by rewriting the x-ray report text.  This small subset is intended only as an example.  Extract the image files (ignoring any subfolders) into the "images/mimic-cxr-jpg/images_512" folder.
 
 ### Data for GRPO training example
 
-Download the x-ray images of the test set of CheXpert dataset from [here](https://stanfordaimi.azurewebsites.net/datasets/23c56a0d-15de-405b-87c8-99c30138950c).  You'll need to comply with the data Terms and Conditions. Extract the train subset (CheXpert/test) into "images/CheXpert/test" directory of this repo. 
+Download the x-ray images of the test set of CheXpert dataset from [here](https://stanfordaimi.azurewebsites.net/datasets/23c56a0d-15de-405b-87c8-99c30138950c).  You'll need to comply with the data Terms and Conditions. Extract the train subset (CheXpert/test) into "images/CheXpert/test" directory of this repo.
 
-We provide a data manifest [file](datalists/grpo.jsonl) formatted for GRPO training. It lists image names and solutions for each case, where the "solution" is a list of abnormalities present in each image.  The task of GRPO training is to learn the thinking process based solely on the provided list of abnormalities. 
+We provide a data manifest [file](datalists/grpo.jsonl) formatted for GRPO training. It lists image names and solutions for each case, where the "solution" is a list of abnormalities present in each image.  The task of GRPO training is to learn the thinking process based solely on the provided list of abnormalities.
 
 This CheXpert test set was used for testing during the model development. But here, instead we use it for the training example, since the data subset is very small, and you should observe accuracy improvement quickly (check WANDB graphs of accuracy).
 
-
 ## Acknowledgements
 
 This project uses a number of Huggingface libraries, including TRL, Transformers and Accelerate, as well as implementation ideas from a great [open-r1](https://github.com/huggingface/open-r1) project: "Open R1: A fully open reproduction of DeepSeek-R1", Hugging Face, Jan 2025.
@@ -257,18 +250,17 @@ This project uses a number of Huggingface libraries, including TRL, Transformers
 - **Base Model**: [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)
 - **Datasets**: [MIMIC-CXR](https://physionet.org/content/mimic-cxr-jpg/2.1.0/) • [CheXpert](https://stanfordaimi.azurewebsites.net/datasets/8cbd9ed4-2eb9-4565-affc-111cf4f7ebe2)
 
-
 ## License
 
 NV-Reason-CXR-3B model weights are released under the [NVIDIA OneWay Noncommercial License Agreement](https://huggingface.co/nvidia/NV-Reason-CXR-3B/blob/main/LICENSE).
 
-
 ## Citation
 
 If you find our work helpful, please consider citing the [paper](https://arxiv.org/abs/2510.23968):
+
 ```bibtex
 @misc{myronenko2025reasoning,
-      title={Reasoning Visual Language Model for Chest X-Ray Analysis}, 
+      title={Reasoning Visual Language Model for Chest X-Ray Analysis},
       author={Andriy Myronenko and Dong Yang and Baris Turkbey and Mariam Aboian and Sena Azamat and Esra Akcicek and Hongxu Yin and Pavlo Molchanov and Marc Edgar and Yufan He and Pengfei Guo and Yucheng Tang and Daguang Xu},
       year={2025},
       eprint={2510.23968},
@@ -278,5 +270,3 @@ If you find our work helpful, please consider citing the [paper](https://arxiv.o
       url={https://arxiv.org/abs/2510.23968}
 }
 ```
-
-