Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# CI for NV-Reason-CXR: lint and format checks via pre-commit
name: CI

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
pre-commit:
name: pre-commit
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Install pre-commit
run: pip install pre-commit

- name: Run pre-commit
run: pre-commit run --all-files
7 changes: 7 additions & 0 deletions .markdownlint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Markdownlint config for NV-Generate-CTMR
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment references NV-Generate-CTMR but this repo is NV-Reason-CXR

Suggested change
# Markdownlint config for NV-Generate-CTMR
# Markdownlint config for NV-Reason-CXR

# Relaxed for existing docs (READMEs with tables, HTML, long lines).
# Re-enable rules as you clean up docs or for new files.

# Line length: allow long lines common in docs (tables, code, links)
MD013:
line_length: 700
35 changes: 35 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Pre-commit hooks for NV-Reason-CXR
# Install: pip install pre-commit && pre-commit install
# Run manually: pre-commit run --all-files

repos:
# General file checks
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-json
- id: check-merge-conflict
- id: check-added-large-files
args: [--maxkb=1000]
- id: check-case-conflict
- id: debug-statements
Comment thread
jjomier marked this conversation as resolved.

# Python linting and formatting (ruff) — fixes applied locally
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.8.4
hooks:
- id: ruff
args: [--fix]
- id: ruff-format

# Markdown linting
- repo: https://github.com/igorshubovych/markdownlint-cli
rev: v0.38.0
hooks:
- id: markdownlint

ci:
autoupdate_commit_msg: "chore: pre-commit autoupdate"
92 changes: 41 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,20 @@
# NV-Reason-CXR-3B

## Description

NV-Reason-CXR-3B is a specialized vision-language model designed for medical reasoning and interpretation of chest X-ray images, with detailed explanations. The model combines visual understanding with medical reasoning capabilities, enabling healthcare professionals to access comprehensive analyses and engage in follow-up discussions about radiological findings. NV-Reason-CXR-3B provides step-by-step reasoning that mirrors clinical thinking patterns, making it valuable for educational and research applications in medical imaging.

This model is for research and development only. It is intended to empower developers to extend this work in their tasks and to provide practical examples of applying the methodology across medical domains.

**Table of Contents**
1. [Overview](#overview)
2. [Introduction](#introduction)
3. [Installation](#installation)
4. [Training models](#training-models)
- [SFT](#sft)
- [GRPO](#grpo)
5. [Data](#data)
## Table of Contents

1. [Overview](#overview)
2. [Introduction](#introduction)
3. [Installation](#installation)
4. [Training models](#training-models)
- [SFT](#sft)
- [GRPO](#grpo)
5. [Data](#data)

## Overview

Expand All @@ -25,37 +26,35 @@ The goal of this repo is to provide examples for inference and training of the [

## Introduction

Vision–language models (VLMs) have shown strong promise for medical image analysis, but most remain opaque, offering predictions without the transparent, stepwise reasoning clinicians rely on. We present a framework that brings chain-of-thought (CoT) reasoning to chest X-ray interpretation.
Vision–language models (VLMs) have shown strong promise for medical image analysis, but most remain opaque, offering predictions without the transparent, stepwise reasoning clinicians rely on. We present a framework that brings chain-of-thought (CoT) reasoning to chest X-ray interpretation.
Our approach is designed to learn how experts reason—not just what they conclude—by aligning intermediate steps with observable image evidence and radiology workflow. Beyond accuracy, the explicit reasoning traces support clinical auditability: they reveal why a conclusion was reached, which alternatives were considered, and where uncertainty remains—enabling quality assurance, error analysis, and safer human–AI collaboration.

Inspired by reasoning-first training (DeepSeek-R1 and Open-R1), our approach combines a radiologist-style supervised fine-tuning (SFT) warm start with GRPO reinforcement learning (RL) and verifiable rewards defined over a list of chest X-ray abnormalities.

We enlisted several experienced radiologists to annotate their internal reasoning while reading chest X-ray cases. To support this, we developed an internal web platform that makes thought capture as seamless as possible. The platform provides automated voice recording, transcription, error correction, and optional translation into English. We used both the collected human reasoning data and synthetic reasoning data for training with SFT, as well as abnormality list only (from MIMIC-CXR) for GRPO training.
We enlisted several experienced radiologists to annotate their internal reasoning while reading chest X-ray cases. To support this, we developed an internal web platform that makes thought capture as seamless as possible. The platform provides automated voice recording, transcription, error correction, and optional translation into English. We used both the collected human reasoning data and synthetic reasoning data for training with SFT, as well as abnormality list only (from MIMIC-CXR) for GRPO training.

In an expert reader study, AI-assisted reasoning increased confidence, supported targeted error auditing, and reduced time to finalize reports—particularly for abnormal cases. On out-of-distribution (OOD) evaluation using the CheXpert test set, the model attains competitive multi-label classification while providing faithful rationales.

NV-Reason-CXR-3B is designed to respond in the style of a teacher, a senior radiologist, explaining the problem and the solution and offers:

- Chain-of-thought processing
- The reasoning engine generates step-by-step diagnostic analysis
- Systematic anatomical review
- Identification of normal and abnormal findings
- Differential diagnosis consideration
- Clinical output generation
- Main findings
- Step-by-step reasoning pathway
- Differential diagnoses and their likelihood
- Recommendations for follow-up or clinical correlation
- Clarification multi-step follow-up chat
- Structured report generation

- Chain-of-thought processing
- The reasoning engine generates step-by-step diagnostic analysis
- Systematic anatomical review
- Identification of normal and abnormal findings
- Differential diagnosis consideration
- Clinical output generation
- Main findings
- Step-by-step reasoning pathway
- Differential diagnoses and their likelihood
- Recommendations for follow-up or clinical correlation
- Clarification multi-step follow-up chat
- Structured report generation

An example of the model output:

![](docs/d1.png)

You can try the 🩻 [\[Web Demo\]](https://huggingface.co/spaces/nvidia/nv-reason-cxr) for examples of the model output, where you can also ask the follow up questions such as "provide differentials" and "write a structured report".
![Model output example](docs/d1.png)

You can try the 🩻 [\[Web Demo\]](https://huggingface.co/spaces/nvidia/nv-reason-cxr) for examples of the model output, where you can also ask the follow up questions such as "provide differentials" and "write a structured report".

## Preliminary subjective evaluation

Expand All @@ -69,27 +68,26 @@ text);
- Full AI reasoning: Full AI reasoning output and the structured report.
Readers were instructed to behave as in routine practice.


| ![1](docs/se1.png) | ![2](docs/se2.png) |
| - | - |
| ![3](docs/se3.png) | ![4](docs/se4.png) |

Overall, experts rated the reasoning traces as accurate, appropriately qualified, and practically useful; full reasoning notably improved trust and confidence and yielded substantial time savings—especially for abnormal studies.

### Use Case

### Use Case:
Radiologists, medical students, and medical researchers would be expected to use this system for chest X-ray interpretation with detailed reasoning, educational training with AI-generated explanations, and research applications requiring explainable medical AI analyses.

**Important Medical AI Considerations:**
This model is designed for research and educational purposes only and should not be used for clinical diagnosis or treatment decisions. All outputs should be reviewed by qualified medical professionals. The model's reasoning capabilities are intended to support medical education and research, not replace clinical judgment.

## Model Architecture:
## Model Architecture

- **Architecture Type:** Transformer
- **Network Architecture:** Vision-Language Model based on [Qwen2.5-VL-3B](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) architecture with medical reasoning capabilities

This model was developed by fine-tuning Qwen2.5-VL-3B using Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) for enhanced medical reasoning.


## Quick start / Inference

```python
Expand All @@ -98,7 +96,7 @@ from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image


# Load the model
# Load the model
model_name = "nvidia/NV-Reason-CXR-3B"
model = AutoModelForImageTextToText.from_pretrained(
model_name,
Expand Down Expand Up @@ -135,7 +133,7 @@ text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=text, images=[image], return_tensors="pt")
inputs = inputs.to(model.device)

# Generate
# Generate
generated_ids = model.generate(**inputs, max_new_tokens=2048)

# Trim and decode
Expand All @@ -155,10 +153,10 @@ print(generated_text)

```


## Installation

For inference only, the minimal set of required dependencies are
For inference only, the minimal set of required dependencies are

```shell
pip install torch==2.7.1 torchvision==0.22.1 transformers==4.56.1
```
Expand All @@ -167,7 +165,7 @@ For training, we recommend to create a Python virtual environment with `uv`.
To install `uv`, follow instructions [here](https://docs.astral.sh/uv/getting-started/installation/).

```shell
uv venv --seed --python 3.11 nvreasoncxr && source nvreasoncxr/bin/activate
uv venv --seed --python 3.11 nvreasoncxr && source nvreasoncxr/bin/activate
```

Then, install dependencies:
Expand All @@ -176,7 +174,7 @@ Then, install dependencies:
uv pip install vllm==0.10.1.1
uv pip install flash-attn==2.8.3 --no-build-isolation
uv pip install accelerate bitsandbytes datasets peft wandb deepspeed einops flake8 hf_transfer huggingface-hub isort liger-kernel packaging parameterized safetensors pandas numpy scikit-learn qwen-vl-utils
uv pip install trl==0.22.2 transformers==4.56.1
uv pip install trl==0.22.2 transformers==4.56.1

```

Expand All @@ -188,12 +186,10 @@ Optionally, log into your WANDB account to view training progress later:
wandb login
```


## Training models

The training configuration assumes a node of 8 x A100s NVIDIA GPUs (80GB). You'll need to download the images for the examples first, and place them into the "images" folder, see the "Data" section below.


### SFT

```shell
Expand All @@ -207,11 +203,10 @@ accelerate launch --config_file accelerate/zero2.yaml \
--output_dir data/output_sft_model \
--dataset_path datalists/sft.jsonl \
--num_train_epochs 1 \
--dataset_streaming false \
--dataset_streaming false \
--gradient_accumulation_steps 8
```


### GRPO

```shell
Expand All @@ -228,24 +223,22 @@ accelerate launch --config_file accelerate/zero2.yaml \
--gradient_accumulation_steps 8
```

## Data


## Data
The model was trained on both internally collected human reasoning data and synthetic data. The small datasets provided below are examples only, intended to demonstrate the training code. The full training dataset is currently not provided.

### Data for SFT training example
Download the x-ray images of the MIMIC-CXR-JPG dataset from [here](https://physionet.org/content/mimic-cxr-jpg/2.1.0/). You'll need to comply with the data Terms and Conditions. The training example uses only a small subset of 256 cases, so you could download only the images listed [here](datalists/sft.jsonl). In these examples the radiology thinking process was synthetically generated with LLM by rewriting the x-ray report text. This small subset is intended only as an example. Extract the image files (ignoring any subfolders) into the "images/mimic-cxr-jpg/images_512" folder.

Download the x-ray images of the MIMIC-CXR-JPG dataset from [here](https://physionet.org/content/mimic-cxr-jpg/2.1.0/). You'll need to comply with the data Terms and Conditions. The training example uses only a small subset of 256 cases, so you could download only the images listed [here](datalists/sft.jsonl). In these examples the radiology thinking process was synthetically generated with LLM by rewriting the x-ray report text. This small subset is intended only as an example. Extract the image files (ignoring any subfolders) into the "images/mimic-cxr-jpg/images_512" folder.

### Data for GRPO training example

Download the x-ray images of the test set of CheXpert dataset from [here](https://stanfordaimi.azurewebsites.net/datasets/23c56a0d-15de-405b-87c8-99c30138950c). You'll need to comply with the data Terms and Conditions. Extract the train subset (CheXpert/test) into "images/CheXpert/test" directory of this repo.
Download the x-ray images of the test set of CheXpert dataset from [here](https://stanfordaimi.azurewebsites.net/datasets/23c56a0d-15de-405b-87c8-99c30138950c). You'll need to comply with the data Terms and Conditions. Extract the train subset (CheXpert/test) into "images/CheXpert/test" directory of this repo.

We provide a data manifest [file](datalists/grpo.jsonl) formatted for GRPO training. It lists image names and solutions for each case, where the "solution" is a list of abnormalities present in each image. The task of GRPO training is to learn the thinking process based solely on the provided list of abnormalities.
We provide a data manifest [file](datalists/grpo.jsonl) formatted for GRPO training. It lists image names and solutions for each case, where the "solution" is a list of abnormalities present in each image. The task of GRPO training is to learn the thinking process based solely on the provided list of abnormalities.

This CheXpert test set was used for testing during the model development. But here, instead we use it for the training example, since the data subset is very small, and you should observe accuracy improvement quickly (check WANDB graphs of accuracy).


## Acknowledgements

This project uses a number of Huggingface libraries, including TRL, Transformers and Accelerate, as well as implementation ideas from a great [open-r1](https://github.com/huggingface/open-r1) project: "Open R1: A fully open reproduction of DeepSeek-R1", Hugging Face, Jan 2025.
Expand All @@ -257,18 +250,17 @@ This project uses a number of Huggingface libraries, including TRL, Transformers
- **Base Model**: [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)
- **Datasets**: [MIMIC-CXR](https://physionet.org/content/mimic-cxr-jpg/2.1.0/) • [CheXpert](https://stanfordaimi.azurewebsites.net/datasets/8cbd9ed4-2eb9-4565-affc-111cf4f7ebe2)


## License

NV-Reason-CXR-3B model weights are released under the [NVIDIA OneWay Noncommercial License Agreement](https://huggingface.co/nvidia/NV-Reason-CXR-3B/blob/main/LICENSE).


## Citation

If you find our work helpful, please consider citing the [paper](https://arxiv.org/abs/2510.23968):

```bibtex
@misc{myronenko2025reasoning,
title={Reasoning Visual Language Model for Chest X-Ray Analysis},
title={Reasoning Visual Language Model for Chest X-Ray Analysis},
author={Andriy Myronenko and Dong Yang and Baris Turkbey and Mariam Aboian and Sena Azamat and Esra Akcicek and Hongxu Yin and Pavlo Molchanov and Marc Edgar and Yufan He and Pengfei Guo and Yucheng Tang and Daguang Xu},
year={2025},
eprint={2510.23968},
Expand All @@ -278,5 +270,3 @@ If you find our work helpful, please consider citing the [paper](https://arxiv.o
url={https://arxiv.org/abs/2510.23968}
}
```


Loading
Loading