GeoT optimization 3/4: Add fused batched Muon optimizer by coreyjadams · Pull Request #1743 · NVIDIA/physicsnemo

coreyjadams · 2026-06-22T21:59:33Z

PhysicsNeMo Pull Request

Cursor made this implementation and I want to clean it up to be a tighter integration against torch before we merge. The key is that the overhead of looping over params is actually pretty significant for models like GeoT. So this is a first draft at taht fusion.

We won't merge it in this state, but I wanted a branch as a placeholder for putting all the pieces together.

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
If I am implementing a new model or modifying any existing model, I have followed the Models Implementation Coding Standards.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

Add physicsnemo.optim.Muon, a fused/batched drop-in replacement for torch.optim.Muon that groups 2-D parameters by (shape, dtype, device) and runs batched Newton-Schulz via torch.bmm/baddbmm with torch._foreach_* momentum/weight-decay updates. Matches torch.optim.Muon hyperparameters, momentum_buffer state, and LR-adjustment modes. Export it from physicsnemo.optim and switch the unified external aero recipe's build_muon_optimizer to use it via CombinedOptimizer.

copy-pr-bot · 2026-06-22T21:59:36Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

parameters.

copy-pr-bot · 2026-06-29T19:52:57Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-06-29T19:58:56Z

Greptile Summary

This PR introduces physicsnemo.optim.Muon, a subclass of torch.optim.Muon that batches the Newton-Schulz orthogonalization across same-shaped parameters using torch.bmm/torch.baddbmm, reducing kernel launches from O(num_params × ns_steps) to O(num_shape_groups × ns_steps). The author explicitly marks this as a WIP placeholder that won't be merged as-is.

physicsnemo/optim/muon.py: New fused optimizer with batched NS iteration, DTensor/FSDP2 rejection guards, and a full docstring. Relies on torch.optim._muon._adjust_lr (a private PyTorch internal) and the Nesterov formula uses lerp(grad, buf, momentum) whose equivalence to torch.optim.Muon needs confirming on a PyTorch 2.10 build.
test/optim/test_muon.py: Good test coverage for grouping, state-dict roundtrip, and DTensor rejection; the key numerical-equivalence test is gated on torch.optim.Muon availability and may not have run yet.
examples/.../utils.py and physicsnemo/optim/__init__.py: Minimal wiring changes to surface the new optimizer.

Important Files Changed

Filename	Overview
physicsnemo/optim/muon.py	New fused Muon optimizer subclass: two correctness concerns (private `_adjust_lr` import that will hard-fail if PyTorch refactors the internal module, and an unverified Nesterov formula that may diverge from the upstream by `momentum*grad`).
test/optim/test_muon.py	Good test coverage for grouping, state-dict roundtrip, and DTensor rejection; the critical numerical-equivalence test (`test_matches_torch_muon`) is guarded by `skipif` and may not have been run against a real `torch.optim.Muon` build yet.
physicsnemo/optim/init.py	Exports new `Muon` class; change is minimal and correct.
examples/cfd/external_aerodynamics/unified_external_aero_recipe/src/utils.py	Switches call sites from `torch.optim.Muon` to `physicsnemo.optim.Muon`; straightforward drop-in replacement with no other changes.

_{Reviews (1): Last reviewed commit: "Merge branch 'main' into geoT-opt-muon-o..." | Re-trigger Greptile}

peterdsharpe

Great job with this!

The bucketing-logic ("grouping") is a particularly nice touch that I think will really benefit kernel-launch-bound training. TBH, this might be worth upstreaming to PyTorch's Muon impl too (unless they do it first).

coreyjadams · 2026-06-29T21:22:25Z

/ok to test fdc0b5a

coreyjadams added 2 commits June 22, 2026 18:48

Apply pre-commit

0d20603

coreyjadams added 2 commits June 29, 2026 13:15

Merge branch 'main' into geoT-opt-muon-opt-fusion

413cf35

Update muon: inherit, not reimplement. Catch sharded weights and

4ea5813

parameters.

coreyjadams marked this pull request as ready for review June 29, 2026 19:53

coreyjadams requested a review from peterdsharpe as a code owner June 29, 2026 19:53

Merge branch 'main' into geoT-opt-muon-opt-fusion

34e0207

greptile-apps Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread physicsnemo/optim/muon.py Outdated

Comment thread physicsnemo/optim/muon.py

Comment thread physicsnemo/optim/muon.py

Comment thread physicsnemo/optim/muon.py Outdated

peterdsharpe approved these changes Jun 29, 2026

View reviewed changes

Comment thread physicsnemo/optim/muon.py Outdated

Comment thread physicsnemo/optim/muon.py Outdated

Comment thread physicsnemo/optim/muon.py

Comment thread physicsnemo/optim/muon.py

Comment thread physicsnemo/optim/muon.py

Clean up muon PR, purge DTensor checks

fdc0b5a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GeoT optimization 3/4: Add fused batched Muon optimizer#1743

GeoT optimization 3/4: Add fused batched Muon optimizer#1743
coreyjadams wants to merge 6 commits into
mainfrom
geoT-opt-muon-opt-fusion

coreyjadams commented Jun 22, 2026

Uh oh!

copy-pr-bot Bot commented Jun 22, 2026

Uh oh!

copy-pr-bot Bot commented Jun 29, 2026

Uh oh!

greptile-apps Bot commented Jun 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

peterdsharpe left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coreyjadams commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

coreyjadams commented Jun 22, 2026

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Review Process

Uh oh!

copy-pr-bot Bot commented Jun 22, 2026

Uh oh!

copy-pr-bot Bot commented Jun 29, 2026

Uh oh!

greptile-apps Bot commented Jun 29, 2026

Greptile Summary

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

peterdsharpe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coreyjadams commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants