Skip to content

Unified Aero Recipe: Fix for manifest-based path without augmentation#1755

Merged
coreyjadams merged 25 commits into
NVIDIA:mainfrom
peterdsharpe:psharpe/unified-recipe-manifest-validation-fix
Jun 29, 2026
Merged

Unified Aero Recipe: Fix for manifest-based path without augmentation#1755
coreyjadams merged 25 commits into
NVIDIA:mainfrom
peterdsharpe:psharpe/unified-recipe-manifest-validation-fix

Conversation

@peterdsharpe

Copy link
Copy Markdown
Collaborator

PhysicsNeMo Pull Request

Description

Stack on top of #1753, #1754

Checklist

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

…ither `inverse()` or `inverse_td()`. And never used
- Removed the `_all_reduce` method from `MetricCalculator` as it was redundant.
- Updated the `__call__` method to directly return the metrics without reduction.
- Introduced `_reduce_and_average_epoch` function in `train.py` to handle averaging of epoch loss and metrics across distributed processes.
- Added unit tests for the new `_reduce_and_average_epoch` function to ensure correctness in both single-process and distributed scenarios.
- Added `_build_manifest_val_dataset` function to create a separate validation dataset when augmentations are enabled, ensuring validation does not inherit training augmentations.
- Updated `build_dataloaders` to utilize the new validation dataset logic, allowing for consistent evaluation behavior across manifest and directory modes.
- Introduced unit tests for `_build_manifest_val_dataset` to verify correct behavior for both augmented and non-augmented scenarios.
@copy-pr-bot

copy-pr-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@peterdsharpe peterdsharpe changed the title Psharpe/unified recipe manifest validation fix Unified Aero Recipe: Fix for manifest-based path without augmentation Jun 25, 2026
@peterdsharpe peterdsharpe requested a review from melo-gonzo June 25, 2026 16:06

@melo-gonzo melo-gonzo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look over the changes in dataset.py - all LGTM! Thanks @peterdsharpe

@peterdsharpe peterdsharpe marked this pull request as ready for review June 25, 2026 19:44
@peterdsharpe peterdsharpe enabled auto-merge June 25, 2026 19:45
@peterdsharpe peterdsharpe disabled auto-merge June 25, 2026 19:45
@greptile-apps

greptile-apps Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a bug in manifest mode where validation data was being passed through the augmented transform chain when augment=True, unlike directory mode which always builds its val dataset with augment=False. The fix introduces _build_manifest_val_dataset and wires it into build_dataloaders so manifest-mode validation now matches directory-mode behavior.

  • Adds _build_manifest_val_dataset: returns None (val shares train dataset) when augment=False, or a fresh un-augmented dataset over the same directory when augment=True, relying on the same deterministic reader glob order.
  • Updates build_dataloaders to initialise manifest_val_dataset = None before the loop and use it as val_dataset in the manifest branch, with the existing multi-dataset limitation note updated to cover the new variable.
  • Adds TestManifestValDataset with two tests that lock down both branches, including a guard assertion that the train dataset actually carries stochastic transforms so the no-augmentation check on the val dataset is non-vacuous.

Important Files Changed

Filename Overview
examples/cfd/external_aerodynamics/unified_external_aero_recipe/src/datasets.py Adds _build_manifest_val_dataset helper and wires it into build_dataloaders so validation in manifest mode is always un-augmented when augment=True, matching directory-mode behavior. Logic, variable initialization, and existing NOTE comments around the multi-dataset limitation are all updated consistently.
examples/cfd/external_aerodynamics/unified_external_aero_recipe/tests/test_manifest.py Adds TestManifestValDataset with two targeted tests: one confirming augment=False returns the None sentinel, and one verifying that augment=True produces a separate dataset object with no stochastic transforms but the deterministic CenterMesh still present. The guard assertion prevents a vacuous pass.

Reviews (2): Last reviewed commit: "Merge branch 'main' into psharpe/unified..." | Re-trigger Greptile

Comment thread examples/cfd/external_aerodynamics/unified_external_aero_recipe/src/train.py Outdated
@peterdsharpe

Copy link
Copy Markdown
Collaborator Author

@greptileai

@coreyjadams coreyjadams left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me!

@coreyjadams coreyjadams added this pull request to the merge queue Jun 29, 2026
Merged via the queue into NVIDIA:main with commit 314177c Jun 29, 2026
6 checks passed
@peterdsharpe peterdsharpe deleted the psharpe/unified-recipe-manifest-validation-fix branch July 1, 2026 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants