Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- Adds the `physicsnemo-model-builder` agent skill (`skills/physicsnemo-model-builder/`):
guides contributors through adding a new model or reusable layer to PhysicsNeMo,
or wrapping an existing PyTorch model.
- Adds Point-Transformer local vector-attention blocks to `physicsnemo.nn`.
- FSDP2 checkpoint support: full save/load round-trip for
``torch.distributed.fsdp`` v2 models, including DTensor edge cases,
Expand Down
87 changes: 87 additions & 0 deletions skills/physicsnemo-model-builder/BENCHMARK.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Evaluation Report

Evaluation of the `physicsnemo-model-builder` skill before publication through NVSkills-Eval.

This benchmark summarizes 3-Tier Evaluation from NVSkills-Eval results for the skill. The goal is to document whether the skill is safe, discoverable, effective, and useful for agents before it is published for broader workflow use.

> **Status: pending.** The results, Tier-1/Tier-2 findings, and verdict below are
> populated by an NVSkills-Eval run prior to publication. The evaluation dataset
> (`evals/evals.json`) and target agents are committed; run the harness and
> refresh this file before publishing.

## Evaluation Summary

- Skill: `physicsnemo-model-builder`
- Evaluation date: _pending_
- NVSkills-Eval profile: `external`
- Environment: `local`
- Dataset: 4 evaluation tasks (`evals/evals.json`)
- Attempts per task: 2
- Pass threshold: 50%
- Overall verdict: _pending_

## Agents Used

- `claude-code`
- `codex`

## Metrics Used

Reported benchmark dimensions:

- Security: checks whether skill-assisted execution avoids unsafe behavior such as secret leakage, destructive commands, or unauthorized access.
- Correctness: checks whether the agent follows the expected workflow and produces the correct final output.
- Discoverability: checks whether the agent loads the skill when relevant and avoids using it when irrelevant.
- Effectiveness: checks whether the agent performs measurably better with the skill than without it.
- Efficiency: checks whether the agent uses fewer tokens and avoids redundant work.

Underlying evaluation signals used in this run:

- `security` (Security): checks for unsafe operations, secret leakage, and unauthorized access.
- `skill_execution` (Skill Execution): verifies that the agent loaded the expected skill and workflow.
- `skill_efficiency` (Efficiency): checks routing quality, decoy avoidance, and redundant tool usage.
- `accuracy` (Accuracy): grades final-answer correctness against the reference answer.
- `goal_accuracy` (Goal Accuracy): checks whether the overall user task completed successfully.
- `behavior_check` (Behavior Check): verifies expected behavior steps, including safety expectations.
- `token_efficiency` (Token Efficiency): compares token usage with and without the skill.

## Test Tasks

The benchmark dataset contained 4 evaluation tasks:

- Positive tasks: 2 tasks where the skill was expected to activate (add a new model from scratch; wrap an external PyTorch model).
- Negative tasks: 2 tasks where the model-builder skill was not expected (a model-selection/discovery question that belongs to `physicsnemo-discover`; an out-of-scope datapipe request).
- Unlabeled tasks: 0.

Entries with `expected_skill` set are treated as positive skill-activation cases; entries with `expected_skill: null` are treated as negative activation cases.

## Results

_Pending NVSkills-Eval run._

| Dimension | Num | `claude-code` | `codex` |
|---|---:|---:|---:|
| Security | — | — | — |
| Correctness | — | — | — |
| Discoverability | — | — | — |
| Effectiveness | — | — | — |
| Efficiency | — | — | — |

Score values show skill-assisted performance. Values in parentheses show uplift versus the no-skill baseline when baseline data is available.

## Tier 1: Static Validation Summary

_Pending NVSkills-Eval run._

## Tier 2: Deduplication Summary

_Pending NVSkills-Eval run._ Note: this skill is intentionally distinct from
`physicsnemo-discover` (authoring/porting models vs. selecting existing ones);
the negative eval tasks guard that routing boundary.

## Publication Recommendation

_Pending NVSkills-Eval run._ Refresh this file with the harness output (results
table, Tier-1/Tier-2 findings, verdict) before publishing, and keep it with the
skill; re-run when the evaluation dataset, skill behavior, or target agents
materially change.
223 changes: 223 additions & 0 deletions skills/physicsnemo-model-builder/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
---
name: physicsnemo-model-builder
description: Official NVIDIA-authored workflow for adding a new model or reusable layer to PhysicsNeMo, or integrating an existing PyTorch model. Scaffolds a standards-compliant physicsnemo.Module (or a Module.from_torch wrapper for an external nn.Module), places it correctly, wires exports, writes tests against the house test helpers, and runs the local CI gates (ruff, interrogate, pytest). Use when a contributor wants to add or port a model or layer into the physicsnemo package. Do NOT use for datapipes, nn.functional ops/backends such as FunctionSpec, losses or metrics, training-recipe or example authoring, environment/installation setup, or merely deciding which existing model fits a task (use physicsnemo-discover for that).
license: Apache-2.0
metadata:
author: NVIDIA <agent-skills@nvidia.com>
tags:
- physicsnemo
- models
- contributing
- scaffolding
- integration
---

# PhysicsNeMo Model Builder

Drive a contributor from "I have a model (or layer, or an existing PyTorch
module)" to a standards-compliant, tested, CI-green addition to the
`physicsnemo` package. **You do the mechanical work** — placement, the
`physicsnemo.Module` shell, serialization wiring, docstrings, type
annotations, validation, exports, tests, and gates. **The contributor brings
the architecture** — the novel `forward` math. Keep that division explicit:
never invent their model; scaffold everything around it.

The audience is a researcher fluent in PyTorch but new to PhysicsNeMo, so
**explain the "why"** at each step (name the rule, give the reason) rather
than silently emitting files.

## When NOT to use this skill

Stop and **redirect — do not activate** — when the request is to *pick*, *use*,
or *configure* something that already exists, or targets another surface:

- **"Which existing model should I use / which fits my data?"** — selection and
discovery, not authoring → `physicsnemo-discover`.
- **Datapipes** (`physicsnemo/datapipes/`), **losses or metrics**
(`physicsnemo/metrics/`), **functional ops / backends**
(`physicsnemo/nn/functional/`, `FunctionSpec`), **training recipes / examples**
(`examples/`).

This skill is only for **authoring or porting** a model or reusable layer into
the `physicsnemo` package.

## Core principle

1. **The written standards are ground truth — read them, don't paraphrase from
memory.** The authoritative rules live in `CODING_STANDARDS/` at the repo
root: `MODELS_IMPLEMENTATION.md` (rules `MOD-***`) and
`EXTERNAL_IMPORTS.md` (rules `EXT-***`). Open the cited rule before relying
on it; reference it by ID when you justify a decision. They evolve — a rule
recalled from memory may be stale.
2. **Reuse before you build — discover, don't reinvent.** Half of a clean
integration is *not* writing code that already exists. Before scaffolding a
`forward`, enumerate what `physicsnemo.nn` already provides and tell the
contributor what to import (`references/reuse_map.md`).
3. **Verify every path before you cite it.** Glob/Read the live repo; a path
recalled from memory or pattern-matched from a neighbor is disproof — drop
it.

## Repo root resolution

Resolve the PhysicsNeMo repo root **first** (see `CONTRIBUTING.md §Repo root
resolution`); all `CODING_STANDARDS/…` and `physicsnemo/…` paths are rooted
there, and scaffolded files are written under it. **If no local clone is on the
path** (e.g. headless against the skills repo in an eval), shallow-clone the
canonical repo once and anchor to it — read its existing tree read-only for
standards / reuse / path verification, write new files under it:
`DEST="${TMPDIR:-/tmp}/physicsnemo-src"; [ -d "$DEST/physicsnemo" ] || git clone --depth 1 https://github.com/NVIDIA/physicsnemo "$DEST"`.
Use that URL verbatim; never interpolate one from user input.

## Scope

In scope: **complete models** (`physicsnemo/experimental/models/`), **reusable
layers** (`physicsnemo/nn/module/`), and **wrapping an existing PyTorch
`nn.Module`** via `Module.from_torch`.

Out of scope — stop and redirect: datapipes (`physicsnemo/datapipes/`),
functional ops / custom backends (`physicsnemo/nn/functional/`, `FunctionSpec`),
losses & metrics (`physicsnemo/metrics/`), training recipes (`examples/`), and
"which model should I use" (→ `physicsnemo-discover`).

## Workflow

Run in order, **after resolving the repo root** (§Repo root resolution).
Confirm the consequential choices with the contributor (artifact type,
placement, external-wrap vs from-scratch); scaffold the rest.

**Default to action.** When the repository is available, *create the actual
files* — `__init__.py`, `<name>.py`, and the test module — with a placeholder
`forward` that raises `NotImplementedError` (marked `# TODO: contributor's
forward math`), rather than only describing them in prose. Ask only the
questions that genuinely block placement; then produce files and iterate. The
skill's value is the working scaffold on disk, not a description of one.

### 1. Intake & classify

Ask only what you can't infer (≤4 questions). Resolve:

- **Artifact type:** complete *model*, reusable *layer*, or *wrap* an existing
PyTorch module? (Decision tree + rationale: `references/placement.md`.)
- **Identity:** class name (PascalCase), one-line purpose, the forward
inputs/outputs and their tensor shapes, heavy deps.
- **For wrap:** the import path of their `nn.Module`, and whether its
`__init__` args are JSON-serializable — this picks the serialization path
(`references/serialization.md`).

### 2. Place it (and say why)

- New **model** → `physicsnemo/experimental/models/<name>/` (`MOD-002a`: new
models start in `experimental`). Layout: `__init__.py` (re-exports) +
`<name>.py`. New **layer** → `physicsnemo/nn/module/<name>.py`, re-exported
from both `physicsnemo/nn/module/__init__.py` and `physicsnemo/nn/__init__.py`
(`MOD-000a`). Tests mirror the source path under `test/`.
- State the rule ID and the reason (experimental = API may change; layers are
shared building blocks).

### 3. Reuse audit

Before writing `forward`, enumerate existing primitives the contributor would
otherwise reinvent — attention bases, embeddings, `Mlp`, neighbor ops
(`knn`, `radius_search`), the TE-aware `LayerNorm`. Use the live search
patterns in `references/reuse_map.md`; verify each path before citing it. Say
explicitly "import `X` from `physicsnemo.nn` instead of writing your own." Keep
genuinely novel, model-specific pieces local to the model.

### 4. Scaffold the shell

Generate from the skeletons in `references/scaffolds.md`, adapting to the
contributor's shapes; explain what each enforced piece is for.

- **New model / layer:** subclass **`physicsnemo.Module`** (not
`torch.nn.Module` — `MOD-001`); a `ModelMetaData`; a **constructor taking
JSON-serializable config** (no splatted `**kwargs` — `MOD-010`; no
string-based class selection — `MOD-009`); a `forward` with jaxtyping on
every tensor arg (`MOD-006`), `if not torch.compiler.is_compiling():` shape
validation (`MOD-005`), and NumPy `r"""` docstrings with
`Parameters`/`Forward`/`Outputs` sections and `:math:` shapes (`MOD-003`).
Imports upward-only (`EXT-***`).
- **Wrap external:** `Module.from_torch(TheirModule, meta=...)`. **The
serialization gotcha lives here** — `physicsnemo.Module` save/from_checkpoint
requires `__init__` args to be JSON-serializable; nested `nn.Module` args
must each be converted via `Module.from_torch`. Walk them through
`references/serialization.md`, then prove it with the round-trip test below.

### 5. Tests

Generate the test module from `references/scaffolds.md`: class-per-public-class,
the `device` fixture, parametrized constructor/attribute checks (≥2 configs —
`MOD-008a`), `validate_forward_accuracy` for non-regression (`MOD-008b`), and
`validate_checkpoint` for the save/load round-trip (`MOD-008c`). These helpers
are **mandatory and come from `test.common`** — write the import explicitly in
the generated test and name them in your summary; don't hand-roll what they
provide:

```python
from test.common import validate_forward_accuracy, validate_checkpoint
```

### 6. Gates

From the repo root, run and iterate to green (explain each):

```
make lint # ruff format --check + ruff check
make interrogate # docstring coverage
make pytest # or: pytest test/<mirrored/path> -q
```

`physicsnemo/experimental/` is exempt from ruff/interrogate, but **not** from
runtime contracts — the serialization round-trip test must still pass there.

### 7. Finish & review

- Add a one-line `CHANGELOG.md` entry and SPDX Apache-2.0 headers to new files;
remind the contributor commits need `-s` (sign-off).
- Do an independent **code-review pass over the diff** before opening the PR —
re-check it against the standards (`MOD-***`/`EXT-***`), correctness, and the
reuse audit, ideally with fresh eyes (a separate review session/agent). If the
host agent offers a built-in code-review command (for example Claude Code's
`/code-review`), use it; otherwise review the diff directly. Then open the PR
— CODEOWNERS review + CI re-run the gates.

### 8. Definition of done

Confirm each before declaring success; fix any miss before finishing:

- [ ] Repo root resolved; every cited path verified to exist (no memory/guesses).
- [ ] Placed right: model → `experimental/models/` (`MOD-002a`); layer →
`nn/module/` + both `__init__` re-exports (`MOD-000a`).
- [ ] Subclasses `physicsnemo.Module` with a `ModelMetaData` (`MOD-001`);
`__init__` is JSON-serializable — no splatted `**kwargs` (`MOD-010`), no
string class selection (`MOD-009`).
- [ ] `forward` has jaxtyping on every tensor arg (`MOD-006`) +
`is_compiling()`-guarded shape validation (`MOD-005`); NumPy `r"""` docstrings
with `:math:` shapes (`MOD-003`).
- [ ] Reuse audit done — nothing reimplemented that `physicsnemo.nn` provides.
- [ ] Tests use `test.common`: `validate_forward_accuracy` (`MOD-008b`) +
`validate_checkpoint` (`MOD-008c`), ≥2 constructor configs (`MOD-008a`).
- [ ] Gates green (`make lint`, `make interrogate`, `make pytest`);
`CHANGELOG.md` entry + SPDX headers added.

## Common gotchas

Surface the relevant traps inline as you scaffold (full catalogue:
`references/lessons.md`):

- **`Module` serialization** is a common external-integration failure: raw
`nn.Module` submodule args break `from_checkpoint` (`references/serialization.md`).
- The **TE-aware `LayerNorm`** runs only on CUDA when Transformer Engine is
present; tests must skip the CPU case under TE.
- **`experimental/` skips lint, not runtime contracts.**
- Promote a model-specific layer to `physicsnemo.nn` only when a **second**
consumer appears — keep it local until then.

## Related resources

- `references/placement.md` — artifact decision tree and where each kind goes.
- `references/reuse_map.md` — live search patterns for existing primitives.
- `references/serialization.md` — `physicsnemo.Module`, JSON args, `from_torch`.
- `references/scaffolds.md` — model / layer / external-wrap / test skeletons.
- `references/lessons.md` — gotchas distilled from real integrations.
- `CODING_STANDARDS/MODELS_IMPLEMENTATION.md`, `EXTERNAL_IMPORTS.md` — the
authoritative rules; read the cited rule before relying on it.
Loading
Loading