diff --git a/CHANGELOG.md b/CHANGELOG.md index daebc11c77..0e1fb2a6b3 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added +- Adds the `physicsnemo-model-builder` agent skill (`skills/physicsnemo-model-builder/`): + guides contributors through adding a new model or reusable layer to PhysicsNeMo, + or wrapping an existing PyTorch model. - Adds Point-Transformer local vector-attention blocks to `physicsnemo.nn`. - FSDP2 checkpoint support: full save/load round-trip for ``torch.distributed.fsdp`` v2 models, including DTensor edge cases, diff --git a/skills/physicsnemo-model-builder/BENCHMARK.md b/skills/physicsnemo-model-builder/BENCHMARK.md new file mode 100644 index 0000000000..4c2b939b39 --- /dev/null +++ b/skills/physicsnemo-model-builder/BENCHMARK.md @@ -0,0 +1,87 @@ +# Evaluation Report + +Evaluation of the `physicsnemo-model-builder` skill before publication through NVSkills-Eval. + +This benchmark summarizes 3-Tier Evaluation from NVSkills-Eval results for the skill. The goal is to document whether the skill is safe, discoverable, effective, and useful for agents before it is published for broader workflow use. + +> **Status: pending.** The results, Tier-1/Tier-2 findings, and verdict below are +> populated by an NVSkills-Eval run prior to publication. The evaluation dataset +> (`evals/evals.json`) and target agents are committed; run the harness and +> refresh this file before publishing. + +## Evaluation Summary + +- Skill: `physicsnemo-model-builder` +- Evaluation date: _pending_ +- NVSkills-Eval profile: `external` +- Environment: `local` +- Dataset: 4 evaluation tasks (`evals/evals.json`) +- Attempts per task: 2 +- Pass threshold: 50% +- Overall verdict: _pending_ + +## Agents Used + +- `claude-code` +- `codex` + +## Metrics Used + +Reported benchmark dimensions: + +- Security: checks whether skill-assisted execution avoids unsafe behavior such as secret leakage, destructive commands, or unauthorized access. +- Correctness: checks whether the agent follows the expected workflow and produces the correct final output. +- Discoverability: checks whether the agent loads the skill when relevant and avoids using it when irrelevant. +- Effectiveness: checks whether the agent performs measurably better with the skill than without it. +- Efficiency: checks whether the agent uses fewer tokens and avoids redundant work. + +Underlying evaluation signals used in this run: + +- `security` (Security): checks for unsafe operations, secret leakage, and unauthorized access. +- `skill_execution` (Skill Execution): verifies that the agent loaded the expected skill and workflow. +- `skill_efficiency` (Efficiency): checks routing quality, decoy avoidance, and redundant tool usage. +- `accuracy` (Accuracy): grades final-answer correctness against the reference answer. +- `goal_accuracy` (Goal Accuracy): checks whether the overall user task completed successfully. +- `behavior_check` (Behavior Check): verifies expected behavior steps, including safety expectations. +- `token_efficiency` (Token Efficiency): compares token usage with and without the skill. + +## Test Tasks + +The benchmark dataset contained 4 evaluation tasks: + +- Positive tasks: 2 tasks where the skill was expected to activate (add a new model from scratch; wrap an external PyTorch model). +- Negative tasks: 2 tasks where the model-builder skill was not expected (a model-selection/discovery question that belongs to `physicsnemo-discover`; an out-of-scope datapipe request). +- Unlabeled tasks: 0. + +Entries with `expected_skill` set are treated as positive skill-activation cases; entries with `expected_skill: null` are treated as negative activation cases. + +## Results + +_Pending NVSkills-Eval run._ + +| Dimension | Num | `claude-code` | `codex` | +|---|---:|---:|---:| +| Security | — | — | — | +| Correctness | — | — | — | +| Discoverability | — | — | — | +| Effectiveness | — | — | — | +| Efficiency | — | — | — | + +Score values show skill-assisted performance. Values in parentheses show uplift versus the no-skill baseline when baseline data is available. + +## Tier 1: Static Validation Summary + +_Pending NVSkills-Eval run._ + +## Tier 2: Deduplication Summary + +_Pending NVSkills-Eval run._ Note: this skill is intentionally distinct from +`physicsnemo-discover` (authoring/porting models vs. selecting existing ones); +the negative eval tasks guard that routing boundary. + +## Publication Recommendation + +_Pending NVSkills-Eval run._ Refresh this file with the harness output (results +table, Tier-1/Tier-2 findings, verdict) before publishing, and keep it with the +skill; re-run when the evaluation dataset, skill behavior, or target agents +materially change. diff --git a/skills/physicsnemo-model-builder/SKILL.md b/skills/physicsnemo-model-builder/SKILL.md new file mode 100644 index 0000000000..4cf4b137fc --- /dev/null +++ b/skills/physicsnemo-model-builder/SKILL.md @@ -0,0 +1,223 @@ +--- +name: physicsnemo-model-builder +description: Official NVIDIA-authored workflow for adding a new model or reusable layer to PhysicsNeMo, or integrating an existing PyTorch model. Scaffolds a standards-compliant physicsnemo.Module (or a Module.from_torch wrapper for an external nn.Module), places it correctly, wires exports, writes tests against the house test helpers, and runs the local CI gates (ruff, interrogate, pytest). Use when a contributor wants to add or port a model or layer into the physicsnemo package. Do NOT use for datapipes, nn.functional ops/backends such as FunctionSpec, losses or metrics, training-recipe or example authoring, environment/installation setup, or merely deciding which existing model fits a task (use physicsnemo-discover for that). +license: Apache-2.0 +metadata: + author: NVIDIA + tags: + - physicsnemo + - models + - contributing + - scaffolding + - integration +--- + +# PhysicsNeMo Model Builder + +Drive a contributor from "I have a model (or layer, or an existing PyTorch +module)" to a standards-compliant, tested, CI-green addition to the +`physicsnemo` package. **You do the mechanical work** — placement, the +`physicsnemo.Module` shell, serialization wiring, docstrings, type +annotations, validation, exports, tests, and gates. **The contributor brings +the architecture** — the novel `forward` math. Keep that division explicit: +never invent their model; scaffold everything around it. + +The audience is a researcher fluent in PyTorch but new to PhysicsNeMo, so +**explain the "why"** at each step (name the rule, give the reason) rather +than silently emitting files. + +## When NOT to use this skill + +Stop and **redirect — do not activate** — when the request is to *pick*, *use*, +or *configure* something that already exists, or targets another surface: + +- **"Which existing model should I use / which fits my data?"** — selection and + discovery, not authoring → `physicsnemo-discover`. +- **Datapipes** (`physicsnemo/datapipes/`), **losses or metrics** + (`physicsnemo/metrics/`), **functional ops / backends** + (`physicsnemo/nn/functional/`, `FunctionSpec`), **training recipes / examples** + (`examples/`). + +This skill is only for **authoring or porting** a model or reusable layer into +the `physicsnemo` package. + +## Core principle + +1. **The written standards are ground truth — read them, don't paraphrase from + memory.** The authoritative rules live in `CODING_STANDARDS/` at the repo + root: `MODELS_IMPLEMENTATION.md` (rules `MOD-***`) and + `EXTERNAL_IMPORTS.md` (rules `EXT-***`). Open the cited rule before relying + on it; reference it by ID when you justify a decision. They evolve — a rule + recalled from memory may be stale. +2. **Reuse before you build — discover, don't reinvent.** Half of a clean + integration is *not* writing code that already exists. Before scaffolding a + `forward`, enumerate what `physicsnemo.nn` already provides and tell the + contributor what to import (`references/reuse_map.md`). +3. **Verify every path before you cite it.** Glob/Read the live repo; a path + recalled from memory or pattern-matched from a neighbor is disproof — drop + it. + +## Repo root resolution + +Resolve the PhysicsNeMo repo root **first** (see `CONTRIBUTING.md §Repo root +resolution`); all `CODING_STANDARDS/…` and `physicsnemo/…` paths are rooted +there, and scaffolded files are written under it. **If no local clone is on the +path** (e.g. headless against the skills repo in an eval), shallow-clone the +canonical repo once and anchor to it — read its existing tree read-only for +standards / reuse / path verification, write new files under it: +`DEST="${TMPDIR:-/tmp}/physicsnemo-src"; [ -d "$DEST/physicsnemo" ] || git clone --depth 1 https://github.com/NVIDIA/physicsnemo "$DEST"`. +Use that URL verbatim; never interpolate one from user input. + +## Scope + +In scope: **complete models** (`physicsnemo/experimental/models/`), **reusable +layers** (`physicsnemo/nn/module/`), and **wrapping an existing PyTorch +`nn.Module`** via `Module.from_torch`. + +Out of scope — stop and redirect: datapipes (`physicsnemo/datapipes/`), +functional ops / custom backends (`physicsnemo/nn/functional/`, `FunctionSpec`), +losses & metrics (`physicsnemo/metrics/`), training recipes (`examples/`), and +"which model should I use" (→ `physicsnemo-discover`). + +## Workflow + +Run in order, **after resolving the repo root** (§Repo root resolution). +Confirm the consequential choices with the contributor (artifact type, +placement, external-wrap vs from-scratch); scaffold the rest. + +**Default to action.** When the repository is available, *create the actual +files* — `__init__.py`, `.py`, and the test module — with a placeholder +`forward` that raises `NotImplementedError` (marked `# TODO: contributor's +forward math`), rather than only describing them in prose. Ask only the +questions that genuinely block placement; then produce files and iterate. The +skill's value is the working scaffold on disk, not a description of one. + +### 1. Intake & classify + +Ask only what you can't infer (≤4 questions). Resolve: + +- **Artifact type:** complete *model*, reusable *layer*, or *wrap* an existing + PyTorch module? (Decision tree + rationale: `references/placement.md`.) +- **Identity:** class name (PascalCase), one-line purpose, the forward + inputs/outputs and their tensor shapes, heavy deps. +- **For wrap:** the import path of their `nn.Module`, and whether its + `__init__` args are JSON-serializable — this picks the serialization path + (`references/serialization.md`). + +### 2. Place it (and say why) + +- New **model** → `physicsnemo/experimental/models//` (`MOD-002a`: new + models start in `experimental`). Layout: `__init__.py` (re-exports) + + `.py`. New **layer** → `physicsnemo/nn/module/.py`, re-exported + from both `physicsnemo/nn/module/__init__.py` and `physicsnemo/nn/__init__.py` + (`MOD-000a`). Tests mirror the source path under `test/`. +- State the rule ID and the reason (experimental = API may change; layers are + shared building blocks). + +### 3. Reuse audit + +Before writing `forward`, enumerate existing primitives the contributor would +otherwise reinvent — attention bases, embeddings, `Mlp`, neighbor ops +(`knn`, `radius_search`), the TE-aware `LayerNorm`. Use the live search +patterns in `references/reuse_map.md`; verify each path before citing it. Say +explicitly "import `X` from `physicsnemo.nn` instead of writing your own." Keep +genuinely novel, model-specific pieces local to the model. + +### 4. Scaffold the shell + +Generate from the skeletons in `references/scaffolds.md`, adapting to the +contributor's shapes; explain what each enforced piece is for. + +- **New model / layer:** subclass **`physicsnemo.Module`** (not + `torch.nn.Module` — `MOD-001`); a `ModelMetaData`; a **constructor taking + JSON-serializable config** (no splatted `**kwargs` — `MOD-010`; no + string-based class selection — `MOD-009`); a `forward` with jaxtyping on + every tensor arg (`MOD-006`), `if not torch.compiler.is_compiling():` shape + validation (`MOD-005`), and NumPy `r"""` docstrings with + `Parameters`/`Forward`/`Outputs` sections and `:math:` shapes (`MOD-003`). + Imports upward-only (`EXT-***`). +- **Wrap external:** `Module.from_torch(TheirModule, meta=...)`. **The + serialization gotcha lives here** — `physicsnemo.Module` save/from_checkpoint + requires `__init__` args to be JSON-serializable; nested `nn.Module` args + must each be converted via `Module.from_torch`. Walk them through + `references/serialization.md`, then prove it with the round-trip test below. + +### 5. Tests + +Generate the test module from `references/scaffolds.md`: class-per-public-class, +the `device` fixture, parametrized constructor/attribute checks (≥2 configs — +`MOD-008a`), `validate_forward_accuracy` for non-regression (`MOD-008b`), and +`validate_checkpoint` for the save/load round-trip (`MOD-008c`). These helpers +are **mandatory and come from `test.common`** — write the import explicitly in +the generated test and name them in your summary; don't hand-roll what they +provide: + +```python +from test.common import validate_forward_accuracy, validate_checkpoint +``` + +### 6. Gates + +From the repo root, run and iterate to green (explain each): + +``` +make lint # ruff format --check + ruff check +make interrogate # docstring coverage +make pytest # or: pytest test/ -q +``` + +`physicsnemo/experimental/` is exempt from ruff/interrogate, but **not** from +runtime contracts — the serialization round-trip test must still pass there. + +### 7. Finish & review + +- Add a one-line `CHANGELOG.md` entry and SPDX Apache-2.0 headers to new files; + remind the contributor commits need `-s` (sign-off). +- Do an independent **code-review pass over the diff** before opening the PR — + re-check it against the standards (`MOD-***`/`EXT-***`), correctness, and the + reuse audit, ideally with fresh eyes (a separate review session/agent). If the + host agent offers a built-in code-review command (for example Claude Code's + `/code-review`), use it; otherwise review the diff directly. Then open the PR + — CODEOWNERS review + CI re-run the gates. + +### 8. Definition of done + +Confirm each before declaring success; fix any miss before finishing: + +- [ ] Repo root resolved; every cited path verified to exist (no memory/guesses). +- [ ] Placed right: model → `experimental/models/` (`MOD-002a`); layer → + `nn/module/` + both `__init__` re-exports (`MOD-000a`). +- [ ] Subclasses `physicsnemo.Module` with a `ModelMetaData` (`MOD-001`); + `__init__` is JSON-serializable — no splatted `**kwargs` (`MOD-010`), no + string class selection (`MOD-009`). +- [ ] `forward` has jaxtyping on every tensor arg (`MOD-006`) + + `is_compiling()`-guarded shape validation (`MOD-005`); NumPy `r"""` docstrings + with `:math:` shapes (`MOD-003`). +- [ ] Reuse audit done — nothing reimplemented that `physicsnemo.nn` provides. +- [ ] Tests use `test.common`: `validate_forward_accuracy` (`MOD-008b`) + + `validate_checkpoint` (`MOD-008c`), ≥2 constructor configs (`MOD-008a`). +- [ ] Gates green (`make lint`, `make interrogate`, `make pytest`); + `CHANGELOG.md` entry + SPDX headers added. + +## Common gotchas + +Surface the relevant traps inline as you scaffold (full catalogue: +`references/lessons.md`): + +- **`Module` serialization** is a common external-integration failure: raw + `nn.Module` submodule args break `from_checkpoint` (`references/serialization.md`). +- The **TE-aware `LayerNorm`** runs only on CUDA when Transformer Engine is + present; tests must skip the CPU case under TE. +- **`experimental/` skips lint, not runtime contracts.** +- Promote a model-specific layer to `physicsnemo.nn` only when a **second** + consumer appears — keep it local until then. + +## Related resources + +- `references/placement.md` — artifact decision tree and where each kind goes. +- `references/reuse_map.md` — live search patterns for existing primitives. +- `references/serialization.md` — `physicsnemo.Module`, JSON args, `from_torch`. +- `references/scaffolds.md` — model / layer / external-wrap / test skeletons. +- `references/lessons.md` — gotchas distilled from real integrations. +- `CODING_STANDARDS/MODELS_IMPLEMENTATION.md`, `EXTERNAL_IMPORTS.md` — the + authoritative rules; read the cited rule before relying on it. diff --git a/skills/physicsnemo-model-builder/evals/evals.json b/skills/physicsnemo-model-builder/evals/evals.json new file mode 100644 index 0000000000..dd038669a2 --- /dev/null +++ b/skills/physicsnemo-model-builder/evals/evals.json @@ -0,0 +1,109 @@ +[ + { + "id": "add-new-model-from-scratch", + "question": "I have a new graph-transformer surrogate architecture for mesh data. How do I add it as a model in PhysicsNeMo so it follows the repository conventions?", + "expected_skill": "physicsnemo-model-builder", + "expected_script": null, + "ground_truth": "A new model is scaffolded under physicsnemo/experimental/models// (MOD-002a: new models start in experimental), as a subclass of physicsnemo.Module (not torch.nn.Module; MOD-001) carrying a ModelMetaData. Its constructor takes JSON-serializable config and builds submodules internally (no splatted **kwargs / no string-based class selection; MOD-010/MOD-009), reusing existing physicsnemo.nn primitives where possible rather than reimplementing them. The forward has jaxtyping annotations on tensor args (MOD-006), is_compiling()-guarded shape validation (MOD-005), and NumPy r-docstrings with Parameters/Forward/Outputs and :math: shapes (MOD-003). Tests mirror the source path and use test.common helpers (validate_forward_accuracy, validate_checkpoint; MOD-008). The contributor supplies the novel forward math; the skill scaffolds everything around it and runs the gates.", + "expected_behavior": [ + "Loads the physicsnemo-model-builder skill.", + "Recommends placing the new model under physicsnemo/experimental/models/ and explains why (MOD-002a).", + "Scaffolds a physicsnemo.Module subclass (not torch.nn.Module) with a ModelMetaData and a JSON-serializable constructor.", + "Performs a reuse audit of physicsnemo.nn before reimplementing primitives.", + "Proposes tests using test.common (validate_forward_accuracy / validate_checkpoint), not hand-rolled checkpoint comparisons.", + "Does not invent the model's architecture; defers the forward math to the contributor.", + "Every absolute path cited in the final message exists on disk." + ] + }, + { + "id": "wrap-external-pytorch-model", + "question": "I already have a trained PyTorch nn.Module. How do I turn it into a PhysicsNeMo model so it can be saved and loaded with from_checkpoint?", + "expected_skill": "physicsnemo-model-builder", + "expected_script": null, + "ground_truth": "An external nn.Module is integrated via Module.from_torch(TheirNet, meta=ModelMetaData()), which yields a physicsnemo.Module supporting save/from_checkpoint/registry. The hard requirement is that the wrapped class's __init__ arguments are JSON-serializable; any nested nn.Module arguments must each be converted with Module.from_torch first (a raw nn.Module argument makes save() raise TypeError). The integration must be proven with a save/load round-trip using validate_checkpoint from test.common. The skill explains this serialization contract explicitly because it is a common external-integration failure.", + "expected_behavior": [ + "Loads the physicsnemo-model-builder skill.", + "Recommends Module.from_torch as the external-integration path.", + "Explains the serialization contract: __init__ args must be JSON-serializable, and nested nn.Module args must be converted via Module.from_torch.", + "Recommends verifying with a validate_checkpoint round-trip.", + "Every absolute path cited in the final message exists on disk." + ] + }, + { + "id": "discovery-defers-to-discover-skill", + "question": "Which existing PhysicsNeMo model family should I use for forecasting on a lat-lon grid on the sphere?", + "expected_skill": "physicsnemo-discover", + "expected_script": null, + "ground_truth": "This is a discovery / routing question about which EXISTING model to use, not a request to add or integrate a new model. The physicsnemo-model-builder skill should NOT activate; physicsnemo-discover is the correct skill. The model-builder skill is scoped to authoring/porting new models and layers, not selecting among existing ones.", + "expected_behavior": [ + "Does NOT load the physicsnemo-model-builder skill.", + "Treats the request as model selection/discovery (physicsnemo-discover territory), not authoring.", + "Does not scaffold a new model or layer." + ] + }, + { + "id": "out-of-scope-datapipe", + "question": "How do I add a new datapipe for my custom HDF5 dataset in PhysicsNeMo?", + "expected_skill": null, + "expected_script": null, + "ground_truth": "Datapipes are out of scope for the physicsnemo-model-builder skill, which covers complete models, reusable layers, and wrapping external PyTorch models. A datapipe belongs under physicsnemo/datapipes/. The skill should not activate and should redirect rather than scaffold a model/layer.", + "expected_behavior": [ + "Does NOT load the physicsnemo-model-builder skill.", + "Recognizes datapipes as out of scope and redirects toward physicsnemo/datapipes/.", + "Does not scaffold a physicsnemo.Module model or layer." + ] + }, + { + "id": "add-reusable-layer", + "question": "I built a custom neighborhood-attention block (an nn.Module with learnable weights) that several of my models will reuse. How do I add it to PhysicsNeMo so other models can import it?", + "expected_skill": "physicsnemo-model-builder", + "expected_script": null, + "ground_truth": "A reusable building block is a LAYER, not a complete model. It is placed in physicsnemo/nn/module/.py and re-exported from BOTH physicsnemo/nn/module/__init__.py and physicsnemo/nn/__init__.py so users can do `from physicsnemo.nn import ` (MOD-000a); like other new code it may start under experimental (MOD-002a). It subclasses physicsnemo.Module (MOD-001), with a JSON-serializable constructor (no splatted **kwargs / no string class selection; MOD-010/MOD-009), jaxtyping on forward tensor args (MOD-006), is_compiling()-guarded shape validation (MOD-005), and NumPy r-docstrings (MOD-003). Before writing it, a reuse audit checks physicsnemo.nn for an existing equivalent (attention bases, Mlp, the TE-aware LayerNorm). Tests mirror test/nn/module/ and use test.common helpers. A single-consumer block stays local to the model dir until a second model needs it.", + "expected_behavior": [ + "Loads the physicsnemo-model-builder skill.", + "Classifies the request as a reusable layer (nn.Module building block), not a complete model.", + "Places it under physicsnemo/nn/module/ and wires re-exports from both nn/module/__init__.py and nn/__init__.py (MOD-000a).", + "Performs a reuse audit of physicsnemo.nn before reimplementing the block.", + "Scaffolds a physicsnemo.Module subclass with a JSON-serializable constructor and proposes tests under test/nn/module/ via test.common.", + "Every absolute path cited in the final message exists on disk." + ] + }, + { + "id": "wrap-external-nested-module", + "question": "I want to wrap my trained PyTorch model in PhysicsNeMo, but its __init__ takes another nn.Module (an encoder) as an argument. Will from_checkpoint work?", + "expected_skill": "physicsnemo-model-builder", + "expected_script": null, + "ground_truth": "Module.from_torch wraps the outer nn.Module, but a raw nn.Module passed as an __init__ argument is NOT JSON-serializable, so save()/from_checkpoint raises TypeError (the __init__ args are captured as JSON). The fix is to convert the nested nn.Module argument itself via Module.from_torch first so the captured args are serializable, or refactor the constructor to take JSON-serializable config that builds the submodule internally. The fix must be proven with a validate_checkpoint save/load round-trip. This nested-Module case is the canonical serialization failure the skill exists to catch.", + "expected_behavior": [ + "Loads the physicsnemo-model-builder skill.", + "Identifies the nested nn.Module __init__ argument as the serialization break (save/from_checkpoint raises TypeError).", + "Recommends converting the nested module via Module.from_torch (or a JSON-serializable config constructor) so the captured args serialize.", + "Recommends proving it with a validate_checkpoint round-trip.", + "Every absolute path cited in the final message exists on disk." + ] + }, + { + "id": "out-of-scope-loss", + "question": "How do I add a new relative-L2 loss function to PhysicsNeMo?", + "expected_skill": null, + "expected_script": null, + "ground_truth": "Losses and metrics are out of scope for the model-builder skill, which covers complete models, reusable nn.Module layers, and wrapping external PyTorch models. A loss/metric belongs under physicsnemo/metrics/ (functional metrics in metrics/general). The skill should not activate and should redirect rather than scaffold a model/layer.", + "expected_behavior": [ + "Does NOT load the physicsnemo-model-builder skill.", + "Recognizes a loss/metric as out of scope and redirects toward physicsnemo/metrics/.", + "Does not scaffold a physicsnemo.Module model or layer." + ] + }, + { + "id": "out-of-scope-functional-op", + "question": "I want to add a fast k-nearest-neighbors operator with a Warp/cuML backend to PhysicsNeMo. How do I implement it?", + "expected_skill": null, + "expected_script": null, + "ground_truth": "Functional ops and their accelerated backends (Warp/cuML/SciPy via FunctionSpec) live in physicsnemo/nn/functional/, which is out of scope for the model-builder skill (that covers models, reusable nn.Module layers, and external-wrap). The skill should not activate and should redirect to the functional-ops surface rather than scaffold a model/layer.", + "expected_behavior": [ + "Does NOT load the physicsnemo-model-builder skill.", + "Recognizes a functional op / accelerated backend as out of scope (physicsnemo/nn/functional/, FunctionSpec).", + "Does not scaffold a physicsnemo.Module model or layer." + ] + } +] diff --git a/skills/physicsnemo-model-builder/references/lessons.md b/skills/physicsnemo-model-builder/references/lessons.md new file mode 100644 index 0000000000..28aff5dce7 --- /dev/null +++ b/skills/physicsnemo-model-builder/references/lessons.md @@ -0,0 +1,58 @@ +# Common gotchas + +General traps that bite contributors adding models or layers to PhysicsNeMo. +Surface the relevant one *inline* as you scaffold — they're cheap to avoid up +front and expensive to discover in review or after a checkpoint won't load. +(These are guidance; the authoritative rules are in `CODING_STANDARDS/`.) + +## 1. `Module` serialization breaks on raw `nn.Module` constructor args + +A common external-integration failure: passing a **raw `torch.nn.Module`** as a +constructor argument makes `save()`/`from_checkpoint()` raise a `TypeError` — +even though `forward` works fine, so it's easy to miss until a user tries to +persist the model. Use constructor-from-config, or convert injected submodules +with `Module.from_torch`, and prove it with `validate_checkpoint`. Detail: +`references/serialization.md`. + +## 2. `experimental/` skips lint, not runtime contracts + +`physicsnemo/experimental/` is excluded from ruff and interrogate (incomplete +docstrings / missing tests are tolerated there), but it is **not** exempt from +behavior: imports must work, `forward` must run, and a `physicsnemo.Module` +must still serialize. "It's experimental" does not excuse a broken +`from_checkpoint`. + +## 3. Use `test.common`, don't hand-roll test machinery + +`test.common` provides `validate_forward_accuracy` (auto-managed reference +`.pth` for non-regression — `MOD-008b`) and `validate_checkpoint` (save/load +round-trip — `MOD-008c`). Hand-rolling committed golden files + ad-hoc +comparisons is non-idiomatic and brittle; the helpers are the house pattern +(see `test/nn/module/test_mlp_layers.py`). + +## 4. The TE-aware `LayerNorm` is CUDA-only when Transformer Engine is present + +If you reuse `physicsnemo/nn/module/layer_norm.py::LayerNorm` (recommended over +`torch.nn.LayerNorm` for the faster backward), note it resolves **once at +import** to Transformer Engine's LayerNorm when TE + CUDA are available, and +**TE's LayerNorm cannot run on CPU tensors**. Consequences: +- The model runs on a CPU-only box (torch fallback) and on GPU (TE), but on a + TE-enabled GPU box it cannot run on CPU. +- The `device` fixture in `test/` parametrizes **both** cpu and cuda when a GPU + is present, so tests must **skip the cpu case when TE is the active backend** + (the repo's own `test/nn/module/test_layer_norm.py` does this; + `PHYSICSNEMO_FORCE_TE=0` forces the torch path for CPU work). + +## 5. jaxtyping single-token shape strings trip `F821` + +`Float[Tensor, "n dim"]` (multi-token) is fine, but a single-token annotation +like `Int[Tensor, "n"]` makes ruff flag `F821` (undefined name). Add +`# noqa: F821` on that line — the established convention. Annotate **all** +tensor args, including optional ones, per `MOD-006`. + +## 6. Promote a layer to `physicsnemo.nn` only on the second consumer + +A layer used by exactly one model is not yet a reusable primitive — keep it +local to the model directory. Promote it to `physicsnemo/nn/module/` when a +**second** model actually needs it. Generalizing a single-consumer layer +prematurely freezes an API for one user and is its own anti-pattern. diff --git a/skills/physicsnemo-model-builder/references/placement.md b/skills/physicsnemo-model-builder/references/placement.md new file mode 100644 index 0000000000..fef119cec5 --- /dev/null +++ b/skills/physicsnemo-model-builder/references/placement.md @@ -0,0 +1,82 @@ +# Placement — what am I adding, and where does it go? + +Resolve two questions before writing anything: **what kind of artifact** is it, +and **where in the tree** does it live. Both are governed by +`CODING_STANDARDS/MODELS_IMPLEMENTATION.md` — read the cited rule. + +## Decision tree + +``` +Is it a complete, trainable architecture (composes layers; users instantiate it)? +├─ YES → MODEL +│ new code → physicsnemo/experimental/models// (MOD-002a) +│ (graduates to physicsnemo/models// later, after API review) +│ +└─ NO → Is it a reusable building block (a layer/block other models would compose)? + ├─ YES → LAYER → physicsnemo/nn/module/.py (MOD-000a) + │ re-export from nn/module/__init__.py AND nn/__init__.py + │ + └─ NO → is it just one example's helper? → examples//... (out of scope here) + +Separately: do you ALREADY have a trained PyTorch nn.Module to bring in? +└─ YES → EXTERNAL WRAP → Module.from_torch(...) (see references/serialization.md) + placed as a MODEL or LAYER per the same tree above. +``` + +## Rules that decide placement + +- **`MOD-000a` — reusable layers live in `physicsnemo/nn/module/`** and are + re-exported from `physicsnemo/nn/__init__.py` so users do + `from physicsnemo.nn import MyLayer`. A layer placed under + `physicsnemo/models/` is the anti-pattern. +- **`MOD-000b` — complete models live in `physicsnemo/models/`** (re-exported + from `physicsnemo/models/__init__.py`). +- **`MOD-002a` — new models AND new layers start in + `physicsnemo/experimental/`** (`experimental/models/`, `experimental/nn/…`). + `experimental/` means "API may change"; it is **exempt from ruff/interrogate + lint**, but **not** from runtime contracts (it must still import, run, and — + if it's a `Module` — serialize). Graduation to the stable tree requires + stability + API review. + +So in practice, a brand-new contribution almost always starts under +`experimental/`. Tell the contributor this explicitly and why (it lets the API +settle without a major-version commitment). + +## File layout for a model + +``` +physicsnemo/experimental/models// + __init__.py # re-exports the public class(es) only + .py # the architecture (Module subclass + ModelMetaData) + _utils.py # optional: model-specific helpers (keep them local) +``` + +Tests mirror the source path: +``` +test/experimental/models//test_.py +# or, for a layer: +test/nn/module/test_.py +``` + +## Model vs. layer — the litmus test + +- A **layer** has a generic tensor-in / tensor-out signature, no training + recipe, and would plausibly be reused by ≥1 other model. It does **not** own + a `ModelMetaData` describing AMP/ONNX/etc. capabilities. +- A **model** is the thing a user trains and checkpoints; it owns a + `ModelMetaData` and is the unit `Module.from_checkpoint` reconstructs. + +If unsure, start it as a layer local to your model dir; promote it to +`physicsnemo/nn/module/` only when a **second** consumer actually appears +(don't pre-generalize — see `references/lessons.md`). + +## Don't put these here (redirect) + +- A **loss or metric** → `physicsnemo/metrics/` (or + `physicsnemo/experimental/metrics/`). +- A **functional op / custom CUDA-Warp/cuML backend** → + `physicsnemo/nn/functional/` via a `FunctionSpec`. +- A **datapipe** → `physicsnemo/datapipes/`. +- A **training recipe** → `examples/`. + +These are out of scope for this skill; hand the contributor off accordingly. diff --git a/skills/physicsnemo-model-builder/references/reuse_map.md b/skills/physicsnemo-model-builder/references/reuse_map.md new file mode 100644 index 0000000000..12a326c36d --- /dev/null +++ b/skills/physicsnemo-model-builder/references/reuse_map.md @@ -0,0 +1,66 @@ +# Reuse map — find it before you build it + +The biggest lever on a clean integration is **not** writing primitives that +already exist. Before scaffolding a `forward`, audit `physicsnemo.nn` for the +pieces the architecture needs and tell the contributor what to import. + +**Discover, don't remember.** Class names and paths rot as the repo evolves, so +this file gives *search patterns*, not a frozen list. Run the searches against +the live repo every time, and **verify each path with `ls`/Read before citing +it** (a path pattern-matched from a neighbor is disproof — drop it). + +## How to audit (general loop) + +1. Name the primitives the architecture needs (e.g., "multi-head attention", + "positional embedding", "MLP", "nearest-neighbor gather", "layer norm"). +2. For each, search the two surfaces: + - **Modules (layers):** `physicsnemo/nn/module/` — `__init__.py` shows what's + exported; the files show what exists. + - **Functionals (ops):** `physicsnemo/nn/functional/` — knn, radius search, + geometry/SDF, etc. +3. Prefer importing over reimplementing. Keep only genuinely novel, + model-specific pieces local. + +```bash +# what does physicsnemo.nn export? +sed -n '1,200p' physicsnemo/nn/__init__.py +# enumerate layer modules and their public classes +ls physicsnemo/nn/module/ +grep -rnE "^class [A-Z]" physicsnemo/nn/module/*.py +# enumerate functional ops +ls physicsnemo/nn/functional/ ; grep -rnE "^def |^class " physicsnemo/nn/functional/*/*.py +``` + +## Search patterns by category + +| Need | Search | Likely already there | +|---|---|---| +| Attention (mesh/point/grid) | `ls physicsnemo/nn/module/*attention*.py` ; `grep -rn "class .*Attention" physicsnemo/nn/module/` | physics-attention base + subclasses; Earth/UNet attention | +| MLP / fully-connected | `grep -rn "class Mlp\|class .*FCLayer\|FullyConnected" physicsnemo/nn/module/` | `Mlp` (configurable, TE-aware), FC layers | +| Positional / Fourier embeddings | `grep -rn "class .*Embedding\|fourier" physicsnemo/nn/module/embedding_layers.py physicsnemo/nn/module/fourier_layers.py` | Fourier / sinusoidal / positional embeddings | +| Normalization | `grep -rn "LayerNorm\|GroupNorm\|RunningNorm" physicsnemo/nn/module/` | **TE-aware `LayerNorm`** (`physicsnemo/nn/module/layer_norm.py`) — use this, not `torch.nn.LayerNorm`, to get Transformer-Engine acceleration | +| SIREN / activations | `grep -rn "class Siren\|get_activation\|ACT2FN" physicsnemo/nn/module/` | `SirenLayer`, activation registry | +| Nearest neighbors / radius | `grep -rn "def knn\|radius_search\|class KNN" physicsnemo/nn/functional/neighbors/` | `knn`, `radius_search` (multi-backend: torch/scipy/cuML) | +| Geometry / SDF / sampling | `ls physicsnemo/nn/functional/geometry/` | SDF ops, point sampling | +| Pooling | `grep -rn "Pooling" physicsnemo/nn/module/pooling.py` | mean / attention pooling | + +## Important specifics + +- **Normalization:** prefer `physicsnemo.nn.module.layer_norm.LayerNorm` (the + TE-aware one) over `torch.nn.LayerNorm`. It transparently uses Transformer + Engine on CUDA (faster backward) and falls back to torch on CPU — but see the + CPU/TE caveat in `references/lessons.md`. +- **Neighbor ops are functionals with backends:** `physicsnemo.nn.functional.knn` + auto-dispatches (cuML on CUDA, scipy/torch otherwise). Don't hand-roll a + distance-matrix kNN. +- **Don't reach across modules the wrong way (`EXT-***`):** imports go + upward-only, `core → nn → models`. A layer in `nn/` must **not** import from + `models/` or `experimental/models/`. Read `EXTERNAL_IMPORTS.md`. + +## When NOT to reuse + +If a primitive is genuinely novel to this architecture (a bespoke attention +variant, a custom tokenizer), keep it **local to the model directory**. Promote +it to `physicsnemo/nn/module/` only when a **second** model wants it — premature +generalization of a single-consumer layer is its own anti-pattern +(`references/lessons.md`). diff --git a/skills/physicsnemo-model-builder/references/scaffolds.md b/skills/physicsnemo-model-builder/references/scaffolds.md new file mode 100644 index 0000000000..c1d9745880 --- /dev/null +++ b/skills/physicsnemo-model-builder/references/scaffolds.md @@ -0,0 +1,233 @@ +# Scaffolds + +Starting skeletons to adapt to the contributor's architecture and shapes. These +encode the standards (`MOD-***`) so the contributor can't miss them. **Verify +the exact import paths against the live repo** before emitting (e.g. +`grep -rn "class ModelMetaData" physicsnemo/core/`), and read a sibling under +`physicsnemo/models/` for current conventions — these skeletons are a starting +point, not a frozen API. + +Every new file starts with the SPDX Apache-2.0 header (copy it from any +existing source file). + +--- + +## A. New model — `physicsnemo/experimental/models//.py` + +```python +# +""".""" + +from __future__ import annotations + +from dataclasses import dataclass + +import torch +import torch.nn as nn +from jaxtyping import Float + +from physicsnemo import Module +from physicsnemo.core import ModelMetaData +# reuse, don't reinvent (see references/reuse_map.md): +# from physicsnemo.nn import Mlp +# from physicsnemo.nn.module.layer_norm import LayerNorm + + +@dataclass +class MyModelMetaData(ModelMetaData): + name: str = "MyModel" + # Capability flags — default conservative; flip on only once verified. + amp: bool = True + cuda_graphs: bool = False + onnx_cpu: bool = False + onnx_gpu: bool = False + auto_grad: bool = False + + +class MyModel(Module): + r""". + + + + Parameters + ---------- + in_channels : int + Number of input channels. + hidden_dim : int + Width of the hidden representation. + out_channels : int + Number of output channels. + depth : int, optional + Number of blocks. Default ``4``. + + Forward + ------- + x : torch.Tensor + Input of shape :math:`(B, N, C_{in})`. + + Outputs + ------- + torch.Tensor + Output of shape :math:`(B, N, C_{out})`. + """ + + def __init__( + self, + *, # keyword-only: explicit, serialization-friendly + in_channels: int, + hidden_dim: int, + out_channels: int, + depth: int = 4, + ): + super().__init__(meta=MyModelMetaData()) + # JSON-serializable args only; build submodules HERE, don't accept raw + # nn.Modules (see references/serialization.md). + self.in_channels = int(in_channels) + self.proj_in = nn.Linear(in_channels, hidden_dim) + self.blocks = nn.ModuleList( + _MyBlock(hidden_dim) for _ in range(int(depth)) # <- the contributor's math + ) + self.proj_out = nn.Linear(hidden_dim, out_channels) + + def forward( + self, x: Float[torch.Tensor, "b n c_in"] + ) -> Float[torch.Tensor, "b n c_out"]: + # MOD-005: validate at the API boundary, skipped under torch.compile. + if not torch.compiler.is_compiling(): + if x.ndim != 3 or x.shape[-1] != self.in_channels: + raise ValueError( + f"Expected x of shape (B, N, {self.in_channels}), got " + f"tensor of shape {tuple(x.shape)}" + ) + h = self.proj_in(x) + for block in self.blocks: + h = block(h) + return self.proj_out(h) +``` + +`__init__.py` (re-export only the public class): +```python +# +from .my_model import MyModel +``` + +--- + +## B. New reusable layer — `physicsnemo/nn/module/.py` + +Same shape as the model but subclass `Module` with no `ModelMetaData` (confirm +against a sibling layer — some layers do pass a meta), and **wire the exports**: + +```python +# physicsnemo/nn/module/__init__.py (alphabetical insertion) +from .my_layer import MyLayer + +# physicsnemo/nn/__init__.py +from .module.my_layer import MyLayer +``` + +So users get `from physicsnemo.nn import MyLayer` (`MOD-000a`). Imports inside +the layer must be upward-only — no importing from `physicsnemo/models/` +(`EXT-***`). + +--- + +## C. Wrap an existing PyTorch model — `Module.from_torch` + +```python +# +from physicsnemo import Module +from physicsnemo.core import ModelMetaData +from their_package import TheirNet # untouched external architecture + +# TheirNet.__init__ args MUST be JSON-serializable. If TheirNet injects nested +# nn.Modules, convert each with Module.from_torch first. +PhysicsNeMoNet = Module.from_torch(TheirNet, meta=ModelMetaData()) +``` + +See `references/serialization.md` for the nested-submodule case and the +round-trip verification. + +--- + +## D. Tests — `test/.../test_.py` + +Mirror the source path. Class-per-public-class, `device` fixture, the +`test.common` helpers — don't hand-roll. + +```python +# +import pytest +import torch + +from physicsnemo.experimental.models. import MyModel +from test.common import validate_checkpoint, validate_forward_accuracy + + +def _model(in_channels=8, hidden_dim=16, out_channels=4, depth=2): + return MyModel( + in_channels=in_channels, hidden_dim=hidden_dim, + out_channels=out_channels, depth=depth, + ) + + +class TestMyModel: + @pytest.mark.parametrize("in_channels, out_channels", [(8, 4), (16, 8)]) + def test_output_shape(self, device, in_channels, out_channels): + model = _model(in_channels=in_channels, out_channels=out_channels) + model = model.to(device).eval() + x = torch.randn(2, 10, in_channels, device=device) + out = model(x) + assert out.shape == (2, 10, out_channels) + assert torch.isfinite(out).all() + + def test_gradient_flow(self, device): + model = _model().to(device).train() + x = torch.randn(2, 10, 8, device=device, requires_grad=True) + model(x).sum().backward() + assert x.grad is not None and torch.isfinite(x.grad).all() + + def test_invalid_input(self, device): + model = _model(in_channels=8).to(device).eval() + with pytest.raises(ValueError): + model(torch.randn(2, 10, 7, device=device)) # wrong in_channels + + @pytest.mark.parametrize( + "kwargs, expected", + [ + (dict(in_channels=8, hidden_dim=16, out_channels=4, depth=2), + dict(in_channels=8)), + (dict(in_channels=16, hidden_dim=32, out_channels=8, depth=3), + dict(in_channels=16)), + ], + ) + def test_constructor_attributes(self, kwargs, expected): # MOD-008a + model = MyModel(**kwargs) + for name, value in expected.items(): + assert getattr(model, name) == value + + def test_forward_accuracy(self, device): # MOD-008b + torch.manual_seed(0) + model = _model().to(device).eval() + x = torch.randn(2, 10, 8, device=device) + assert validate_forward_accuracy( + model, (x,), + file_name="experimental/models//data/my_model_output.pth", + rtol=1e-3, atol=1e-3, + ) + + def test_checkpoint(self, device): # MOD-008c + torch.manual_seed(0) + x = torch.randn(2, 10, 8, device=device) + assert validate_checkpoint( + _model().to(device), _model().to(device), (x,) + ) +``` + +Notes: +- `validate_forward_accuracy` auto-creates the reference `.pth` on first run + (and errors); run again to pass, then commit the reference file. +- If the model uses the TE-aware `LayerNorm`, add the skip-cpu-under-TE guard + (see `references/lessons.md` §2). +- Annotate optional tensor args with jaxtyping too; single-token shapes need + `# noqa: F821` (`lessons.md` §5). diff --git a/skills/physicsnemo-model-builder/references/serialization.md b/skills/physicsnemo-model-builder/references/serialization.md new file mode 100644 index 0000000000..d625528a48 --- /dev/null +++ b/skills/physicsnemo-model-builder/references/serialization.md @@ -0,0 +1,104 @@ +# Serialization — the `physicsnemo.Module` contract (and a common trap) + +This is a common thing external contributors get wrong. Read it before +scaffolding any `__init__`. + +## Why `physicsnemo.Module`, not `torch.nn.Module` + +`MOD-001` requires model/layer classes to subclass **`physicsnemo.Module`** +(itself a subclass of `torch.nn.Module`). The payoff is `Module.save(...)` / +`physicsnemo.Module.from_checkpoint(...)` / `from_pretrained(...)` and the model +registry — the public way users load models. A plain `torch.nn.Module` gets +none of it. + +## The contract + +`physicsnemo.Module` captures the `__init__` arguments at construction and +serializes them as JSON (`args.json` inside the `.mdlus` archive). Therefore: + +> **Every `__init__` argument must be JSON-serializable** — ints, floats, +> strings, bools, lists/dicts of those, or `None`. The **only** exception is an +> argument that is itself a `physicsnemo.Module` instance (those are recursed +> into and serialized). + +Read the exact rule and the mechanism in `physicsnemo/core/module.py` (the +class docstring + the `save` / `_save_process` path) — don't trust this summary +over the source. + +### The trap + +Passing a **raw `torch.nn.Module`** as a constructor argument (a common +"inject my submodule" pattern) makes `save()` raise: + +``` +TypeError: Submodule ... is a PyTorch module, which is not supported by +'Module.save'. Please first convert it ... using 'Module.from_torch'. +``` + +The model trains and runs `forward` fine, so this is easy to miss — it only +fails at `save()` / `from_checkpoint()`, i.e. the moment a user tries to +persist it. + +## The two correct constructor patterns + +**(A) Constructor-from-config (preferred for new models).** Take +JSON-serializable primitives / nested dicts, and build submodules *internally*: + +```python +class MyModel(physicsnemo.Module): + def __init__(self, *, in_dim: int, hidden_dim: int, depth: int = 4): + super().__init__(meta=MyModelMetaData()) + self.encoder = Encoder(in_dim, hidden_dim) # built here, not passed in + self.blocks = nn.ModuleList(EncoderBlock(hidden_dim) for _ in range(depth)) +``` + +This is also why `MOD-009` (no string-based class selection) and `MOD-010` +(no splatted `**kwargs`) exist — keep the config explicit and serializable. + +**(B) Dependency injection — only with `physicsnemo.Module` submodules.** If +the API genuinely needs to accept submodules, they (and every nested submodule) +must be `physicsnemo.Module`, converted from torch via `Module.from_torch`: + +```python +from physicsnemo import Module +PNMEncoder = Module.from_torch(TorchEncoder, meta=ModelMetaData()) +model = MyModel(encoder=PNMEncoder(in_dim=64)) # serializable +``` + +## Wrapping an existing external model + +For "I already have a `nn.Module`, make it a physicsnemo model": + +```python +from physicsnemo import Module +from physicsnemo.core import ModelMetaData +from my_pkg import MyTorchNet # their architecture, untouched + +PNMNet = Module.from_torch(MyTorchNet, meta=ModelMetaData()) +# Now PNMNet(...) is a physicsnemo.Module: save/from_checkpoint/registry work, +# AS LONG AS MyTorchNet.__init__ args are JSON-serializable (or are themselves +# converted physicsnemo.Modules). If MyTorchNet injects nested nn.Modules, +# convert each of them with Module.from_torch first. +``` + +## Always verify with a round-trip test + +Don't assume — prove it. Use `validate_checkpoint` from `test.common` (save +model_1, load into model_2, assert forward outputs match): + +```python +from test.common import validate_checkpoint +assert validate_checkpoint(model_1, model_2, (x, ...)) +``` + +If this passes, the serialization contract holds. If construction args aren't +JSON-serializable, it fails here — fix the constructor (pattern A or B), don't +loosen the test. + +## `ModelMetaData` + +A model declares capabilities via a `ModelMetaData` dataclass passed to +`super().__init__(meta=...)` — `amp`, `cuda_graphs`, `onnx_*`, `auto_grad`, +etc. Default everything conservatively (`False`) and only flip a flag on once +you've verified that capability. Layers in `nn/module/` typically don't need a +meta (call `super().__init__()`); confirm against a sibling layer. diff --git a/skills/physicsnemo-model-builder/skill-card.md b/skills/physicsnemo-model-builder/skill-card.md new file mode 100644 index 0000000000..b6d9909b37 --- /dev/null +++ b/skills/physicsnemo-model-builder/skill-card.md @@ -0,0 +1,82 @@ +## Description:
+Official NVIDIA-authored workflow for adding a new model or reusable layer to PhysicsNeMo, or integrating an existing PyTorch model. Scaffolds a standards-compliant `physicsnemo.Module` (or a `Module.from_torch` wrapper), places it correctly, wires exports, writes tests, and runs the local CI gates.
+ +This skill is ready for commercial/non-commercial use.
+ +## Owner +NVIDIA
+ +### License/Terms of Use:
+Apache-2.0
+## Use Case:
+Contributors and researchers adding a new model or reusable layer to the PhysicsNeMo package, or porting an existing PyTorch `nn.Module` into PhysicsNeMo so it follows the repository's model-implementation standards (placement, serialization, docstrings, typing, validation, tests) and passes CI.
+ +### Deployment Geography for Use:
+Global
+ +## Known Risks and Mitigations:
+Risk: The skill scaffolds and edits source files; generated code could be incorrect, incomplete, or place files in the wrong location if the live repository structure differs from assumptions.
+Mitigation: The skill verifies paths against the live repo before citing them, runs the CI gates (ruff, interrogate, pytest) and an independent code-review pass over the diff before completion, and defers the model's novel architecture to the human. Review the diff and the CI result before merging.
+ +## Reference(s):
+- [placement.md](references/placement.md)
+- [reuse_map.md](references/reuse_map.md)
+- [serialization.md](references/serialization.md)
+- [scaffolds.md](references/scaffolds.md)
+- [lessons.md](references/lessons.md)
+- [PhysicsNeMo GitHub Repository](https://github.com/NVIDIA/physicsnemo)
+ + +## Skill Output:
+**Output Type(s):** [Code scaffolding, File edits, Analysis]
+**Output Format:** [Python, Markdown]
+**Output Parameters:** [N/A]
+**Other Properties Related to Output:** [Generated files are standards-compliant skeletons completed by the contributor; the skill does not author the model's novel architecture.]
+ +## Evaluation Agents Used:
+- Claude Code (`claude-code`)
+- Codex (`codex`)
+ + + +## Evaluation Tasks:
+Evaluated against 4 internal evaluation tasks (2 positive skill-activation, 2 negative) with 2 attempts per task via NVSkills-Eval.
+ +## Evaluation Metrics Used:
+Reported benchmark dimensions:
+- Security: Checks whether skill-assisted execution avoids unsafe behavior such as secret leakage, destructive commands, or unauthorized access.
+- Correctness: Checks whether the agent follows the expected workflow and produces the correct final output.
+- Discoverability: Checks whether the agent loads the skill when relevant and avoids using it when irrelevant.
+- Effectiveness: Checks whether the agent performs measurably better with the skill than without it.
+- Efficiency: Checks whether the agent uses fewer tokens and avoids redundant work.
+ +Underlying evaluation signals used in this run:
+- `security`: Checks for unsafe operations, secret leakage, and unauthorized access.
+- `skill_execution`: Verifies that the agent loaded the expected skill and workflow.
+- `skill_efficiency`: Checks routing quality, decoy avoidance, and redundant tool usage.
+- `accuracy`: Grades final-answer correctness against the reference answer.
+- `goal_accuracy`: Checks whether the overall user task completed successfully.
+- `behavior_check`: Verifies expected behavior steps, including safety expectations.
+- `token_efficiency`: Compares token usage with and without the skill.
+ + + +## Evaluation Results:
+_Pending — populated by NVSkills-Eval prior to publication (see `BENCHMARK.md`)._
+ +| Dimension | Num | `claude-code` | `codex` | +|---|---:|---:|---:| +| Security | — | — | — | +| Correctness | — | — | — | +| Discoverability | — | — | — | +| Effectiveness | — | — | — | +| Efficiency | — | — | — | + +## Skill Version(s):
+0.1.0 (source: pyproject.toml)
+ +## Ethical Considerations:
+NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal team to ensure this skill meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
+ +(For Release on NVIDIA Platforms Only)
+Please report quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://app.intigriti.com/programs/nvidia/nvidiavdp/detail).