Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
coordinates with no learnable parameters.
- Adds radiation transport example (`examples/nuclear_engineering/radiation_transport`)
- Adds agent skills structure, and initial skill for 'discoverability'.
- Adds the `physicsnemo-functional-builder` agent skill: a standalone workflow
for adding a new `physicsnemo.nn.functional` op (or a Warp/cuML/SciPy backend
for an existing op) via `FunctionSpec`, with cross-backend equivalence tests.
- Adds xDeepONet to experimental models
(`physicsnemo.experimental.models.xdeeponet.DeepONet`). A single
dimension-generic (2D/3D) DeepONet that accepts a spatial or MLP branch,
Expand Down
87 changes: 87 additions & 0 deletions skills/physicsnemo-functional-builder/BENCHMARK.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Evaluation Report

Evaluation of the `physicsnemo-functional-builder` skill before publication through NVSkills-Eval.

This benchmark summarizes 3-Tier Evaluation from NVSkills-Eval results for the skill. The goal is to document whether the skill is safe, discoverable, effective, and useful for agents before it is published for broader workflow use.

> **Status: pending.** The results, Tier-1/Tier-2 findings, and verdict below are
> populated by an NVSkills-Eval run prior to publication. The evaluation dataset
> (`evals/evals.json`) and target agents are committed; run the harness and
> refresh this file before publishing.

## Evaluation Summary

- Skill: `physicsnemo-functional-builder`
- Evaluation date: _pending_
- NVSkills-Eval profile: `external`
- Environment: `local`
- Dataset: 4 evaluation tasks (`evals/evals.json`)
- Attempts per task: 2
- Pass threshold: 50%
- Overall verdict: _pending_

## Agents Used

- `claude-code`
- `codex`

## Metrics Used

Reported benchmark dimensions:

- Security: checks whether skill-assisted execution avoids unsafe behavior such as secret leakage, destructive commands, or unauthorized access.
- Correctness: checks whether the agent follows the expected workflow and produces the correct final output.
- Discoverability: checks whether the agent loads the skill when relevant and avoids using it when irrelevant.
- Effectiveness: checks whether the agent performs measurably better with the skill than without it.
- Efficiency: checks whether the agent uses fewer tokens and avoids redundant work.

Underlying evaluation signals used in this run:

- `security` (Security): checks for unsafe operations, secret leakage, and unauthorized access.
- `skill_execution` (Skill Execution): verifies that the agent loaded the expected skill and workflow.
- `skill_efficiency` (Efficiency): checks routing quality, decoy avoidance, and redundant tool usage.
- `accuracy` (Accuracy): grades final-answer correctness against the reference answer.
- `goal_accuracy` (Goal Accuracy): checks whether the overall user task completed successfully.
- `behavior_check` (Behavior Check): verifies expected behavior steps, including safety expectations.
- `token_efficiency` (Token Efficiency): compares token usage with and without the skill.

## Test Tasks

The benchmark dataset contained 4 evaluation tasks:

- Positive tasks: 2 tasks where the skill was expected to activate (add a new functional op with a Warp backend; add an optional cuML/SciPy backend to an existing op).
- Negative tasks: 2 tasks where the functional-builder skill was not expected (a reusable-layer/model request that belongs to `physicsnemo-model-builder`; an out-of-scope request such as a datapipe or a "which op should I use" usage question).
- Unlabeled tasks: 0.

Entries with `expected_skill` set are treated as positive skill-activation cases; entries with `expected_skill: null` are treated as negative activation cases.

## Results

_Pending NVSkills-Eval run._

| Dimension | Num | `claude-code` | `codex` |
|---|---:|---:|---:|
| Security | — | — | — |
| Correctness | — | — | — |
| Discoverability | — | — | — |
| Effectiveness | — | — | — |
| Efficiency | — | — | — |

Score values show skill-assisted performance. Values in parentheses show uplift versus the no-skill baseline when baseline data is available.

## Tier 1: Static Validation Summary

_Pending NVSkills-Eval run._

## Tier 2: Deduplication Summary

_Pending NVSkills-Eval run._ Note: this skill is intentionally distinct from
`physicsnemo-model-builder` (authoring `nn.functional` ops/backends vs. models and
`nn.Module` layers); the negative eval tasks guard that routing boundary.

## Publication Recommendation

_Pending NVSkills-Eval run._ Refresh this file with the harness output (results
table, Tier-1/Tier-2 findings, verdict) before publishing, and keep it with the
skill; re-run when the evaluation dataset, skill behavior, or target agents
materially change.
205 changes: 205 additions & 0 deletions skills/physicsnemo-functional-builder/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
---
name: physicsnemo-functional-builder
description: Official NVIDIA-authored workflow for adding a new functional op (or a new optimized backend for an existing op) to physicsnemo.nn.functional. Scaffolds a FunctionSpec with multi-backend dispatch (a torch reference plus optional Warp, cuML, or SciPy backends), wires re-exports, writes cross-backend equivalence tests, and runs the local CI gates (ruff, interrogate, pytest). Use when a contributor wants to add an op or a Warp/cuML/SciPy backend to physicsnemo.nn.functional. Do NOT use for complete models or reusable nn.Module layers (use physicsnemo-model-builder), datapipes, losses or metrics, training-recipe or example authoring, environment setup, or deciding which existing op to use.
license: Apache-2.0
metadata:
author: NVIDIA <agent-skills@nvidia.com>
tags:
- physicsnemo
- functional
- kernels
- contributing
- scaffolding
---

# PhysicsNeMo Functional Builder

Drive a contributor from "I have an op (a kNN, an SDF query, a sampler, an
interpolation, a geometry kernel)" — or "I have a faster backend for an
existing op" — to a standards-compliant, tested, CI-green addition to
`physicsnemo.nn.functional`. **You do the mechanical work**: the per-op package
layout, the `FunctionSpec` shell, backend registration and dispatch, the
`torch.library.custom_op` wrapping for accelerated backends, re-exports,
cross-backend tests, and the gates. **The contributor brings the math** — the
actual algorithm and any hand-written Warp/cuML kernel. Keep that division
explicit: never invent their algorithm; scaffold everything around it.

The audience is a researcher fluent in PyTorch but new to PhysicsNeMo, so
**explain the "why"** at each step (name the rule, give the reason) rather than
silently emitting files.

This skill is standalone — it does not depend on any other PhysicsNeMo skill.

## Core principle

1. **`physicsnemo/core/function_spec.py` and `CODING_STANDARDS/` are ground
truth — read them, don't paraphrase from memory.** The dispatch/registration
machinery lives in `FunctionSpec` (`physicsnemo/core/function_spec.py`); the
tensor-annotation / docstring / import rules live in `CODING_STANDARDS/`
(`MOD-***`, `EXT-***`). Open the real class and the cited rule before relying
on them, and reference them by name when you justify a decision. The exact
`FunctionSpec` method names (`register`, `dispatch`, `make_function`,
`make_inputs_forward`, `compare_forward`, `warp_launch_context`) may evolve —
confirm against the source.
2. **Study a live exemplar before scaffolding.** The house pattern is consistent
and best learned by reading one op end-to-end:
`physicsnemo/nn/functional/geometry/farthest_point_sampling/` (Warp + torch)
and `physicsnemo/nn/functional/neighbors/knn/` (cuML + SciPy + torch). Mirror
their structure for the new op.
3. **Verify every path before you cite it.** Glob/Read the live repo; a path
recalled from memory or pattern-matched from a neighbor is disproof — drop it.

## Scope

In scope: a **new functional op** in `physicsnemo/nn/functional/<category>/`, and
**adding a backend** (Warp, cuML, SciPy) to an existing op. Both center on
`FunctionSpec`.

Out of scope — stop and redirect: complete models / reusable `nn.Module` layers
(→ `physicsnemo-model-builder`), datapipes (`physicsnemo/datapipes/`), losses &
metrics (`physicsnemo/metrics/`), training recipes (`examples/`), and "which op
should I use" (a usage question, not an authoring one).

## Key facts that differ from models/layers

State these early — contributors coming from the model side get them wrong:

- **Functionals live in the STABLE tree**, `physicsnemo/nn/functional/...` —
**not** `experimental/`. (Contrast `MOD-002a`, which sends new *models/layers*
to `experimental/`. There is no `experimental/nn/functional`.) See
`references/placement.md`.
- **There is no `Module`, no parameters, no serialization, no `ModelMetaData`,
no checkpoint round-trip.** A functional is a stateless op. The testing story
is **cross-backend equivalence**, not `validate_checkpoint`.
- **Accelerated backends must be wrapped in `torch.library.custom_op`** (plus a
`register_fake`), so they compose with `torch.compile`/autograd — even an
inline Warp kernel.

## Workflow

Run in order. Confirm the consequential choices (new-op vs new-backend, category,
which backends); scaffold the rest.

### 1. Intake & classify

Ask only what you can't infer (≤4 questions). Resolve:

- **Task:** a brand-new op, or a new backend for an existing op?
- **Identity:** op name (snake_case) and `FunctionSpec` class (PascalCase); the
signature (inputs/outputs + tensor shapes via jaxtyping); the category
(`geometry`, `neighbors`, `interpolation`, …).
- **Backends:** which ones to provide. **Always a torch reference (`baseline`)**;
then optionally Warp (CUDA kernels), cuML (CUDA, optional dep), SciPy (CPU,
optional dep). (`references/backends.md`.)

### 2. Place it (and say why)

Per-op package under `physicsnemo/nn/functional/<category>/<op_name>/`:
`<op_name>.py` (the `FunctionSpec` subclass + `dispatch`), `_torch_impl.py`,
`_warp_impl.py` + `kernels.py` (if Warp), `_cuml_impl.py` / `_scipy_impl.py` (if
those deps), `utils.py` (shared validation), `__init__.py`. Re-export up the
chain: op `__init__` → category `__init__` → `physicsnemo/nn/functional/__init__.py`.
Say the rule: **stable tree, not experimental** (`references/placement.md`).

### 3. Scaffold the `FunctionSpec` + dispatch

From `references/dispatch.md` and the skeletons in `references/scaffolds.md`:

- Subclass `FunctionSpec`; register each backend with
`@FunctionSpec.register(name=..., required_imports=(...), rank=..., baseline=...)`
(lower `rank` = preferred; **exactly one** `baseline=True`, the torch ref).
- Implement `dispatch()` for backend selection: explicit `implementation=`
override → availability check; else auto-select (fast backend on CUDA, CPU
backend on CPU) with a one-time fallback warning.
- Expose the public op via `OpClass.make_function("op_name")`.
- Add `make_inputs_forward` (benchmark inputs) and `compare_forward`
(tie-aware equivalence) hooks.
- jaxtyping on every tensor arg (`MOD-006`); NumPy `r"""` docstrings
(`Parameters`/`Returns`/`Raises`); shape validation in `utils.py` guarded by
`if not torch.compiler.is_compiling():` (`MOD-005`); upward-only imports
(`EXT-***`).

### 4. Backends (torch always; Warp/cuML/SciPy as chosen)

From `references/backends.md`:

- **torch** reference impl in `_torch_impl.py` — the `baseline`, always present,
device-agnostic, the equivalence oracle.
- **Warp:** pure `@wp.kernel`s in `kernels.py` (no torch import); `_warp_impl.py`
wraps the launch in `@torch.library.custom_op(...)` + `@<op>.register_fake`,
converts with `wp.from_torch(..., return_ctype=True)`, and uses
`FunctionSpec.warp_launch_context(tensor)` for device/stream. Warp kernels are
typically CUDA-only — raise clearly on CPU.
- **cuML / SciPy:** gate the whole impl on
`check_version_spec(pkg, ver, hard_fail=False)`; inside, wrap with
`torch.library.custom_op` + `register_fake`, move data zero-copy via DLPack
(cuML) or numpy (SciPy), and **check `tensor.device.type` inside the impl**.
When the dep is missing, register a stub that raises a clear `ImportError`.

### 5. Cross-backend tests

From `references/testing.md` (mirror source path under `test/nn/functional/...`):

- A **known-answer** test on a deterministic input (backend-independent truth).
- **Per-backend** parametrization with `pytest.skip` for device (Warp/cuML are
CUDA-only) and for missing optional deps (`check_version_spec`).
- A **backend-parity** test via `OpClass.compare_forward(...)` — and remember the
classic trap: for neighbor ops compare **distances, not indices** (equal-distance
ties order differently across backends); sort before comparing.
- `torch.library.opcheck(...)` for each `custom_op` backend.

### 6. Gates

From the repo root, run and iterate to green (explain each):

```
make lint # ruff format --check + ruff check
make interrogate # docstring coverage
make pytest # or: pytest test/nn/functional/<category>/... -q
```

Unlike `experimental/`, `nn/functional/` is **not** lint/interrogate-exempt —
the new op must pass ruff and docstring coverage.

### 7. Finish & review

- Add a one-line `CHANGELOG.md` entry and SPDX Apache-2.0 headers to new files;
remind the contributor commits need `-s` (sign-off).
- Do an independent **code-review pass over the diff** before opening the PR —
re-check it against `FunctionSpec`, the standards (`MOD-***`/`EXT-***`),
correctness, and backend parity, ideally with fresh eyes (a separate review
session/agent). If the host agent offers a built-in code-review command (for
example Claude Code's `/code-review`), use it; otherwise review the diff
directly. Then open the PR — CODEOWNERS review + CI re-run the gates.

## Common gotchas

Surface the relevant traps inline as you scaffold (full catalogue:
`references/lessons.md`):

- **Stable tree, not `experimental/`** — the opposite of the models/layers rule.
- **Backend ties differ:** compare neighbor outputs by *distances* (sorted), not
indices; use `compare_forward` to encode the tie-aware comparison.
- **Device checks belong inside the impl**, not only in `dispatch` (e.g. cuML
raises on CPU tensors, Warp on CPU tensors).
- **`custom_op` + `register_fake` are mandatory** for Warp/cuML backends, or
`torch.compile`/`opcheck` break.
- **Optional deps are gated by `check_version_spec(..., hard_fail=False)`** with a
stub `ImportError` fallback — never a bare top-level `import cuml`.
- **Exactly one `baseline=True`** (the torch reference); it's the benchmark and
equivalence oracle.

## Related resources

- `references/placement.md` — where functionals go (stable tree), per-op package
layout, re-exports, and what's *not* a functional.
- `references/dispatch.md` — `FunctionSpec`: registration, `dispatch`,
`make_function`, benchmark/compare hooks.
- `references/backends.md` — Warp (kernels + `custom_op` wrap), cuML & SciPy
(optional-dep gating, DLPack), and the torch reference.
- `references/testing.md` — cross-backend equivalence, device/dep skips, `opcheck`.
- `references/scaffolds.md` — copy-paste skeletons for the package, each backend,
and the test module.
- `references/lessons.md` — gotchas distilled from real functional PRs.
- `physicsnemo/core/function_spec.py`, `CODING_STANDARDS/` — the authoritative
source; read before relying on them.
Loading
Loading