Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
192 commits
Select commit Hold shift + click to select a range
147ed94
Refactor shared skill guidance
chesterxgchen Jun 16, 2026
6013255
Add Lightning routing to agent inspect
chesterxgchen Jun 16, 2026
0301207
Fix Lightning patch detection in agent inspect
chesterxgchen Jun 16, 2026
5d7a11c
Detect from-import lightning module alias in agent inspect
chesterxgchen Jun 16, 2026
5bbd00a
Add nvflare-convert-lightning skill and enable Lightning routing
chesterxgchen Jun 16, 2026
71a92a5
Fix Lightning routing misclassification and logger import path
chesterxgchen Jun 16, 2026
2ecb40c
Remove benchmark notes from skill artifacts
chesterxgchen Jun 16, 2026
8bd3131
Promote Lightning on active use regardless of torch import count
chesterxgchen Jun 16, 2026
45b416f
Gate Lightning promotion on entry-point location, not directory-wide …
chesterxgchen Jun 16, 2026
1b2666e
Fix mixed PyTorch Lightning routing
chesterxgchen Jun 16, 2026
502902a
Make PyTorch/Lightning reorder preserve unrelated framework order
chesterxgchen Jun 16, 2026
ea8b427
Simplify PyTorch/Lightning routing to the trigger-contract rule
chesterxgchen Jun 16, 2026
f881d5b
Scope Lightning-over-PyTorch preference to the PyTorch family
chesterxgchen Jun 16, 2026
345da71
Add Milestone 8 conversion checkpoint
chesterxgchen Jun 16, 2026
38a627b
Revert "Add Milestone 8 conversion checkpoint"
chesterxgchen Jun 16, 2026
7a87e01
Clarify NVFLARE skill source-of-truth guidance
chesterxgchen Jun 27, 2026
9c5b9aa
Clarify Lightning local loss policy guidance
chesterxgchen Jun 27, 2026
946d3de
Deduplicate Lightning loss policy guidance
chesterxgchen Jun 27, 2026
3fa792c
Clarify runtime output guidance ownership
chesterxgchen Jun 28, 2026
f40dbbf
Ignore local tmp artifacts
chesterxgchen Jun 28, 2026
53d1666
Document inspect framework routing order
chesterxgchen Jun 28, 2026
07daf77
Remove unused Lightning receive assignment
chesterxgchen Jun 28, 2026
fc07818
Clarify Lightning wrapper routing boundary
chesterxgchen Jun 28, 2026
d9e5f68
Clarify Lightning validation evidence
chesterxgchen Jun 28, 2026
3a87093
Add inspector skill recommendation coverage
chesterxgchen Jun 28, 2026
6e6191b
Decouple skill lint from design docs
chesterxgchen Jun 28, 2026
f3e0516
Handle Lightning patched trainer conversion state
chesterxgchen Jun 28, 2026
c365f59
Add Lightning patch alias inspect regression
chesterxgchen Jun 28, 2026
3cb29f3
Harden Lightning patch inspection
chesterxgchen Jun 28, 2026
80656ef
Decouple skill lint from design docs
chesterxgchen Jun 28, 2026
47cc01c
Handle Lightning patch submodule imports
chesterxgchen Jun 28, 2026
17db96f
Refine agent Lightning routing
chesterxgchen Jun 28, 2026
d2dddee
Raise on empty Lightning eval batches
chesterxgchen Jun 28, 2026
6ab2b1b
Prefer active Lightning evidence over torch imports
chesterxgchen Jun 28, 2026
dac32b2
Fix mixed Lightning PyTorch inspection routing
chesterxgchen Jun 28, 2026
b4c1063
Guard Lightning promotion for PyTorch entrypoints
chesterxgchen Jun 28, 2026
4c8a7fa
Preserve PyTorch rank ahead of incidental Lightning
chesterxgchen Jun 28, 2026
4736c51
Cover non-PyTorch Lightning routing
chesterxgchen Jun 28, 2026
e403259
Cover Lightning inspector helper paths
chesterxgchen Jun 28, 2026
2adc9bc
Track imported submodules for Lightning routing
chesterxgchen Jun 28, 2026
5316eac
Validate milestone 8 checkpoint evidence
chesterxgchen Jun 28, 2026
7bc4881
Restore agent skill catalog lint sources
chesterxgchen Jun 28, 2026
5ee4848
Use skill categories for trigger overlap lint
chesterxgchen Jun 28, 2026
00c2d94
Re-decouple skill lint engine from design docs
chesterxgchen Jun 28, 2026
0da8a56
Remove stale skill category fixture metadata
chesterxgchen Jun 28, 2026
af670b2
Clarify deferred doc crosslink lint
chesterxgchen Jun 28, 2026
82469ce
Clarify PyTorch Lightning inspection routing
chesterxgchen Jun 28, 2026
bb42e77
Fix PyTorch routing with incidental Lightning imports
chesterxgchen Jun 28, 2026
bde8bc5
Fix Lightning routing with PyTorch entry imports
chesterxgchen Jun 28, 2026
4d2b7a4
Fix modular Lightning inspector routing
chesterxgchen Jun 28, 2026
f79f64a
Fix Lightning inspector promotion weighting
chesterxgchen Jun 28, 2026
7f4d339
Fix Lightning inspector routing with PyTorch entry points
chesterxgchen Jun 28, 2026
af180e2
Fix inspector import context resolution
chesterxgchen Jun 28, 2026
cc7e4a2
Add relative Lightning import inspector regression
chesterxgchen Jun 28, 2026
370cf07
Add milestone 8 lint checkpoint regression test
chesterxgchen Jun 28, 2026
e5d5ede
Add package submodule Lightning import coverage
chesterxgchen Jun 28, 2026
c5781e1
Require category for public agent skills
chesterxgchen Jun 28, 2026
e3ec67d
Require terminal completion before skill validation success
chesterxgchen Jun 28, 2026
626afb7
Document unsupported skill category frontmatter
chesterxgchen Jun 28, 2026
0f4010b
Remove category from skill frontmatter
chesterxgchen Jun 28, 2026
3ad5947
Add inspector coverage for Lightning routing helpers
chesterxgchen Jun 28, 2026
1fd43d6
Clarify agent skill lint contracts
chesterxgchen Jun 28, 2026
4913d3c
Fix Lightning routing evidence scoring
chesterxgchen Jun 28, 2026
36de66d
Fix PyTorch-Lightning routing evidence
chesterxgchen Jun 28, 2026
f815be5
Avoid false Lightning reachability from import prefixes
chesterxgchen Jun 28, 2026
fca43f7
Keep Lightning routing within entry context
chesterxgchen Jun 28, 2026
003cb12
Guard Lightning routing fallback by entry context
chesterxgchen Jun 28, 2026
bb73e4f
Avoid local resolution for dotted external imports
chesterxgchen Jun 28, 2026
66c4792
Trim Lightning detection prose to the inspect override boundary
chesterxgchen Jun 28, 2026
f58a9f2
Align skill category frontmatter validation
chesterxgchen Jun 28, 2026
9759811
Surface skill category in manifest and skills list
chesterxgchen Jun 28, 2026
7adbb74
Tighten milestone 8 checkpoint validation
chesterxgchen Jun 28, 2026
d3cdc00
Clarify skill category lint metadata invariant
chesterxgchen Jun 28, 2026
50b8724
Fix Lightning fallback routing over PyTorch imports
chesterxgchen Jun 28, 2026
99024ba
Preserve PyTorch import evidence in Lightning fallback
chesterxgchen Jun 28, 2026
14ba8ef
Stop context-prefixing package-prefix import candidates
chesterxgchen Jun 28, 2026
26cc27e
Require full import path to resolve locally before following package …
chesterxgchen Jun 28, 2026
157c9f9
Pin Lightning shadowing guard to entry context in tests
chesterxgchen Jun 28, 2026
1532b28
Resolve nested local dotted imports via context prefix
chesterxgchen Jun 28, 2026
5bbbaa5
Derive package-prefix candidates from resolved modules
chesterxgchen Jun 28, 2026
3377613
Guard raw top-level package prefix is not followed for nested imports
chesterxgchen Jun 28, 2026
27584ab
Simplify Lightning promotion routing helper
chesterxgchen Jun 28, 2026
4eff40c
Untrack agent skill evaluation design doc
chesterxgchen Jun 28, 2026
97d3098
Remove milestone 8 checkpoint utility
chesterxgchen Jun 28, 2026
c9f6a97
Untrack agent implementation plan design doc
chesterxgchen Jun 28, 2026
3b579d6
Clarify skill source-of-truth boundaries
chesterxgchen Jun 28, 2026
1451a9d
Tighten source-discovered-strategy override evals
chesterxgchen Jun 28, 2026
cbd58f6
Prevent source-discovered conversion overrides
chesterxgchen Jun 28, 2026
87ea1ae
Centralize source override skill guidance
chesterxgchen Jun 28, 2026
3afbbb4
Guard Lightning eval fixtures against empty batches
chesterxgchen Jun 28, 2026
96a83c7
Align conversion skills with operating model (steps 1-6)
chesterxgchen Jul 2, 2026
07ce2d1
Add JSON output contract tests and packaged conversion templates
chesterxgchen Jul 2, 2026
5f6d999
Fix cross-review findings in conversion skills and boundary lint
chesterxgchen Jul 2, 2026
49b7d1c
Add high-level overview to skill architecture doc
chesterxgchen Jul 2, 2026
40b53c2
Modularize inspector framework detection into per-framework detectors
chesterxgchen Jul 2, 2026
69609d0
Document the three responsibility layers in skill architecture doc
chesterxgchen Jul 2, 2026
1738711
Relocate eval suites out of shipped skills into dev_tools/agent/skill…
chesterxgchen Jul 2, 2026
e607dfd
Fix conversion-skill routing and template review findings
chesterxgchen Jul 2, 2026
b07ba14
Fail closed on stray eval dirs inside shipped skills
chesterxgchen Jul 2, 2026
3dc986c
Harden promotion, aggregator weights, and eval-mode in review fixes
chesterxgchen Jul 2, 2026
1fd9ab6
Fix deferred review findings: cross-family ties, mixed-workspace nami…
chesterxgchen Jul 2, 2026
e63d02f
Don't count in-Lightning torch usage as standalone PyTorch base evidence
chesterxgchen Jul 2, 2026
ef21abd
Improve agent skill routing and validation guidance
chesterxgchen Jul 2, 2026
1b87dbf
Fix reachability collision, entry-context routing, and nested-eval li…
chesterxgchen Jul 2, 2026
38565b1
Harden conversion aggregator step weights
chesterxgchen Jul 2, 2026
5c26c97
Guard agent skill eval exclusions
chesterxgchen Jul 2, 2026
ba4278b
Align eval-loader error messages to the evals.json filename
chesterxgchen Jul 2, 2026
2c3f3ee
Fix Lightning routing for embedded PyTorch evidence
chesterxgchen Jul 2, 2026
86ca934
Ignore local-only agent skill design docs
chesterxgchen Jul 2, 2026
57910ee
Add oversized step count aggregation regression
chesterxgchen Jul 2, 2026
8cba1b0
Prune excluded runtime lint directories
chesterxgchen Jul 2, 2026
5dea87d
Align lint-independence invariant with the eval-root input
chesterxgchen Jul 2, 2026
1fd4d15
Fix Lightning fallback PyTorch scoring
chesterxgchen Jul 2, 2026
5fe8fdd
Clarify skill lint eval inputs
chesterxgchen Jul 2, 2026
8b75820
Fix three residual framework-routing edge cases
chesterxgchen Jul 2, 2026
c9290c3
Prefer non-utility framework fallback in inspector
chesterxgchen Jul 2, 2026
bcf4e34
Address PR #4837 review comments on Lightning skill/evals
chesterxgchen Jul 2, 2026
0ae56fc
Fix Claude2 code-review findings (8) in the agent skill inspector/lints
chesterxgchen Jul 2, 2026
c4c8bc7
Align skill frontmatter with the agentskills.io spec
chesterxgchen Jul 2, 2026
0e934ab
Structure shared skill content as a spec-compliant internal skill
chesterxgchen Jul 2, 2026
8a113f7
Scope agent doctor to conversion-only readiness checks
chesterxgchen Jul 2, 2026
ddde656
Close two Claude2 review residuals: parity test + reachability memoiz…
chesterxgchen Jul 2, 2026
80b41e9
De-duplicate and clarify conversion skill guidance
chesterxgchen Jul 2, 2026
1d45c6a
Fix recipe selection to match real catalog fields and keep HE explicit
chesterxgchen Jul 2, 2026
4906842
Clarify skill metadata frontmatter docs
chesterxgchen Jul 2, 2026
151146d
Fix packaged skill asset references
chesterxgchen Jul 2, 2026
5716b10
Fix agent doctor JSON readiness guidance
chesterxgchen Jul 2, 2026
aacb539
Make setup.py-build packaging tests hermetic (fix intermittent flake)
chesterxgchen Jul 3, 2026
1149b2a
Clarify PyTorch recipe privacy selection
chesterxgchen Jul 3, 2026
601740f
Clarify import-vs-inspect wording and soften cyclic recipe example
chesterxgchen Jul 3, 2026
4925408
Harden setup.py-build flake fix: xdist loadgroup + isolated bdist-dir
chesterxgchen Jul 3, 2026
9c0b488
Harden received-model metric-ownership guidance (eval report)
chesterxgchen Jul 3, 2026
712f6fb
Add settled conversion rules: device placement, pretrained-path, plai…
chesterxgchen Jul 3, 2026
a63e689
Encode Lightning DDP -> external-process executor rule
chesterxgchen Jul 3, 2026
645b57f
Clarify PyTorch distributed launch recipe guidance
chesterxgchen Jul 3, 2026
21f99b2
Update Lightning DDP launch guidance
chesterxgchen Jul 3, 2026
4632602
Remove unfounded shared-token-ID rule from data-derived-arg eval
chesterxgchen Jul 3, 2026
a9b223a
Add generated-code quality rules: setup-outside-loop and data-location
chesterxgchen Jul 3, 2026
2d5c39c
Wire conversion-quality behaviors into pytorch/lightning eval suites
chesterxgchen Jul 3, 2026
ffb08cd
Fix PyTorch eval template to build model once before the round loop
chesterxgchen Jul 3, 2026
8c585c2
Scope conversion-quality assertions to match basic fixtures
chesterxgchen Jul 3, 2026
0086950
Disambiguate no-hardcoded-absolute-data-path for graders
chesterxgchen Jul 3, 2026
3878dc5
Note why a shared vocabulary mapping matters, not just vocab_size
chesterxgchen Jul 3, 2026
5fbeacf
Fix PyTorch conversion template setup placement
chesterxgchen Jul 3, 2026
9391d69
Refine conversion data path evals
chesterxgchen Jul 3, 2026
9235865
Document external-data eval fixtures in SOURCE notes
chesterxgchen Jul 3, 2026
90372e6
Initialize FLARE before PyTorch setup hook
chesterxgchen Jul 3, 2026
18131e8
Address review: privacy scope wording, CLI syntax, DDP + validation c…
chesterxgchen Jul 3, 2026
26de3f7
Consolidate external data fixture source notes
chesterxgchen Jul 3, 2026
eadbed2
Clarify Lightning DDP metadata broadcast guidance
chesterxgchen Jul 3, 2026
af40aba
Share PyTorch-family recipe selection between PyTorch and Lightning s…
chesterxgchen Jul 3, 2026
cf3f3c0
Remove dead code left by earlier review-fix rounds
chesterxgchen Jul 3, 2026
ae50cc8
Fix unresolvable per-skill reference in shared recipe-selection doc
chesterxgchen Jul 3, 2026
42f70f7
Reconcile HE reporting with in-scope recipe-level privacy; list share…
chesterxgchen Jul 3, 2026
cb3df73
Fail closed on HE recipes in the SimEnv conversion path
chesterxgchen Jul 3, 2026
a9ae059
Simplify the Lightning promotion weighted fallback
chesterxgchen Jul 3, 2026
415bb30
Parametrize three near-clone inspector test families
chesterxgchen Jul 3, 2026
2329b43
Consolidate lint engine registries, walkers, and per-skill I/O
chesterxgchen Jul 3, 2026
155fe82
Carry the HE SimEnv exception into canonical-path and validation docs
chesterxgchen Jul 3, 2026
488ba85
Make SkillRecord validation lazy to preserve bounded reads on scoped …
chesterxgchen Jul 3, 2026
1ec2b8f
Mark homomorphic encryption unsupported by the conversion skills
chesterxgchen Jul 3, 2026
26fa7bc
Harden skill trust boundaries and supply-chain/privacy rules (securit…
chesterxgchen Jul 3, 2026
6450d28
Exclude bytecode from packaged skills (security review, packaging)
chesterxgchen Jul 3, 2026
25573ff
Add injection evals for supply-chain, trust-escalation, and poisoned …
chesterxgchen Jul 3, 2026
91e0cca
Strengthen recipe-list drift contract and clarify privacy_compatible
chesterxgchen Jul 3, 2026
820a7aa
Make injection typosquat a true substitution (review nit)
chesterxgchen Jul 3, 2026
d27d6b6
Redact requirement URLs in approval prompts and unify venv guidance
chesterxgchen Jul 3, 2026
6d3263f
Enforce the agent skill lint before push (runtest -s + pre-push hook)
chesterxgchen Jul 3, 2026
17aa625
Align injection eval with the requirement-line redaction contract
chesterxgchen Jul 3, 2026
10f6575
Make unattended dependency install mandatory, not a reportable blocker
chesterxgchen Jul 3, 2026
dd9450a
Conform skills to the company (NVCARPS) guideline, keeping NVFLARE ch…
chesterxgchen Jul 3, 2026
44025a6
Ignore the local agent-skill checks report design doc
chesterxgchen Jul 3, 2026
bcb74c1
Align privacy scope: DP and privacy filters unsupported everywhere HE is
chesterxgchen Jul 3, 2026
0e0847a
Add device selection eval check
chesterxgchen Jul 3, 2026
39236ef
Harden agent skills and benchmark RCA reporting
chesterxgchen Jul 4, 2026
3b0153c
Revert "Harden agent skills and benchmark RCA reporting"
chesterxgchen Jul 4, 2026
c9dc219
Read SKILL.md as a bounded regular file in the frontmatter validator
chesterxgchen Jul 4, 2026
929a69a
Re-land execution-sandbox, runtime-path, supply-chain, and network-lo…
chesterxgchen Jul 4, 2026
9f6d107
Re-land PyTorch eval coverage: device, checkpoint, state-mismatch, pr…
chesterxgchen Jul 4, 2026
7b1610e
Re-land Lightning eval coverage + network-logger gating, excluding th…
chesterxgchen Jul 4, 2026
b159cd9
Re-land diagnose/orient eval coverage from the reverted commit
chesterxgchen Jul 4, 2026
0527b07
Harden agent skill install/package integrity (item 11)
chesterxgchen Jul 4, 2026
59edaf7
Fix umask-002 build failure: bundle root world-writable check only
chesterxgchen Jul 4, 2026
201063d
Add empty-batch guard to external-data-lightning fixture
chesterxgchen Jul 4, 2026
7968f5c
Add empty-batch guard to gpu-device-lightning fixture
chesterxgchen Jul 4, 2026
6d25181
Drop the non-essential 'nvflare agent doctor' command
chesterxgchen Jul 4, 2026
429bb50
Strengthen conversion skill validation ordering
chesterxgchen Jul 4, 2026
c8fb50a
Avoid approval waits in unattended skill runs
chesterxgchen Jul 4, 2026
403fb8d
Rescope skills to the enforceable security boundary; stop skill-owned…
chesterxgchen Jul 5, 2026
4e4d0a8
Tighten skill boundary: progressive disclosure, direct recipe show, n…
chesterxgchen Jul 5, 2026
f91b5f2
Clarify skill layout collision handling
chesterxgchen Jul 5, 2026
f05de35
Generalize source model layout guidance
chesterxgchen Jul 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .githooks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Git hooks

Repo-managed git hooks. Enable them once per clone:

```bash
git config core.hooksPath .githooks
```

## `pre-push`

Runs the deterministic agent-skill lint
(`python -m dev_tools.agent.skills.checks --skills-root skills`) and blocks the
push if it finds anything, so the agent skills checked into GitHub stay clean.
It covers `skills/` and the eval suites under `dev_tools/agent/skill_evals/`.

The same lint also runs in `./runtest.sh -s` and in the pre-merge CI unit tests
(`tests/unit_test/tool/agent_skill_checks/seed_skills_test.py`), so this hook is
a fast local pre-push gate rather than the only enforcement.

Emergency bypass: `git push --no-verify`.
28 changes: 28 additions & 0 deletions .githooks/pre-push
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/usr/bin/env bash
# NVFLARE pre-push hook: block a push when the agent skill lint finds anything,
# so the skills checked into GitHub stay clean.
#
# Enable once per clone:
# git config core.hooksPath .githooks
#
# The lint is fast and dependency-light; it covers skills/ and the eval suites
# under dev_tools/agent/skill_evals/. The same check runs in `./runtest.sh -s`
# and in CI. Bypass in an emergency with `git push --no-verify`.
set -euo pipefail

repo_root="$(git rev-parse --show-toplevel)"

# Nothing to check if this repo has no skills root.
if [ ! -d "$repo_root/skills" ]; then
exit 0
fi

echo "pre-push: running agent skill lint (python -m dev_tools.agent.skills.checks)..."
if ! python3 -m dev_tools.agent.skills.checks --skills-root "$repo_root/skills"; then
echo ""
echo "pre-push: agent skill lint failed. Fix the findings above (or run"
echo " ./runtest.sh -s) before pushing. Emergency bypass:"
echo " git push --no-verify"
exit 1
fi
echo "pre-push: agent skill lint clean."
11 changes: 10 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -187,9 +187,18 @@ CLAUDE.local.md
.cursor/
.claude/
.devcontainer/
tmp/review/
tmp/

# memory profiler output
tests/memory_profile/**/*.dat
/HEAD
uv.lock

# Local-only agent skill design docs (human reference; not shipped in the PR).
# Only docs/design/skills_architecture.md is tracked; keep the rest out so a
# stray `git add -A` cannot re-track them.
docs/design/agent_skill_authoring.md
docs/design/agent_skill_checks_report.md
docs/design/agent_skill_evaluation.md
docs/design/agent_skill_operating_model.md
docs/design/export_arg_fidelity.md
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ include nvflare/_version.py
include nvflare/libs/*.so
include nvflare/fuel/utils/*.json
recursive-include skills *
global-exclude *.py[co]
global-exclude __pycache__
# Build-time only: AgentSkillsBuildPy loads this frontmatter validator to build the
# bundled-skills manifest. Needed in the sdist so wheels can build from an sdist; it is
# not installed into the wheel (dev_tools is excluded from packages in setup.py).
Expand Down
500 changes: 500 additions & 0 deletions dev_tools/agent/skill_evals/nvflare-convert-lightning/evals.json

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Fixture Source Notes

The `hello-lightning` fixtures are minimized, unconverted PyTorch Lightning
training code modeled on the NVFLARE repository example:

- Source example: `examples/hello-world/hello-lightning`

The fixture intentionally omits real datasets, data download, FLARE integration,
and full job execution details so trigger and behavior evals stay deterministic.
`train.py` and `model.py` represent plain Lightning code before any FLARE
conversion; the agent under evaluation is expected to add the
`flare.patch(trainer)` Client API integration and a `job.py`.

The `gpu-device-lightning` fixture is synthetic, derived from
`hello-lightning` with an explicit `torch.cuda.is_available()` choice between
Lightning's `gpu` and `cpu` accelerators. It makes device-intent preservation
applicable without requiring a GPU on the evaluation host.

The `vocab-lightning` fixture adds a `LitTextCNN` model whose `__init__` has a
required, data-derived argument (`vocab_size`, no default). The conversion must
pin one shared vocabulary size for the server recipe model config and every
client model construction path. Passing a live `LightningModule` instance with
required args can serialize without those args and fail server-side
reconstruction in the model persistor.

The `external-data-lightning` fixtures are synthetic, derived from the
`hello-lightning` fixture but loading train/val CSVs from an external data
directory (`--data-dir`, default `/data/nvflare/lightning-tabular`) instead of
building synthetic in-memory tensors. The path is intentionally external to the
repository and run workspace so configurable data-path behavior is asserted only
when the source provides an external dataset location.

The `hello-lightning` fixture's `LitNet` includes `validation_step` with
`self.log("val_loss", ...)` and the training entry point builds a validation
dataloader, so evaluation-focused evals can assert Lightning-native evaluation
(`trainer.validate` before `trainer.fit`) without a separate fixture.
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import pytorch_lightning as pl
import torch
import torch.nn as nn
import torch.nn.functional as F


class LitNet(pl.LightningModule):
def __init__(self, input_size=4, num_classes=2, lr=0.01):
super().__init__()
self.save_hyperparameters()
self.fc1 = nn.Linear(input_size, 8)
self.fc2 = nn.Linear(8, num_classes)

def forward(self, x):
x = F.relu(self.fc1(x))
return self.fc2(x)

def training_step(self, batch, batch_idx):
features, labels = batch
if labels.numel() == 0:
raise ValueError("empty training batch; check per-site data partitioning")
loss = F.cross_entropy(self(features), labels)
self.log("train_loss", loss)
return loss

def validation_step(self, batch, batch_idx):
features, labels = batch
if labels.numel() == 0:
raise ValueError("empty validation batch; check per-site data partitioning")
loss = F.cross_entropy(self(features), labels)
self.log("val_loss", loss)
Comment thread
chesterxgchen marked this conversation as resolved.
Comment thread
chesterxgchen marked this conversation as resolved.

Comment thread
chesterxgchen marked this conversation as resolved.
def configure_optimizers(self):
return torch.optim.SGD(self.parameters(), lr=self.hparams.lr)
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import csv
from pathlib import Path

import pytorch_lightning as pl
import torch
from model import LitNet
from torch.utils.data import DataLoader, TensorDataset

DEFAULT_DATA_DIR = "/data/nvflare/lightning-tabular"


def load_csv(data_path):
features = []
labels = []
with Path(data_path).open(newline="", encoding="utf-8") as csv_file:
reader = csv.DictReader(csv_file)
for row in reader:
features.append([float(row[f"feature_{index}"]) for index in range(4)])
labels.append(int(row["label"]))
if not features:
raise ValueError(f"no rows loaded from {data_path}")
return TensorDataset(torch.tensor(features, dtype=torch.float32), torch.tensor(labels, dtype=torch.long))


class TabularDataModule(pl.LightningDataModule):
def __init__(self, data_dir=DEFAULT_DATA_DIR, batch_size=4):
super().__init__()
self.data_dir = Path(data_dir)
self.batch_size = batch_size

def setup(self, stage=None):
self.train_dataset = load_csv(self.data_dir / "train.csv")
self.val_dataset = load_csv(self.data_dir / "val.csv")

def train_dataloader(self):
return DataLoader(self.train_dataset, batch_size=self.batch_size, shuffle=True)

def val_dataloader(self):
return DataLoader(self.val_dataset, batch_size=self.batch_size)


def main():
parser = argparse.ArgumentParser()
parser.add_argument("--data-dir", default=DEFAULT_DATA_DIR)
parser.add_argument("--batch-size", type=int, default=4)
args = parser.parse_args()

model = LitNet()
datamodule = TabularDataModule(data_dir=args.data_dir, batch_size=args.batch_size)
trainer = pl.Trainer(max_epochs=1, accelerator="cpu", devices=1, logger=False)
trainer.fit(model, datamodule=datamodule)


if __name__ == "__main__":
main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import pytorch_lightning as pl
import torch
import torch.nn as nn
import torch.nn.functional as F


class LitNet(pl.LightningModule):
def __init__(self):
super().__init__()
self.layer = nn.Linear(4, 2)

def forward(self, features):
return self.layer(features)

def training_step(self, batch, batch_idx):
features, labels = batch
if labels.numel() == 0:
raise ValueError("empty training batch; check per-site data partitioning")
return F.cross_entropy(self(features), labels)
Comment thread
chesterxgchen marked this conversation as resolved.

def configure_optimizers(self):
return torch.optim.SGD(self.parameters(), lr=0.01)
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import pytorch_lightning as pl
import torch
from model import LitNet
from torch.utils.data import DataLoader, TensorDataset


def main():
accelerator = "gpu" if torch.cuda.is_available() else "cpu"
dataset = TensorDataset(torch.randn(8, 4), torch.randint(0, 2, (8,)))
trainer = pl.Trainer(max_epochs=1, accelerator=accelerator, devices=1, logger=False)
trainer.fit(LitNet(), DataLoader(dataset, batch_size=4))


if __name__ == "__main__":
main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import pytorch_lightning as pl
import torch
import torch.nn as nn
import torch.nn.functional as F


class LitNet(pl.LightningModule):
def __init__(self, input_size=4, num_classes=2, lr=0.01):
super().__init__()
self.save_hyperparameters()
self.fc1 = nn.Linear(input_size, 8)
self.fc2 = nn.Linear(8, num_classes)

def forward(self, x):
x = F.relu(self.fc1(x))
return self.fc2(x)

def training_step(self, batch, batch_idx):
features, labels = batch
if labels.numel() == 0:
raise ValueError("empty training batch; check per-site data partitioning")
loss = F.cross_entropy(self(features), labels)
Comment thread
greptile-apps[bot] marked this conversation as resolved.
self.log("train_loss", loss)
return loss

def validation_step(self, batch, batch_idx):
features, labels = batch
if labels.numel() == 0:
raise ValueError("empty validation batch; check per-site data partitioning")
loss = F.cross_entropy(self(features), labels)
self.log("val_loss", loss)
Comment thread
chesterxgchen marked this conversation as resolved.

def configure_optimizers(self):
return torch.optim.SGD(self.parameters(), lr=self.hparams.lr)
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import pytorch_lightning as pl
import torch
from model import LitNet
from torch.utils.data import DataLoader, TensorDataset


def make_loader():
features = torch.randn(8, 4)
labels = torch.randint(0, 2, (8,))
return DataLoader(TensorDataset(features, labels), batch_size=4)


def main():
model = LitNet()
train_loader = make_loader()
val_loader = make_loader()
trainer = pl.Trainer(max_epochs=1, accelerator="cpu", devices=1, logger=False)
trainer.fit(model, train_loader, val_loader)


if __name__ == "__main__":
main()
Loading
Loading