Skip to content

fix(nrl-k8s): remove SA impersonation from dev pod RBAC check#2655

Open
terrykong wants to merge 2 commits into
mainfrom
tk/fix-dev-rbac-check
Open

fix(nrl-k8s): remove SA impersonation from dev pod RBAC check#2655
terrykong wants to merge 2 commits into
mainfrom
tk/fix-dev-rbac-check

Conversation

@terrykong
Copy link
Copy Markdown
Collaborator

@terrykong terrykong commented Jun 1, 2026

Summary

  • The nrl-k8s dev connect RBAC preflight check used kubectl auth can-i --as=system:serviceaccount:... which requires the impersonate verb on serviceaccounts
  • Most users on EKS with SSO-based roles lack this permission, so the check always fails even when the user has full edit access in the namespace
  • Removed the --as flag so the check tests the current user's own permissions instead

Test plan

  • kubectl auth can-i get pods -n default returns yes (previously failed with impersonation)
  • nrl-k8s dev connect passes the RBAC check and creates the dev pod successfully

Validation

Install from this branch:

uv tool install --reinstall "nrl-k8s @ git+https://github.com/NVIDIA-NeMo/RL.git@tk/fix-dev-rbac-check#subdirectory=infra/nrl_k8s"

Then run:

nrl-k8s dev connect

Previously this failed with:

error: the default service account in default lacks edit permissions — kubectl won't work inside the dev pod

The root cause was kubectl auth can-i --as=system:serviceaccount:default:default returning a Forbidden error (cannot impersonate serviceaccounts), which the CLI misinterpreted as missing RBAC.

After the fix, the RBAC check passes and the dev pod creates and connects successfully.

Generated with Claude Code

The preflight RBAC check used kubectl auth can-i --as=system:serviceaccount:...
which requires the impersonate verb on serviceaccounts. Most users on
EKS with SSO-based roles lack this permission, causing nrl-k8s dev connect
to fail even when the user has full edit access in the namespace.

Check the current user's own permissions instead.

Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong requested a review from a team as a code owner June 1, 2026 19:36
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jun 1, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@terrykong terrykong enabled auto-merge (squash) June 1, 2026 19:42
@terrykong terrykong added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Jun 1, 2026
@terrykong
Copy link
Copy Markdown
Collaborator Author

/ok to test 6664ec1

@terrykong
Copy link
Copy Markdown
Collaborator Author

/ok to test 20633ab

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants