feat: CommunityToolkit.Aspire.Hosting.K3s — k3s Kubernetes cluster hosting integration by edmondshtogu · Pull Request #1322 · CommunityToolkit/Aspire

edmondshtogu · 2026-05-14T10:45:04Z

Closes #1321

Overview of changes

Adds CommunityToolkit.Aspire.Hosting.K3s, a hosting integration that runs a lightweight Kubernetes cluster as an Aspire resource tree. Developers can declare a local Kubernetes cluster in Program.cs — with Helm charts, manifests, and exposed service endpoints — the same way they add Redis or PostgreSQL. No external tooling beyond a compatible container runtime (Docker or Podman) is required.

What's included

New package: src/CommunityToolkit.Aspire.Hosting.K3s/

File	Responsibility
K3sClusterResource.cs	ContainerResource — k3s server; holds kubeconfig directory path and image settings
K3sClusterOptions.cs	Configuration (pod/service CIDR, disabled components, k3s image tag, helm/kubectl image overrides)
K3sBuilderExtensions.cs	AddK3sCluster, WithDataVolume, WithLifetime, WithReference(cluster), WithK3sVersion, …
K3sReadinessHealthCheck.cs	File-based health check — polls cluster/kubeconfig.yaml, writes local/ + container/ variants, probes nodes via KubernetesClient
HelmReleaseResource.cs	ContainerResource — runs alpine/helm; child of cluster; exits 0 on success
K3sBuilderExtensions.Helm.cs	AddHelmRelease, WithHelmValue, WithHelmValuesFile
K8sManifestResource.cs	ContainerResource — runs alpine/k8s; child of cluster; exits 0 on success
K3sBuilderExtensions.Manifest.cs	AddK8sManifest with auto-detected Kustomize support
K3sServiceEndpointResource.cs	Resource — in-process port-forward; M1 passive health via IsReady flag
K3sBuilderExtensions.ServiceEndpoint.cs	AddServiceEndpoint, WithReference(endpoint)
K3sInProcessPortForwarder.cs	KubernetesClient WebSocket TCP forwarder; binds 0.0.0.0:{port}
K3sAgentResource.cs	Worker node support (K3sClusterOptions.AgentCount)
HelmContainerImageTags.cs / KubectlContainerImageTags.cs	Pinned image defaults; overridable via K3sClusterOptions

New Tests & Examples:

Unit Tests: tests/CommunityToolkit.Aspire.Hosting.K3s.Tests/ — 87 unit tests covering resource registration, script generation, kubeconfig variants, Kustomize detection, values file injection, and public API null guards.
Examples: examples/k3s/CommunityToolkit.Aspire.Hosting.K3s.AppHost/
TypeScript playground: playground/polyglot/TypeScript/CommunityToolkit.Aspire.Hosting.K3s/ValidationAppHost/

Key design decisions

Health check via bind-mount, not docker exec. k3s writes its kubeconfig to K3S_KUBECONFIG_OUTPUT=/tmp/k3s-kubeconfig/kubeconfig.yaml, bind-mounted to AppHostDirectory/.k3s/{name}/cluster/ on the host. The health check polls File.Exists, rewrites server URLs into local/ and container/ variants, then confirms node readiness via IKubernetes.CoreV1.ListNodeAsync. No shell access, no docker exec, works with any container runtime.
Helm and kubectl run as containers. HelmReleaseResource and K8sManifestResource extend ContainerResource and are shown as children of the cluster in the Aspire dashboard. The install/apply script is injected via WithContainerFiles. They cannot use WaitFor(cluster) (Aspire forbids a child waiting for its parent), so their scripts poll for /root/.kube/kubeconfig.yaml — which only appears after the cluster health check passes — before proceeding. Consumers use WaitForCompletion(helmRelease) since these are run-to-completion containers.
Kubeconfig delivered via bind-mount to all containers. WithReference(cluster) on containers, the helm installer, and the kubectl applier all bind-mount AppHostDirectory/.k3s/{name}/container/ (server: https://{name}:6443) at a known in-container path and set KUBECONFIG. Bind-mount is used uniformly so the kubeconfig updates automatically if the cluster is recreated without restarting dependent containers. No KUBECONFIG_DATA base64 encoding — all standard Kubernetes tooling (kubectl, helm, KubernetesClient SDK) works without custom bootstrap code.
Kustomize auto-detected. AddK8sManifest checks for kustomization.yaml at configuration time: if present, it bind-mounts the directory (preserving relative base references) and uses kubectl apply -k; otherwise it uses an async WithContainerFiles callback to copy only the YAML files at container-start time and applies with kubectl apply -f --server-side. The callback approach avoids Aspire's build-time path validation on the string overload. The script auto-detects the mode at runtime.
Service exposure without NodePort. K3sServiceEndpointResource starts an in-process KubernetesClient WebSocket port-forward bound to 0.0.0.0:{hostPort}. Host resources receive services__{name}__url=http(s)://localhost:{port}; container resources receive services__{name}__url=http(s)://host.docker.internal:{port} with --add-host=host.docker.internal:host-gateway injected automatically via ContainerRuntimeArgsCallbackAnnotation (DCP does not inject this on Linux). The forwarder resolves targetPort from the service spec (not the service port) before opening the pod WebSocket, and only signals ready after a running pod is confirmed — not when the TCP listener starts.
Image overrides via K3sClusterOptions. The helm and kubectl container images are configurable via HelmImage/HelmTag/HelmRegistry and KubectlImage/KubectlTag/KubectlRegistry on K3sClusterOptions. Defaults: docker.io/alpine/helm:3.17.3 and docker.io/alpine/kubectl:1.36.0.
Robustness fixes from review. WithK3sVersion propagates the image tag to all agent nodes (prevents server/agent version skew). AddServiceEndpoint validates the port is in the range 1–65535. Helm values files are indexed as {i}-{filename} so declaration order is preserved and basename collisions are safe. All helm --set values and --values paths are POSIX single-quote escaped via ShellEscape().

Usage

var cluster = builder.AddK3sCluster("k8s")
    .WithDataVolume()
    .WithK3sVersion("v1.36.0-k3s1");

var widgetCrd = cluster.AddK8sManifest("widget-crd", "./k8s/crds/");

var argocd = cluster.AddHelmRelease("argocd", "argo-cd",
    repo: "https://argoproj.github.io/argo-helm",
    version: "7.8.0",
    @namespace: "argocd")
    .WithHelmValuesFile("./deploy/argocd-values.yaml");

var ui = cluster.AddServiceEndpoint("argocd-ui", "argocd-server", 443, "argocd")
    .WaitForCompletion(argocd);

builder.AddProject<Projects.WidgetOperator>("operator")
    .WaitForCompletion(widgetCrd)
    .WithReference(cluster);

builder.AddProject<Projects.Api>("api")
    .WaitFor(ui)
    .WithReference(ui);

Example

PR Checklist

Other information

Security Assumptions
This change assumes local development only. k3s runs in privileged mode (required by k3s/containerd). The bind-mounted kubeconfig directory is readable only by the AppHost user. KubernetesClientConfiguration uses the embedded CA cert from the kubeconfig; no DangerousAcceptAnyServerCertificateValidator is used.

Remaining Follow-up Work

Publish-time diagnostic when a K3sClusterResource has no production counterpart configured.

github-actions · 2026-05-14T10:45:13Z

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/CommunityToolkit/Aspire/main/eng/scripts/dogfood-pr.sh | bash -s -- 1322

Or

Run remotely in PowerShell:

iex "& { $(irm https://raw.githubusercontent.com/CommunityToolkit/Aspire/main/eng/scripts/dogfood-pr.ps1) } 1322"

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

edmondshtogu · 2026-05-14T11:16:38Z

@dotnet-policy-service agree

Copilot

Pull request overview

Copilot reviewed 37 out of 38 changed files in this pull request and generated 13 comments.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 37 out of 38 changed files in this pull request and generated 15 comments.

Comments suppressed due to low confidence (4)

src/CommunityToolkit.Aspire.Hosting.K3s/K3sBuilderExtensions.Manifest.cs:95

This async callback has no await, which will produce CS1998 and be treated as an error in this repo. Use a completed Task return (or a synchronous overload) so the project builds cleanly.

            resourceBuilder.WithContainerFiles("/k8s-manifests", async (ctx, ct) =>
            {
                if (Directory.Exists(absolutePath))

src/CommunityToolkit.Aspire.Hosting.K3s/K3sBuilderExtensions.Helm.cs:91

This async expression-bodied callback has no await, so CS1998 will be emitted and treated as an error. Return a completed task (or use a synchronous overload) instead of marking the lambda async.

            .WithContainerFiles("/helm-values", async (ctx, ct) =>
                release.ValuesFiles
                    .Select((hostPath, i) => (ContainerFileSystemItem)new ContainerFile

src/CommunityToolkit.Aspire.Hosting.K3s/K3sInProcessPortForwarder.cs:152

The same empty-selector case here can select the first ready pod in the namespace and forward traffic to an unrelated workload. Services without selectors are valid in Kubernetes, so the forwarder should not treat an empty selector as "all pods".

            var selector = string.Join(",",
                (svc.Spec.Selector ?? new Dictionary<string, string>()).Select(kv => $"{kv.Key}={kv.Value}"));

            var pods = await k8sClient.CoreV1
                .ListNamespacedPodAsync(@namespace, labelSelector: selector, cancellationToken: ct)

tests/CommunityToolkit.Aspire.Hosting.K3s.Tests/K3sClusterResourceTests.cs:253

This test name says it verifies the pod subnet argument, but the assertion only checks that a cluster resource exists. It would still pass if WithPodSubnet stopped adding --cluster-cidr, so it should assert the command-line args.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 37 out of 38 changed files in this pull request and generated 7 comments.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 37 out of 38 changed files in this pull request and generated 5 comments.

edmondshtogu · 2026-05-18T18:47:50Z

@aaronpowell, @IEvangelist, @marshalhayes or @tommasodotNET can one of you please review the PR (sorry for pulling you in the comment, but you are the most active contributors here)

edmondshtogu · 2026-05-19T11:19:51Z

@davidfowl do you know who can help with the review here?

IEvangelist

Thorough code review

13 findings total: 1 blocking, 7 major, 5 minor/medium. Overall verdict: request changes. The design is sound — the execution needs a careful pass on lifecycle, idempotency, and atomicity.

🚨 Blocking

1. Unresolved Git merge conflict markers in `.gitignore` (lines 26–38)

Literal <<<<<<< main, =======, >>>>>>> main markers were committed. git grep "<<<<<<<" confirms .gitignore:26. The conflict was resolved locally but the markers were never deleted. This alone should keep the PR out of main.

🔴 Major

2. `K3sReadinessHealthCheck._cachedClient` is dead state — `Kubernetes` client leaks every tick

K3sBuilderExtensions.cs:211-215 registers the check with sp => new K3sReadinessHealthCheck(resource, resource.ApiEndpoint). DefaultHealthCheckService.RunCheckAsync invokes the factory on every check, so _cachedClient always starts null and the cache is dead code. Every tick allocates a fresh Kubernetes/HttpClient (line 116) which is never disposed (the class is not IDisposable). The "stale credentials" recovery in the catch block (lines 70–84) can only ever dispose the just-created instance, never a stale one across calls — it's a no-op pattern.

Fix: register as a singleton (var hc = new K3sReadinessHealthCheck(...); sp => hc) and make the class IDisposable, or drop the cache and using the client locally within each CheckHealthAsync call.

3. Non-atomic kubeconfig overwrite on every health check

K3sReadinessHealthCheck.cs:99-113. Because of #2, EnsureClientAsync runs every tick and File.WriteAllTextAsync truncates both local/kubeconfig.yaml and container/kubeconfig.yaml in place. Readers hitting the race window can observe a partial/empty file — host projects with KUBECONFIG=…/local/kubeconfig.yaml, the helm/kubectl scripts polling for the file, and any guest BuildConfigFromConfigFile(...) at startup that does not retry.

Fix: write to a temp file in the same directory then File.Move(tempPath, finalPath, overwrite: true); only rewrite when the source raw YAML's hash has changed since last write.

4. `AllocatePort()` TOCTOU race + interface mismatch

K3sBuilderExtensions.ServiceEndpoint.cs:200-207. Two distinct defects:

Window race: the probe listener is stopped before K3sInProcessPortForwarder rebinds on the same port. Any other process can claim the port in the gap, causing the subsequent new TcpListener(IPAddress.Any, localPort).Start() to throw SocketException(AddressAlreadyInUse). The exception is caught by the outer catch (Exception ex) (lines 79–85), logged, and the loop backs off and retries with the same already-stolen port — it never picks a fresh one. The endpoint appears in the dashboard as perpetually unhealthy with no obvious cause.
Interface mismatch: the probe binds on IPAddress.Loopback, but the forwarder binds on IPAddress.Any (line 43 of K3sInProcessPortForwarder.cs). A port free on 127.0.0.1 can still conflict on 0.0.0.0 (another service bound on a non-loopback interface) — same failure mode.

Fix: probe on IPAddress.Any (match the actual bind). Better: pass the live listener instance into K3sInProcessPortForwarder and keep it bound through the handoff so there is no release-rebind window. On retry inside the forwarder loop, allocate a fresh port rather than reusing the same one.

5. `WithDataVolume` is not idempotent — duplicate volume mounts

K3sBuilderExtensions.cs:327-335. Each call appends a fresh volume annotation targeting /var/lib/rancher/k3s. Two calls (a common error when composing builder extensions) produce two ContainerMountAnnotations at the same target → Docker rejects with Duplicate mount point: /var/lib/rancher/k3s. The unit test WithDataVolumeAddsSingleVolumeMount only exercises a single call; there is no test asserting idempotency.

Fix: remove the previous data-volume annotation before adding, or use ResourceAnnotationMutationBehavior.Replace semantics consistent with WithLifetime (line 410-412).

6. `WithReference(cluster)` adds a duplicate bind-mount on repeat invocation

K3sBuilderExtensions.cs:368-388. Same defect as #5: the container branch unconditionally appends a new ContainerMountAnnotation for /var/k3s. Two calls — easy when composing helper extension methods — create two bind-mounts to the same target, which Docker rejects at start with the duplicate-mount error buried in DCP logs.

Fix: check destination.Resource.Annotations.OfType<ContainerMountAnnotation>().Any(m => m.Target == "/var/k3s") and skip (or replace).

7. Helm `--set` key injection — only the value is escaped against Helm's parser

K3sBuilderExtensions.Helm.cs:214-228:

foreach (var (key, value) in release.HelmValues)
    sb.Append($" --set {ShellEscape($"{key}={HelmEscape(value)}")}");

The shell layer is correct — POSIX '\'' single-quoting is robust. But HelmEscape is only applied to the value, never the key. Helm's own --set parser splits on ,, treats {/} as list literals, and uses \ as the escape character. A key containing a comma — e.g. WithHelmValue("a.b,c.d=evil", "true") — produces a single shell-quoted argument that Helm then parses as two assignments: a.b=true and c.d=evil. For local dev this is a foot-gun rather than a vulnerability (the AppHost is the trust boundary), but a real bug. None of the tests exercise a key with commas/braces/backslashes.

Fix: apply HelmEscape to both key and value, or validate keys against a conservative pattern (e.g. ^[A-Za-z_][A-Za-z0-9_.\[\]]*$) and throw ArgumentException otherwise.

8. Port-forwarder has no shutdown/dispose path — fire-and-forget leak risk

K3sBuilderExtensions.ServiceEndpoint.cs:179 + the whole of K3sInProcessPortForwarder.cs. _ = Task.Run(() => forwarder.RunAsync(logger, ct), ct); — fire-and-forget with no handle retained. K3sInProcessPortForwarder exposes neither IDisposable/IAsyncDisposable nor a Stop method. The only way to terminate the listener on 0.0.0.0:{port} is the ct passed in from the ResourceReadyEvent subscription.

Two concerns:

Cancellation token lifetime: ct is the event-handler's token. It is application-scoped in most Aspire versions, but the contract is not "AppHost stopping" — if the platform ever changes the semantics, the 0.0.0.0 socket leaks for the rest of the AppHost session with no IAsyncDisposable to fall back to.
Per-connection task orphan: inside RunAsync, each accepted connection spawns _ = Task.Run(() => ForwardConnectionAsync(tcp, logger, ct), CancellationToken.None) — uses CancellationToken.None, not ct. Unhandled exceptions in those tasks are caught and LogDebug'd only, so a thrown WebSocket exception during shutdown is invisible to operators.

Also note: ForwardConnectionAsync creates a fresh Kubernetes client (and thus a fresh TLS handshake) per accepted TCP connection (line 168-169) — wasteful but each is using-disposed.

Fix: make K3sInProcessPortForwarder implement IAsyncDisposable, store the listener and a linked CancellationTokenSource as fields, retain the forwarder reference on the endpoint resource (or in a registry keyed by resource name), and dispose on AppHost stop via builder.Eventing.Subscribe<BeforeStopEvent> or similar.

🟡 Medium

9. Kustomize directory and helm/kubectl kubeconfig bind-mounts are read-write

K3sBuilderExtensions.Manifest.cs:81, 88 and K3sBuilderExtensions.Helm.cs:99. None of these bind-mounts pass isReadOnly: true:

Kustomize overlay dir → /k8s-manifests (Manifest.cs:88) — the user's source tree under their checkout.
container/ (kubeconfig) → /root/.kube (Manifest.cs:81 and Helm.cs:99) — host-side kubeconfig directory.

The kubectl/helm scripts don't write to these paths today, so this is latent. But any plugin (krew, helm-secrets, etc.) the user adds via a custom image override (opts.HelmImage, opts.KubectlImage) gains write access to the host filesystem at those paths. WithReference(cluster) for user containers correctly uses isReadOnly: true (line 385) — the helm/kubectl/manifest paths should match.

Fix: add an overload that passes isReadOnly: true, or build ContainerMountAnnotation directly as WithReference does.

10. `K3sServiceEndpointResource.IsReady` can be stuck `false` forever with no recovery surface

K3sBuilderExtensions.ServiceEndpoint.cs:134-188 + K3sServiceEndpointResource.cs:56. RunEndpointAsync is fire-and-forget. The outer try/catch covers the synchronous setup, but the spawned forwarder.RunAsync(...) Task is not awaited — any unhandled exception inside RunAsync that escapes its inner catch (Exception ex) (e.g. an exception thrown by notifications.PublishUpdateAsync during the onReadyChanged callback) terminates the task silently. IsReady stays false, the dependent K3sServiceEndpointHealthCheck.CheckHealthAsync keeps returning Unhealthy, and any consumer using WaitFor(serviceEndpoint) blocks forever with no diagnostic.

The _ = notifications.PublishUpdateAsync(...) call on line 172 is also fire-and-forget; if it throws, the exception escapes the synchronous part of the callback (since Action<bool> cannot await), reaches TaskScheduler.UnobservedTaskException, and disappears.

Fix: make onReadyChanged an async-aware delegate, await PublishUpdateAsync inside it, and surface failures via the resource notification with State = FailedToStart. Add a watchdog: if the forwarder Task faults, publish FailedToStart on the endpoint resource so the dashboard surfaces the problem.

🟢 Minor

11. PR description references an integration test project that doesn't exist

The PR overview claims tests/CommunityToolkit.Aspire.Hosting.K3s.IntegrationTests/ — ubuntu-latest only exists. Only the unit-test project tests/CommunityToolkit.Aspire.Hosting.K3s.Tests/ (87 facts, no [RequiresDocker]) is present. Given the scope of dangerous code (port forwarder lifecycle, kubeconfig race, privileged containers, shell-script injection), the absence of any integration test that exercises an actual k3s container is a real gap — but the discrepancy with the description is what concerns me here.

Fix: add the integration test project (it would catch #2, #3, #5, #6, #8 trivially) or correct the PR description.

12. `WithHttpsDeveloperCertificate()` on the k3s container is misuse

K3sBuilderExtensions.cs:141. WithHttpsDeveloperCertificate() configures the ASP.NET Core dev cert for JavaScript/Python apps. k3s manages its own server certificate (the very cert whose SANs are extended via --tls-san on lines 90-93). Injecting the ASP.NET dev cert into the k3s container is at best a no-op, at worst pollutes the container with unused files. The build still succeeds.

Fix: remove the .WithHttpsDeveloperCertificate() call; the WithHttpsEndpoint declaration alone is sufficient.

13. `AddServiceEndpoint` missing whitespace validation on `serviceName`/`@namespace`

K3sBuilderExtensions.ServiceEndpoint.cs:32-47. The port range validation is correctly added (lines 44-46). But serviceName and @namespace only get ArgumentNullException.ThrowIfNull — empty or whitespace strings pass validation and produce an unhelpful runtime failure inside the Kubernetes API call. The convention elsewhere in this same file (e.g. WithHelmValuesFile line 133, WithPodSubnet line 281, WithServiceSubnet line 293) is ArgumentException.ThrowIfNullOrWhiteSpace.

Fix: use ArgumentException.ThrowIfNullOrWhiteSpace(serviceName) and (@namespace).

✅ Categories with no findings

Aspire resource semantics for parent-child — correctly modeled via IResourceWithParent<K3sClusterResource>.
Helm shell injection via ShellEscape — the POSIX '\'' technique is correctly implemented and properly applied to release name, chart ref, namespace, version, values-file paths, and --set values. The keys are not Helm-escaped (#7) but the shell layer itself is safe.
Repo conventions on namespaces — extension method classes correctly use namespace Aspire.Hosting; file-scoped; resources use Aspire.Hosting.ApplicationModel;; internal helpers use CommunityToolkit.Aspire.Hosting;. is null/is not null is used consistently. [ResourceName] is applied to name parameters in public extension methods. The CI workflow (tests.yaml) was updated.
Container runtime annotation usage — ContainerRuntimeArgsCallbackAnnotation(Action<IList<object>>) matches the documented Aspire API. Multiple invocations append redundant --add-host=... flags but Docker handles duplicates idempotently.
api/CommunityToolkit.Aspire.Hosting.K3s.cs — this is a new file, not an edit to an existing generated file, so the convention is not violated.

Summary

Top 3 most important findings:

Blocking: .gitignore contains literal Git merge conflict markers.
Major: K3sReadinessHealthCheck._cachedClient is dead state — the registration-factory pattern creates a fresh instance per check, so the cache never carries over. Every health-check tick allocates a new Kubernetes/HttpClient that is never disposed, and rewrites both kubeconfig variants non-atomically (#2 + #3 combined).
Major: AllocatePort() + K3sInProcessPortForwarder lifecycle. The port-allocation TOCTOU + interface-mismatch race (#4), the fire-and-forget forwarder with no dispose path (#8), and the IsReady-stuck-false failure mode (#10) together mean the service-endpoint feature has no reliable failure path: a transient startup race can leave the resource permanently unhealthy with no operator-visible diagnostic.

Design assessment:

The overall design — modelling the cluster as a privileged container with Helm/manifest/service-endpoint children — is sound for the stated local-dev inner-loop use case. The decisions to (a) run helm/kubectl as containers rather than requiring host binaries, (b) rewrite kubeconfig into local/ and container/ variants and bind-mount the appropriate one per consumer type, (c) make child containers poll for the kubeconfig file rather than WaitFor-ing the parent (avoiding the cluster-vs-node Ready deadlock), and (d) use an in-process KubernetesClient WebSocket port-forward bound on 0.0.0.0 for both host and container reach — are all well-considered and individually correct. The k3d-inspired init entrypoint and the deliberate avoidance of --cluster-init (with the explanatory comment about etcd peer IP changes) show evidence of real operational testing.

The execution falls short on resource lifecycle management (no IDisposable surfaces, fire-and-forget tasks with no observability, broken cache, leaked HttpClients), idempotency of builder operations (WithDataVolume, WithReference are not safe to call twice), and atomicity of shared state (kubeconfig file rewrites). These are all fixable without touching the design — the core architecture is good; the implementation needs a careful pass on "what happens on the second call / on shutdown / on transient failure".

IEvangelist

A few other things to consider.

IEvangelist · 2026-05-20T15:06:23Z

+                Mode = UnixFileMode.UserRead | UnixFileMode.UserWrite | UnixFileMode.UserExecute
+                     | UnixFileMode.GroupRead | UnixFileMode.GroupExecute
+                     | UnixFileMode.OtherRead | UnixFileMode.OtherExecute,


This exact UnixFileMode combination (effectively 0755 / rwxr-xr-x) is repeated four times across the integration:

K3sBuilderExtensions.cs:71-73 (server entrypoint, here)

K3sBuilderExtensions.cs:182-184 (agent entrypoint)

K3sBuilderExtensions.Helm.cs:79-81 (helm-install.sh)

K3sBuilderExtensions.Manifest.cs:74-76 (kubectl-apply.sh)

Let's encapsulate it into a single internal constant with a doc comment explaining why these specific bits were chosen as the default for injected shell scripts (UID assumptions inside the various base images, why OtherExecute is required, etc.) — that context is non-obvious from the call sites and would help future maintainers reason about whether 0750 or 0700 would be safer for a given script.

Something like:

/// <summary> /// Default Unix file mode (0755 / rwxr-xr-x) for shell scripts injected /// into k3s/helm/kubectl containers via <c>WithContainerFiles</c>. /// </summary> /// <remarks> /// // …explain the rationale here: why these exact bits, what fails without /// // them, and the trust assumptions about the container's runtime UID. /// </remarks> internal const UnixFileMode ExecutableScriptMode = UnixFileMode.UserRead | UnixFileMode.UserWrite | UnixFileMode.UserExecute | UnixFileMode.GroupRead | UnixFileMode.GroupExecute | UnixFileMode.OtherRead | UnixFileMode.OtherExecute;

Each call site then becomes Mode = K3sFileModes.ExecutableScriptMode,.

Thanks for the detailed review @IEvangelist, in the last two commits I addressed what you found.

# Finding Status

1 🚨 .gitignore conflict markers ✅ Resolved

2 🔴 Health check factory leaks client every tick ✅ Singleton + cache dropped

3 🔴 Non-atomic kubeconfig overwrite ✅ WriteAtomicAsync

4 🔴 AllocatePort TOCTOU + interface mismatch ✅ IPAddress.Any, fresh port on retry

5 🔴 WithDataVolume not idempotent ✅ Removes existing annotation first

6 🔴 WithReference duplicate mount ✅ Idempotency guard on target path

7 🔴 Helm --set key not Helm-escaped ✅ HelmEscape on both key and value

8 🔴 Port-forwarder fire-and-forget, no dispose ✅ IAsyncDisposable, linked CTS, ResourceStoppedEvent

9 🟡 Bind-mounts read-write ✅ isReadOnly: true on all kubeconfig mounts

10 🟡 IsReady stuck false, PublishUpdateAsync fire-and-forget ✅ await using ensures cleanup; PublishUpdateAsync at line 185 is fire-and-forget inside the synchronous callback — acceptable for local dev since the notification system is internal and well-tested — if you want the full async-delegate refactor, let me know and I will change it

11 🟢 PR description references missing integration test project ✅ Removed (I created the project initially under tests/CommunityToolkit.Aspire.Hosting.K3s.IntegrationTests/, but the tests locally took to long to run, therefore I removed)

12 🟢 WithHttpsDeveloperCertificate() misuse ✅ Removed

13 🟢 Missing whitespace validation on serviceName/@namespace ✅ ThrowIfNullOrWhiteSpace

…isolation - Resolve .gitignore merge conflict markers (keep both sides) - Register K3sReadinessHealthCheck as a singleton to prevent per-tick client leaks - Write kubeconfig variants atomically via temp-file-then-rename - Fix stale credential recovery to not delete the raw kubeconfig k3s will not rewrite - Fix AllocatePort to probe on IPAddress.Any and re-allocate a fresh port on retry - Make WithDataVolume and WithReference(cluster) idempotent against duplicate calls - Apply HelmEscape to --set keys as well as values - Mark kubeconfig bind-mounts as read-only - Throw InvalidOperationException on no-selector services instead of silently marking ready - Add whitespace validation on serviceName/namespace in AddServiceEndpoint - Remove WithHttpsDeveloperCertificate from k3s container (no-op for k3s) - Extract K3sFileHelpers constant for 0755 script mode and /tmp/k3s-kubeconfig.yaml path - Switch kubeconfig container mount from directory to file-level to isolate kubectl cache

Health check (K3sReadinessHealthCheck): - Drop _cachedClient — registration was already fixed to singleton so the cache was live, but a fresh Kubernetes client per check is simpler, removes all stale-client invalidation complexity, and the TLS overhead every 5 s is negligible for local dev - Extract EnsureKubeconfigVariantsAsync; stale detection now only deletes the derived local/ and container/ files — rawPath is never touched Port forwarder (K3sInProcessPortForwarder): - Implement IAsyncDisposable; own a CancellationTokenSource and TcpListener as fields so DisposeAsync can cancel and release the socket independently of the outer application-lifetime token - RunAsync links _cts.Token with the outer ct so either side can stop the loop - Per-connection Task.Run now uses the linked token instead of CancellationToken.None — connections can be cancelled on shutdown - On listener failure a fresh port is re-allocated for the retry Service endpoint wiring: - K3sServiceEndpointResource retains the Forwarder reference - RunEndpointAsync wraps forwarder in await using so DisposeAsync runs on any exit path - Subscribe to ResourceStoppedEvent on the cluster to dispose the forwarder immediately when the cluster container stops, releasing the host port without waiting for the application-lifetime token

IEvangelist · 2026-05-20T17:34:05Z

Is the name of this file common, or is this a play on words leaning on the k from Kubernetes?

yes, check here https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/

IEvangelist · 2026-05-20T17:35:36Z

+    }
+  },
+  "packages": {
+    "CommunityToolkit.Aspire.Hosting.K3s": ""


Hey @sebastienros, is this correct? I always forget..."" is valid, right?

IEvangelist

This LGTM, I'm not going to merge it though as this really belongs to @aaronpowell — I'll leave that to him. Thank you for the PR though, @edmondshtogu.

edmondshtogu · 2026-05-20T17:54:22Z

Thanks @IEvangelist, also test I added are passing, the failing ones are not related with my changes.

edmondshtogu added 3 commits May 13, 2026 18:50

feat: CommunityToolkit.Aspire.Hosting.K3s

983b1cd

feat: proper support for service endpoints, helm and manifests resources

4dda24b

feat: support files for helm/manifest

a2ce723

Copilot AI review requested due to automatic review settings May 14, 2026 10:45

Copilot started reviewing on behalf of edmondshtogu May 14, 2026 10:45 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

edmondshtogu mentioned this pull request May 14, 2026

feat: K3s Kubernetes cluster hosting integration #1321

Open

edmondshtogu added 4 commits May 14, 2026 14:07

Merge branch 'main' into main

2729a69

Merge branch 'main' into main

bc271ec

Merge branch 'main' into main

0e5c9de

compliting left work

d343ee3

edmondshtogu requested a review from Copilot May 18, 2026 12:52

Copilot started reviewing on behalf of edmondshtogu May 18, 2026 12:52 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

edmondshtogu and others added 3 commits May 18, 2026 14:59

upgrade example aspire version

0a4d13a

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

fix: copilot reviews

f9a50d1

chore: exluding integration tests for win runners

da4bf9d

edmondshtogu requested a review from Copilot May 18, 2026 14:59

Copilot started reviewing on behalf of edmondshtogu May 18, 2026 15:00 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

edmondshtogu and others added 2 commits May 18, 2026 17:10

chore: update example comment

ff66be4

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

fix: copilot reviews

37c11e3

edmondshtogu requested a review from Copilot May 18, 2026 17:57

Copilot started reviewing on behalf of edmondshtogu May 18, 2026 17:58 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

edmondshtogu and others added 3 commits May 18, 2026 20:15

fix: WaitForServiceReadyAsync service health check

99297cd

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

chore: addressing reviews

7d34ddc

fix: copilot reviews

7473170

edmondshtogu requested a review from Copilot May 18, 2026 18:25

Copilot started reviewing on behalf of edmondshtogu May 18, 2026 18:26 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

fix: copilot reviews

98bf921

edmondshtogu added 2 commits May 20, 2026 13:00

fix: issues with the integration test build and k3s api port proxying

8c11cc5

chore: excluding the integration test since take too long to run

e33ae5f

IEvangelist requested changes May 20, 2026

View reviewed changes

IEvangelist reviewed May 20, 2026

View reviewed changes

edmondshtogu added 2 commits May 20, 2026 17:47

edmondshtogu requested a review from IEvangelist May 20, 2026 16:48

fix: manifest apply exit code when no kustomize present

06a7b0f

IEvangelist reviewed May 20, 2026

View reviewed changes

IEvangelist approved these changes May 20, 2026

View reviewed changes

#	Finding	Status
1 🚨	`.gitignore` conflict markers	✅ Resolved
2 🔴	Health check factory leaks client every tick	✅ Singleton + cache dropped
3 🔴	Non-atomic `kubeconfig` overwrite	✅ `WriteAtomicAsync`
4 🔴	`AllocatePort` TOCTOU + interface mismatch	✅ `IPAddress.Any`, fresh port on retry
5 🔴	`WithDataVolume` not idempotent	✅ Removes existing annotation first
6 🔴	`WithReference` duplicate mount	✅ Idempotency guard on target path
7 🔴	Helm `--set` key not Helm-escaped	✅ `HelmEscape` on both key and value
8 🔴	Port-forwarder fire-and-forget, no dispose	✅ `IAsyncDisposable`, linked CTS, `ResourceStoppedEvent`
9 🟡	Bind-mounts read-write	✅ `isReadOnly: true` on all `kubeconfig` mounts
10 🟡	`IsReady` stuck false, `PublishUpdateAsync` fire-and-forget	✅ `await using` ensures cleanup; `PublishUpdateAsync` at line 185 is fire-and-forget inside the synchronous callback — acceptable for local dev since the notification system is internal and well-tested — if you want the full async-delegate refactor, let me know and I will change it
11 🟢	PR description references missing integration test project	✅ Removed (I created the project initially under `tests/CommunityToolkit.Aspire.Hosting.K3s.IntegrationTests/`, but the tests locally took to long to run, therefore I removed)
12 🟢	`WithHttpsDeveloperCertificate()` misuse	✅ Removed
13 🟢	Missing whitespace validation on `serviceName`/`@namespace`	✅ `ThrowIfNullOrWhiteSpace`

Conversation

edmondshtogu commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview of changes

What's included

Key design decisions

Usage

Example

PR Checklist

Other information

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

edmondshtogu commented May 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

edmondshtogu commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edmondshtogu commented May 19, 2026

Uh oh!

IEvangelist left a comment

Choose a reason for hiding this comment

Thorough code review

🚨 Blocking

1. Unresolved Git merge conflict markers in .gitignore (lines 26–38)

🔴 Major

2. K3sReadinessHealthCheck._cachedClient is dead state — Kubernetes client leaks every tick

3. Non-atomic kubeconfig overwrite on every health check

edmondshtogu commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 14, 2026 •

edited

Loading

edmondshtogu commented May 18, 2026 •

edited

Loading

1. Unresolved Git merge conflict markers in `.gitignore` (lines 26–38)

2. `K3sReadinessHealthCheck._cachedClient` is dead state — `Kubernetes` client leaks every tick

4. `AllocatePort()` TOCTOU race + interface mismatch

5. `WithDataVolume` is not idempotent — duplicate volume mounts

6. `WithReference(cluster)` adds a duplicate bind-mount on repeat invocation

7. Helm `--set` key injection — only the value is escaped against Helm's parser

10. `K3sServiceEndpointResource.IsReady` can be stuck `false` forever with no recovery surface

12. `WithHttpsDeveloperCertificate()` on the k3s container is misuse

13. `AddServiceEndpoint` missing whitespace validation on `serviceName`/`@namespace`

edmondshtogu May 20, 2026 •

edited

Loading