Add Supervisor CSI CreateVolume support for vSAN File Service volumes via FVS#3972
Conversation
b0c1488 to
541c876
Compare
9266949 to
bd0d8b0
Compare
|
Triggering CSI-TKG Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete |
|
Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete |
|
FAILED --- Jenkins Build #1250 |
bd0d8b0 to
71161bd
Compare
|
Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete |
|
Triggering CSI-TKG Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete |
|
FAILED --- Jenkins Build #1019 |
|
SUCCESS --- Jenkins Build #1252 |
|
Triggering CSI-TKG Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete |
|
SUCCESS --- Jenkins Build #1020 |
| // Phase 2: no existing FileVolume — pick one candidate namespace and create the CR. | ||
| if instanceNS == "" { | ||
| var targetNS string | ||
| for _, ns := range candidateNamespaces { |
There was a problem hiding this comment.
Would it be possible/reasonable to randomize candidateNamespaces each time? In the current approach one of the namespace could become a hotspot;
There was a problem hiding this comment.
done. added random logic.
// Randomize order so repeated creates do not always probe the same instance namespace first (hotspot).
rand.Shuffle(len(candidateNamespaces), func(i, j int) {
candidateNamespaces[i], candidateNamespaces[j] = candidateNamespaces[j], candidateNamespaces[i]
})
| Spec: fvv1alpha1.FileVolumeSpec{ | ||
| PvcUID: pvcUID, | ||
| Size: *qty, | ||
| Protocols: []fvv1alpha1.FileVolumeProtocol{fvv1alpha1.FileVolumeProtocolNFSv4}, |
There was a problem hiding this comment.
Is it always v4 protocol? do mount command change if its v4?
There was a problem hiding this comment.
we will always use nfs v4 protocol.
Devops does not have the option to request file volume with specific network protocol - nfs3, nfs4 etc.
| err = wait.PollUntilContextTimeout(ctx, fvsWaitStep, fvsWaitMax, true, func(ctx context.Context) (bool, error) { | ||
| if err := c.fileVolumeClient.Get(ctx, ctrlclient.ObjectKey{Namespace: instanceNS, Name: fvName}, fv); err != nil { | ||
| return false, err | ||
| } | ||
| phase := fv.Status.Phase | ||
| if phase == "" { | ||
| return false, nil | ||
| } | ||
| if strings.EqualFold(string(phase), string(fvv1alpha1.FileVolumePhaseError)) { | ||
| return false, fmt.Errorf("FileVolume %s/%s entered Error phase", instanceNS, fvName) | ||
| } | ||
| if !strings.EqualFold(string(phase), string(fvv1alpha1.FileVolumePhaseReady)) { | ||
| return false, nil | ||
| } | ||
| exportPath = fv.Status.ExportPath | ||
| endpoint = fv.Status.Endpoint | ||
| return exportPath != "" && endpoint != "", nil | ||
| }) |
There was a problem hiding this comment.
I was thinking about this, won't this be mostly static? why not cache this?
Btw, vcfa support 1.5k namesapce, and it could keep increasing.
There was a problem hiding this comment.
added namespace informer cache
| const ( | ||
| fvsGroup = "fvs.vcf.broadcom.com" | ||
| fvsVersion = "v1alpha1" | ||
| fvsWaitStep = 5 * time.Second |
There was a problem hiding this comment.
why are the waits needed? add in comment
| "no instance namespace with a healthy FileVolumeService for requested zones %v", requestedZones) | ||
| } | ||
|
|
||
| fvTyped := &fvv1alpha1.FileVolume{ |
There was a problem hiding this comment.
done printed
log.Infof("Creating FileVolume CR for PVC %s/%s: %+v", pvcNamespace, pvcName, fvTyped)
|
|
||
| fv := &fvv1alpha1.FileVolume{} | ||
| var exportPath, endpoint string | ||
| err = wait.PollUntilContextTimeout(ctx, fvsWaitStep, fvsWaitMax, true, func(ctx context.Context) (bool, error) { |
There was a problem hiding this comment.
Can we add some profiling here? we can know how long these creation takes
There was a problem hiding this comment.
We already have metric to measure time it take to create file volume. For now just to align with csi-provisioner sidecar timeout for createvolume, we have set max wait timeout to 5 minutes.
prometheus.CsiControlOpsHistVec.WithLabelValues(volumeType, prometheus.PrometheusCreateVolumeOpType,
prometheus.PrometheusPassStatus, faultType).Observe(time.Since(start).Seconds())
|
|
||
| candidateNamespaces, err := c.listFVSCandidateInstanceNamespaces(ctx, pvcNamespace, requestedZones) | ||
| if err != nil { | ||
| return nil, csifault.CSIInternalFault, logger.LogNewErrorCodef(log, codes.FailedPrecondition, |
There was a problem hiding this comment.
What does FailedPrecondition mean here? This error code is not documented in CSI spec for CreateVolume. Maybe INVALID_ARGUMENT is more appropriate. It means "Source incompatible or not supported".
There was a problem hiding this comment.
FailedPrecondition is defined here
https://github.com/grpc/grpc/blob/master/doc/statuscodes.md
FailedPrecondition: The operation was rejected because the system is not in a state required for the operation's execution.
We have used this code at other places in CSI.
Since User provided PVC spec is valid and internal state of the system is not Ready (candidate namespaces not available to provision filevolume CR), returning FailedPrecondition here.
| break | ||
| } | ||
| if targetNS == "" { | ||
| return nil, csifault.CSIInternalFault, logger.LogNewErrorCodef(log, codes.FailedPrecondition, |
| return "" | ||
| } | ||
|
|
||
| // findVPCNetworkConfigurationForNamespace loads the supervisor Namespace, reads the |
There was a problem hiding this comment.
Can you add a note to follow up to see if it is possible not to go through all the namespaces?
There was a problem hiding this comment.
added TODO
// TODO: FileVolumeService: filter namespaces by labels for instance namespaces
nsObjs, err := c.namespaceLister.List(labels.Everything())
There was a problem hiding this comment.
label for instance namespace is finalized.
added namespace filtering with label fvs_instance_namespace = true in 5721ee7
| } | ||
|
|
||
| func TestNamespaceHasAnyRequestedZone_Intersection(t *testing.T) { | ||
| patches := gomonkey.ApplyMethod(reflect.TypeOf(&unittestcommon.FakeK8SOrchestrator{}), "GetZonesForNamespace", |
There was a problem hiding this comment.
This still uses gomonkey. There is a PR #3980 that replaces gomonkey with injectable clients. See if these can be replaced. This comment is non-blocking.
There was a problem hiding this comment.
fixed. Verified running unit tests locally on my mac
% go test -count=1 ./pkg/csi/service/wcp/ \
-run 'Test(IsVsanFileServicePolicyStorageClass|VPCPathFromVPCNetworkConfiguration|TopologyListFromZoneMap|NamespaceHasAnyRequestedZone|FvsAccessibleTopology|FindVPCNetworkConfigurationForNamespace|GetVPCPathForNamespace|ListFVSCandidateInstanceNamespaces|InstanceNamespaceHasReadyFileVolumeService|CreateFileVolumeViaFVS_ValidationAndPVC|ShouldProvisionVsanFileVolumeViaFVS)$'
ok sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/wcp 1.398s
dp024984@MY756TCN94 vsphere-csi-driver % go test -count=1 ./pkg/csi/service/wcp/ \
-run 'Test(IsVsanFileServicePolicyStorageClass|VPCPathFromVPCNetworkConfiguration|TopologyListFromZoneMap|NamespaceHasAnyRequestedZone|FvsAccessibleTopology|FindVPCNetworkConfigurationForNamespace|GetVPCPathForNamespace|ListFVSCandidateInstanceNamespaces|InstanceNamespaceHasReadyFileVolumeService|CreateFileVolumeViaFVS_ValidationAndPVC|ShouldProvisionVsanFileVolumeViaFVS)$' -v
=== RUN TestIsVsanFileServicePolicyStorageClass
--- PASS: TestIsVsanFileServicePolicyStorageClass (0.00s)
=== RUN TestVPCPathFromVPCNetworkConfiguration
--- PASS: TestVPCPathFromVPCNetworkConfiguration (0.00s)
=== RUN TestTopologyListFromZoneMap
--- PASS: TestTopologyListFromZoneMap (0.00s)
=== RUN TestFvsAccessibleTopology
--- PASS: TestFvsAccessibleTopology (0.00s)
=== RUN TestFindVPCNetworkConfigurationForNamespace
--- PASS: TestFindVPCNetworkConfigurationForNamespace (0.23s)
=== RUN TestGetVPCPathForNamespace
--- PASS: TestGetVPCPathForNamespace (0.20s)
=== RUN TestListFVSCandidateInstanceNamespaces
{"level":"info","time":"2026-04-21T17:35:07.836147-07:00","caller":"wcp/fvs_filevolume.go:162","msg":"listFVSCandidateInstanceNamespaces: consumer namespace \"pvc-ns\" uses VPC path \"/orgs/default/projects/default/vpcs/same-vpc\", requested zones [zone-a]"}
{"level":"info","time":"2026-04-21T17:35:07.837972-07:00","caller":"wcp/fvs_filevolume.go:193","msg":"listFVSCandidateInstanceNamespaces: namespace \"inst-ns\" matches consumer VPC path \"/orgs/default/projects/default/vpcs/same-vpc\" (VPCNetworkConfiguration \"inst-ns-44444444-4444-4444-4444-444444444444\")"}
{"level":"info","time":"2026-04-21T17:35:07.838003-07:00","caller":"wcp/fvs_filevolume.go:202","msg":"listFVSCandidateInstanceNamespaces: adding namespace \"inst-ns\" to FVS candidate list (VPC path \"/orgs/default/projects/default/vpcs/same-vpc\", requested zones [zone-a])"}
--- PASS: TestListFVSCandidateInstanceNamespaces (0.10s)
=== RUN TestInstanceNamespaceHasReadyFileVolumeService
--- PASS: TestInstanceNamespaceHasReadyFileVolumeService (0.00s)
=== RUN TestCreateFileVolumeViaFVS_ValidationAndPVC
=== RUN TestCreateFileVolumeViaFVS_ValidationAndPVC/missing_PVC_parameters
{"level":"error","time":"2026-04-21T17:35:07.838632-07:00","caller":"wcp/fvs_filevolume.go:300","msg":"missing PVC name or namespace in CreateVolume parameters","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/wcp.(*controller).createFileVolumeViaFVS\n\t/Users/dp024984/go/src/github.com/vsphere-csi-driver/pkg/csi/service/wcp/fvs_filevolume.go:300\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/wcp.TestCreateFileVolumeViaFVS_ValidationAndPVC.func1\n\t/Users/dp024984/go/src/github.com/vsphere-csi-driver/pkg/csi/service/wcp/fvs_filevolume_test.go:356\ntesting.tRunner\n\t/Users/dp024984/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.26.2.darwin-arm64/src/testing/testing.go:2036"}
=== RUN TestCreateFileVolumeViaFVS_ValidationAndPVC/missing_accessibility_requirements
{"level":"error","time":"2026-04-21T17:35:07.83877-07:00","caller":"wcp/fvs_filevolume.go:305","msg":"accessibility requirements are required for FVS file volume provisioning","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/wcp.(*controller).createFileVolumeViaFVS\n\t/Users/dp024984/go/src/github.com/vsphere-csi-driver/pkg/csi/service/wcp/fvs_filevolume.go:305\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/wcp.TestCreateFileVolumeViaFVS_ValidationAndPVC.func2\n\t/Users/dp024984/go/src/github.com/vsphere-csi-driver/pkg/csi/service/wcp/fvs_filevolume_test.go:368\ntesting.tRunner\n\t/Users/dp024984/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.26.2.darwin-arm64/src/testing/testing.go:2036"}
=== RUN TestCreateFileVolumeViaFVS_ValidationAndPVC/volume_name_without_extractable_PVC_UID
{"level":"error","time":"2026-04-21T17:35:07.838911-07:00","caller":"common/util.go:482","msg":"failed to extract UID from volumeName name: \"invalid-volume-name\"","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/common.ExtractVolumeIDFromPVName\n\t/Users/dp024984/go/src/github.com/vsphere-csi-driver/pkg/csi/service/common/util.go:482\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/wcp.(*controller).createFileVolumeViaFVS\n\t/Users/dp024984/go/src/github.com/vsphere-csi-driver/pkg/csi/service/wcp/fvs_filevolume.go:309\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/wcp.TestCreateFileVolumeViaFVS_ValidationAndPVC.func3\n\t/Users/dp024984/go/src/github.com/vsphere-csi-driver/pkg/csi/service/wcp/fvs_filevolume_test.go:384\ntesting.tRunner\n\t/Users/dp024984/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.26.2.darwin-arm64/src/testing/testing.go:2036"}
{"level":"error","time":"2026-04-21T17:35:07.838982-07:00","caller":"wcp/fvs_filevolume.go:311","msg":"failed to extract PVC UID from volume name \"invalid-volume-name\": failed to extract UID from volumeName name: \"invalid-volume-name\"","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/wcp.(*controller).createFileVolumeViaFVS\n\t/Users/dp024984/go/src/github.com/vsphere-csi-driver/pkg/csi/service/wcp/fvs_filevolume.go:311\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/wcp.TestCreateFileVolumeViaFVS_ValidationAndPVC.func3\n\t/Users/dp024984/go/src/github.com/vsphere-csi-driver/pkg/csi/service/wcp/fvs_filevolume_test.go:384\ntesting.tRunner\n\t/Users/dp024984/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.26.2.darwin-arm64/src/testing/testing.go:2036"}
--- PASS: TestCreateFileVolumeViaFVS_ValidationAndPVC (0.00s)
--- PASS: TestCreateFileVolumeViaFVS_ValidationAndPVC/missing_PVC_parameters (0.00s)
--- PASS: TestCreateFileVolumeViaFVS_ValidationAndPVC/missing_accessibility_requirements (0.00s)
--- PASS: TestCreateFileVolumeViaFVS_ValidationAndPVC/volume_name_without_extractable_PVC_UID (0.00s)
=== RUN TestShouldProvisionVsanFileVolumeViaFVS
=== RUN TestShouldProvisionVsanFileVolumeViaFVS/non-VPC_network_provider_returns_FailedPrecondition
=== RUN TestShouldProvisionVsanFileVolumeViaFVS/VPC_provider_FSS_disabled
=== RUN TestShouldProvisionVsanFileVolumeViaFVS/VPC_provider_FSS_enabled
--- PASS: TestShouldProvisionVsanFileVolumeViaFVS (0.00s)
--- PASS: TestShouldProvisionVsanFileVolumeViaFVS/non-VPC_network_provider_returns_FailedPrecondition (0.00s)
--- PASS: TestShouldProvisionVsanFileVolumeViaFVS/VPC_provider_FSS_disabled (0.00s)
--- PASS: TestShouldProvisionVsanFileVolumeViaFVS/VPC_provider_FSS_enabled (0.00s)
PASS
ok sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/wcp 1.151s
71161bd to
355836b
Compare
… (storage classes vsan-file-service-policy and vsan-file-service-policy-latebinding) through the File Volume Service (FVS) when NSX VPC networking and the supports_vsan_fileservice capability are enabled. Instance namespace selection - Resolve the consumer supervisor namespace's VPC using the namespace annotation nsx.vmware.com/vpc_network_config → cluster-scoped VPCNetworkConfiguration CR. - Match peer namespaces by status.vpcs[0].name; skip namespaces without the annotation. - Enumerate Namespaces and retain those with matching requested topology zones (namespaceHasAnyRequestedZone). When creating a FileVolume, choose the first such candidate namespace that has a FileVolumeService with status.healthState Ready (instanceNamespaceHasReadyFileVolumeService). FileVolume lifecycle - Create or reuse a FileVolume CR named by PVC UID in a chosen instance namespace; wait for Ready phase, export path, and endpoint from status. Volume identity and classification - Return CSI volume IDs as fv:<instance-namespace>:<filevolume-name> using common.FVSVolumeIDPrefix. RBAC for filevolumes, filevolumeservices (fvs.vcf.broadcom.com), and vpcnetworkconfigurations (crd.nsx.vmware.com) in supervisor manifests. Unit tests for FVS helpers, fake dynamic client list kinds, and gomonkey where needed.
355836b to
a840175
Compare
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: divyenpatel, xing-yang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
Add the supervisor CSI path that provisions vSAN file service volumes for PVCs created using storage classes vsan-file-service-policy and vsan-file-service-policy-latebinding through the File Volume Service (FVS) when NSX VPC networking and the supports_vsan_fileservice capability are enabled.
Instance namespace selection
FileVolume lifecycle
Volume identity and classification
RBAC for filevolumes, filevolumeservices (fvs.vcf.broadcom.com), and vpcnetworkconfigurations (crd.nsx.vmware.com) in supervisor manifests.
Unit tests for FVS helpers, fake dynamic client list kinds, and gomonkey where needed.
Testing done:
Pre-checkins
https://jenkins-vcf-csifvt.devops.broadcom.net/view/instapp/job/wcp-instapp-e2e-pre-checkin/1252/ - Pass
https://jenkins-vcf-csifvt.devops.broadcom.net/view/instapp/job/vks-instapp-e2e-pre-checkin/1019/ - Failed.
This PR is not related to VKS and only contains supervisor changes.
Verified creating file volume using legacy architecture to ensure no regression happening for CreateVolume workflow.
Unit tests
Special notes for your reviewer:
Release note: