[2/3] Integrate XSK maps into the core XDP datapath#6003
[2/3] Integrate XSK maps into the core XDP datapath#6003ProjectsByJackHe wants to merge 43 commits into
Conversation
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (60.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #6003 +/- ##
==========================================
+ Coverage 84.88% 85.51% +0.62%
==========================================
Files 60 60
Lines 18797 18846 +49
==========================================
+ Hits 15956 16116 +160
+ Misses 2841 2730 -111 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Planning on waiting until #6006 gets resolved to address the next wave of feedback. |
|
It's not clear to me why we don't have end-to-end tests exercising everything from providing the XSKMAP before creating a registration through actual XDP data paths flowing? Since we intend to officially support this feature, it is risky for both us and our partners to supply/depend on an API that does not have regression test coverage and a paved pattern to follow. |
That's in the works. I took the approach of building up an E2E flow manually first to see the shape of how the datapath integration would look like. There's likely going to be some limitations in the CI environment that prevents some scenarios from being exercised, but I am currently adding coverage for the parts that can be automated now that the skeleton of the general changes is mapped out in the current PR iteration. |
|
Let's ensure we add tests with each chunk of work we do. I understand it may be a lot of work to add tests, but the best time to protect the code with automation and review the coverage is when making the change. |
XDP needs to release a new driver version consumable by MsQuic in the CI before I can push my tests. I have them right now working over a local VM with the latest xdp drivers installed, and find that everything is working. |
|
Either we merge #6034 (or wait until the next XDP release) to add automated coverage, or having a one-off manual dispatch for the tests + manual local VM runs is good enough to give confidence to merge this integration. Let's decide on this first and align timelines. Currently, the map tests are pushed for review, but disabled. |
|
|
||
| _IRQL_requires_max_(PASSIVE_LEVEL) | ||
| QUIC_STATUS | ||
| CxPlatDpRawInsertXskByMapConfigs( |
There was a problem hiding this comment.
Is there a corresponding cleanup?
There was a problem hiding this comment.
are you asking where in msquic we close/cleanup the map handles given to msquic? If so, the ownership of the map handles is external; the app should own creation/deletion. When XSKs get destroyed upon msquic cleanup, it is the responsibility of the app to handle cleaning up the map handles as well, otherwise it will just have a bunch of stale entries in the map, where xdp will be directing traffic to dead sockets (dropping them basically).
There was a problem hiding this comment.
When XSKs get destroyed upon msquic cleanup, it is the responsibility of the app to handle cleaning up the map handles as well
I'm not sure I follow this - usually symmetry for create/delete is preferred. If MsQuic is inserting entries into the maps, it seems ideal for it to revert that during its teardown.
There was a problem hiding this comment.
+1 to this.
We recently had to do something similar for rules. We should have a best effort removal from the map on cleanup and failure.
The app owns the map, but MsQuic owns the XSK, so lets make sure we don't leave them in the map.
| } | ||
|
|
||
| const uint32_t FakeIfIndex = 0xDEAD; | ||
| const QUIC_XDP_MAP_HANDLE FakeHandle = (QUIC_XDP_MAP_HANDLE)(intptr_t)-1; |
There was a problem hiding this comment.
Might be wondering, why -1? it was previously 0x1234 (which I thought was out of bounds for windows handles).
Why not NULL or INVALID_HANDLE (0)? Indeed, 0 is invalid on windows but not posix.
Given that the general abstraction of raw-only-datapath mode is cross-platform, POSIX runs these tests too, and so we can add that coverage.
There was a problem hiding this comment.
That's a good point. Maybe we need a cross-plat abstraction for "invalid [socket|map|file] handle" which can either be a value we know must be invalid at compile time, or open some dynamically created object (e.g., an event object) that we expect will mismatch the expected handle type.
mtfriesen
left a comment
There was a problem hiding this comment.
We need to solve the testing problem - map mode should run all existing MsQuic tests that are not intrinsically incompatible with an XDP map.
Nearly all MsQuic protocol level tests should be compatible with XDP maps, so getting those exercised is a min bar.
| CxPlatInitialize(); | ||
|
|
||
| CXPLAT_DATAPATH* Datapath; | ||
| CXPLAT_DATAPATH* Datapath = NULL; |
There was a problem hiding this comment.
nitnit: this is C++, nullptr or default init should be prefered.
| CXPLAT_DATAPATH* Datapath = NULL; | |
| CXPLAT_DATAPATH* Datapath{}; |
| if (Source->IsSet.XdpEnabled && !Source->XdpEnabled && MsQuicLib.XdpMapConfigCount > 0) { | ||
| QuicTraceLogError( | ||
| SettingXdpDisabledInMapMode, | ||
| "[ lib] Error: XdpEnabled cannot be set to FALSE when XDP map mode is active."); |
There was a problem hiding this comment.
nit: let's make logs slightly more actionable for users. "xdp map mode" is our current name for this feature, but we don't expose it in the API. Let's also avoid double negatives.
| "[ lib] Error: XdpEnabled cannot be set to FALSE when XDP map mode is active."); | |
| "[ lib] Error: Xdp must be enabled when an XDP map was configured."); |
| // | ||
| // N.B. Currently only supported for Windows user-mode. | ||
| // | ||
| const struct QUIC_XDP_MAP_CONFIG* XdpMapConfigs; |
There was a problem hiding this comment.
nit: consider SAL to annotate the length.
| // | ||
| // N.B. Currently only supported for Windows user-mode. | ||
| // | ||
| const struct QUIC_XDP_MAP_CONFIG* XdpMapConfigs; |
There was a problem hiding this comment.
style: We don't repeat the "struct" keyword on use:
| const struct QUIC_XDP_MAP_CONFIG* XdpMapConfigs; | |
| const QUIC_XDP_MAP_CONFIG* XdpMapConfigs; |
| // The map configs must remain valid for the lifetime of the datapath. | ||
| // | ||
| // N.B. Currently only supported for Windows user-mode. | ||
| // |
There was a problem hiding this comment.
nit: This comment is a bit overly detailed for the config parameter; it is likely to fall out of sync.
The details would be better in a design doc file or at the relevant point in the code.
| // | |
| // | |
| // External XDP map configurations. When present, the datapath insert XSK sockets in | |
| // the provided maps at instead of configuring per-connection rules. | |
| // The map configs must remain valid for the lifetime of the datapath. | |
| // | |
| // N.B. Currently only supported for Windows user-mode. | |
| // |
| // since we are skipping OS platform specific initializations. | ||
| // | ||
| CXPLAT_DBG_ASSERT(InitConfig->XdpMapConfigs != NULL); | ||
| if (NewDataPath == NULL) { |
| if (NewDataPath == NULL) { | ||
| return QUIC_STATUS_INVALID_PARAMETER; | ||
| } | ||
| if (UdpCallbacks != NULL) { |
There was a problem hiding this comment.
if UdpCallbacks == NULL, shouldn't we fail? Or is it intentional to support a scenario with no callbacks?
I suspect this is an artifact of the "TCP or UDP callbacks are needed" logic, but we only have UDP here.
| DataPathInitialize( | ||
| if (InitConfig->XdpMapConfigCount > 0) { | ||
| // | ||
| // Raw-only datapath: the raw datapath must initialize successfully |
There was a problem hiding this comment.
The logic in that first if branch would be better factored in a function.
Either having a RawOnlyDataPathInitialize function, or extracting the logic from DataPathInitialize that is always needed, so you can do a DataPathCommonInitialize followed by RawDataPathInitialize here.
CxPlatDataPathInitialize should only deal with dispatching, not have low level logic.
|
|
||
| CXPLAT_DBG_ASSERT(Datapath->UdpHandlers.Receive != NULL || Config->Flags & CXPLAT_SOCKET_FLAG_PCP); | ||
| CXPLAT_DBG_ASSERT(IsServerSocket || Config->PartitionIndex < Datapath->PartitionCount); | ||
| CXPLAT_DBG_ASSERT(CxPlatDpRawIsRawDatapathOnly(Datapath->RawDataPath) || IsServerSocket || Config->PartitionIndex < Datapath->PartitionCount); |
There was a problem hiding this comment.
Question: Why do we need this change? It isn't clear to me why xdp maps would interfere with partioning.
|
|
||
| _IRQL_requires_max_(PASSIVE_LEVEL) | ||
| QUIC_STATUS | ||
| CxPlatDpRawInsertXskByMapConfigs( |
There was a problem hiding this comment.
+1 to this.
We recently had to do something similar for rules. We should have a best effort removal from the map on cleanup and failure.
The app owns the map, but MsQuic owns the XSK, so lets make sure we don't leave them in the map.
Description
Full E2E demo using tools from this PR : DEMO
Part 2 of the plan: #5982
Fixes #5972
Adds the core msquic datapath integrations for the new API contract defined in #5983
Key design decisions:
Testing
Added datapath unit tests. See the E2E demo.
Additionally, the plan is to leverage some supporting PRs that included code to ingest the latest XDP version, and kicking off a couple of manual CI runs based on that: https://github.com/microsoft/msquic/actions/runs/27313685557/job/80690420168
Documentation
Will come as part 3 in the plan: #5982