[CELEBORN-2317] Validate applicationId to prevent worker path traversal#3674
[CELEBORN-2317] Validate applicationId to prevent worker path traversal#3674afterincomparableyum wants to merge 3 commits into
Conversation
|
I'll fix CI issues and push |
1a54265 to
6e65966
Compare
|
Thanks for the fix! The path traversal issue is real and the approach is solid. One suggestion: I think the So the checks added in That said, the canonical path containment check in Reviewed with Claude Code |
|
ping @RexXiong I have addressed your comment. |
05aa4ca to
1c79a19
Compare
|
ping @SteNicholas @RexXiong could you please review this when you get the chance. |
1d92a40 to
cf8d472
Compare
Review — Validate applicationId to prevent worker path traversalGood catch on the underlying issue, and the fix targets the right sink. The worker builds What works well
Issues
Nice, well-scoped security fix overall — the main blocker is item 1 (make the description and code agree), and item 2 is a quick hardening of the validator itself. |
The worker builds local shuffle paths by concatenating applicationId received over RPC: `<workingDir>/<appId>/<shuffleId>/<fileName>`. The int32 and fileName is built from int id/epoch/mode, but it was never validated against a charset. With auth disabled, any client on the network could supply `appId = "../foo"` and have the worker mkdir, create, or delete files outside its working dir. With auth enabled, the SASL clientId == applicationId equality check in RpcEndpoint did not constrain format, so a tenant whose registered id contained `..` could still escape.
This change:
- Adds Utils.validateAppId enforcing `^[A-Za-z0-9_-]+$`, which matches
Spark (`application_<ts>_<n>`, `local-<ts>`), Flink, and MR formats.
- Calls it at every worker RPC entry point that takes an applicationId
or shuffleKey from the wire: Controller (ReserveSlots, CommitFiles,
DestroyWorkerSlots), PushDataHandler.handleCore, and the two
checkAuth sites in FetchHandler.
- Adds a canonical path containment check in
StorageManager.createDiskFile (local disk branch) as defense in
depth, before mkdirs() runs.
Single auth checkpoint for every current and future RPC. Dropped the redundant calls in Controller. Add comment for remote storages. Harden validateAppId
f4c68ba to
1924c26
Compare
|
Addressed your comment @SteNicholas |
What changes were proposed in this pull request?
This change:
^[A-Za-z0-9_-]+$, which matches Spark (application_<ts>_<n>,local-<ts>), Flink, and MR formats.Why are the changes needed?
The worker builds local shuffle paths by concatenating applicationId received over RPC:
<workingDir>/<appId>/<shuffleId>/<fileName>. The int32 and fileName is built from int id/epoch/mode, but it was never validated against a charset. With auth disabled, any client on the network could supplyappId = "../foo"and have the worker mkdir, create, or delete files outside its working dir. With auth enabled, the SASL clientId == applicationId equality check in RpcEndpoint did not constrain format, so a tenant whose registered id contained..could still escape.Does this PR resolve a correctness bug?
Yes
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit tests/CI