[ExecuTorch] Partitioner: emit target_device CompileSpec by shoumikhin · Pull Request #4272 · pytorch/TensorRT

shoumikhin · 2026-05-16T01:30:31Z

What

Mirror executorch/backends/cuda/cuda_partitioner.py's pattern in TensorRTPartitioner: append a target_device="cuda:0" CompileSpec if not already present.

Why

ExecuTorch's PropagateDevicePass (auto-runs in to_executorch()) reads the target_device CompileSpec from delegates and tags their I/O TensorSpec.device. The tag is then serialized into extra_tensor_info.device_type in the .pte.

Today the tag is metadata only — ExecuTorch's default memory planner (enable_non_cpu_memory_planning=False) still allocates all tensors on CPU regardless. But once a future ExecuTorch release wires CUDA-aware memory planning + a CUDA allocator backing memory, the tag becomes load-bearing: TRT subgraph I/O will be allocated directly on CUDA, eliminating per-call host↔device staging.

This PR emits the metadata at .pte build time so users get the future fast path automatically, without needing a new torch-tensorrt release.

Pattern precedent

Direct copy of executorch/backends/cuda/cuda_partitioner.py:42-52.

Changes

py/torch_tensorrt/executorch/partitioner.py: append CompileSpec(TARGET_DEVICE_COMPILE_SPEC_KEY, b"cuda:0") to compile_specs in TensorRTPartitioner.__init__ unless the user already provided a target_device spec.

Also: defensive copy of caller-supplied compile_specs list (was aliased; now list(compile_specs) if compile_specs else []).

Notes

No backend code change. An earlier draft of this PR added a "trust the device tag → skip cudaPointerGetAttributes syscall" fast path in TensorRTBackend::execute. That was incorrect: the tag is metadata, not a guarantee of pointer provenance. ExecuTorch's cuda_backend.cpp:498-502 documents this exact distinction. The backend continues to use cudaPointerGetAttributes as the source of truth.
No regression: old .pte files without extra_tensor_info are still parsed as CPU by the runtime; backend's existing auto-detect path handles them unchanged.

Test plan

Verified with a TRT-delegated .pte that extra_tensor_info.device_type=1 is emitted on delegate I/O tensors (portable runtime).
Verified that runtime numerical outputs are unchanged (backend still stages CPU→CUDA via cudaMemcpyAsync as before).
Caller's compile_specs list is not mutated post-construction.

lanluo-nvidia

Could you please raise it on top of main, no more changes to go into 2.12, I will add a release notes, stating always default cuda:0 device
Also make sure to change py/torch_tensorrt/compile.py:1245: since currently in the public ExecuTorch save path constructs TensorRTPartitioner() without any compilespecs

Mirror executorch/backends/cuda/cuda_partitioner.py pattern in TensorRTPartitioner: append a target_device="cuda:0" CompileSpec if not already present. ExecuTorch's PropagateDevicePass (auto-runs in to_executorch()) reads the target_device CompileSpec from delegates and tags their I/O TensorSpec.device. The tag is then serialized into extra_tensor_info.device_type in the .pte. Today the tag is metadata only — ExecuTorch's default memory planner (enable_non_cpu_memory_planning=False) still allocates all tensors on CPU regardless. But once a future ExecuTorch release wires CUDA-aware memory planning + a CUDA allocator backing memory, the tag becomes load-bearing: TRT subgraph I/O will be allocated directly on CUDA, eliminating per-call host->device staging. This PR emits the metadata at .pte build time so users get the future fast path automatically, without needing a new torch-tensorrt release. Also defensive-copies the caller-supplied compile_specs list (was aliased; now copies via list()). Test plan: - Verified TRT-delegated .pte emits extra_tensor_info.device_type=1 on delegate I/O tensors (portable runtime). - Verified runtime numerical outputs unchanged (backend still stages CPU->CUDA via cudaMemcpyAsync as before). - Caller compile_specs list not mutated post-construction.

shoumikhin · 2026-05-19T22:15:13Z

Thanks @lanluo-nvidia for the review!

Rebased onto main and switched the base branch. Also added the compile.py change you asked for: _save_as_executorch now accepts a compile_specs= kwarg and forwards it to TensorRTPartitioner. So the public ExecuTorch save path gets the default target_device="cuda:0" automatically, and callers who want a different device can pass compile_specs=[CompileSpec("target_device", b"cuda:N")] to torch_tensorrt.save(...).

Docstring on save() and the unused-kwarg warning are both updated to mention compile_specs=. PTAL.

meta-cla Bot added the cla signed label May 16, 2026

github-actions Bot added component: core Issues re: The core compiler component: api [Python] Issues re: Python API component: runtime labels May 16, 2026

github-actions Bot requested a review from cehongwang May 16, 2026 01:30

narendasan requested a review from lanluo-nvidia May 16, 2026 01:32

shoumikhin force-pushed the shoumikhin/executorch-device-tag-groundwork branch from 2de19fd to ac698c2 Compare May 16, 2026 06:58

shoumikhin changed the title ~~[ExecuTorch] Device-tag groundwork: emit CUDA target_device + honor it in backend~~ [ExecuTorch] Partitioner: emit target_device CompileSpec May 16, 2026

shoumikhin force-pushed the shoumikhin/executorch-device-tag-groundwork branch 4 times, most recently from 64705aa to f940de1 Compare May 17, 2026 05:12

lanluo-nvidia requested changes May 19, 2026

View reviewed changes

shoumikhin force-pushed the shoumikhin/executorch-device-tag-groundwork branch from f940de1 to 5b50cbf Compare May 19, 2026 22:14

shoumikhin changed the base branch from release/2.12 to main May 19, 2026 22:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ExecuTorch] Partitioner: emit target_device CompileSpec#4272

[ExecuTorch] Partitioner: emit target_device CompileSpec#4272
shoumikhin wants to merge 1 commit into
pytorch:mainfrom
shoumikhin:shoumikhin/executorch-device-tag-groundwork

shoumikhin commented May 16, 2026 •

edited

Loading

Uh oh!

lanluo-nvidia left a comment

Uh oh!

shoumikhin commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shoumikhin commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Pattern precedent

Changes

Notes

Test plan

Uh oh!

lanluo-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

shoumikhin commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shoumikhin commented May 16, 2026 •

edited

Loading