Feat/tasks mrtr extension#262
Open
panyam wants to merge 16 commits intomodelcontextprotocol:mainfrom
Open
Conversation
Adds the first scenario for the SEP-2663 io.modelcontextprotocol/tasks extension — a single TasksLifecycleScenario covering sync vs async dispatch, DetailedTask shape on tasks/get, tool errors vs protocol errors, and cancellation semantics. 8 ConformanceCheck records, all passing against a SEP-2663-conformant Go fixture. Why "tasks" (not "tasks-v2"): SEP-2663 IS the tasks surface once it lands; the v2 suffix is only meaningful in implementations that maintain a v1 surface alongside, which the conformance suite does not. Layout: - src/scenarios/server/tasks/lifecycle.ts — scenario class - src/scenarios/server/tasks/helpers.ts — raw-fetch escape hatch (the SDK's typed schemas strip resultType/inputRequests/...) - src/scenarios/server/tasks/lifecycle.test.ts — fork-local vitest runner. Two modes: spawn a fixture binary via MCPKIT_TASKS_BINARY, or point at an already-running server via MCPKIT_TASKS_SERVER_URL. Skips when neither is set so it doesn't break upstream CI runs that go through everything-server (which doesn't yet implement io.modelcontextprotocol/tasks). Scenario is registered in pendingClientScenariosList so all-scenarios.test.ts skips it; promote to active once the upstream fixture grows extension support. Tagged ['extension', DRAFT_PROTOCOL_VERSION] — selectable via --suite extensions and --spec-version draft.
Builds out the rest of the tasks scenarios (atop the lifecycle canary)
and adds the SEP-2322 ephemeral MRTR scenario in a sibling folder.
Both target their own fixtures; both runners are brand-neutral and
language-agnostic (TASKS_SERVER_URL / TASKS_SERVER_CMD,
MRTR_SERVER_URL / MRTR_SERVER_CMD; readiness via TCP polling).
Tasks ClientScenario classes:
- TasksLifecycleScenario (8 checks; v2-01..v2-08)
- TasksCapabilityNegotiationScenario (4 checks; v2-11/22/23/25, SEP-2575)
- TasksWireFieldsScenario (3 checks; v2-12/13/21)
- TasksRequestStateScenario (3 checks; v2-14/15/28)
- TasksMRTRInputScenario (3 checks; v2-16/17/29 partial fulfillment)
- TasksRequestHeadersScenario (3 checks; SEP-2243 request-header tolerance)
- TasksDispatchScenario (8 checks; v2-09/10/19/20/26/27/30/31)
- TasksStatusNotificationsScenario (1 check; SEP-2663 §notifications, optional)
MRTR ClientScenario class:
- MrtrEphemeralFlowScenario (7 checks + 1 SKIPPED; mrtr-01..07,
mrtr-08 deferred for spec terminology +
reference-impl reasons)
Both runners spawn the fixture via a shell command and detect readiness
by TCP-polling the URL's host/port — no log-line scanning, no
language-specific assumptions. The same env vars work for any server
implementation.
Scenarios are tagged ['extension', DRAFT_PROTOCOL_VERSION] and registered
in pendingClientScenariosList so all-scenarios.test.ts (which targets
the upstream everything-server) skips them until the fixture grows
SEP-2322 / SEP-2663 support.
Restructured around ClientScenario classes (one row per class with check-list under it) rather than per-numbered-test slugs. Documents fixture requirements, env vars, open spec questions, and the wire-format diff for each suite. Per AGENTS.md, severity follows spec keyword (MUST/MUST NOT → FAILURE, SHOULD/SHOULD NOT → WARNING). The READMEs explain why some checks emit INFO rather than FAILURE (optional emission paths per SEP-2322).
Author
panyam
added a commit
to panyam/mcpkit
that referenced
this pull request
May 5, 2026
The bulk of the v2 tasks + MRTR conformance lives in the upstream-bound fork now (panyam/mcpconformance, branch feat/tasks-mrtr-extension; upstream Draft PR modelcontextprotocol/conformance#262). Updates the README/WALKTHROUGH/walkthrough.go references in examples/tasks-v2 + examples/mrtr to point at the fork, the migration guide (docs/TASKS_V2_MIGRATION.md) likewise, and the matching Go test skip (server/mrtr_test.go) to point at the new conformance scenario path. No runtime changes.
panyam
added a commit
to panyam/mcpkit
that referenced
this pull request
May 6, 2026
Compress CLAUDE.md's Conformance section to a one-liner roll-up + add a Gotchas bullet for MCPCONFORMANCE_PATH (the env var the new testconf-tasks-v2 / testconf-mrtr targets shell into). The detailed fork-vs-local layout already lives in CAPABILITIES.md mcp-tasks-v2-conformance. Add a "Final disposition" footer to docs/SEP_2663_TASKS_CONFORMANCE_PLAN.md recording the graduation upstream (panyam/mcpconformance fork branch feat/tasks-mrtr-extension, Draft PR modelcontextprotocol/conformance#262) and noting the mcpkit-local folders are now vitest sentinels reserved for future mcpkit-stricter scenarios. No memory pruning — the four feedback notes are still working guidance, not duplicates of checked-in docs.
Two reviewer-driven additions: 1. SEP-2663 createdAt / lastUpdatedAt ISO-8601 assertion in `tasks-server-task-creation` (per Luca's PR modelcontextprotocol#262 review feedback). The check now flags servers that emit non-ISO timestamps (epoch seconds, RFC-2822, etc.) on TaskInfoV2 envelopes. 2. Factor cross-cutting test-harness helpers into _shared/: - `_shared/test-runner.ts` — `waitForServerReady` (renamed from `waitForTcpReady`; the call site cares about server readiness, not the TCP-poll mechanism). Imported by tasks/ and mrtr/ all-scenarios.test.ts; replaces ~30 LOC of inline duplication in each. - `_shared/wire-format.ts` — `ISO_8601_PATTERN` constant + `isIso8601(s)` predicate. Documented rationale for choosing a regex over `Date.parse` (too permissive), `new Date(s).toISOString()` (too strict), or `Temporal.Instant.from` (Node 24+ experimental). Future wire-shape predicates (data URI, percent-encoded filename, etc.) can land here. Cherry-pick footprint when graduating to upstream PR is the SEP folder + the imported `_shared/` files. First PR through carries them upstream; subsequent feat branches inherit via standard upstream-sync flow. All 9 scenario tests still pass against the Go reference fixtures.
panyam
added a commit
to panyam/mcpconformance
that referenced
this pull request
May 6, 2026
Pure rename — the call site cares about server readiness, not the TCP-poll implementation detail. Matches the rename now landed on feat/tasks-mrtr-extension (PR modelcontextprotocol#262).
… helpers Drops initRawSession/rawRequest/rawRequestFull from tasks/helpers.ts in favor of the SDK's Client + StreamableHTTPClientTransport, paired with a Zod passthrough schema (AnyResult) that preserves SEP-2663 / SEP-2322 draft fields the SDK's typed schemas would strip. headers.ts and notifications.ts keep a small inline fetch where the SDK can't reach: per-request HTTP headers (SEP-2243) and SSE notification observation. Both reuse the SDK session via transport.sessionId. All SEP-2663 + MRTR ephemeral-flow scenarios pass against the Go fixture.
9 tasks
PR 2663 commit 62758914 standardised every duration field on the Ms suffix, integer milliseconds. wire-fields.ts now asserts ttlMs and pollIntervalMs are present on CreateTaskResult, the legacy v1 ttl and pollInterval keys are absent (already covered), and the interim ttlSeconds / pollIntervalMilliseconds keys are also absent on a post-2026-05-07 server. lifecycle.ts and the scenario README pick up matching prose updates. Verified by make testconf-tasks-v2 (8/8) against a renamed mcpkit fixture, and make testconf-mrtr (7/7 + 1 SKIPPED) against the paired MRTR surface.
SEP-2322 merged on 2026-05-06 with the variant renamed from IncompleteResult to InputRequiredResult and the resultType discriminator from "incomplete" to "input_required" (commit de6d76fb, per dsp-ant request). The MRTR_INCOMPLETE_RESULT_TYPE constant was specifically designed as a one-line flip point for this scenario. Renames - MRTR_INCOMPLETE_RESULT_TYPE = "incomplete" -> MRTR_INPUT_REQUIRED_RESULT_TYPE = "input_required" - isIncompleteResult -> isInputRequiredResult - All "IncompleteResult" -> "InputRequiredResult" in scenario prose and check descriptions (ephemeral-flow.ts, README.md) SEP-2663 had not yet flipped its discriminator literal as of PR head 82fb2c4d (5/7 21:52 UTC). Caitie's 5/15 RC commitment (issue comment 4384052694 on PR 2322) tracks the alignment to "input_required" both sides. The constant remains the one-line flip point in case the 2663 follow-up surprises us. Tested via mcpkit's make testconf-mrtr (7/7 + 1 SKIPPED green against a renamed mcpkit fixture) and make testconf-tasks-v2 (8/8 still green, no regressions on the paired surface).
Lefthook prettier reformatted column alignment on first push attempt; README also had a stale "renamed from InputRequiredResult" — should read "renamed from IncompleteResult". Fix both.
Two stale references in the mrtr-tasks-composition SKIPPED check: - Comment block + errorMessage framed blocker (a) as "spec authors disagree" / "input_required vs incomplete". SEP-2322 merged 2026-05-06 with "input_required" (commit de6d76fb). The blocker now reads as "SEP-2663 has not yet aligned to the merged 2322 literal" — Caitie's 5/15 RC commitment (PR 2322 issue comment 4384052694) tracks the alignment. - errorMessage referenced "IsIncomplete signal" — that field was renamed to IsInputRequired on the mcpkit side in lockstep with the SEP-2322 wire-variant rename. Updated to match. Status stays SKIPPED because blocker (b) — the mcpkit middleware refactor (issue 347) — is still open.
…sage After SEP-2322 merged with "input_required", the only blocker that actually keeps mrtr-08 SKIPPED is the eager-task-creation pattern in reference-server middleware (panyam/mcpkit issue 347). The earlier two-blocker framing read as if the test were waiting on both, but blocker (a) is effectively resolved for any server that emits the merged-2322 literal — leaving (b) as the sole gate. Tighten the comment block + description + errorMessage to lead with the middleware refactor and demote the discriminator history to a parenthetical aside.
LucaButBoring
suggested changes
May 8, 2026
| if (!('ttlMs' in result)) { | ||
| errs.push( | ||
| 'CreateTaskResult MUST carry ttlSeconds (renamed from v1 `ttl`)' | ||
| 'CreateTaskResult MUST carry ttlMs (renamed from v1 `ttl` and from the interim `ttlSeconds`)' |
There was a problem hiding this comment.
We should be sure to clean up these interim comments before finalizing this (they're useful for keeping up with the SEP changes though)
Author
There was a problem hiding this comment.
Done - time to start cleaning up anyway
| result.ttlSeconds !== null && | ||
| (typeof result.ttlSeconds !== 'number' || result.ttlSeconds <= 0) | ||
| result.ttlMs !== null && | ||
| (typeof result.ttlMs !== 'number' || result.ttlMs <= 0) |
There was a problem hiding this comment.
We should add a Number.isInteger() check to this and pollIntervalMs.
Two requested changes from the SEP-2663 author's review pass on
modelcontextprotocol/conformance PR 262
(pullrequestreview-4254601106).
(1) Drop the interim-key absence checks. The transition window between
"ttlSeconds / pollIntervalMilliseconds" and the final
"ttlMs / pollIntervalMs" wording is over now that the merged
spec settled on the Ms suffix. Useful while the spec was in
flight, noise once it stabilized. Removes the absence checks
plus the surrounding comment + description + details fields.
(2) Add Number.isInteger() to ttlMs and pollIntervalMs validation.
Spec says integer milliseconds; the previous typeof + range check
would have allowed fractional values. Now both fields fail if
they're not integers.
README scenario table tightened: "ttlMs + pollIntervalMs present and
integer-valued; legacy ttl / pollInterval keys absent".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds server-conformance scenarios for SEP-2663 (Tasks Extension), with incidental coverage of SEP-2575 (per-request capability override) and SEP-2243 (Mcp-Method/Mcp-Name request headers) in the parts of the surface where they bind to tasks. Plus one new MRTR-adjacent check (
mrtr-tasks-composition, currentlySKIPPED) for the SEP-2663 commit451f5e1MRTR→Tasks promotion flow. 8 ClientScenario classes / ~33 internal checks for tasks plus 1 class / 7 SUCCESS + 1 SKIPPED for the MRTR↔Tasks composition placeholder. Tagged['extension', DRAFT_PROTOCOL_VERSION]per #255 conventions and registered inpendingClientScenariosListso defaulteverything-serverruns stay green.Motivation and Context
SEP-2663 (Tasks Extension), SEP-2575, and SEP-2243 are in active draft and currently have no conformance coverage in this repo. SDKs implementing them - including ones already shipping reference servers - have nothing to validate against, so wire-shape regressions and edge-case behavior (cancellation semantics, requestState handling, capability gating) slip
through SDK-internal tests. This PR fills that gap. The new scenarios assert what the spec text says, not what any specific implementation does, so any SDK can run them.
The MRTR↔Tasks composition placeholder (
mrtr-tasks-composition,SKIPPED) is a forward-looking marker for SEP-2663 commit451f5e1, which made the "MRTR rounds then promote to a task" flow normative on the wire - see the open spec questions below for why it's deferred.How Has This Been Tested?
Run end-to-end against a reference Go fixture from the in-flight
panyam/mcpkitSDK:Branch results:
The runner is brand-neutral and language-agnostic - fixture wired via env vars, spawn via
sh -c, readiness via TCP polling, no log-line scanning. Anyone's server in any language works. Reference fixtures:npm testagainst the upstreameverything-servercontinues to pass - the new scenarios live inpendingClientScenariosListsoall-scenarios.test.tsskips them untileverything-servergrows extension support.Breaking Changes
None. All new scenarios are additive and tagged as
'extension'+DRAFT_PROTOCOL_VERSION, so they're invisible to dated--spec-versionruns and only appear under--suite extensionsor--spec-version draft. Default CI runs againsteverything-serverare unaffected (the new scenarios are filtered out viapendingClientScenariosList).Types of changes
Checklist
Additional context
Relationship to PR #188 (SEP-2322 MRTR)
Complementary, not overlapping. SEP-2663 builds on SEP-2322's base types, so a few of the tasks scenarios touch the MRTR shape (
inputRequests,requestState,resultType) in their tasks-on-the-wire form (status:"input_required"ontasks/get,tasks/updateresume path, partial inputResponses fulfillment). The standalone-ephemeral-MRTR coverage stays in #188.The branch also contains a
src/scenarios/server/mrtr/folder with ephemeral-flow scenarios mirroring some of #188's checks. Those exist because the tasks reference fixture exercises the full MRTR base, and running them locally caught a real bug. For this upstream merge:mrtr-ephemeral-flow.tschecks in favor of Conformance Tests for SEP-2322 MRTR #188's scenarios.mrtr-tasks-compositioncheck (currentlySKIPPED) is the genuinely new contribution this PR makes to MRTR coverage.Scope (8 ClientScenario classes, ~33 checks)
tasks-lifecycle- sync vs task dispatch, DetailedTask shape, tool errors vs protocol errors, cancel ack, cancel-on-terminal -32602tasks-capability-negotiation- extension advertised undercapabilities.extensions;tasks/*gated behind negotiation; SEP-2575 per-request opt-intasks-wire-fields-ttlSeconds/pollIntervalMillisecondsrenames, no early TTL expiry, norelated-task_meta on inlined resulttasks-request-state- optional emission, echo acceptance, stale-but-valid tolerance (tasks-surface form)tasks-mrtr-input- inputRequests on tasks/get, tasks/update resume, partial-fulfillment with multi-input fixturetasks-request-headers- SEP-2243 server tolerates routing headers; body authoritative when conflictingtasks-dispatch-and-envelope- removed v1 methods (-32601), legacytaskparam ignored,resultType:"complete"on every non-task response, strong-consistency immediate tasks/get, unknown taskId -32602tasks-status-notifications- optional INFO check (notifications are MAY per spec)Plus
mrtr-ephemeral-flow(1 class / 7 SUCCESS + 1 SKIPPED) undersrc/scenarios/server/mrtr/.Design highlights
TASKS_SERVER_URL/TASKS_SERVER_CMD(andMRTR_SERVER_URL/MRTR_SERVER_CMD). Spawn viash -c, readiness via TCP polling, no log-line scanning. Suite isdescribe.skip'd when env vars are unset.resultType,taskId,inputRequests,requestState, inlined result/error). Helpers insrc/scenarios/server/tasks/helpers.tsprovideinitRawSession+rawRequest/rawRequestFullso scenarios read those fields directly. When the SDK gains schemas for SEP-2663 shapes, thecall sites switch back to
client.request(..., AnyResult)and the helper shrinks. Similar in spirit to the raw-MCPadditions in Conformance Tests for SEP-2322 MRTR #188 - could converge on a shared helper.
pendingClientScenariosList-all-scenarios.test.tsskips them sinceeverything-serverdoesn'timplement the extension yet. CLI lookup (
getClientScenario(name)) still finds them.Open spec questions
MRTR
resultTypediscriminator value. SEP-2322's draft uses"input_required"; SEP-2663's draft uses"incomplete". Centralized asMRTR_INCOMPLETE_RESULT_TYPEfor a one-line flip when SEP authors converge. Tracked at modelcontextprotocol/modelcontextprotocol PR 2663 comment 4381885336 and PR 2322 comment 4381884825.mrtr-tasks-composition. SEP-2663 commit451f5e1made the MRTR→Tasks promotion flow normative on the wire: a singletools/callMAY exchange one or moreIncompleteResultrounds and then returnCreateTaskResulton a subsequent round. Implementing this requires the server middleware to defer task creation until the handler signals async-promotion - the natural alternative (mint the task up-front the moment a tool advertises task support) doesn't fit, because by the time the handler'sIsIncompletesignal is observable, theCreateTaskResultis already on the wire. This is a wire-contract requirement, not an SDK-specific implementation choice; existing SDKs across languages that took the up-front pattern will need refactoring before this check can pass anywhere. Combined with Adjust test and allow running in interactive mode #1 above, that's why the check isSKIPPEDtoday.Closes: #261