Python: Fix duplicate item error for tool-using agents under service-side storage (#3295)#5690
Python: Fix duplicate item error for tool-using agents under service-side storage (#3295)#5690moonbox3 wants to merge 1 commit intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR targets a 400 “Duplicate item found with id fc_* …” error encountered when tool-using agents run with service-side storage (e.g., previous_response_id / conversation_id), by changing how the OpenAI Responses client prepares outbound input items during continuation requests and adding a workflow-layer safeguard.
Changes:
- Update the OpenAI Responses chat client message preparation to (conditionally) strip server-issued response items from the outgoing
inputwhen service-side storage is in use. - Add a defense-in-depth change in
AgentExecutor.run()to clearservice_session_idwhen a workflow replays tool history viaAgentExecutorRequest. - Update/add unit tests across OpenAI/Foundry clients and workflow tests; add an issue-tracking markdown writeup.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| python/packages/openai/agent_framework_openai/_chat_client.py | Adds a request-level storage flag to message preparation and strips server-issued items (reasoning/function_call/approval/local-shell-call) on continuation requests. |
| python/packages/openai/tests/openai/test_openai_chat_client.py | Updates existing tests for the new signature/behavior and adds regression tests for stripping behavior under storage. |
| python/packages/foundry/tests/foundry/test_foundry_chat_client.py | Updates Foundry test usage to pass the new request_uses_service_side_storage flag. |
| python/packages/core/agent_framework/_workflows/_agent_executor.py | Clears service_session_id when a run request includes tool items (full-history replay scenario). |
| python/packages/core/tests/workflow/test_full_conversation.py | Removes the previous xfail and validates service_session_id is cleared on full-history replay. |
| issues/done/3295-02-strip-rule-fix.md | Adds a tracking doc describing the intended fix/contract and acceptance criteria. |
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 91%
✓ Correctness
The PR correctly fixes the duplicate item error by stripping server-issued response items (function_call, reasoning, approval, local-shell-call) from the wire input when service-side storage is active. The logic is sound: strip_server_items is correctly derived from uses_service_side_storage, the function_result/function_call_output is correctly preserved (keeping call_id for server-side pairing), and the executor defense-in-depth only fires for the coordinator-replay path (AgentExecutorRequest with tool items). The from_response path is properly left unchanged. Reasoning items becoming unconditionally dropped is intentional and correct—the old code that kept them under storage+function_call was actually vulnerable to the same rs_* duplicate issue. No correctness bugs found.
✓ Security Reliability
The PR implements a well-structured fix for duplicate item errors when using service-side storage with tool-calling agents. The primary fix is at the chat-client message preparation chokepoint, gated on the existing request_uses_service_side_storage flag. A defense-in-depth change clears service_session_id in the executor when replayed tool items are detected. No security vulnerabilities, resource leaks, or unhandled failure modes were identified. The Content.additional_properties is always a dict (never None per _types.py:528-530), Message.contents is always a list (per _types.py:1744-1747), and session state mutation is safe in Python's async model. The default parameter value of request_uses_service_side_storage=True is fail-safe (strips potential duplicates rather than causing 400 errors).
✓ Test Coverage
The PR has strong test coverage. New regression tests directly validate the core fix (strip function_call/reasoning/approval/shell-call items under service-side storage, keep them without). All four affected content types have dedicated storage-on/off test cases. The executor defense-in-depth clearing is validated via an integration test (previously xfail, now passing). Existing tests were properly updated to explicitly pass the new flag. No significant test coverage gaps found.
✗ Design Approach
The PR’s new “strip all reasoning under service-side storage” rule conflicts with existing, still-authoritative OpenAI chat-client tests that require reasoning items to be preserved on continuation turns, except for attributed replay history. I did not find other design-approach issues with comparable evidence.
Suggestions
- Keep the new strip rule for duplicated server-issued items (function_call, approval, local-shell-call results) but preserve the existing reasoning contract: continuation requests should still carry live reasoning items, while only attributed-replay reasoning is omitted.
Automated review by moonbox3's agents
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||
e7e73e4 to
11dce09
Compare
…t#3295) Fixes microsoft#3295. When the OpenAI Responses chat client sends a request that carries previous_response_id / conversation_id / conversation, the server already has the prior turn's response items and rejects duplicates with "Duplicate item found with id fc_xxx". The chat client was re-sending them inline whenever the input messages still carried the items in additional_properties (workflow replay, history providers, etc.), which broke any tool-using agent with persistent history. Decisions: - Single chokepoint: _prepare_message_for_openai. When the resulting request uses service-side storage, drop function_call, reasoning, approval-request/response, and local-shell-call items from the wire input. Keep function_result with its call_id; the server pairs it to the prior function_call via that key. - function_result is preserved unconditionally except for the local-shell variant, which carries its own server-issued item id. - No public API change. Wire format change is subtractive and only on requests that would otherwise 400. - Re-pointed the strict-xfail in test_full_conversation.py from microsoft#4047 to microsoft#3295. Kept xfail because the test asserts executor-level session-id clearing, which is the defense-in-depth half tracked by 3295-03; this slice closes the wire-level half. Files: - python/packages/openai/agent_framework_openai/_chat_client.py: strip rule applied alongside the existing reasoning-item branch. - python/packages/openai/tests/openai/test_openai_chat_client.py: four new tests pin the contract (function_call, approval, local-shell-call stripped under storage; everything kept without storage). Updated pre-existing tests that exercised the storage-on path to either pass request_uses_service_side_storage=False explicitly or assert the new strip behavior. - python/packages/foundry/tests/foundry/test_foundry_chat_client.py: same explicit storage-off opt-in for the inherited test. - python/packages/core/tests/workflow/test_full_conversation.py: re-pointed xfail reason to microsoft#3295 and the executor-level follow-up. Notes for next iteration: - 3295-01 (HITL wire-format validation against live OpenAI/Foundry) was not run; it requires the user's API credentials. The PRD design is locked but the empirical confirmation is still pending. If script 3 fails on either provider, this slice may need to be revisited. - 3295-03 (clear service_session_id in AgentExecutor on full-history replay) remains open. After it lands the xfail in test_full_conversation.py can be removed. - pytest was not run in this iteration because uv-based pytest commands required interactive approval. Validation rests on careful reading; next iteration should run the openai + core test suites.
Motivation and Context
Fixes #3295. When an agent uses tools and any form of service-side storage is in play (a session, a
RedisHistoryProvider, aCosmosHistoryProvider, or a workflow that replays prior conversation), the second turn fails against the OpenAI Responses API with:Three field reports converge on the same root cause:
RedisHistoryProviderand tools, two turns (pamelafox).CosmosHistoryProviderand tools on Foundry, two turns (CristinaStn, still reproducing on 1.1.0).In every case the prior turn's function-call item, persisted with its server-issued
fc_*id inadditional_properties, gets replayed inline in the next request whileprevious_response_id(orconversation) also points the server at the same prior response. The server sees the samefc_*id twice and rejects the request. The same shape reproduces withrs_*(reasoning), approval, and local-shell-call IDs.The current workaround is to drop
tools=from any agent that uses persistent history, which defeats the point.Description
The fix lives at one chokepoint: the OpenAI Responses chat client's outbound message preparation, gated on the existing
request_uses_service_side_storageflag. All three reported repro paths flow through this function, so a single change closes them all.Behavior contract. When the outgoing request carries
previous_response_id,conversation_id, orconversation:function_call, reasoning, approval-request/response, and local-shell-call items are dropped from the wireinputarray. The server already has them from the prior response.function_resultitems are kept untouched. They reference the prior call bycall_id, which the server resolves throughprevious_response_id. This is the load-bearing wire-format contract.When the outgoing request carries no storage marker, behavior is byte-identical to today.
This mirrors the existing reasoning-item treatment that was already in place at the same chokepoint, generalizing it to all server-issued item identities.
Defense in depth at the workflow layer. A second commit clears
service_session_idinAgentExecutor.run()when the incomingAgentExecutorRequestcarries function-call/result items. The wire-level "Duplicate item" error is already closed by the chat-client strip; this commit makes the executor wiring reflect intent (the executor is replaying a conversation it does not own, so it should not present continuation context from a prior response at all). Thefrom_responsehandoff path is unchanged and continues to preserveservice_session_idfor legitimate continuation.Live-API validation. Before merging, the wire-format contract was validated against live OpenAI and Foundry endpoints across four repro shapes:
RedisHistoryProviderwith tools, two turnsprevious_response_id+ onlyfunction_call_output(no inlinefunction_call)The third row is the load-bearing probe: it confirms the OpenAI Responses API does pair
function_call_output.call_idagainst the prior response's stored function_call whenprevious_response_idis set, with no inline function_call required.CosmosHistoryProviderwas not run separately because it shares the chat-client chokepoint with Redis; if Redis passes, Cosmos passes by the same code path.No public API change. No persistence-format change. No new configuration knobs.
Contribution Checklist