fix(oracle): emit tool_calls in streaming responses#1537
Conversation
Note on
|
Issue Discovery: OCI Streaming Tool CallsWhile testing portkey-strands (Strands Agents SDK integration with Portkey), I discovered a critical issue with OCI GenAI streaming responses for tool/function calling. The ProblemWhen using OCI GenAI models through the current production Portkey gateway, streaming responses do not properly return tool call data. The model indicates it wants to call a tool ( Test Results (before this PR):
Actual Error# Model returns end_turn with empty content instead of tool_use
AgentResult(
stop_reason='end_turn', # Should be 'tool_use'
message={'role': 'assistant', 'content': []}, # Tool call data missing
...
)The streaming response returns Why This PR Fixes ItThis PR includes proper streaming state management for tool calls:
VerificationAfter applying this PR's changes, all OCI models pass tool calling tests:
This enables full agentic workflows with OCI GenAI models through Portkey. |
|
cc @VisargD - This PR addresses a critical issue affecting agentic workflows with OCI GenAI models. The streaming tool call bug prevents Strands Agents (and likely other agent frameworks) from working properly with OCI models through Portkey. Would appreciate a review when you have time! |
|
Tracking issue created: #1538 This documents the critical streaming tool call bug and explains why keeping the Oracle GenAI integration up to date is important for enterprise customers. |
|
Rebased check done: branch is already up to date with Validation run:
Re-pinging for review. |
|
Friendly follow-up on this one when you have a moment: @VisargD @narengogi\n\nWould appreciate a review. Thank you! |
|
Maintainer note: keeping OCI support current is strategically important here, not just additive. OCI is increasingly exposing new model families and provider-specific behaviors behind a single enterprise surface, and that surface changes fast: model availability, request/response shapes, auth expectations, token parameters, streaming/tool-call behavior, and embedding support all evolve over time. If the OCI provider lags, Portkey users in Oracle environments end up with capability gaps or subtle incompatibilities even when the same models are available elsewhere. Keeping OCI up to date means Portkey stays credible as a gateway layer for Oracle-hosted GPT/Gemini/Llama/Grok/Cohere workloads, reduces integration drift for enterprise customers, and avoids pushing teams into provider-specific forks or direct OCI workarounds. This PR moves OCI much closer to parity with the broader provider matrix, which is exactly the kind of maintenance surface that compounds in value if it is kept current. |
|
Hey @VisargD @narengogi — friendly ping on this one. It's been open for about 3 weeks now. Beyond the technical additions (full OCI GenAI support, streaming tool call fix, 61 unit tests), there's a broader Portkey + Oracle partnership story here: Portkey becomes the gateway layer for enterprise teams running GenAI on OCI, and Oracle customers get unified API access to all their OCI-hosted models (Llama, Gemini, GPT, Grok, Cohere) through a gateway they may already be using for OpenAI/Anthropic. That's a compelling value prop for both sides — Portkey expands its enterprise reach, Oracle gets first-class representation in the provider matrix. Happy to jump on a call if it's easier to walk through the implementation. Would really appreciate a review when you get a chance! |
ed94379 to
63f0983
Compare
|
Friendly ping — this PR has been open for ~57 days without a review. Happy to address feedback or rebase further if needed. Let me know if there's anything I can clarify. Thanks! |
|
@fede-kamel I'm reviewing this now |
| }; | ||
|
|
||
| Object.keys(defaultValues ?? {}).forEach((key) => { | ||
| if (Object.hasOwn(baseParams, key) && !Array.isArray(baseParams[key])) { | ||
| baseParams[key].default = defaultValues?.[key]; | ||
| const param = baseParams[key]; |
There was a problem hiding this comment.
please revert these changes if they are not relevant to your PR
If baseParams[key] is an empty string "" or the number 0, the this change will skip it, whereas the first version would have processed it.
narengogi
left a comment
There was a problem hiding this comment.
too many cahnges, please trim to only necessary changes and raise a separate PR if cleaner
adding new providers, extending existing ones is rather simple as you can see here:
https://github.com/Portkey-AI/gateway/pull/1510/changes
| let fs: any; | ||
| let modulesLoaded = false; | ||
|
|
||
| async function loadNodeModules() { |
There was a problem hiding this comment.
what are these changes for?
can you please limit your changes to adding a new provider, these changes touch a lot of irrelevant code blocks, this will increase the testing scope and affects core logic in the gateway
|
Hi @narengogi — thanks for the review, this is really helpful. Inline comments — agreed, will revert:
On "too many changes" — quick clarification before I rework: I want to make sure I split this in the way that's most useful for you. The current PR bundles a few things that I think should each stand on their own — could you confirm the plan below works, or steer me differently?
Two specific questions:
Will hold off on the rework until you confirm direction. Thanks! |
The OCI streaming chunk transform discarded tool_calls when the provider sent them alongside the finishReason chunk, so agents using streamed tool-calling against OCI models received `finish_reason: "tool_calls"` with empty content and no tool data. Fix the streaming chunk transform to: - Track stable tool_call indices across chunks via OracleStreamState - Split the OCI tool_calls + finishReason chunk into a tool_calls delta followed by the finish-reason chunk and a [DONE] marker - Build the streaming delta object only with the fields actually present (role / content / tool_calls), matching OpenAI's wire shape Also propagate tool_calls and tool_call_id through the request transform and the non-streaming response transform so tool calling works end to end across stream and non-stream paths. Tests cover request shaping (toolCalls, toolCallId), non-stream extraction (with synthesised id fallback), and streaming chunk shape (tool_calls delta + finish chunk + DONE, plus index stability across multiple tool calls). Refs: OCI streaming tool-call bug surfaced via Strands Agents SDK.
63f0983 to
f985717
Compare
|
@narengogi — split per your review. This PR is now scoped to only the OCI streaming tool-call fix (the bug from #1538). I dropped:
What's left (force-pushed): 3 files, +387/−20, all under
The rest will land as separate PRs:
I'll open them after this one merges (or in parallel if you prefer — let me know). #1540 (rerank/guardrails) will rebase on top of whichever lands first. Re your inline comments on the previous version: yes, both Ready for review. |
|
Split is complete. Sibling PRs opened:
This PR (#1537) remains scoped to the streaming tool-call fix only (+387 / −20). The originally-bundled multi-vendor handler dispatch ( Sibling PRs are independent and can land in any order. |
… index Two streaming-tool-call bugs surfaced during e2e testing against live OCI: 1) The chunk transform returned `string[]` for the finish chunk, but the gateway's `readStream` only handles single-string returns. The array was coerced to a string with comma separators, producing a malformed `,data: [DONE]` tail and eating the SSE separators around the finish chunk. Concatenate the tool_calls delta + finish + DONE into a single string with proper `\n\n` separation instead. 2) OCI streams tool calls progressively: the first chunk per tool carries `id` + `name`, subsequent chunks carry argument fragments without an `id`. The previous index-tracking code treated each id-less continuation as a new tool call (incrementing the index), so a single tool call's argument fragments were spread across indices 0, 1, 1, 1 instead of staying at 0. Anchor on `id` when present; otherwise reuse the current index for continuation fragments. Also drop `id` / `name` from the emitted delta when the provider chunk omits them, matching OpenAI's streaming shape. Tests updated for the single-string return shape and a new case that verifies continuation chunks stay attached to the same tool index when `id` is omitted. Verified e2e against `meta.llama-3.3-70b-instruct` on `us-chicago-1.oci.oraclecloud.com/20231130/actions/chat`: tool_calls deltas + finish + [DONE] now arrive as 7 well-formed SSE events with the tool index pinned at 0 across all continuation fragments.
|
End-to-end test against live OCI surfaced two real bugs in this PR — both fixed in Bug 1 — Fixed by concatenating the (optional) tool_calls delta + finish chunk + Bug 2 — index-thrash on continuation chunks. OCI streams tool calls progressively: the first chunk for a tool carries Fixed by anchoring the index on E2E verification (live OCI,
Test added: The non-streaming path and request shaping were unaffected and continue to pass. |
|
@roh26it @narengogi @VisargD — flagging this for partner-level attention. Status across the OCI split:
All three are split per @narengogi's review on the original bundled PR, each diff is small and focused (#1537: +387/−20, #1634: +266/−0, #1635: +63/−1), all green on type-check / formatting / build, all changes scoped under The partner ask, plainly: The current published OCI provider in Portkey doesn't represent the quality bar of the OCI Generative AI product — streamed tool-calling is broken ( I'm coming at this as the Oracle-side engineer working on these integrations, and the goal here is straightforward: bring the OCI surface in Portkey up to the same parity bar the rest of the provider matrix gets. The PRs are deliberately small and focused so they're cheap to review — none of them touch shared gateway code, all the changes are inside the Oracle provider directory. Would really appreciate a review pass from the team. Happy to jump on a call to walk through the OCI streaming wire format and the GPT-5 / GPT-OSS family quirks if it's useful — there are a few more (verbosity, reasoning_effort, structured outputs) that would naturally follow these as a continued partnership. Thanks! |
|
@roh26it @narengogi @VisargD — gentle follow-up. This work has been in flight since late February. The original bundled PR sat for ~57 days before the first review, and after I split it per @narengogi's feedback into the three small focused PRs (#1537, #1634, #1635), we're back in a holding pattern. I'm not flagging this to be impatient — I'm flagging it because the cost of the delay is now landing on Oracle customers using Portkey for OCI Generative AI workloads. Streaming tool-calls don't work, GPT-5 returns 400, and embeddings aren't exposed. Every week these stay unmerged is another week those customers either work around the gateway or pick something else for OCI traffic, which doesn't serve either side of this. The PRs are deliberately easy to review:
All three are rebased on Would the team be able to commit to a review timeline this week? Even a "we'll get to this on X" would help me plan the follow-up work (verbosity / reasoning_effort / structured outputs) without piling more PRs into the same backlog. Happy to make any change you want to land these faster — including jumping on a 15-minute call to walk through the implementation if that's the lower-friction path. Really would like to get this over the line. Thanks! |
|
Full e2e verification across all three PRs — 23/25 PASS against live OCI ( Combined the three branches into a single test runner, built the gateway, and exercised every model family OCI exposes plus every code path the PRs touch: Group A — non-streaming chat (6 cases)
Group B — streaming text (5 cases)
Group C — non-stream tool calls (4 cases)
Group D — streaming tool calls (this PR's core fix)
Group E — embeddings (#1634)
Group F — GPT-5 maxCompletionTokens routing (#1635)
About the Cohere chat failures
This is not in scope for any of the three PRs. The bundled multi-vendor handler dispatch in the original #1537 would have addressed it, but per the trim-down request that's out, and Cohere chat parity should follow as its own focused PR once these three land. Cohere embeddings still work fine (Group E above) — it's only the chat endpoint shape that needs the separate fix. Bottom line23/25 across the live surface, with the only failures being a pre-existing Cohere chat-shape limitation that none of these three PRs change. All three PRs are independently mergeable and don't depend on each other. |
|
Thanks for the approval @narengogi! What's the next step from here to land this in a release? Happy to add a changelog entry, rebase, or anything else the merge process needs — just let me know what's missing. Also worth flagging: #1635 (GPT-5 |
Summary
Fixes the OCI streaming chunk transform so that
tool_callsare actually emitted to clients. Without this fix, agents using streamed tool calling against OCI models receivefinish_reason: "tool_calls"with no tool data — see #1538.What changed
src/providers/oracle/chatComplete.ts(+ supporting types intypes/ChatDetails.ts):toolCallstogether withfinishReasoninto:tool_callsdelta chunkdata: [DONE]markerOracleStreamState— tracks stable tool-call indices across streamed chunks (via a seen-id set) so the OpenAI-formattool_calls[].indexstays consistent.toolCalls(assistant) andtoolCallId(tool) onto the OCI request envelope so tool-call round-trips are correctly wired in.toolCallsinto OpenAI-shapetool_calls, with an id fallback for providers (e.g. Gemini) that omit it.Tests
src/providers/oracle/chatComplete.test.ts— focused on the new behaviour:toolCallson assistant,toolCallIdon tool, omitted otherwise)[DONE]and ping passthroughsScope note (re: prior review)
This PR has been trimmed per @narengogi's review on the previous version. The earlier branch bundled embeddings, multi-vendor handler dispatch, GPT-5
maxCompletionTokensrouting, multimodal content types, and unrelated changes toopen-ai-base/index.ts,utils/env.ts,requestBody.ts, andjest.config.js. All of that is out — this PR is now scoped solely to the streaming tool-call fix referenced in #1538 plus the minimum supporting types and tests.The split-off items now live in their own focused PRs:
maxCompletionTokensrouting → fix(oracle): route maxTokens to maxCompletionTokens for GPT-5 family #1635The older kitchen-sink branch (#1540) has been closed in favour of these focused PRs.
Verification
npm run build✓npm run format:check✓chatComplete.test.ts)inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/chat) for streaming tool calls on Llama and Grok families.Closes #1538