fix(google): handle content blocking and generation failures#1609
fix(google): handle content blocking and generation failures#1609rosetta-livekit-bot[bot] wants to merge 3 commits into
Conversation
🦋 Changeset detectedLatest commit: f4fd256 The changes in this PR will be included in the next version bump. This PR includes changesets to release 33 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
| if (finishReason === 'STOP' && !chunksYielded) { | ||
| throw new APIStatusError({ | ||
| message: 'Google LLM: no response generated', | ||
| options: { |
There was a problem hiding this comment.
🔴 Removing && retryable guard causes spurious error on final streaming chunk after content was already yielded
The chunksYielded variable is local to each iteration of the for await loop (reset to false on every chunk). Previously, the && retryable condition prevented this check from firing when earlier chunks had already successfully yielded content (which sets retryable = false). Now, if the final streaming chunk has finishReason: 'STOP' but its parts are not parseable by #parsePart (e.g., empty parts array [], or parts containing only executableCode/codeExecutionResult/inlineData which return null from #parsePart at plugins/google/src/llm.ts:532-534), the code throws a non-retryable APIStatusError — discarding all previously-yielded content from the stream and failing the entire LLM request.
Was this helpful? React with 👍 or 👎 to provide feedback.
| throw new APIStatusError({ | ||
| message: 'Google LLM: no content in the response', | ||
| options: { | ||
| retryable, | ||
| requestId, | ||
| }, | ||
| }); |
There was a problem hiding this comment.
🔴 Always-throw on missing content destroys valid streaming responses when a later chunk lacks candidates
The old code used continue when retryable was false (meaning content had already been successfully yielded), gracefully skipping chunks without candidates[0].content.parts. The new code unconditionally throws. In a multi-chunk streaming response, if an intermediate or final chunk arrives without candidates or without content.parts (e.g., a metadata-only chunk), but earlier chunks already put valid content into the queue (setting retryable = false), the new code throws a non-retryable APIStatusError — aborting a response that was otherwise completing successfully. The usageMetadata at line 449 that follows this check would also never be reached for such chunks.
Was this helpful? React with 👍 or 👎 to provide feedback.
| if (!chunk.candidates || !chunk.candidates[0]?.content?.parts) { | ||
| this.logger.warn(`No content in the response: ${JSON.stringify(chunk)}`); | ||
| throw new APIStatusError({ | ||
| message: `Google LLM: generation blocked - ${chunk.candidates[0].finishReason}`, | ||
| message: 'Google LLM: no content in the response', | ||
| options: { | ||
| retryable: false, | ||
| retryable, | ||
| requestId, | ||
| }, | ||
| }); | ||
| } |
There was a problem hiding this comment.
🔴 Safety-blocked responses are misidentified as retryable "no content" errors due to check reordering
The old code explicitly checked for blocked finish reasons (SAFETY, SPII, etc.) before the no-content guard, with a comment explaining why: "safety-blocked responses often lack content.parts, so this must run before the no-content guard to avoid wasting retries." The new code reverses this order — the no-content check at line 399 (!chunk.candidates || !chunk.candidates[0]?.content?.parts) now runs first. When Gemini blocks a response for safety and the candidate has no content.parts (which the old comment says is the common case), the no-content guard fires and throws with retryable: true (since no chunks have been yielded yet). The blocked-reason check at line 419 is never reached.
This causes two problems:
- The framework retries the same blocked prompt up to
maxRetrytimes (seeagents/src/llm/llm.ts:163-216), wasting time and API calls on a prompt that will always be blocked. - The eventual error message says "no content in the response" instead of "generation blocked by Gemini: SAFETY", hiding the actual cause from users.
The second guard at line 429 (!candidate.content?.parts) is also dead code — if it's reached, the guard at line 399 already guaranteed content.parts is truthy.
Prompt for agents
The root cause is that the no-content check at line 399-408 runs before the blocked-reason check at line 419-427, which means safety-blocked responses without content.parts are caught by the no-content guard and incorrectly marked as retryable.
The fix is to restore the original check order: check for blocked finish reasons BEFORE checking for missing content. This ensures safety-blocked responses are immediately thrown as non-retryable with the correct error message.
Specifically, in the `run()` method of `LLMStream` (plugins/google/src/llm.ts), the blocked-reason check (currently lines 419-427) should be moved above the no-content check (currently lines 399-408). The blocked-reason check needs to use optional chaining since candidates may not exist: `chunk.candidates?.[0]?.finishReason` (as it was in the old code).
Additionally, the second no-content guard at lines 429-437 (`if (!candidate.content?.parts)`) is dead code because the guard at line 399 already ensures `content.parts` is truthy whenever execution reaches that point. Consider whether this guard is still needed or if it should be merged with the first one.
Was this helpful? React with 👍 or 👎 to provide feedback.
Ported from python.