feat(plugin-openai): stream input_audio_transcription delta events by jjsquillante · Pull Request #1581 · livekit/agents-js

jjsquillante · 2026-05-22T02:10:50Z

Description

Adds support for conversation.item.input_audio_transcription.delta events on the OpenAI Realtime API. These are emitted by the new gpt-realtime-whisper transcription model (and any future delta-emitting STT model) as audio is processed. Today the plugin's RealtimeSession event switch only handles .completed and .failed, so deltas are silently dropped and user transcripts surface to the client only after the entire utterance finalizes.

Changes Made

plugins/openai/src/realtime/api_proto.ts: add ConversationItemInputAudioTranscriptionDeltaEvent to ServerEventType / ServerEvent.
plugins/openai/src/realtime/realtime_model.ts: add per-item delta accumulator (inputTranscriptAccumulators), wire .delta into the event switch, emit accumulated text as isFinal: false, and clear the accumulator on .completed. remoteChatCtx is intentionally only updated on .completed — partials shouldn't mutate persistent chat history.
plugins/openai/src/realtime/realtime_model.test.ts: 177 lines of new tests covering delta accumulation, ordering, accumulator cleanup on completion, mixed delta/non-delta sessions, and the back-compat path for models that only emit .completed.
examples/src/realtime_streaming_transcript.ts: runnable demo using gpt-realtime-whisper, with inline notes on how partials replace (not append to) prior text per turn.

Pre-Review Checklist

Build passes: All builds (lint, typecheck, tests) pass locally
AI-generated code reviewed: Removed unnecessary comments and ensured code quality
Changes explained: All changes are properly documented and justified above
Scope appropriate: All changes relate to the PR title, or explanations provided for why they're included
Video demo: A small video demo showing changes works as expected and did not break any existing functionality using Agent Playground (if applicable)

Testing

Automated tests added/updated (if applicable)
All tests pass
Make sure both restaurant_agent.ts and realtime_agent.ts work properly (for major changes)

Manual repro: pnpm build && node ./examples/dist/realtime_streaming_transcript.js dev, join the agent's room from the Playground, and speak a long sentence. Expect [user transcript partial] lines streaming to stdout as you speak, ending with one [user transcript FINAL].

[21:06:59.936] INFO (47252): playout completed with interrupt
    speech_id: "speech_73d3d7b6-440"
    message: "Hello! Please say a long sentence, and you’ll see your words appear on the screen as you speak."
[21:06:59.937] DEBUG (47252): Task.runTask: task AgentActivity.realtimeReply done
[21:07:01.446] INFO (47252): [user transcript partial]
    transcript: " Hey, um"
[21:07:02.242] INFO (47252): [user transcript partial]
    transcript: " Hey, um yes, so"
[21:07:03.231] INFO (47252): [user transcript partial]
    transcript: " Hey, um yes, so the quick"
[21:07:03.832] INFO (47252): [user transcript partial]
    transcript: " Hey, um yes, so the quick brown"
[21:07:04.433] INFO (47252): [user transcript partial]
    transcript: " Hey, um yes, so the quick brown fox jumped"
[21:07:05.031] INFO (47252): [user transcript partial]
    transcript: " Hey, um yes, so the quick brown fox jumped over the"
[21:07:06.229] INFO (47252): [user transcript partial]
    transcript: " Hey, um yes, so the quick brown fox jumped over the lazy dog"
[21:07:07.021] INFO (47252): [user transcript partial]
    transcript: " Hey, um yes, so the quick brown fox jumped over the lazy dog as the"
[21:07:08.225] INFO (47252): [user transcript partial]
    transcript: " Hey, um yes, so the quick brown fox jumped over the lazy dog as the text streamed"
[21:07:08.635] INFO (47252): [user transcript partial]
    transcript: " Hey, um yes, so the quick brown fox jumped over the lazy dog as the text streamed on the"
[21:07:09.024] INFO (47252): [user transcript partial]
    transcript: " Hey, um yes, so the quick brown fox jumped over the lazy dog as the text streamed on the screen"
[21:07:09.245] INFO (47252): onInputSpeechStopped
    userTranscriptionEnabled: true
[21:07:09.262] INFO (47252): Creating speech handle
    speech_id: "speech_73fc4af8-cf9"
[21:07:09.262] DEBUG (47252): Task.runTask: task AgentActivity.realtimeGeneration started
[21:07:09.263] DEBUG (47252): realtime generation started
    speech_id: "speech_73fc4af8-cf9"
    stepIndex: 1
[21:07:09.264] DEBUG (47252): Task.runTask: task AgentActivity.realtime_generation.read_messages started
[21:07:09.264] DEBUG (47252): Task.runTask: task AgentActivity.realtime_generation.read_tool_stream started
[21:07:09.264] DEBUG (47252): Task.runTask: task performToolExecutions started
[21:07:09.545] DEBUG (47252): Task.runTask: task performTextForwarding started
[21:07:09.546] DEBUG (47252): Task.runTask: task performAudioForwarding started
[21:07:09.704] INFO (47252): [user transcript FINAL]
    transcript: "Hey, um yes, so the quick brown fox jumped over the lazy dog as the text streamed on the screen"
[21:07:10.688] DEBUG (47252): Task.runTask: task performTextForwarding done
[21:07:10.689] DEBUG (47252): Closing generation channels in handleResponseDone
    messageCount: 1
[21:07:10.693] DEBUG (47252): Task.runTask: task AgentActivity.realtime_generation.read_tool_stream done
[21:07:10.698] DEBUG (47252): Task.runTask: task performToolExecutions done
[21:07:14.309] DEBUG (47252): Task.runTask: task performAudioForwarding done
[21:07:14.312] DEBUG (47252): Task.runTask: task AgentActivity.realtime_generation.read_messages done

Additional Notes

Note to reviewers: Please ensure the pre-review checklist is completed before starting your review.

Add support for `conversation.item.input_audio_transcription.delta` events on the OpenAI Realtime API. These are emitted by the new `gpt-realtime-whisper` transcription model (and any future delta-emitting STT model) as audio is processed; today the plugin's RealtimeSession event switch handles only `.completed` and `.failed`, so deltas are silently dropped and user transcripts surface to the client only after the entire utterance finalises.

changeset-bot · 2026-05-22T02:10:54Z

🦋 Changeset detected

Latest commit: bd0bcca

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 33 packages

Name	Type
@livekit/agents-plugin-openai	Patch
@livekit/agents-plugin-anam	Patch
@livekit/agents-plugin-cartesia	Patch
@livekit/agents-plugin-cerebras	Patch
@livekit/agents-plugin-elevenlabs	Patch
@livekit/agents-plugin-fishaudio	Patch
@livekit/agents-plugin-google	Patch
@livekit/agents-plugin-hume	Patch
@livekit/agents-plugin-inworld	Patch
@livekit/agents-plugin-neuphonic	Patch
@livekit/agents-plugin-perplexity	Patch
@livekit/agents-plugin-rime	Patch
@livekit/agents-plugin-sarvam	Patch
@livekit/agents-plugin-xai	Patch
@livekit/agents	Patch
@livekit/agents-plugin-assemblyai	Patch
@livekit/agents-plugin-baseten	Patch
@livekit/agents-plugin-bey	Patch
@livekit/agents-plugin-deepgram	Patch
@livekit/agents-plugin-hedra	Patch
@livekit/agents-plugin-lemonslice	Patch
@livekit/agents-plugin-liveavatar	Patch
@livekit/agents-plugin-livekit	Patch
@livekit/agents-plugin-minimax	Patch
@livekit/agents-plugin-mistral	Patch
@livekit/agents-plugin-mistralai	Patch
@livekit/agents-plugin-phonic	Patch
@livekit/agents-plugin-resemble	Patch
@livekit/agents-plugin-runway	Patch
@livekit/agents-plugin-silero	Patch
@livekit/agents-plugin-tavus	Patch
@livekit/agents-plugin-trugen	Patch
@livekit/agents-plugins-test	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

jjsquillante · 2026-05-22T02:22:04Z

@toubatbrian

hey brian! 👋 hope all is well - i have a quick PR for streaming text output within the open ai voice realtime model.

when you get a moment, can you review? let me know if anything looks off or you'd like me to update anything.

thanks!

toubatbrian · 2026-05-22T20:14:42Z

We had a comment in python:

                    elif event["type"] == "conversation.item.input_audio_transcription.delta":
                        # currently incoming transcripts are transcribed only after the user stops speaking
                        # it's not very useful to emit these as the transcribe process takes place within ~100ms
                        # when they handle streaming transcriptions, we'll handle it then.
                        pass

Is this not the case anymore based on your testing?

jjsquillante · 2026-05-23T02:42:38Z

when they handle streaming transcriptions, we'll handle it then.

yea, fair question. but to that comment above -- gpt-realtime-whisper was released a few weeks ago so this PR now handles this exact case. The example in the PR description above shows how we're now able to handle partials as the user speaks, before the .completed event fires at the end of the turn.

https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/

lmk if i'm misinterpreting anything, though! thanks for the quick response

jjsquillante · 2026-05-26T19:50:32Z

@toubatbrian just following up here after a long weekend to get back on your radar 👍

toubatbrian

LGTM!

devin-ai-integration Bot reviewed May 22, 2026

View reviewed changes

toubatbrian approved these changes May 26, 2026

View reviewed changes

toubatbrian merged commit f21488d into livekit:main May 26, 2026
6 checks passed

github-actions Bot mentioned this pull request May 26, 2026

Version Packages #1580

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(plugin-openai): stream input_audio_transcription delta events#1581

feat(plugin-openai): stream input_audio_transcription delta events#1581
toubatbrian merged 1 commit into
livekit:mainfrom
jjsquillante:feat/realtime-input-audio-transcription-delta

jjsquillante commented May 22, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented May 22, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

jjsquillante commented May 22, 2026

Uh oh!

toubatbrian commented May 22, 2026 •

edited

Loading

Uh oh!

jjsquillante commented May 23, 2026

Uh oh!

jjsquillante commented May 26, 2026

Uh oh!

toubatbrian left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jjsquillante commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes Made

Pre-Review Checklist

Testing

Additional Notes

Uh oh!

changeset-bot Bot commented May 22, 2026

🦋 Changeset detected

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

jjsquillante commented May 22, 2026

Uh oh!

toubatbrian commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjsquillante commented May 23, 2026

Uh oh!

jjsquillante commented May 26, 2026

Uh oh!

toubatbrian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jjsquillante commented May 22, 2026 •

edited

Loading

toubatbrian commented May 22, 2026 •

edited

Loading