fix(amd): defer no-speech timer until SIP call is answered#5848
Open
chenghao-mou wants to merge 5 commits into
Open
fix(amd): defer no-speech timer until SIP call is answered#5848chenghao-mou wants to merge 5 commits into
chenghao-mou wants to merge 5 commits into
Conversation
Previously amd started all its timers as soon as the SIP audio track was subscribed. Because audio tracks can be published during ringing or carrier early media (before SIP ANSWER), this poisoned the classifier with pre-answer audio and burned the no-speech budget. Split the trigger: the outer detection-timeout still arms on track-subscribed (so amd cannot hang if the call never connects), but the no-speech timer and all audio/transcript/speech-event processing wait for sip.callStatus to become "active". For non-SIP participants, behavior is unchanged. Also adds a reusable wait_for_participant_attribute helper in utils.participant with a dedicated ParticipantAttributeWaitAborted exception, and tracks/cleans up the deferred setup task properly.
theomonnom
reviewed
May 26, 2026
Comment on lines
+71
to
+73
| class ParticipantAttributeWaitAborted(RuntimeError): | ||
| """Raised by :func:`wait_for_participant_attribute` when the wait cannot | ||
| complete (room/participant disconnected or never present).""" |
Member
There was a problem hiding this comment.
Should we just use RuntimeError?
theomonnom
approved these changes
May 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previously AMD started all its timers as soon as the SIP audio track was subscribed. Because audio tracks can be published during ringing or carrier early media (before SIP ANSWER), this poisoned the classifier with pre-answer audio and burned the no-speech budget.
This PR correctly waits for SIP active state to start the no-speech timer. Detection timeout is still armed when track is subscribed.
Also adds a reusable wait_for_participant_attribute helper in utils.participant with a dedicated
ParticipantAttributeWaitAbortedexception, and tracks/cleans up the deferred setup task properly.Now it maps to
uncertain;Previously, verdict is emitted when the prediction arrives and if the silence threshold is satisfied, but it doesn't mean the user turn is ended. A new option
wait_until_finished(default False) now make sure we wait for both EOT and silence threshold for machines.This helps when previously, new normal generation can be triggered in parallel with a generate_reply call after AMD has interrupted any preemptive generations, emitted the verdict and released the hold for playout (the new generation is too late for AMD to interrupt, due to late STT for example).
If EOT is properly being waited, in addition to the silence threshold, then the parallel generation will be interrupted when
interrupt_on_machine=True.This flag still respects the overall
timeoutvalue if there is no speech or transcript arrives.