Skip to content

fix(amd): defer no-speech timer until SIP call is answered#5848

Open
chenghao-mou wants to merge 5 commits into
mainfrom
chenghao/fix/amd-start-timer-when-active
Open

fix(amd): defer no-speech timer until SIP call is answered#5848
chenghao-mou wants to merge 5 commits into
mainfrom
chenghao/fix/amd-start-timer-when-active

Conversation

@chenghao-mou
Copy link
Copy Markdown
Member

@chenghao-mou chenghao-mou commented May 26, 2026

  1. Timeouts

Previously AMD started all its timers as soon as the SIP audio track was subscribed. Because audio tracks can be published during ringing or carrier early media (before SIP ANSWER), this poisoned the classifier with pre-answer audio and burned the no-speech budget.

This PR correctly waits for SIP active state to start the no-speech timer. Detection timeout is still armed when track is subscribed.

Also adds a reusable wait_for_participant_attribute helper in utils.participant with a dedicated ParticipantAttributeWaitAborted exception, and tracks/cleans up the deferred setup task properly.

  1. No speech timeout

Now it maps to uncertain;

  1. EoT

Previously, verdict is emitted when the prediction arrives and if the silence threshold is satisfied, but it doesn't mean the user turn is ended. A new option wait_until_finished (default False) now make sure we wait for both EOT and silence threshold for machines.

This helps when previously, new normal generation can be triggered in parallel with a generate_reply call after AMD has interrupted any preemptive generations, emitted the verdict and released the hold for playout (the new generation is too late for AMD to interrupt, due to late STT for example).

If EOT is properly being waited, in addition to the silence threshold, then the parallel generation will be interrupted when interrupt_on_machine=True.

This flag still respects the overall timeout value if there is no speech or transcript arrives.

Previously amd started all its timers as soon as the SIP audio track was subscribed. Because audio tracks can be published during ringing or carrier early media (before SIP ANSWER), this poisoned the classifier with pre-answer audio and burned the no-speech budget. Split the trigger: the outer detection-timeout still arms on
  track-subscribed (so amd cannot hang if the call never connects), but the no-speech timer and all audio/transcript/speech-event processing wait for sip.callStatus to become "active". For non-SIP participants, behavior is unchanged. Also adds a reusable wait_for_participant_attribute helper in utils.participant with a dedicated
  ParticipantAttributeWaitAborted exception, and tracks/cleans up the deferred setup task properly.
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

devin-ai-integration[bot]

This comment was marked as resolved.

Comment on lines +71 to +73
class ParticipantAttributeWaitAborted(RuntimeError):
"""Raised by :func:`wait_for_participant_attribute` when the wait cannot
complete (room/participant disconnected or never present)."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just use RuntimeError?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants