Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 9 additions & 6 deletions examples/avatar/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,22 @@
# LemonSlice Avatar

A voice agent with a talking-head avatar you can swap mid-conversation.
Pick a persona from the dropdown — an influencer, a cat, a fox, a music
teacher, Marilyn Monroe — and the agent's face, voice, and personality
Pick a persona from the dropdown — Leila, Jess, a software engineer, a
cat, a fox — and the agent's face, voice, and personality
all change without dropping the call.

Try it in the [LiveKit Playground](https://agents.livekit.io/?example=avatar).

## What's in here

- **15 personas** to choose from — each has its own face, voice, system
- **9 personas** to choose from — each has its own face, voice, system
prompt, and idle/speaking body-language hints.
- **Live persona switching** — the dropdown fires a `set_avatar` RPC; a
short hold tone plays while the avatar reconnects with the new face
and voice.
- **Hero motions** — for **Leila** only, the LLM can trigger wave, dance,
or turn via tool calls (one motion at a time, ~6 seconds each). She
waves automatically when the session starts.
Comment on lines +17 to +19
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 README claims motions are "for Leila only" but code enables them for Leila, Jess, and Mr Fox

The newly added README text states "Hero motions — for Leila only" but the code at examples/avatar/actions.py:17 defines ACTION_PERSONAS = frozenset({"leila", "jess", "mr_fox"}), and both Jess and Mr Fox personas have action-describing system prompts in examples/avatar/personas.py:80-83 and examples/avatar/personas.py:169-172. Users reading the README would not know that Jess and Mr Fox also support wave/dance/turn motions.

Suggested change
- **Hero motions** — for **Leila** only, the LLM can trigger wave, dance,
or turn via tool calls (one motion at a time, ~6 seconds each). She
waves automatically when the session starts.
- **Hero motions** — for **Leila**, **Jess**, and **Mr Fox**, the LLM can trigger wave, dance,
or turn via tool calls (one motion at a time, ~6 seconds each). They
wave automatically when the session starts.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

- **LiveKit Inference** for STT + LLM (Deepgram Nova-3 + Gemini 3.5
Flash), Cartesia for TTS, [LemonSlice](https://lemonslice.com) for
the avatar video.
Expand All @@ -36,8 +39,7 @@ python agent.py dev
```

Connect from any LiveKit client. The agent reads the starting persona
from the job metadata; if no metadata is sent it defaults to the
California influencer.
from the job metadata; if no metadata is sent it defaults to Leila.

## Adding or editing personas

Expand Down Expand Up @@ -72,7 +74,8 @@ No reconnect, no page refresh — the same call, with a different face.

```
agent.py entry point + the set_avatar RPC
personas.py the 15 personas and the shared prompt rules
actions.py Leila pose controller (opening wave + LLM tool motions)
personas.py the 9 personas and the shared prompt rules
hold_music.py the soft three-note "please wait" tone
Dockerfile for cloud deploys
```
140 changes: 140 additions & 0 deletions examples/avatar/actions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
"""Avatar pose triggers via LLM tools (wave, dance, turn)."""

from __future__ import annotations

import asyncio
import logging
import os
import time
from dataclasses import dataclass

import aiohttp

logger = logging.getLogger("avatar.actions")

NONE = "none"

ACTION_PERSONAS = frozenset({"leila", "jess", "mr_fox"})

POSE_NAMES: dict[str, dict[str, str]] = {
"leila": {
"wave": "wave-2-leila",
"turn": "turn-leila",
"dance": "dance-leila",
},
"jess": {
"wave": "jess_wave",
"turn": "jess_turn",
"dance": "jess_dance",
},
"mr_fox": {
"wave": "fox2_wave",
"turn": "fox2_turn",
"dance": "fox2_dance",
},
}

DEFAULT_POSE_DURATION_S = 6.0
OPENING_WAVE_DELAY_S = 0.5


def supports_actions(persona_id: str) -> bool:
return persona_id in ACTION_PERSONAS


def _control_url(session_id: str) -> str:
base = os.getenv("LEMONSLICE_API_BASE", "https://lemonslice.com/api").rstrip("/")
return f"{base}/liveai/sessions/{session_id}/control"


async def trigger_pose(session_id: str, name: str) -> bool:
url = _control_url(session_id)
payload = {"event": "pose-trigger", "pose_trigger": {"name": name}}
timeout = aiohttp.ClientTimeout(total=30.0)
async with aiohttp.ClientSession(timeout=timeout) as session:
async with session.post(
url,
headers={
"Content-Type": "application/json",
"X-API-Key": os.environ["LEMONSLICE_API_KEY"],
},
json=payload,
) as response:
return response.ok


@dataclass
class _PlayingSlot:
ends_at: float


class ActionController:
"""Plays one LemonSlice pose at a time; each blocks others for ``DEFAULT_POSE_DURATION_S``."""

def __init__(self) -> None:
self._lock = asyncio.Lock()
self._session_id: str | None = None
self._persona_id: str | None = None
self._slot: _PlayingSlot | None = None

def set_session(self, session_id: str, persona_id: str) -> None:
self._session_id = session_id
self._persona_id = persona_id

def clear_session(self) -> None:
self._session_id = None
self._persona_id = None

def _current_slot(self) -> _PlayingSlot | None:
if self._slot is None:
return None
if time.monotonic() >= self._slot.ends_at:
self._slot = None
return None
return self._slot

async def cancel(self) -> None:
async with self._lock:
self._slot = None
sid = self._session_id
self.clear_session()
if sid is not None:
await trigger_pose(sid, NONE)
Comment thread
devin-ai-integration[bot] marked this conversation as resolved.

async def shutdown(self, _: str = "") -> None:
await self.cancel()

async def play(self, action_id: str) -> str:
session_id = self._session_id
persona_id = self._persona_id
if session_id is None or persona_id is None:
return "Motion unavailable — avatar session not ready."

key = action_id.strip().lower()
pose_name = POSE_NAMES.get(persona_id, {}).get(key)
if pose_name is None:
return f"Unknown motion {action_id!r}."

async with self._lock:
if self._current_slot() is not None:
return "That motion is already playing; try again in a moment."

ok = await trigger_pose(session_id, pose_name)
if not ok:
return "Could not trigger the motion on the avatar."

self._slot = _PlayingSlot(
ends_at=time.monotonic() + DEFAULT_POSE_DURATION_S,
)
logger.info(
"pose playing: persona_id=%r action_id=%r pose_name=%r",
persona_id,
key,
pose_name,
)
return f"Playing motion {key}."

async def opening_wave(self) -> None:
if OPENING_WAVE_DELAY_S > 0:
await asyncio.sleep(OPENING_WAVE_DELAY_S)
await self.play("wave")
73 changes: 63 additions & 10 deletions examples/avatar/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from contextlib import asynccontextmanager
from dataclasses import dataclass

from actions import ActionController, supports_actions
from dotenv import find_dotenv, load_dotenv
from hold_music import hold_beats
from personas import Persona, compose_instructions, resolve_persona
Expand All @@ -21,6 +22,7 @@
inference,
llm as agents_llm,
)
from livekit.agents.llm import function_tool
from livekit.plugins import lemonslice
from livekit.rtc import RpcError, RpcInvocationData

Expand All @@ -33,6 +35,32 @@
class State:
persona: Persona
avatar: lemonslice.AvatarSession
session_id: str


class MotionAgent(Agent):
def __init__(self, persona: Persona, actions: ActionController) -> None:
super().__init__(
instructions=compose_instructions(persona),
tts=inference.TTS("cartesia/sonic-3.5", voice=persona.voice_id),
chat_ctx=agents_llm.ChatContext.empty(),
)
self._actions = actions

@function_tool
async def wave(self) -> str:
"""Wave to the user. Only call when they explicitly ask you to wave."""
return await self._actions.play("wave")

@function_tool
async def dance(self) -> str:
"""Dance for the user. Only call when they explicitly ask you to dance."""
return await self._actions.play("dance")

@function_tool
async def turn(self) -> str:
"""Turn side to side. Only call when they explicitly ask you to turn."""
return await self._actions.play("turn")


@server.rtc_session()
Expand All @@ -55,21 +83,42 @@ def make_avatar(p: Persona) -> lemonslice.AvatarSession:
agent_prompt=p.speaking_prompt,
agent_idle_prompt=p.idle_prompt,
idle_timeout=120,
response_done_timeout=2,
)

actions = ActionController()
ctx.add_shutdown_callback(actions.shutdown)

def make_agent(p: Persona) -> Agent:
if supports_actions(p.id):
return MotionAgent(p, actions)
return Agent(
instructions=compose_instructions(p),
tts=inference.TTS("cartesia/sonic-3.5", voice=p.voice_id),
chat_ctx=agents_llm.ChatContext.empty(),
)

state = State(persona=initial, avatar=make_avatar(initial))
await state.avatar.start(session, room=ctx.room)
avatar = make_avatar(initial)
session_id = await avatar.start(session, room=ctx.room)
state = State(persona=initial, avatar=avatar, session_id=session_id)
await state.avatar.wait_for_join()

if supports_actions(initial.id):
actions.set_session(state.session_id, initial.id)

await session.start(agent=make_agent(initial), room=ctx.room)

if supports_actions(initial.id):
await actions.opening_wave()

session.generate_reply(
instructions=(
f"It's your turn to speak first. Open with a single short greeting in "
f"character as {initial.name} and then stop."
+ (" Do not call wave — you already waved." if supports_actions(initial.id) else "")
)
)

bg_audio = BackgroundAudioPlayer()
await bg_audio.start(room=ctx.room, agent_session=session)

Expand Down Expand Up @@ -101,31 +150,35 @@ async def set_avatar(data: RpcInvocationData) -> str:
session.interrupt()

async with hold_music():
await actions.cancel()
await state.avatar.aclose()
state.avatar = make_avatar(new_persona)
await state.avatar.start(session, room=ctx.room)
state.session_id = await state.avatar.start(session, room=ctx.room)
await state.avatar.wait_for_join()
session.update_agent(make_agent(new_persona))
state.persona = new_persona
# Lemonslice's video pipeline needs a beat after wait_for_join
# before it actually consumes audio + emits frames.
await asyncio.sleep(1.2)

if supports_actions(new_persona.id):
actions.set_session(state.session_id, new_persona.id)
await actions.opening_wave()

session.generate_reply(
instructions=(
f"It's your turn to speak first. Open with a single short line in "
f"character as {state.persona.name} (acknowledge that you're who they "
"just picked) and then stop."
+ (
" Do not call wave — you already waved."
if supports_actions(new_persona.id)
else ""
)
)
)
return json.dumps({"id": state.persona.id})

session.generate_reply(
instructions=(
f"It's your turn to speak first. Open with a single short greeting in "
f"character as {state.persona.name} and then stop."
)
)
return json.dumps({"id": state.persona.id})


if __name__ == "__main__":
Expand Down
Loading
Loading