Push to Talk | Phonic

Push to talk (PTT) gives you manual control over when the user’s audio is listened to, instead of relying on automatic turn-taking detection. This is useful for walkie-talkie-style interfaces, button-driven UIs, or any scenario where you want explicit control over turn-taking.

How it works

In the default mode, Phonic automatically detects when the user starts and stops speaking. With push to talk enabled, you control this by sending unmute and mute messages over the WebSocket:

Unmute — signals that the user is about to speak. Interrupts the assistant and starts listening.
Mute — signals that the user has finished speaking. Stops listening and triggers the assistant’s reply.

You must continue sending audio_chunk messages at all times, even while muted. Send either the real microphone audio or silent frames (zero-filled PCM). If you stop sending audio entirely, the connection may time out and the assistant will not respond.

Setup

1. Enable push to talk in your config message

Set push_to_talk to true in the initial config message:

1 {
2   "type": "config",
3   "agent": "my-agent",
4   "push_to_talk": true
5 }

2. Send mute and unmute messages during the conversation

When the user presses the talk button, send an unmute message:

1 {
2   "type": "unmute"
3 }

When the user releases the button, send a mute message:

1 {
2   "type": "mute"
3 }

3. Keep streaming audio while muted

Even when the user is muted, you must keep sending audio_chunk messages. You can send either:

Silent frames — zero-filled PCM data of the same format and chunk size you normally send
Real microphone audio — the audio will be ignored by the speech model while muted, so it doesn’t matter

1 // Generate a silent audio chunk (example: ~100ms of 16-bit PCM at 44100 Hz)
2 const SILENT_CHUNK_SAMPLES = 4410; // 44100 * 0.1
3 const silentBuffer = Buffer.alloc(SILENT_CHUNK_SAMPLES * 2); // 2 bytes per 16-bit sample
4 const silentBase64 = silentBuffer.toString("base64");
5 
6 // Send silent chunks on an interval while muted
7 const silenceInterval = setInterval(() => {
8   phonicSocket.send(JSON.stringify({
9     type: "audio_chunk",
10     audio: silentBase64,
11   }));
12 }, 100);

Example flow

Here is a typical sequence of messages for a push-to-talk conversation:

Step	Direction	Message	Description
1	Client → Server	`config`	Connect with `push_to_talk: true`
2	Client → Server	`audio_chunk`	Stream silent frames (user hasn’t spoken yet)
3	Client → Server	`unmute`	User presses talk button
4	Client → Server	`audio_chunk`	Stream real microphone audio
5	Client → Server	`mute`	User releases talk button
6	Server → Client	`audio_chunk`	Assistant responds with audio
7	Client → Server	`audio_chunk`	Continue streaming silent frames while assistant speaks
8	Client → Server	`unmute`	User presses talk button again (interrupts assistant if still speaking)

See the WebSocket API reference for full details on the mute and unmute message schemas.