Push to Talk
Push to talk (PTT) gives you manual control over when the user’s audio is listened to, instead of relying on automatic turn-taking detection. This is useful for walkie-talkie-style interfaces, button-driven UIs, or any scenario where you want explicit control over turn-taking.
How it works
In the default mode, Phonic automatically detects when the user starts and stops speaking. With push to talk enabled, you control this by sending unmute and mute messages over the WebSocket:
- Unmute — signals that the user is about to speak. Interrupts the assistant and starts listening.
- Mute — signals that the user has finished speaking. Stops listening and triggers the assistant’s reply.
You must continue sending audio_chunk messages at all times, even while muted. Send either the real microphone audio or silent frames (zero-filled PCM). If you stop sending audio entirely, the connection may time out and the assistant will not respond.
Setup
1. Enable push to talk in your config message
Set push_to_talk to true in the initial config message:
2. Send mute and unmute messages during the conversation
When the user presses the talk button, send an unmute message:
When the user releases the button, send a mute message:
3. Keep streaming audio while muted
Even when the user is muted, you must keep sending audio_chunk messages. You can send either:
- Silent frames — zero-filled PCM data of the same format and chunk size you normally send
- Real microphone audio — the audio will be ignored by the speech model while muted, so it doesn’t matter
Example flow
Here is a typical sequence of messages for a push-to-talk conversation:
See the WebSocket API reference for full details on the mute and unmute message schemas.