Text to Speech
Generate speech audio from a prompt when a trigger or event occurs and save locally or play via SIP.
Text to Speech
This node generates an audio file from a text prompt when a trigger or event occurs. The audio is saved locally on the filesystem, and the node outputs metadata indicating whether a new audio file was generated.
Properties
| Property | Description | Type | Default | Required |
|---|---|---|---|---|
trigger_source | Use built-in events from upstream nodes or a custom trigger expression. | enum | custom | Yes |
trigger | Trigger condition expression when trigger_source is custom. | trigger-condition | null | Yes (custom trigger) |
interval | Minimum time between consecutive generations in seconds. | float | 10 | No |
enabled | Enable/disable audio generation. | bool | true | No |
prompt_source | Use a static prompt or metadata path. | enum | static | Yes |
prompt_text | Static prompt text when prompt_source is static. | text | "" | No |
prompt_metadata_path | Comma-separated metadata paths to extract prompt text when prompt_source is metadata_path. Supports dot paths and * wildcards. | string | "" | No |
tts_provider | TTS provider to use. Lumeo Cloud (lumeo), Eleven Labs (elevenlabs) | enum | lumeo | Yes |
lumeo_voice_id | Voice selection when using Lumeo provider. | enum | male | Yes (Lumeo) |
elevenlabs_api_key | ElevenLabs API key. | string | null | Yes (ElevenLabs) |
elevenlabs_voice_id | ElevenLabs voice ID. | string | "" | Yes (ElevenLabs) |
elevenlabs_model_id | ElevenLabs model ID. | string | eleven_multilingual_v2 | No |
audio_format | Audio output format. | enum | mp3 | No |
sip_enabled | Play generated audio to a SIP endpoint via RTP. | bool | false | No |
sip_ip | SIP endpoint IP address. | string | "" | Yes (when SIP enabled) |
sip_port | SIP port on the endpoint. | number | 5060 | No |
sip_username | SIP user/extension on the endpoint. | string | 51 | No |
sip_password | SIP password (optional, reserved for future digest auth). | string | "" | No |
When SIP output is enabled, audio format is automatically forced to WAV. The node establishes a SIP call to the endpoint, streams the audio as G.711 u-law over RTP, then terminates the call.
Output Metadata
This node appends metadata under nodes.<node_id>:
audio_generated_delta:truewhen a new audio file was generated on this frame.audio_filepath: absolute path to the saved audio file when generation succeeds.
Updated about 2 hours ago
