From 9a4b7c787b78404a85d0bd12a5ffede0de4521b4 Mon Sep 17 00:00:00 2001
From: Mendon Kissling <59585235+mendonk@users.noreply.github.com>
Date: Mon, 23 Jun 2025 17:45:37 -0400
Subject: [PATCH] docs: voice mode websockets endpoints (#8430)

* initial-content

* cleanup

* Apply suggestions from code review

* full-example-paths

* Apply suggestions from code review

Co-authored-by: April I. Murphy <36110273+aimurphy@users.noreply.github.com>

* path-not-required

---------

Co-authored-by: April I. Murphy <36110273+aimurphy@users.noreply.github.com>
---
 docs/docs/Concepts/concepts-voice-mode.md | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/docs/docs/Concepts/concepts-voice-mode.md b/docs/docs/Concepts/concepts-voice-mode.md
index eb0c1c318..387120e85 100644
--- a/docs/docs/Concepts/concepts-voice-mode.md
+++ b/docs/docs/Concepts/concepts-voice-mode.md
@@ -51,3 +51,22 @@ Be aware of the following considerations when using voice mode:
 * Voice mode does not use the instructions in the Agent component's **Agent Instructions** field, because your spoken instructions override this value.
 * Voice mode only maintains context within the conversation session you are currently in.
 If you exit a conversation and close the **Playground**, your conversational context is not available in the next chat session.
+
+## Langflow voice mode endpoints
+
+Langflow exposes OpenAI Realtime API-compatible websocket endpoints for your flows. You can build voice applications against these endpoints the same way you would build against [OpenAI Realtime API websockets](https://platform.openai.com/docs/guides/realtime#connect-with-websockets).
+
+The WebSockets endpoints require an [OpenAI API key](https://platform.openai.com/docs/overview) for authentication, and they support an optional [ElevenLabs](https://elevenlabs.io) integration.
+
+Langflow exposes two WebSockets endpoints:
+
+* `/ws/flow_as_tool/{flow_id}` or `/ws/flow_as_tool/{flow_id}/{session_id}`: Establishes a connection to OpenAI Realtime voice, and then invokes flows as tools by the [OpenAI Realtime model](https://platform.openai.com/docs/guides/realtime-conversations#handling-audio-with-websockets).
+This approach is ideal for low latency applications, but it is less deterministic since the OpenAI voice-to-voice model determines when to call your flow.
+
+* `/ws/flow_tts/{flow_id}` or `/ws/flow_tts/{flow_id}/{session_id}`: Converts audio to text using [OpenAI Realtime voice transcription](https://platform.openai.com/docs/guides/realtime-transcription), and then each flow is invoked directly for each transcript.
+This approach is more deterministic but has higher latency.
+This is the mode used in the Langflow playground.
+
+Path parameters:
+* `flow_id`: Required path parameter. The ID of the flow to be used as a tool.
+* `session_id`: Optional path parameter. A unique identifier for the conversation session. If not provided, one is automatically generated.
\ No newline at end of file