# Dograh AI: Open-Source, Self-Hosted Voice AI with a Visual Workflow Builder

[Dograh AI](https://github.com/dograh-hq/dograh) is an open-source platform for building and deploying conversational voice agents. It **provides a node-based visual workflow builder, a voice engine that manages real-time audio streaming, and a platform layer with call tracing, recordings, and analytics**. The entire stack is self-hostable and model-agnostic: you connect your own LLM, TTS/STT, and telephony providers.

<iframe width="100%" height="315" src="https://www.youtube.com/embed/xD9JEvfCH9k" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>


## The problem with proprietary voice AI platforms

A voice interaction pipeline involves several distinct paid services: the LLM, speech-to-text, text-to-speech, telephony, and a platform fee on top. Proprietary platforms bundle and mark up all of these.

![Diagram showing stacked costs of a typical voice AI platform including LLM, Voice, Phone Call, and Platform Fee leading to a locked system](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/caea3dd4-26e4-4cf9-fe3f-780aaadf9200/public =1280x720)

Beyond cost, closed-source platforms mean limited visibility into why a call failed, no control over the underlying providers, and significant migration effort if pricing or features change. When an agent gives a bad response, you need to know whether it was a bad prompt, a slow API call, or an LLM error. Without access to the execution trace, debugging is guesswork.

![Graphic showing the four stages of a voice AI pipeline: Phone Call (Audio In), Speech-to-Text, LLM (Reasoning), and Text-to-Speech](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/91c7bbff-5173-4a28-6bbb-f730fbf4e900/lg1x =1280x720)

## Core components

![Slide showing the three pillars of Dograh: a Voice Engine, a Visual Workflow Builder, and The Platform Layer](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/8cc84afb-e0b8-48c2-c8cb-f7cf7cb24000/orig =1280x720)

**Voice engine.** Handles real-time audio streaming, connects the caller to the STT and TTS services and the LLM, and manages interruption detection and state throughout the conversation.

**Visual workflow builder.** A node-based canvas where conversation logic is mapped as connected nodes: prompts, branches, API tool calls, variable extraction, and call transfers. Drag-and-drop with per-node configuration in a side panel.

**Platform layer.** Call tracing, recordings, transcripts, and analytics built in. This is the layer that raw frameworks require you to build yourself.

## Local installation

Dograh runs via Docker Compose. Prerequisites: Git, Docker, and Docker Compose.

```command
git clone https://github.com/dograh-hq/dograh
```

```command
cd dograh
```

```command
REGISTRY=ghcr.io/dograh-hq ENABLE_TELEMETRY=true docker compose up --pull always
```

![Terminal showing docker compose up followed by Docker pulling containers including redis, cloudflared, api, ui, and postgres](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/aaf96152-fe28-436e-a120-9315e5b59900/lg2x =1280x720)

Docker starts the UI, backend API, Redis cache, and Postgres database. The dashboard is available at `http://localhost:3000` after startup.

## Building a lead qualification agent

The following walkthrough builds an inbound call agent that asks qualifying questions, extracts structured data, and transfers qualified leads to a human agent.

### Creating the agent

In the dashboard under **Voice Agents**, click **Create**. Set:

- **Call type:** Inbound
- **Use case:** Lead Qualifier
- **Activity description:** "Qualify inbound demo requests, create CRM lead, transfer to human."

Dograh generates a starter workflow from this description.

### Designing the workflow

![Dograh visual workflow builder canvas showing a generated flowchart of conversational nodes connected by lines](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/e5113793-8ac0-4f46-f6a5-58fca0217100/lg2x =1280x720)

The canvas shows the conversation as connected nodes. Clicking a node opens its configuration panel.

**Start Call node.** Configures the opening greeting and a system prompt with the agent's persona and rules. The prompt at each node scopes the LLM's behavior to that stage of the conversation, which makes the agent more predictable than a single global prompt.

**Main Agenda and Questions node.** The prompt instructs the agent to ask qualifying questions: "What are you looking to build today?" and "What is your company size and industry?" Enabling **Variable Extraction** tells the LLM to pull structured fields like `company_name`, `budget`, and `use_case` from the caller's responses and store them for downstream use.

![Configuration panel for a node showing the Prompt text area where system instructions for the LLM are written](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/7dd9d0db-b7c3-43f3-c331-5a5e97cd3500/md2x =1280x720)

**API Tool Call node.** After gathering information, a tool call node passes the extracted variables as arguments to an external API, such as creating a CRM lead.

**Branch and Transfer nodes.** A Branch node evaluates a condition (for example, whether `user_qualified` is `true`). A matching branch leads to a Transfer node that routes the live call to a human agent's phone number. A non-matching branch leads to an End Call node with a closing statement.

## Testing and debugging

### Web call testing

The builder includes a **Web Call** feature for testing in-browser. Speaking to the agent produces a live transcript on screen, with no phone number setup required.

### Agent Run Completed view

After each call, Dograh presents a detailed trace.

![Agent Run Completed screen showing the call transcript on the right and Initial Context and Gathered Context JSON data on the left](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/d5863430-84f0-475b-1b86-b7aec2dd3400/public =1280x720)

- **Transcript:** full turn-by-turn conversation log
- **Call trace:** the sequence of nodes activated, showing exactly which path the conversation took
- **Initial and gathered context:** raw JSON showing the agent's starting configuration and the state changes as the conversation progressed, including extracted variables and their values
- **Recording:** downloadable audio of the call

This trace answers the specific questions that matter during debugging: which node triggered, what variables were extracted, what was passed to the tool call, and where the conversation diverged from the expected path.

## Final thoughts

Dograh occupies the space between raw frameworks (full control, significant infrastructure work) and proprietary platforms (easy to start, opaque and expensive at scale). **The visual builder and built-in call tracing reduce the time-to-first-working-agent without hiding what is happening inside the system**.

The bring-your-own-provider model means you can optimize provider choices for cost or quality independently, and the self-hosted architecture means provider changes or platform pricing changes do not affect your deployment.

For teams building voice agents who have hit the limits of proprietary platforms or want to avoid those limits from the start, Dograh is worth evaluating. The Docker Compose setup takes a few minutes and the call trace view alone provides more debugging information than most closed-source platforms expose.

Source code and documentation are at [github.com/dograh-hq/dograh](https://github.com/dograh-hq/dograh).