The open-source AI voice studio

Clone, dictate and create.

Clone voices, generate speech across seven TTS engines, dictate into any app, and talk to agents in voices you own. A free and local alternative to ElevenLabs and WisprFlow, running entirely on your machine.

Download View on GitHub

macOS, Windows, Linux

try me!

Avalanche VoiceBox

Jarvis

Dry wit, composed British AI assistant

Samuel L. Jackson

Commanding intensity with sharp, punchy delivery

Bob Ross

Gentle, soothing voice full of quiet encouragement

Sam Altman

Measured, thoughtful Silicon Valley cadence

Morgan Freeman

Rich, warm baritone with gravitas and calm authority

Linus Tech Tips

Enthusiastic, fast-paced tech explainer energy

Fireship

Rapid-fire, deadpan tech humor with zero filler

Scarlett Johansson

Smooth, low alto with understated warmth

Dario Amodei

Calm, precise articulation with academic depth

David Attenborough

Warm, reverent narration with wonder and precision

Zendaya

Relaxed, modern delivery with effortless cool

Barack Obama

Measured cadence with rhythmic pauses and gravitas

Jarvis

Dry wit, composed British AI assistant

Samuel L. Jackson

Commanding intensity with sharp, punchy delivery

Bob Ross

Gentle, soothing voice full of quiet encouragement

Sam Altman

Measured, thoughtful Silicon Valley cadence

Morgan Freeman

Rich, warm baritone with gravitas and calm authority

Linus Tech Tips

Enthusiastic, fast-paced tech explainer energy

Fireship

Rapid-fire, deadpan tech humor with zero filler

Scarlett Johansson

Smooth, low alto with understated warmth

Dario Amodei

Calm, precise articulation with academic depth

David Attenborough

Warm, reverent narration with wonder and precision

Zendaya

Relaxed, modern delivery with effortless cool

Barack Obama

Measured cadence with rhythmic pauses and gravitas

Generate speech using Jarvis...

EnglishQwen 1.7BRobot

Morgan Freeman

enQwen 1.7B0:08

2 minutes ago

The neural pathways of human speech contain more complexity than any language model can fully capture, yet we keep pushing the boundaries of what is possible.

Samuel L. Jackson

enQwen 1.7B0:07

15 minutes ago

In a world increasingly shaped by artificial intelligence, the human voice remains our most powerful tool for connection and storytelling.

Jarvis

enQwen 0.6B0:09

1 hour ago

The architecture of modern text-to-speech systems reveals an elegant interplay between transformer models and acoustic feature prediction.

Bob Ross

enChatterbox0:06

3 hours ago

Welcome to the next chapter. Every great story begins with a single voice, and today that voice can be yours.

Linus Tech Tips

enQwen 1.7B0:05

5 hours ago

Local inference gives you complete control over your voice data. No cloud, no subscriptions, no compromises.

0:00/0:00

Jarvis

Professional voice tools, zero compromise

Everything you need to clone voices, generate speech, and produce multi-voice content — running entirely on your machine.

Near-Perfect Voice Cloning

Multiple TTS engines for exceptional voice quality. Clone any voice from a few seconds of audio with natural intonation and emotion.

Stories Editor

Create multi-voice narratives with a timeline-based editor. Arrange tracks, trim clips, and mix conversations between characters.

Audio Effects Pipeline

Apply pitch shift, reverb, delay, compression, and more — then save as presets. Preview effects live and set defaults per voice profile.

Local or Remote

Run GPU inference locally with Metal, CUDA, ROCm, Intel Arc, or DirectML — or connect to a remote machine. One-click server setup with automatic discovery.

Audio Transcription

Unlimited Generation Length

Generate up to 50,000 characters in one go. Text is auto-split at sentence boundaries, generated per-chunk, and crossfaded seamlessly.

Any clip becomes a voice.

Three ways to get a sample in. Upload a clip, record from your microphone, or capture audio playing on your system. Avalanche VoiceBox clones the voice from as little as 3 seconds of audio.

Upload a clip

Drag and drop any audio file — WAV, MP3, FLAC, or WebM.

Record from microphone

Live waveform preview while you record. Up to 30 seconds.

System audio capture

Clone a voice from a YouTube video, podcast, or any app playing audio.

Start Recording

Click to record from your microphone.
Maximum duration: 30 seconds.

Capture

Dictate anywhere. Paste into any app.

Hold a shortcut anywhere on your machine, speak, release. The transcript lands in a focused text field in any app, or your clipboard. Agents speak back through the same pill in any cloned voice.

Hold

⌘⌥

on macOS,

CtrlAlt

on Windows — from anywhere on your machine.

Recording

0:00

Whisper Base74M99 langs

Whisper Small244M99 langs

Whisper Medium769M99 langs

Whisper Large1.5B99 langs

Whisper Turbo809M99 langs

Whisper, sized for every machine

Base, Small, Medium, Large, and Turbo. Pick the size that fits your hardware and quality bar — 99 languages across every tier, all running locally.

rawum so like i think we should ship it on friday, actually no wait, tuesday

cleanI think we should ship it on Tuesday.

Qwen3 · refining...

Refined transcripts

A local LLM cleans ums, self-corrections, and punctuation without rephrasing. Optional, toggleable, and never leaves your machine.

via MCP·Claude Code

Speaking · Morgan

“Tests passing. Ready to merge.”

Agents speak in voices you own

Any MCP-aware agent — Claude Code, Cursor, Cline — gets a voice with one tool call. The pill surfaces when an agent is speaking, so you always see what’s coming out of your machine.

MCP

Every agent gets a voice.

One tool call — avalanche-voicebox.speak— and any MCP-aware agent can talk to you in a voice you’ve cloned. Claude Code, Cursor, Cline, or anything that speaks MCP.

01Add Avalanche VoiceBox to your MCP config

{
  "mcpServers": {
    "avalanche-voicebox": {
      "url": "http://127.0.0.1:17493/mcp"
    }
  }
}

02The tool is now available

// In any MCP-aware agent:
await avalanche-voicebox.speak({
  text: "Deploy complete.",
  profile: "Morgan",
})

Also exposed as POST /speakfor anything that doesn’t speak MCP — ACP, A2A, shell scripts, or custom harnesses.

Claude Code

$claude run

✓Tests passing (42 files)

✓Build succeeded in 12.4s

→avalanche-voicebox.speak({ profile: "Morgan" })

On your desktop

Speaking · Morgan

“Tests passing. Ready to merge.”

Per-agent voice

Bind each MCP client to a voice profile. Claude Code in Morgan, Cursor in Scarlett — you know which agent is talking without looking.

Always visible

Every agent-initiated speech surfaces the pill. No silent background TTS — you always see what’s coming out of your machine.

Open protocols

MCP ships day one. ACP, A2A, and anything else built on a tool-call primitive slots into the same endpoint.

Personalities

Voices with a personality.

Give any voice profile a free-form personality. Then Rewrite your text in their voice, or let them Compose a fresh line of their own — your cloned voice, in full character.

Marlowe

Voice profile · cloned from a 12s sample

Personality

“1940s noir detective. World-weary, cynical, every situation a metaphor for the city's underbelly. Talks like he's seen one stack trace too many.”

Rewrite

Compose

Your text

the build is done and we shipped to production

In character

Marlowe, in character

“Build's wrapped, ship's left the dock. Another stack of code makes its way into prod, another row of green checks lining the wall.”

Rewrite

Restate your text in their voice while preserving every idea. Same content, their delivery — for scripts, dubs, and consistent character voice across long-form work.

Compose

No input needed — hit the button and the character improvises a fresh line of their own. Roll again for another take. Useful for game dialogue, narration cues, or character barks.

Built-in REST API

Your local voice API

Every engine you download becomes a REST endpoint on your machine. Build apps, games, and voice tools with full programmatic control — no API keys, no rate limits, no per-character fees.

API Reference

http://127.0.0.1:17493

POST/generateGenerate speech

POST/generate/{id}/cancelCancel a generation

GET/profilesList voice profiles

POST/profilesCreate a new profile

GET/models/statusModel catalog & state

GET/historyPast generations

GET/healthServer health

See the full OpenAPI reference at /docs when Avalanche VoiceBox is running →

Generate a linecurl

curl -X POST http://127.0.0.1:17493/generate \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to the game, player one.",
    "profile_id": "b3f1c2d4-5e6f-4a7b-8c9d-0e1f2a3b4c5d",
    "engine": "qwen_custom_voice",
    "instruct": "warm, slow, cinematic"
  }' \
  --output line.wav

Games

Generate NPC dialogue on the fly, localize characters into new languages, or ship expressive voice lines without a studio.

Apps & agents

Give your app or AI agent a voice. Real-time narration, accessibility readouts, voice replies — all running on the user's machine.

Scripts & tools

Batch-generate audiobook chapters, automate podcast intros, or wire Avalanche VoiceBox into your Stream Deck. It's just a localhost URL.

No API keysNo rate limitsNo per-character feesWorks offlineYour audio, your machine

Video tutorials

Learn by watching

Walkthroughs from the community covering setup, voice cloning, and production workflows.

YouTube

Free AI Voice Generator on Your PC (Clones Any Voice)

Kevin Stratvert

YouTube

NEW Voicebox DESTROYS ElevenLabs?

Julian Goldie SEO

YouTube

This Open-Source TTS App Sounds Scary Good (And It's Free)

Dave Swift

YouTube

2026年最好的声音克隆工具？Voicebox完整测评：从下载到API调用，附速度对比

Tech指南

YouTube

Get Started with Voicebox: Open-Source Alternative to ElevenLabs Tutorial

StinkyScrublet

YouTube

Free AI Voice Generator (Clones Any Voice)

mikbes

Supported models

Pick the right model for every job — TTS, transcription, refinement. All models run locally on your hardware. Download once, use forever.

TTS Engines

Text → speech. Voice cloning, preset voices, and delivery control.

07 models

Qwen3-TTS

by Alibaba

1.7B0.6B

High-quality multilingual cloning with natural prosody. The only engine with delivery instructions — control tone, pace, and emotion with natural language.

10 langsDelivery instructions

Chatterbox

by Resemble AI

Production-grade voice cloning with the broadest language support. 23 languages with zero-shot cloning and emotion exaggeration control.

23 langs

Chatterbox Turbo

by Resemble AI

350M

Lightweight and fast. Supports paralinguistic tags — embed [laugh], [sigh], [gasp] directly in your text for expressive speech.

Fast[tag] support

LuxTTS

by ZipVoice

Ultra-fast, CPU-friendly cloning at 48kHz. Exceeds 150x realtime on CPU with ~1GB VRAM. The fastest engine for quick iterations.

150x realtime48kHz

Qwen CustomVoice

by Alibaba

1.7B0.6B

Nine premium preset speakers with natural-language style control. "Speak slowly with warmth", "authoritative and clear" — tone and pace adapt.

Instruct control10 langs

TADA

by Hume AI

3B1B

Speech-language model with text-acoustic dual alignment. Built for long-form — 700s+ coherent audio without drift. Multilingual at 3B.

10 langsLong-form

Kokoro

by hexgrad · Apache 2.0

82M

Tiny 82M-parameter TTS that runs at CPU realtime with negligible VRAM. Pre-built voice styles — pick a voice, type, generate.

CPU realtimePreset voices

Transcription

Speech → text. Multi-language STT for dictation and captures.

02 models

Whisper

by OpenAI

1.5B769M244M74M

The default. Mature multilingual ASR across a wide size range — pick Tiny for speed or Large for best accuracy.

99 langs

Whisper Turbo

by OpenAI

809M

Pruned Whisper Large v3. Near-best quality at roughly 8x the speed — the right default for real-time dictation.

99 langs8x faster

Language Models

Transcript refinement, persona replies, and on-device reasoning.

01 model

Qwen3

by Alibaba

4B1.7B0.6B

Powers transcript cleanup, persona voice replies, and the voice I/O loop. Shares its runtime with the TTS/STT stack — one model cache, one GPU story.

RefinementPersona replies

Loved by users

What people are saying

Avalanche VoiceBox has passed 1M+ downloads. Here's a handful of notes from the people using it every day.

Cloning my own voice was a snap — and now I can hear my reminders and to-dos from my digital doppelgänger. Very cool.

jimzip

It's better than most other paid services.

Peiming Pai

I'm using this great tool for my multimedia course project — you made it into the classrooms!

theanoma.ly

This app is fantastic! I use it to learn languages and for my learning materials. Congratulations, it's great!

Kevin Serrano

Absolutely amazing! The learning curve was very short. Thank you for a great program and for making it free.

DJWhy

First engine I tried, zero config. It worked! Amazing.

Fitz

This is great for people who are uncomfortable with advocating for themselves in public. Thanks for making it.

creativeaction.ca

Thanks for this. It's a life-saver!

The Cowboy Movie Channel

Fantastic open-source app!

Mitja

From supporters on Buy Me a Coffee →

Download Avalanche VoiceBox

Available for macOS, Windows, and Linux. No dependencies required.

View all releases on GitHub

Official token

$VOICEBOX on Solana

An optional way to back the project and have some fun. Avalanche VoiceBox is and always will be free and open source — the token is not required to use anything here.

Learn about $VOICEBOX