Playbooks Tools Jun 07, 2026

Local Text-to-Speech With Piper

A fast, lightweight text-to-speech engine that turns text into a WAV file in under a second, on a plain CPU, with nothing leaving your machine. Here is the exact setup we run for voiceover and audio articles.

Written byH Hillary

Read time9 min

UpdatedJun 12, 2026

Filed underPlaybooks · Tools

Most modern text-to-speech wants a GPU and a few seconds per sentence. Piper does not. It was built to run on a Raspberry Pi, which means it generates speech in sub-second times on a regular CPU, and it ships with pre-trained voices in more than 30 languages.

The quality is not ElevenLabs. It is “good radio narrator,” and that turns out to be enough for most of the work we actually do: audio versions of articles, voiceover for short videos, scratch tracks for editing, accessibility outputs, narration drafts to review before recording the real thing. We reach for Piper any time the job is “I need narration and don’t care whose voice it is.” Here is how to set it up.

What you will end up with

Piper installed in its own Python environment.
At least one voice downloaded to your disk.
One command that takes text on standard input and writes a WAV file, with nothing leaving your machine.

No subscription, no API latency, no per-character billing.

Before you start

You need a Mac, Windows, or Linux machine and Python 3. That is close to the whole list. Piper is genuinely light: the Atlas note pegs it at around 1GB of RAM, and it runs on CPU, so you do not need a GPU. Apple Silicon runs it at speed with no special setup.

Piper is command-line and library only. There is no desktop app to double-click. The commands are short, but you will be in a terminal.

One note on disk: the first voice you download is several hundred MB. Voices after that are much smaller.

Step 1: Install Piper in a virtual environment

Install Piper into its own Python environment so its dependencies stay out of everything else.

# create and activate a virtual environment
python3 -m venv piper-env
source piper-env/bin/activate

# install
pip install piper-tts

On Windows, activate with piper-env\Scripts\activate instead, then pip install piper-tts. If you would rather skip Python entirely on Windows or on a Raspberry Pi, standalone binaries are on the releases page at https://github.com/rhasspy/piper/releases.

Step 2: Download a voice

Piper does not ship a voice with the package. You pull one, once, and it stays on disk.

# see every voice available
python -m piper.download_voices --list

# download US English, medium quality
python -m piper.download_voices en_US-lessac-medium

The lessac English voices are among the better ones, which is why we start there. Voice quality varies a lot between voices, so audition before you commit to one. You can hear them all first at https://rhasspy.github.io/piper-samples/.

Want a second voice? Pull it the same way:

python -m piper.download_voices en_GB-alan-medium

Step 3: Generate your first audio

The basic flow is the whole tool: pipe text in, get a WAV out.

echo "Hello, this is Piper speaking on my local machine." | \
  piper --model en_US-lessac-medium --output_file hello.wav

Open hello.wav in any media player. That is the entire loop.

Reading from a file works the same way, which is how you turn a written article into audio:

cat article.txt | piper --model en_US-lessac-medium --output_file article.wav

And switching voices is just a different --model:

echo "British English now." | \
  piper --model en_GB-alan-medium --output_file british.wav

Prove it works

Two things tell you the install is healthy.

First, the file exists and plays back as clear speech. Open the WAV and listen: you want a clean, neutral narrator voice, not static or silence.

Second, the speed. Generate a single sentence and watch the terminal. On a regular CPU the file is written in under a second, which is the point of Piper and the thing that feels different if you are used to waiting on a cloud TTS API.

You can also confirm the voices you have downloaded are where Piper expects them:

# macOS / Linux
ls ~/.local/share/piper-tts/voices
# Windows: %USERPROFILE%\.local\share\piper-tts\voices

When a short clip plays back as a real narrator voice, generated in a blink on your own CPU, you have what people pay a cloud TTS service for, running locally for free.

Trade-offs and gotchas

Piper is excellent at its lane and honest about its limits. A few things to know before you lean on it.

The voices are neutral. There is no prosody control beyond which voice you pick. You cannot tell Piper to “read this excitedly.” If you need emotional or character narration, this is the wrong tool, and the Atlas note points to Bark or a cloud option for that.
No voice cloning. Piper cannot recreate your voice or an actor’s from a sample. For that, the note recommends Coqui XTTS-v2 or F5-TTS.
No SSML support. The standard markup for pauses and emphasis is not available, so you work in plain text. The workaround is to chunk text manually and join the WAVs.
Technical terms can trip it. URLs, acronyms, and brand names sometimes get mispronounced. Pre-process or spell them phonetically when it matters.
Voice quality genuinely varies. Some voices are crisp, others noticeably more robotic. Always audition from the samples page before building anything around a voice.

Our verdict, in short: Piper is our default for any text-to-speech that does not need a specific voice. Free, fast, light, offline, voices in dozens of languages. It covers the middle 70 percent of voiceover work, the part where you just need clear narration. For voice cloning we would use XTTS-v2 or F5-TTS; for the last mile of emotional nuance we would still pay for ElevenLabs. Everything else is Piper. It helps that the tool comes out of the Rhasspy smart-home project, which means it is actively maintained and battle-tested in real-time use.

Where to go next

The obvious pairing is with a local model. Use Ollama to draft the narration, a summary, or an audio version of an article, then pipe that text straight into Piper. The text generation and the speech generation both stay on your machine, and neither one bills you.

If you are mixing Piper voiceover into a video that already has audio, Demucs can help clean and separate the existing track first. We cover the local model setup in a separate Playbook.

You now have text-to-speech running on your own CPU, fast enough to feel instant, with no per-character bill attached. Curious about these things. You should be too.

Harness your curiosity.

— Stridenote · № 015

What you will end up with

Before you start

Step 1: Install Piper in a virtual environment

Step 2: Download a voice

Step 3: Generate your first audio

Prove it works

Trade-offs and gotchas

Where to go next

More from playbooks.

PDF and Docs to Clean Markdown Locally With Marker

Give a Local Coding Agent Safe Access to Your Files

Migrate Off Paid AI Tools to Open-Source: A Step-by-Step