Voice Cloning
Voice cloning is the synthesis of a target speaker's voice from a short reference audio sample, allowing a TTS system to produce new speech in that speaker's timbre, accent, and (to a lesser extent) speaking style. Modern systems from vendors like ElevenLabs and Cartesia can produce a usable clone from seconds to minutes of clean reference audio, rather than the hours required by older speaker-adaptation approaches. The capability raises real consent and authentication concerns: deployments that clone real people's voices need explicit permission from the speaker, and downstream consumers (banks, support lines) increasingly cannot treat a familiar voice as proof of identity.
Example
An audiobook publisher records 30 minutes of a narrator reading reference passages, uploads the sample to ElevenLabs to create a custom voice, then generates the remaining 8 hours of the book in that voice without requiring further studio time. The narrator's contract specifies what the cloned voice may and may not be used for after the project ships.
Put this into practice
Build polished, copy-ready prompts in under 60 seconds with SurePrompts.
Try SurePrompts