Text-to-Speech Voice AI: Meet ElevenLabs

Use its app to convert any book into an audio book; also use its tools for voice cloning, dubbing and speech-to-text.

May 16, 2025

This is a subscriber-only edition in the ‘Meet Ai’ section of the newsletter. I launched it with the goal of introducing you each week to a new AI company and its AI tools that you could use for work, side hustles or play!

These articles are not meant to be reviews of the company or tool. My goal is to reduce the ‘overwhelm factor’ related to new AI tools and make it easier if you want to switch to using something that saves you time and increases productivity.

Consider upgrading to a paid subscription to access the entire article. Here is an example of a free one.

Meet AI posts like this one are not sponsored posts. In some cases, we might earn an affiliate commission at no extra cost to you if you click on links and make a purchase.

There are two sections to this article.
Part1 covers the founding and evolution of the company based on research from publicly available sources.
Part 2 is focused on the company’s AI tools and ‘how it works’ so that you can have enough information to get started using them.
p.s. if you are in a rush to adopt an AI tool for your assignment, feel free to scroll and skip straight to Part 2. But do come back to the story when you have time. It is worth a read.

🔥TL;DR

What it does-

Provides AI-powered tools for converting text-to-speech or speech-to-text, voice cloning and dubbing. Generate human-like, natural-sounding AI audio.

Whom it’s for-

Individual content creators such as podcasters, animators, others.
Businesses that want to increase reach, accessibility and publish content in multiple languages using realistic-sounding voices. Examples include audiobook publishers, corporate videos, gaming, education, customer support and other content.

Selected common use cases for consumers -

Use their ElevenReader app to instantly convert any book into an audio book and any pdf or newsletter (such as this one) into an audio newsletter by uploading or linking it, selecting the best sounding voice from its library and listening instead of reading. Currently, this tool is free to use!
Transcribing anything - literally anything, whether a taped conversation, interview, podcast or those voice notes you took in class because you didn’t feel like writing.
Dubbing YouTube or other video content into multiple languages so that you can widen your audience as a content creator.

Price-

Free plan and paid plans ($5 - $1,320 per month when billed monthly plus custom-priced enterprise plans; as of May 2025).

👋Meet ElevenLabs

ElevenLabs is all over tech publications these days! For multiple reasons such as -

Becoming a unicorn; i.e. a startup with a valuation >$1 billion (currently at >$3 billion). Unlike most other unicorns, ElevenLabs reached this milestone in just 2 years. No Terrible Twos for this one!
Innovating fast with new voice-related features and speech-generating tech.
Partnerships with companies like Spotify which started accepting audiobooks that use ElevenLabs’ digital voice narration technology or German media company Bertelsmann which will use ElevenLabs’ audio tools to support production for its various units.
Licensing the voices of well-known celebrities such as Deepak Chopra for use in its tools.
Being used by famous podcasters such as Lex Fridman to translate interviews with leaders like Indian Prime Minister Modi and others into multiple languages. Or Andrew Huberman to dub his monologue in real time in different languages.
Click on the image below to see the video in action - it’s cool!

Origin Story in Brief

Like many AI startups, ElevenLabs is a very young company. It was started in 2022 by former Google employee Piotr Dabkowski and former Palantir employee Mati Staniszewski.

As the story goes, they met as children growing up in Poland. At the time, movie dubbing in Poland for foreign films contributed to a poor viewing experience. (As someone who also grew up in a foreign country and had to suffer through terrible dubbing for Hollywood movies, I can relate to it).

Fast forward to adulthood, the founders decided to take that problem on with the goal to improve voice synthesis, make synthetic speech sound more natural and thus increase the accessibility of all content.

Company logo

In its introduction on its website, the company describes itself as an ‘AI audio research and development company’. With a significant portion of key functions such as product development and research based mostly in Europe and sales and partnership offices in the United States and other parts of the world, the company has a global footprint. In fact, its products have a lot of demand from markets whose residents use multiple languages such as those in Asia.

Finding success early on

ElevenLabs launched a prototype of its tool in Jan 2023. In just a few months, it had already grown to over 1 million users. In addition to individual consumers, business customers like The Washington Post and others started using its tools to generate audio versions of their newsletters and content. Although the list of enterprise (business) customers is growing, a significant part of its revenues comes from non-corporate consumers.

While exact numbers are unavailable, it is estimated that the company’s employee headcount now stands at over 250. Also hard to confirm since it a privately-held company, it is believed that the company’s annualized recurring revenue (ARR) is now around $80 million+.

Startups that achieve stardom this quickly, in under 3 years in the case of ElevenLabs, do so because their product serves a real and urgent need in the market and serves it with quality.

ElevenLabs’ tool replicates human voices in multiple languages with high accuracy, which makes its product this sought-after. It has the lowest word error rates of over 96% for many languages. Its tools are also known to generate ‘context-aware speech’ and detect non-verbal sound effects.

What does the future look like for ElevenLabs?

Crunchbase, which tracks all types of startup activity, predicts that ElevenLabs will continue on its growth trajectory.

At this point in time, that prediction seems sound. There is certainly competition including that from Google and OpenAI’s voice tools. So far, per media reports, they haven’t been able to beat the quality of ElevenLabs’ voice cloning capabilities. However, there are some startups that are creating recent buzz. One of these is Dia, a text-to-speech model that seems to have more advanced controls and features.

There are some more ifs and buts here. If ElevenLabs continues to acquire corporate clients, which are known to be more sticky than consumer clients, it can signal more reliable revenues in the coming years. If it innovates at the same rate as in the past, it can attract more customers from all over the world. So far, with each new funding round, the company develops and tests new capabilities and features.

For now, the future is looking bright!

Let’s check out their AI tools in the next Section.

Part 2: Audio AI Tools

In this section, I will share more about its features and how it works so that you can have enough information to get started.

This is not meant to be a review of ElevenLabs per se. My goal is to reduce the ‘overwhelm factor’ related to new AI tools and hopefully make it easier if you want to switch to using something that saves you time and increases productivity.

Visit the ElevenLabs website and you can see six different tools that you can use depending on your context.

Text-to-Speech - converts text to speech
Speech-to-Text - converts speech to text transcript
Conversational AI - build AI Agents which can ‘converse’ and communicate with your clients or customers
Dubbing - as in movies, YouTube videos and other content in different voices and languages
Voice cloning - clone your own voice
ElevenReader - a way to listen to any content such as books, articles, etc.

While this overview on the home page gives you an idea of its product capabilities, you need to ‘go to app’ to start actually using them.

Testing out the tools

I created a free account and started testing out some of the tools. There are many options and customizations. It is not possible to cover all of them with limited time and space. But here’s some information below.

Text-to-Speech

I copy-pasted a paragraph from this article - see image. Before you generate speech from this text, you can select the voice (they have a large library of voices that you can filter by gender, age and even accent), the speed of the speech and other voice elements. You just need to play around with these to figure out the balance that works for you.

When you hit the ‘Generate’ speech button, the selected voice will read out the text at the speed and style you picked. You can download the final version as an mp3 file.

When you select a voice, you can also choose whether you’d like it to sound like a narration, conversation, confident and so on.

In fact, the ElevenLabs voice library has become so popular that there are subreddits with interesting comments such as these below.

Speech-to-Text

As the term suggests, this tool, called Scribe, helps you transcribe any speech, be it interviews or any other conversation or monologue. ElevenLabs claims that it is the most accurate model that exists and can transcribe speech in 99 languages. A researcher at ElevenLabs tweeted this about Scribe.

Studio

The Studio section in the workspace is where some of the magic happens. You can create an audio book by uploading content or creating it from scratch by chapter. You can also generate a podcast by choosing the host and guest voices, length of podcast, topic (or use your own content).

While I do prefer to create original content with live guests for my own podcast, I can see how someone looking to create podcasts for marketing purposes might be able to use this tool.

Dubbing

On that topic of podcasting, at some point in the future, I can see myself translating my podcast audio and video into different languages. As you see in the images below, ElevenLabs makes it as simple as importing your video (you don’t even need to upload it; you need only paste a link), selecting the source language and target language, a few more customizations and that’s it!

Conversational AI

This tool sticks out a bit from the rest because it caters more to businesses that want to build AI Agents. If you want to know more about Agents in non-technical language, check out our article on this topic.

To create an AI Agent that can answer customer questions for example, you would need to complete a few steps to get it up and running. ElevenLabs helps you deploy it but that is only after you have uploaded the content that you would like the Agent to use (or pointed to a knowledge base via website links), selected its persona and other such checklist items. p.s. if this is of interest, message me and I can help you navigate this section and create an Agent.

ElevenLabs’ voice chat function for customer support has been developed using its own Conversational AI tool…naturally!

I was pretty impressed with how human-like this voice agent sounded. I clicked the voice chat help button shown above. Instead of typing in what I was looking for, I had an actual conversation with the Conversational AI Agent which was able to resolve my issue. You can also change the language in which you want to communicate with the Agent.

ElevenReader

Most ElevenLabs AI tools are browser-based except for ElevenReader, which is available as an app on mobile phones. It is somewhat of a standalone product too. You can download it on your phone and use it to read books, newsletters or pretty much any other document. It’s basically a way to instantly convert any book into an audio book or any article into an audio article!

As with the other ElevenLabs tools, you can choose your preferred voice from its library and language. The company has also licensed the voices of celebrities like Deepak Chopra and the voices of some dead celebrities (licensed through their estates) like Judy Garland, which has created some ethics-related controversy.

The company also launched ElevenReader Publishing, a platform for books rights holders to convert their text-based books to AI-generated audiobooks and offer them through the ElevenReader app. For now, these audiobooks are available to users for free although the company plans to monetize it at a later date.

There are additional interesting features such as being able to add voiceovers and generate a variety of sound effects, to name a couple. The best way to find them is to get onto their platform and start playing with the various tools and features. Apropos of this, they refer to a part of their workspace as ‘Playground’. I found that amusing and clever because that is exactly how one might get started using their app.

**For several how-to videos describing different capabilities, check out their YouTube channel.

APIs for those technical customers!

An API or Application Programming Interface, in non-technical terms, is packaged code that facilitates communication and information-sharing between two entities, usually two businesses.

APIs come in handy when the volume of interaction between the entities is high and/or frequent. They are also useful for businesses that want to use a particular product, such as ElevenLabs’ tools, to create new applications without having to build their own. ElevenLabs makes APIs available to developers for a price.

So, what does it cost?

On that note of price, there are different pricing tiers available to customers. Like many other AI-powered tools (e.g. Gamma), prices are based on usage. Each plan has a fixed price for a certain number of credits. Paid plans range from $5 a month when billed monthly to $1,320 a month.

Some features are available only with higher priced plans, as is also the norm with tech products these days. The Creator plan is the minimum plan to start accessing some tools such as voice cloning and higher quality audio. If you are a company, you can also avail of a customized enterprise plan with custom pricing.

How are credits charged?

Each plan, whether free or paid comes with a set amount of credits that you can use per month - kinda like a debit card! Each task you complete incurs credits which get deducted from your account.

The number of credits used depends on the amount of processing. For example, for text-to-speech, it depends on the number of characters; conversational AI is charged based on minutes of conversation and dubbing is based on the length of audio.

Once you reach the credit limit for your plan, you can either upgrade or purchase additional characters or minutes. These add-on prices vary by plan.

Why use this tool?

Here are some reasons why I would use ElevenLabs’ voice AI tools.

High quality audio

The speech and sound quality of any ElevenLabs product is superior to others that I have tried; whether audio books or the conversational AI Agent. But don’t take my word for it. Get on their web platform and try one of the samples for yourself.

Creative freedom

As someone who hosts podcasts, I spent a considerable amount of time editing audio and sound. My podcast episodes are recorded in English. Previously, even the thought of getting them translated into the other two languages that I grew up with would have seemed costly, cumbersome and time consuming. With ElevenLabs’ dubbing tool, all it would take is clicking a few buttons on the web app.
The same applies to transcripts. The tools that I currently use for transcribing my podcast episodes do a decent job but I still need to spend time correcting errors. A quick test of the ElevenLabs’ Scribe tool showed higher accuracy in converting speech-to-text.

Ease of use

The tools are simple to use. You can easily find what you need to take the next step. This reduces the cognitive burden of searching for the right buttons to push. That being said, unless you are already familiar with audio features, effects and terminology, there can be a bit of a learning curve. But, the app does a good job of explaining what the terms mean in layman language.

Affordability

The price points are affordable for infrequent use but could add up if you were to use the tools often. The tradeoff is that given how much you can broaden your potential audience with just a few clicks by making your content more accessible as a creator or a business, it can be worth it.

A lot of flexibility

If you are new to audio editing and manipulation, it can take some trial and error to get the output you desire - pretty similar to the first time you tried to maximize the beauty of your scenery picture on Instagram. But, ElevenLabs’ platform gives you a lot of flexibility to create the audio or voice experience you want.

A final note on tools misuse

There are many dangers to providing tools that can easily generate speech that sounds exactly like a human. ElevenLabs is familiar with these dangers as there have been many cases of its technology being misused to mislead people; for example faking celebrity voices and spreading hate speech. Unfortunately, the possibilities for nefarious activities are significant.

To combat this, the company launched detection tools that allow you to check if an audio was generated using their technology in addition to other methods to improve traceability, moderation and safety of content. You can read more about their safety efforts here.

Thank you for reading, learning and rising!

Share Where Data meets Business

Where Data meets Business