Artificial Intelligence


Turn text into lifelike speech using deep learning

Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products. Amazon Polly is a Text-to-Speech (TTS) service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice.

With dozens of lifelike voices across a variety of languages, you can select the ideal voice and build speech-enabled applications that work in many different countries. In addition to Standard TTS voices, Polly offers Neural Text-to-Speech (NTTS) voices, delivering advanced improvements in speech quality through a new machine learning approach, offering customers one of the most natural and human-like text-to-speech voices on the market. Neural TTS technology also supports a Newscaster reading style that is tailored to news narration use cases.

Use cases

Content Creation

Audio can be used as a complementary media to written and/or visual communication. By voicing your content, you can provide your audience with an alternative way to consume information and meet the needs of a larger pool of readers. Polly can generate speech in dozens of languages, making it easy to add speech to applications with a global audience, such as RSS feeds, websites, or videos.
Example: Convert an article to speech and download as MP3


Polly enables developers to provide their applications with an enhanced visual experience such as speech-synchronized facial animation or karaoke-style word highlighting. Polly makes it easy to request an additional stream of metadata with information about when particular sentences, words and sounds are being pronounced. Using this metadata stream alongside the synthesized speech audio stream, customers can animate avatars and highlight text as it is currently spoken text in their app.
Example: Play speech and highlight spoken text


With Polly, your contact centers can engage customers with natural sounding voices. You can cache and replay Polly’s speech output to prompt callers through interactive voice response (IVR) systems, such as Amazon Connect. Additionally, you can leverage Amazon Polly’s API to deliver automated real-time information such as service status, account and billing inquiries, addresses, and contact information.

Example: Text-to-speech for telephony systems

Get Started