Wouldn’t it be convenient if your readers could engage with your content whilst on their morning jog, making breakfast or driving to work, all without detriment to their routine? That’s the thinking behind text-to-speech engine Amazon Polly, unveiled in 2016. Sure, text-to-speech has existed in mainstream media for the best part of two decades, but it’s struggled to catch on in the mainstream, mainly due to its lack of naturalistic expression. However, with a company the size of Amazon behind it, Polly represents a massive leap forward in text-to-speech technology, as it’s by far the closest we’ve come to a realistic human voice in multiple languages.

Publishers’ interest in text-to-speech stems from consumers now listening to an average of around 17 hours of audio per week:

Average time spent listening to audio

So while audio still doesn’t compete with text or video, it’s quickly becoming a contender. Its convenience serves as a real asset.

Amazon Polly uses “advanced deep learning technologies to synthesize speech that sounds like a human voice”, which makes consuming journalism as audio all the more appealing. It’s a flexible service, with dozens of realistic voices across a variety of languages and even multiple speaking styles: “a Newscaster reading style that is tailored to news narration use cases and a Conversational speaking style which can be used for many use cases including telephony applications.” This hints at the future possibilities of Polly. That it was developed by a company as large as Amazon suggests a serious degree of support and resource for the application, so it only stands to develop and improve from this point. With that in mind, publishers looking to implement audio on-site or efficiently produce a new podcast may have just found their dream tool.

Reuters Institute for Journalism recently published a report entitled The Future of Voice and the Implications for News, in which it reveals that purchases of voice-activated speakers such as Amazon’s Alexa are growing faster than smartphones and tablets at a similar stage. The use of voice-activated speakers in the US, UK and Germany has doubled in the last year

Mathematics client The Face already integrates audio with its content. It boasts a bespoke audio player, currently used for specific features which are enhanced by audio, such as a transcription of songwriter and poet Arlo Parks’ ode to London:

The Face audio player

Amazon’s Polly could be an asset to your publication; the tool could dictate to users any article they wish, rather than specific features. Perhaps you pride yourself on the breadth of your coverage – Polly’s multiple language functionalities could be instrumental in expanding your readership.

Despite this explosive growth, the most effective audio monetisation strategy for publishers is still contested. One solution involves “a sponsorship message at the beginning of the listening experience. Esra Celebi for Purple Publish suggests “a 15 or 30-second ad could be one quick and easy way to generate revenue from audio content” but in reality most publishers won’t be able to put much sales resource into this area yet. Programmatic audio advertising is here, however, with Google rolling out audio ads to DoubleClick Bid Manager last year.

In all likelihood, much of the text-to-speech innovation built on Amazon’s Polly is still ahead of us. But, small-to-medium publishers would do well to start paying attention to the tool now. Audio is a tested and trusted way of consuming content and audio consumption is growing because consumers are rapidly realising its benefits. Whether you work for a blog or an internationally-renowned newsroom, audio is here to stay, and publishers would do well to take it seriously.