Announcement: AI Narrations Available for All New LessWrong Posts

Ruby; Raemon; peter_hartree; TYPE III AUDIO

Awesome! My dyslexic friend may finally get to listen to my writing :)

A few suggestions for improvement:

Auto-start the narration when the speaker icon is clicked
Let the user set a default speed in the user settings, or alternatively, remember which speed the user used last and apply it next time they listen to a post.
Add the speaker button to posts previews in recent discussion
Have a hovering mini-player on the side so you can easily pause, play, rewind, forward, increase or decrease speed from anywhere in the page.
Have a visual indicator on the audio timeline to show you where section headings are are so you can jump to them like you can with the table of contents.

Thanks! We do have feature (2)—we remember whatever playback speed you last set. If you're not seeing this, please let me know what browser you're using.

[-]Yoav Ravid2y20

Oh, great! I didn't check if it exists before writing it down (whoops), so it probably works :)

[-]MondSemmel2y50

Feedback on this specific audio narration, and the feature in general: (I've also submitted this via the Feedback button.)

At 0:44, there's a line "That's the end of that list", which is not in the written text. Maybe there's some logic here which assumes that a colon is followed by a bunch of bullet points? In this case, there was no list (zero bullet points), and so the line "That's the end of that list" makes no sense. And besides, there never was a corresponding preceding line à la "Here's a list of bullet points".
At 0:52, there's a narration line "Here's a list of bullet points. Podcast feeds.". Here, there is a list of bullet points, but the narration line is inserted before the subheading of "Podcast feeds", rather than where it's supposed to be in the audio, namely afterwards.
At 1:19, the narration says "will continue narrating selected curated posts for now.", but the text says "will continue narrating most curated posts for now.". Presumably the text has been edited after the audio was generated. If the audio can get out of sync with the text, that's a conundrum that has to be solved somehow. Generating new audio for every edit is presumably prohibitively expensive. Although this would by no means be sufficient, the audio must at least indicate somehow that it's out of date with the post. But then we're still left with the problem where you might listen to the audio of an essay which has since been edited to say "I've changed my mind; everything I said here is wrong".

[-]Solenoid_Entity2y30

Thanks for the feedback!

The audio reflecting updates to the text is relatively easily fixed, and that feature is in the pipeline (though for now user reports are helpful for this.)

There's some fairly complex logic we use for lists — trying to prevent having too many repetitive audio notes, but also keeping those notes when they're helpful. We're still experimenting with it, so thanks for pointing out those formatting issues!

[-]Henry Prowbell2y*58

I really like the way it handles headlines and bullet point lists!

In an ideal world I'd like the voice to sound less robotic. Something like https://elevenlabs.io/ or https://www.descript.com/overdub. How much I enjoy listening to text-to-speech content depends a lot on how grating I find the voice after long periods of listening.

[-]peter_hartree2y50

Thanks! We're currently using Azure TTS. Our plan is to review every couple months and update to use better voices when they become available on Azure or elsewhere. Elevenlabs is a good candidate but unfortunately they're ~10x more expensive per hour of narration than Azure ($10 vs $1).

[-]Yoav Ravid2y20

I think the cost per million words measure from the previous version of your comment was also useful to know. Did you replace it because it's incorrect?

[-]peter_hartree2y30

I replaced it because it seemed like a less useful format.

Azure TTS cost per million characters = $16
Elevenlabs TTS cost per million characters = $180

1 million characters is roughly 200,000 words.

One hour of audio is roughly 9000 words.

[-]Chris Lakin2y30

Does the narration re-do when posts get edited?

[-]Solenoid_Entity2y10

Currently we can trigger this if someone requests it, and we have a feature in the pipeline to detect significant changes automatically and re-narrate.

[-][anonymous]2y30

This is really great, ty c:
Will it eventually be expanded to earlier posts?

[-]peter_hartree2y30

Yep, if the pilot goes well then I imagine we'll do all the >100 karma posts, or something like that.

We'll add narrations for all >100 karma posts on the EA Forum later this month.

[-]Yoav Ravid2y30

How much would it cost to narrate all the posts on Lesswrong? Or above various karma cutoffs? Cause there's a lot of good posts under 100 karma (including many from the sequences), so I wonder what's the tradeoff.

[-]Solenoid_Entity2y10

It's unlikely we'll ever actually GENERATE narrations for every post on LessWrong (distribution of listening time would be extremely long-tailed), but it's plausible if the service continues that we'll be able to enable the player on all LW posts above a certain Karma threshold, as well as certain important sequences.
If you have specific sequences or posts in mind, feel free to send them to us to be added to our list!

[-]Yoav Ravid2y20

Perhaps instead of, or in addition to, using a karma cutoff, it could be request based? So you'd have that Icon on all posts, and if someone clicks it on an old article that doesn't yet have a narration it will ask them whether they want it to be narrated.

[-]MondSemmel2y20

I forgot to say this in my previous comment, but nowadays I prefer to listen to nonfiction articles (via TTS) rather than reading them. So I listen to a ton of TTS stuff and thus very much appreciate any work that makes the experience of listening to TTS easier or higher quality.

[-]Misaligned-Semi-intelligence2y20

This is really great. As someone with pretty bad uncorrectable and constantly declining vision, a lot of my "reading" is listening. Lately I've often been thinking "Why can't I easily listen to everything I find on the internet yet?". When I tried to just use an existing service to convert things myself, I ran into a lot of the problems that the improvements listed here seem to solve.

[-]MondSemmel2y30

I've also looked into TTS recently, and discovered that the Microsoft Edge browser has decent TTS built into both the web and mobile browsers. It's not perfect by any means, but I found it surprisingly good, especially for a free feature. I guess it's not surprising that Microsoft's offering here is good, given that tons of other TTS services use Microsoft Azure's TTS.

[-]Solenoid_Entity2y20

This is great to hear, and please feel free to contact us with any other features or improvements you'd find helpful :)

[-]Yoav Ravid2y20

It seems to act funny when there's a code block in the post. See GPT-2's positional embedding matrix is a helix for example

[-]peter_hartree2y10

Thanks for the heads up. Each of those code blocks is being treated separately, so the placeholder is repeated several times. We'll release a fix for this next week.

Usually the text inside codeblocks is not suitable for narration. This is a case where ideally we would narrate them. We'll have a think about ways to detect this.

[-]Askwho2y10

If you are looking for any support on a more naturalistic version of those posts that make it to the podcast feed, I have been experimenting heavily and producing an ElevenLabs AI podcast of Yudkowsky's latest fiction. I have an API pipeline already built, and could assist in producing a test episode or two.

[-]Waldvogel2y11

Great new feature. Thank you! I will probably make use of this over the next few weeks.

But I did get a laugh out of "Specialist terminology, acronyms and idioms are handled gracefully" immediately being followed by a mispronunciation of "latex."

[-]Solenoid_Entity2y10

Ha, oops! Yeah, there's a lot of specialist terminology, we find feedback like this really helpful as often we're able to quickly fix this.

[-]PotteryBarn2y10

Maybe somewhat unrelated, but does anyone know if there's been an effort to narrate HP:MoR using AI? I have several friends that I think could really stand to enjoy it, but who can't get past the current audiobook narration. I mostly agree with them, although it's better on 1.5x.

[-]Raphaël2y10

HPMOR is ~4.4 million characters, which would cost around $800–$1000 to narrate with ElevenLabs being conservative.

[-]Solenoid_Entity2y31

You'd probably want to factor in some time for making basic corrections to pronunciation, too.
ElevenLabs is pretty awesome but in my experience can be a little unpredictable with specialist terminology, of which HPMOR has... a lot.
It wouldn't be crazy to do an ElevenLabs version of it with multiple voices etc., but you're looking at significant human time to get that all right.

[+][comment deleted]2y10

LESSWRONG
LW

LESSWRONG
LW

71

Announcement: AI Narrations Available for All New LessWrong Posts

71

71

How to Access

Send us your feedback.

Is this just text-to-speech on posts?