Awesome! My dyslexic friend may finally get to listen to my writing :)
A few suggestions for improvement:
Feedback on this specific audio narration, and the feature in general: (I've also submitted this via the Feedback button.)
Thanks for the feedback!
The audio reflecting updates to the text is relatively easily fixed, and that feature is in the pipeline (though for now user reports are helpful for this.)
There's some fairly complex logic we use for lists — trying to prevent having too many repetitive audio notes, but also keeping those notes when they're helpful. We're still experimenting with it, so thanks for pointing out those formatting issues!
I really like the way it handles headlines and bullet point lists!
In an ideal world I'd like the voice to sound less robotic. Something like https://elevenlabs.io/ or https://www.descript.com/overdub. How much I enjoy listening to text-to-speech content depends a lot on how grating I find the voice after long periods of listening.
Thanks! We're currently using Azure TTS. Our plan is to review every couple months and update to use better voices when they become available on Azure or elsewhere. Elevenlabs is a good candidate but unfortunately they're ~10x more expensive per hour of narration than Azure ($10 vs $1).
I think the cost per million words measure from the previous version of your comment was also useful to know. Did you replace it because it's incorrect?
I replaced it because it seemed like a less useful format.
1 million characters is roughly 200,000 words.
One hour of audio is roughly 9000 words.
Yep, if the pilot goes well then I imagine we'll do all the >100 karma posts, or something like that.
We'll add narrations for all >100 karma posts on the EA Forum later this month.
How much would it cost to narrate all the posts on Lesswrong? Or above various karma cutoffs? Cause there's a lot of good posts under 100 karma (including many from the sequences), so I wonder what's the tradeoff.
It's unlikely we'll ever actually GENERATE narrations for every post on LessWrong (distribution of listening time would be extremely long-tailed), but it's plausible if the service continues that we'll be able to enable the player on all LW posts above a certain Karma threshold, as well as certain important sequences.
If you have specific sequences or posts in mind, feel free to send them to us to be added to our list!
Perhaps instead of, or in addition to, using a karma cutoff, it could be request based? So you'd have that Icon on all posts, and if someone clicks it on an old article that doesn't yet have a narration it will ask them whether they want it to be narrated.
I forgot to say this in my previous comment, but nowadays I prefer to listen to nonfiction articles (via TTS) rather than reading them. So I listen to a ton of TTS stuff and thus very much appreciate any work that makes the experience of listening to TTS easier or higher quality.
This is really great. As someone with pretty bad uncorrectable and constantly declining vision, a lot of my "reading" is listening. Lately I've often been thinking "Why can't I easily listen to everything I find on the internet yet?". When I tried to just use an existing service to convert things myself, I ran into a lot of the problems that the improvements listed here seem to solve.
I've also looked into TTS recently, and discovered that the Microsoft Edge browser has decent TTS built into both the web and mobile browsers. It's not perfect by any means, but I found it surprisingly good, especially for a free feature. I guess it's not surprising that Microsoft's offering here is good, given that tons of other TTS services use Microsoft Azure's TTS.
This is great to hear, and please feel free to contact us with any other features or improvements you'd find helpful :)
It seems to act funny when there's a code block in the post. See GPT-2's positional embedding matrix is a helix for example
Thanks for the heads up. Each of those code blocks is being treated separately, so the placeholder is repeated several times. We'll release a fix for this next week.
Usually the text inside codeblocks is not suitable for narration. This is a case where ideally we would narrate them. We'll have a think about ways to detect this.
If you are looking for any support on a more naturalistic version of those posts that make it to the podcast feed, I have been experimenting heavily and producing an ElevenLabs AI podcast of Yudkowsky's latest fiction. I have an API pipeline already built, and could assist in producing a test episode or two.
Great new feature. Thank you! I will probably make use of this over the next few weeks.
But I did get a laugh out of "Specialist terminology, acronyms and idioms are handled gracefully" immediately being followed by a mispronunciation of "latex."
Ha, oops! Yeah, there's a lot of specialist terminology, we find feedback like this really helpful as often we're able to quickly fix this.
Maybe somewhat unrelated, but does anyone know if there's been an effort to narrate HP:MoR using AI? I have several friends that I think could really stand to enjoy it, but who can't get past the current audiobook narration. I mostly agree with them, although it's better on 1.5x.
HPMOR is ~4.4 million characters, which would cost around $800–$1000 to narrate with ElevenLabs being conservative.
You'd probably want to factor in some time for making basic corrections to pronunciation, too.
ElevenLabs is pretty awesome but in my experience can be a little unpredictable with specialist terminology, of which HPMOR has... a lot.
It wouldn't be crazy to do an ElevenLabs version of it with multiple voices etc., but you're looking at significant human time to get that all right.
TYPE III AUDIO is running an experiment with the LessWrong team to provide automatic AI narrations on all new posts. All new LessWrong posts will be available as AI narrations (for the next few weeks).
You might have noticed the same feature recently on the EA Forum, where it is now an ongoing feature. Users there have provided excellent feedback and suggestions so far, and your feedback on this pilot will allow further improvements.
How to Access
On Post Pages
Click the speaker icon to listen to the AI narration:
Podcast Feeds
Perrin Walker (AKA Solenoid Entity) of TYPE III AUDIO will continue narrating most curated posts for now.
Send us your feedback.
Please send us your feedback! This is an experiment, and the software is improved and updated daily based on user feedback.
You could share what you find most useful, what's annoying, bugged or difficult to understand, how this compares to human narration, and what additional features you'd like to see.
Is this just text-to-speech on posts?
It's an improvement on that.
We spoke with the Nonlinear Library team about their listeners' most-requested upgrades, and we hope our AI narrations will be clearer and more engaging than unimproved TTS. Some specific improvements:
We'd like to thank Kat Woods and the team at Nonlinear Library for their work, and for giving us helpful advice on this project.