For me, the physical act of scanning words takes active focus compared to the analogue in listening, which is automatic. (I don't think my comprehension or engagement is lower when listening, to be clear).

I've tried the naturalreaders.com 'pro' version, but experienced a few issues:

  • It separates the text into small chunks, and pauses between each one. it aims to separate them by sentences, but sometimes separates them by comma, and rarely after a lone word (both of which make my intuition assume a new sentence has started).
  • The voice doesn't seem to know what it's reading, fails to emphasize what should be, etc. It's nowhere near the quality of Solenoid Entity's readings of the sequences, for example.

As a result, I think my brain doesn't register this AI-read text as 'something to listen to,' so it takes some active focus to continue listening, and eventually my focus shifts to something else while the audio keeps playing in the background. This does not happen with human-read text.

Anyone who can help me with this might have a high potential impact, since I'd be listening to text for a large portion of my day and am trying to use myself to do everything I can to help with alignment.

New Answer
New Comment

3 Answers sorted by

Garrett Baker

Jun 11, 2023

30

I like Voice Dream Reader. I don't know how the voice compares to Natural Reader, but it does emphasize words and pronounce things differently based on context-cues. But those context cues are like periods and commas and stuff.

I find I stay approximately as engaged when listening to Voice Dream Reader when compared to an audiobook or someone reading stuff, but this could be an effect of having listened to several days worth of content via it.

Nihal M

Jun 11, 2023

30

have you tried https://play.ht/

[-][anonymous]11mo10

I'm really liking this so far :) (using "larry - narrative")

Elizabeth

Jun 11, 2023

30

Double checking you used "plus" voices and not just "premium" on Natural Reader? Plus still has issues but is much better than premium. 

[-][anonymous]11mo10

Thanks for the reply. I did use "plus." I also tried the "commercial" preview, and it's a bit better, I may end up compromising with it if I can't find a better solution.

2 comments, sorted by Click to highlight new comments since: Today at 3:06 AM

Do you happen to have some samples handy of types of text you are typically reading? At least a few pages from a few different sources. Try to find some representative samples spectrum  of the content you read.

I may be able set you up with an open source solution using Bark Audio, but it's impossible to know without poking at the Bark model and seeing if I can find a spot it works in and you start get samples that really sound like it understands.  (For example if you use an English Bark voice with a foreign text prompt, even though the Bark TTS model knows the language, the English voice won't be able to speak it, or will have a horrific accent. Because Bark is kind of sort of modeling 'person-asked-to-speak-language-they-don't-know' in a way. Sort of like how GPT might do that if you changed language mid conversation. Well pre RLHF GPT.)

I don't want to make any promises, I have terrible focus, I don't frequent this site often, I give a 50% chance that I forget about this comment entirely until I suddenly remember I posted this in three months from now. Also while the Bark voices are wonderful (they sound like they understand what the are saying) the Bark audio quality (distortion, static) is not. You can stack another model on top to fix but it is annoying.

BUT it just so happens that the most recent source of my lack of focus, to some degree, has been poking at TTS stuff just for fun. Pure amateur hour over here. But the new models are so good they make a lot of stuff easy. And I just happened to see this comment after not visiting this site for weeks. 

The https://play.ht/ best voices are maybe comparable though if you just want a quick solution. I do actually prefer Bark, if you can ignore the audio quality, but it's super unreliable and fiddly.

[-][anonymous]11mo10

Thanks for the offer!

I'm trying to read through a lot of LW and astral codex posts right now. Here are some samples:
https://slatestarcodex.com/2014/12/17/the-toxoplasma-of-rage/
https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators
https://astralcodexten.substack.com/p/janus-simulators 
https://www.lesswrong.com/posts/uyBeAN5jPEATMqKkX/lies-told-to-children-1
https://carado.moe/values-complex-not-objective.html

(if you meant audio as well, then for example, the sequences, LW curated podcast, and astral codex ten podcast all have lots of audio of associated text)

I think I'd be able to ignore things like static. I've listened to some decades-old recordings before with no problem.

If you think you'll forget to check this site, we could continue on a platform you use more often. My email is kuiranya (at) proton.me, I could give you my discord (for example) from there. 

I'm looking into https://play.ht/ as well :)