Wearable tech might disrupt language before vision

Sheikh Abdur Raheem Ali

Wearable tech might disrupt language before vision

2 min read9th Jan 2023No comments

8

Peak rate of speech is ~220 WPM. A 64-core TPUv4 can decode 1920 context tokens + 64 prompt tokens, then generate 64 response tokens, all within 1.9s for a 540B param model (About ~1516 WPM?).

I think this will be possible soon:

Wear a directional microphone during a conversation
Filter out noise, transcribe speech into text with Whisper
Predict completion of text based on context "chunks"
Compress essential details into key points
Use key points and context to infer suggested reply
Wait till you reach a certain confidence threshold (or trigger signal)
Play reply (or personalized summary) into earphones at 2-3x speed

Short term memory is ~18s without rehearsal so the reply would need to fit into that length. Baseline memory capacity is 4 1 items so it's easy to lose track of things. Recording and replaying voice notes is possible but is generally only feasible for post-discussion review.

With low-latency transformer inference, you can organize/predict/compress/synthesize information while talking to someone, which augments human intelligence. The value is not in the transcription that can be saved for later, but in the immediate processing into a simpler representation for live access.

Auditory signals reach central processing systems within 8-10ms, but visual stimulus can take around 20-40ms. We'd likely see latency improvements of 1-2 orders of magnitude before widespread adoption, even if it requires distillation into smaller models. Real-time translation has been a thing for a while, but LLM capabilities are now broader.

Baddeley's model of working memory is split into the phonological loop and the visuo-spatial sketchpad. Later additions also add the central executive and episodic buffer. Most people talking about VR think about vision, but it seems plausible that wearable tech might disrupt language first.

What does this look like?

Not sure yet. Some possibilities:

AirPods become AIPods.
Less notetaking in schools.
Initial outrage/controversy, concerns around free will, control, atrophy.
Eventually considered as benign as using glasses to correct myopia.
Reduced social stigma for temp. audio recordings, modulo wiretapping laws.
We say "Sorry, wasn't paying attention, could you repeat what you said?" less often.
A few people use this to improve their effective IQ/communication skills.
Some people parrot what the AI says verbatim, even when it's obviously wrong.

8

New Comment

Moderation Log

LESSWRONG
LW

Wearable tech might disrupt language before vision

8

What does this look like?

See Also

8