The Next ChatGPT Moment: AI Avatars

southpaw

[-][anonymous]2y82

People used to imagine the internet working like a 3d game, with stores and online avatars for other customers at the store. This turned out to be not useful, the additional information isn't helping the user.

Mobile apps used to be more like the PC desktop apps they came from, where gradually unnecessary elements have been hidden through flat UI design.

While I also kinda imagine an ai collaborating with a human with a little avatar that emotes, jumps around and points to things, looks distressed when there is no network connection...does this give the user true value?

Or will people find it annoying and instead we end up with "flat", where chatbot outputs become terse and labeled by model confidence or if a specific claim has been fact checked.

[-]roha2y30

For collaboration on job-like tasks that assumption might hold. For companionship and playful interactions I think the visual domain, possibly in VR/AR, will be found to be relevant and kept. Given our psychological priors, I also think for many people it may feel like a qualitative change in what kind of entity we are interacting with - from lifeless machine, over uncanny human imitation, to believable personality on another substrate.

[-]kolmplex2y20

Yeah, I also doubt that it will be the primary way of using AI. I'm just saying that AI avatar tech could exist soon and that it will change how the public views AI.

ChatGPT itself is in a bit of a similar situation. It changed the way many people think of AI, even for those who don't find it particularly useful.

[-][anonymous]2y0-2

Absolutely. I kinda imagine Microsofts Cortana putting her ghostly fingers through foreground apps in windows, especially native Microsoft apps, to try to help the user out. She would seem to be actually physically helping you and/or actually existing in your computers desktop.

But it's all vestigial and extra pixel rendering that isn't helping the user accomplish anything. Even the concept of gender for the ai or a voice is vestigial.

[-]Stuart Johnson2y10

My bet is that conversational agents get buy-in in the early days because of Skeuomorphism, but eventually are phased out in favour of more efficient interaction styles.

[-]RogerDearnaley2y10

Within the next 1-3 years, many people will have an interaction with an AI avatar that feels authentically human.

If you go look at digi.ai's website for their plans, they basically want to tick all the boxes on this, in a usecase where it matters and will make money, and already put out a render of what they want tio to look like. So I'd guess closer to 1 than 3 years.

[-]kolmplex2y10

Looks like they are focusing on animated avatars. I expect the realtime photorealistic video to be the main bottleneck, so I agree that removing that requirement will probably speed things up.

[-]RogerDearnaley2y20

Yes, they're going with a cute Pixar-like style (I gather they hired an ex-Pixar animator). Anime would likely also work for something like this. Both of those might reduce the psychological impact a little by adding an air of unreality, though I suspect a sufficiently interactive conversation would still have a good deal of impact.

[-]roha2y10

Empirical data point: In my experience, talking to Inflection's Pi on the phone covers the low latency integration of "AI is capable of holding a conversation over text, transcribing speech to text, and synthesizing natural-sounding speech" sufficiently well to pass some bar of "feels authentically human" to me until you try to test its limits. I imagine that subjective experience to be more likely to appear if you don't have background knowledge about LLMs / DL. Its main problems are 1) keeping track of context in plausibly human-like way (e.g. playing a game of guessing capital cities of European countries leads to repetitive questions about the same few countries even if asked to take care in various ways) and 2) inconsistent rejection of talking about certain things depending on previous text (e.g. retelling dark jokes by real comedians).

I share your expectation that adding photorealistic video generation to it can plausibly lead to another "cultural moment", though it might depend on whether such avatars find similarly rapid adoption as ChatGPT or whether it's phased in more gradually. (I've no overview of the entire space and stumbled over Inflection's product by chance after a random podcast listening. If there are similar ones out there already I'd love to know.)

edit: Corrected link formatting.

[-]MattJ2y-10

I guess it could be a great tool to help people quickly learn to converse in a foreign language.

^{^}

Speech-to-text is good enough (OpenAI Whisper), text-to-speech is nearly good enough (ElevenLabs), and conversation / language modeling is good enough (ChatGPT with a Character.ai-style personality). All this currently suffices for realistic audio conversation with an AI. Human video generation isn't quite good enough yet, but it's making progress (Audio to Photoreal, HeyGen, Metahuman). Based on the current rate of progress, a functional AI avatar seems attainable within 1-3 years.

^{^}

Latency might be a problem in the near-term. In particular, it's unclear how fast the video generation will be.

^{^}

This is already happening to a limited extent. Many people have formed significant emotional attachments through text-only interactions with relatively weak language models (e.g. Character.ai and Replika).

^{^}

The shift could be more gradual than ChatGPT’s, though. AI avatar tech is improving gradually whereas ChatGPT was dropped sui generis on the world.

^{^}

The Google Trends chart for "AI". ChatGPT came out on November 30, 2022.

LESSWRONG
LW

LESSWRONG
LW

43

The Next ChatGPT Moment: AI Avatars

43

43