I lurk and tag stuff.


Sorted by New


Soares, Tallinn, and Yudkowsky discuss AGI cognition

Something similar came up in the post:

If it has some sensory dominion over the world, it can probably estimate a pretty high mainline probability of no humans booting up a competing superintelligence in the next day; to the extent that it lacks this surety, or that humans actually are going to boot a competing superintelligence soon, the probability of losing that way would dominate in its calculations over a small fraction of materially lost galaxies, and it would act sooner.

Though rereading it, it's not addressing your exact question.

Good Explanations (Advice)

Removed this from the page itself now that talk pages exist:

[pre-talk-page note] I think this should maybe be merged with Distillation and Pedagogy – Ray

AI Safety Needs Great Engineers

That 80k guide seems aimed at people who don't yet have any software engineering experience. I'm curious what you think the path is from "Average software engineer with 5+ years experience" to the kind of engineer you're looking for, since that's the point I'm starting from.

Cognitive Reframes

I don't quite understand what this tag is supposed to be about from the title, nor does the single example clarify sufficiently.

Maybe delete?

Two Stupid AI Alignment Ideas

Another potential problem with the first scenario: the AI is indifferent about every long-term consequence of its actions, not just how many paper clips it gets long-term. If it finds a plan that creates a small number of paperclips immediately but results in the universe being destroyed tomorrow, it takes it.

"Summarizing Books with Human Feedback" (recursive GPT-3)

In the Romeo and Juliet example, the final summary gets a key fact disastrously wrong: 

Romeo buys poison to kill Juliet at her grave.

(In the original he buys it to kill himself).

It looks like a single snippet of the original got blown up until it was 10% of the final summary, and the surrounding context was not sufficient to fix it.

Come, cordial and not poison, go with me 

To Juliet’s grave; for there must I use thee.

Attempted Gears Analysis of AGI Intervention Discussion With Eliezer

Low conversion rate to grabbiness is only needed in the model if you think there are non-grabby aliens nearby. High conversion rate is possible if the great filter is in our past and industrial civilizations are incredibly rare.

What is this tag for? I don't see how it applies to the one post tagged with it.

Feature Selection

The implied machine learning technique here seems to be a model which has been pretrained on the reward signal, and then during a single rollout it's given a many-shot prompt in an interactive way, with the reward signal on the early responses to the prompt used to shape the later responses to the prompt.

This means a single pretrained model can be finetuned on many different tasks with high sample efficiency and no additional gradient descent needed, as long the task is something within the capabilities of the model to understand.

(I'm assuming that this story represents a single rollout because the AI remembers what happened earlier, and that the pleasure/pain is part of its input rather than part of a training process because it is experienced as a stimulus rather than as mind control.)

Is this realistic for a future AI? Would adding this sort of online finetuning to a GPT improve its performance? I guess you can already do it in a hacky way by adding something like "Human: 'That's not what I meant at all!'" to the prompt.

Hegel vs. GPT-3

Load More