Rene de Visser — LessWrong

A simple case for extreme inner misalignment

Wouldn't a goal be mainly "past" data though? Though I guess the application of the goal depends on recognizing features when trying to apply it. I guess it depends how far out of context/distribution one is trying to apply the goal in the future.

A simple case for extreme inner misalignment

Rene de Visser1y50

Surely the point of compression is that what you are compressing is preserved. i.e. the uncompressed version is roughly reproduced. Better compression means you preserve the important aspects while using less space.

Shouldn't the goal be preserved by the compression? I don't get this post at all.

Why Q*, if real, might be a game changer

Rene de Visser2y*10

I wonder if giving lower rewards for correctly guessing common tokens, and higher rewards for correctly guessing uncommon tokens would improve models? I don't think I've seen anyone trying this.

Found: https://ar5iv.labs.arxiv.org/html/1902.09191 - Improving Neural Response Diversity with Frequency-Aware Cross-Entropy Loss .

ProjectLawful.com: Eliezer's latest story, past 1M words

Rene de Visser4y30

I'd also like an EPUB version that is stripped as possible. I guess it might be necessary to prepend the characters name to know who is saying what, but I find the rest very distracting. I find it makes it hard to read.

We're already in AI takeoff

Rene de Visser4y30

I was thinking specifically here of maximizing the value function (desires) across the agents interacting with other. Or more specially adapting the system in a way that it self maintains "maximizing the value function (desires) across the agents" property.

An example is an ecomonic system which seeks to maximize the total wealthfare. Current systems though don't maintain themselves. More powerful agents take over the control mechanisms (or adjust the market rules) so that they are favoured (lobbying, cheating, ignoring the rules, mitageting enforcement). Similar problems occur in other types of coallitions.

Postulating a more powerful agent that forces this maximization property (an aligned super AGI) is cheating unless you can describe how this agent works and self maintains itself and this goal.

However coming to a solution of a system of agents that self maintains this property with no "super agent" might lead to solutions for AGI alignment, or might prevent the creation of such a misaligned agent.

I read a while ago the design/theoritics of corruption resistent systems is an area that has not received much research.

We're already in AI takeoff

Rene de Visser4y10

How do you know they don't generalize? As far as I know, no one has solved these problems for coallitions of agents, regardless of human, theoritical or otherwise.

We're already in AI takeoff

Rene de Visser4y10

What do you mean by "technical" here?

I think solving the alignment problem for government, corporations, and other coallitions would probably help solving the alignment problem in AGI.

I guess you are saying that even if we could solve the above alignment problems it would still not go all the way to solving it for AGI? What particular gaps are you thinking of?

Recognizing and Dealing with Negative Automatic Thoughts

Rene de Visser4y30

Yes, the postive reframing step from TEAM (version of CBT) / Feeling great by Dr David Burns is missing from the above, as is the "Magic Dial" step.

A bit odd, as I would have guessed that the above lists or taken directly from "Feeling Great", or from his web site.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments