LESSWRONG
LW

Marius Adrian Nicoară
88170
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Alignment Research Field Guide
Marius Adrian Nicoară1moΩ010

It's not quite clear to me what a MIRIx chapter means. 
The only thing I can connect it to is TEDx, which are events that are planned and coordinated independently, on a community-by-community basis, under a free license from TED.
Is a MIRIx chapter similar to a TEDx event?

Reply
Absolute Zero: Alpha Zero for LLM
Marius Adrian Nicoară2mo10

"If the human wants coffee, we want the AI to get the human a coffee. We don't want the AI to get itself a coffee." 
It's not clear to me that this is the only possible outcome. It's not a mistake that we humas do routinely. In fact, there is some evidence that if someone asks us to do them a favor, we might end up liking them more and continue to do more favors for that person. Granted, there seem to have been no large-scale studies analyzing this so called Ben Franklin effect. Even if this effect does turn out to be more robust, it's not clear to me how this could transfer to an AI. And then there's the issue of making sure the AI won't somehow get rid of this constraint that we imposed on it.

"The problem is that we don't know what we want the AI to do, certainly not with enough precision to turn it into code." 
I agree; that's backed up by the findings from the Moral Machine experiment about what we think autonomous cars should do.
 

Reply
PSA: The LessWrong Feedback Service
Marius Adrian Nicoară2mo00

I would be great to have automated feedback on the epistemics of a piece of text. An LLM that can read text and identify reasoning errors or add appropriate qualifiers. As a browser plugin, it would also be helpful when reading news articles. Perhaps it can be done by using the Constitutional AI methodology and using Rationality: From A-Z(or something similar) as the constitution.

Reply
Absolute Zero: Alpha Zero for LLM
Marius Adrian Nicoară2mo10

I only skimmed a little through the post I'm linking to, but I'm curios if the method of self-other-overlap could help "keep AI meta-ethical evolution grounded to human preferences":

https://www.lesswrong.com/posts/jtqcsARGtmgogdcLT/reducing-llm-deception-at-scale-with-self-other-overlap-fine

My own high-level, vaguely defined guess of a method would be something that is central to the functioning of the AI such that if the AI goes against it, then the AI will not be able to make sense of the world. But that seems to carry the risk of the AI just messing everything up as it goes crazy. So the method should also include a way of limiting the capabilities of the AI while it's in that confused state.

Reply
Absolute Zero: Alpha Zero for LLM
Marius Adrian Nicoară2mo10

"it's the distinction between learning from human data versus learning from a reward signal." That's an interesting distinction. The difference I currently see between the two is that currently a reward signal can be hacked by the AI, while human data cannot. Is that an accurate thing to say? 

Are there any resources you could recommend for alignment methods that take into account the distinction you mentioned?

Reply
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Marius Adrian Nicoară2mo10

I think editing should be possible. Not sure about deleting it entirely.

Reply
Absolute Zero: Alpha Zero for LLM
Marius Adrian Nicoară2mo30

I think that from an AI Alignment perspective,  giving AI so much control over its training seems to be very problematic. What we are mostly left with is to control the interface that AI has to physical reality i.e. sensors and actuators.

For now, it seems to me that AI is mostly affecting the virtual world. I think the moment when AI can competently and more directly influence physical reality would be a tipping point, because then it can cause a lot more changes to the world. 

I would say that the ability to do continuous learning is required to adapt well to the complexity of physical reality. So a big improvement in continuous learning might be an important next goalpost to watch for.

Reply
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Marius Adrian Nicoară2mo10

Yes, and there's an interesting question in that post and and interesting answer in the comments there. Would be great to have everything in one place. @Matrice Jacobine and @alapmi, maybe you could try to come to some sort of an agreement?

Reply
Ethical Deception: Should AI Ever Lie?
Marius Adrian Nicoară10mo*10

This seems like an introduction to the topic. enough to get the curiosity boiling.

Reply
AI Will Not Want to Self-Improve
Marius Adrian Nicoară1y10

Looking at the convergent instrumental goals

  • self preservation
  • goal preservation
  • resource acquisition
  • self improvement

I think some are more important than others. 

There is the argument that in order to predict the actions of a superintelligent agent you need to be as intelligent as it is. It would follow that an AI might not be able to predict if its goal will be preserved or not by self improvement. 

But I think it can have high confidence that self improvement will help with self preservation and resource acquisition. And those gains will be helpful with any new goal it might decide to have. So self improvement would not seem to be such a bad idea.

Reply
Load More
2Satire: Sam Altman get's grilled by the Financial Times for his kitchen and his cooking skills + what this might say about him
2mo
0
1An artistic illustration of Scalable Oversight - "A world apart, neither gods nor mortals"
3mo
0
-3Hyppotherapy
11mo
0
3Some desirable properties of automated wisdom
1y
2
7What and how much makes a difference?
1y
0