Wiki Contributions


Run them on examples such as frown-with-red-bar and smile-with-blue-bar.

That sounds like a black-box approach. 

Which problems are you thinking of?

Human's not knowing what goals we want AI to have and the riggability of the reward learning process. Which you stated were problems for CIRL in 2020.

This can be extended to arbitrarily many agents. Moreso, the valuable insight here is that cooperation is achieved when the evidence that the group cooperates exceeds each and every member's individual threshold for cooperation. A formalism of the intuitive strategy 'I will only cooperate if there are no defectors' (or perhaps 'we will only cooperate if there are no defectors').

You should include the highlighted insight in your summary. Also, why does your setup not lead to inconsistencies when Abram Demsi isn't sure his setup does? Is it just that you don't have ", then   "?

Goodhard problem.


One important feature of ACE is that it can overcome simplicity bias - even quite strong simplicity bias. In the following example, the labelled data consisted of smiling faces with a red bar under them, and frowning faces with a blue bar under them.

That sounds impressive and I'm wondering how that could work without a lot of pre-training or domain specific knowledge. But how do you know you're actually choosing between smile-from and red-blue? 

Also, this method seems superficially related to CIRL. How does it avoid the associated problems? 

General relativity for babies is a classic: 

I'm definitely going to vary the pause duration. 

But I'm curious how you practiced this skill. Are you one of those incomprehensible beings that can just set a trigger action plan for something like that? I've always struggled to create TAPs as sophisticated as that, so I'd be curious if you had some other method. LIke, going on Omegle (RIP) with a checklist?

I just tried asking GPT-4 to respond to me, placing a question near the end of its response, and to roll a die at the end. If the die has its maximal value, I need to spend a minute thinking before I respond, else I can just respond normally. And it worked. GPT-4 asked good enough questions, which lead to interesting enough places, that I'd recommend other people try this. Though please ask for, like, a three sided or even two-sided dice as a ten-sided dice is too much. 

I chose 1-minute to make the feedback loops faster, and also because I think I understand why you recommended 5 minutes first and knew I didn't need to start at that value. For people whose minds are resistant to thinking at all on the spot, going blank or wandering off topic, 5 minutes gives them enough time to have a minor flash of inisght relevant to whatever they're meant to think of. Then they can start improving that over time until they can think for several minutes if forced to. I've already got that skill, I just haven't made it habitual. So that's why I went with 1 minute. 

But you can also make feedback loops tighter by asking GPT-4 to give short responses, and sticking to short questions yourself so you can simulate answering questions under pressure and resisting that pressure when needed. 

Another way to improve things would be to notice what questions are important yourself, but being able to notice that you can just think about a question at all, and practicing that motion of thinking, is a useful subskill of its own. Then you can practice noticing when a question is worth seriously thiking about. Also, GPT-4 just asked a lot of questions worth thinking about, so I don't think you need to worry about it never asking you questions worth thinking of. 

In fact, you could ask it to just throw you some softball questions and that may well work. 

Right. Are you saying Grok may be impressive because of the sheer amount of resources being funnelled into it?

Isn't that effectively the same thing as using substantially less compute?

I find it very interesting that they managed to beat GPT-3.5 with only 2 months of training! This makes me think xAI might become a major player in AGI development.

Did they do it using substantially less compute as well or something? Because otherwise, I don't see what is that impressive about this. 

Load More