Wiki Contributions


Thanks for writing this.

Overall, I don't like the post much under it's current form. There's ~0 evidence (e.g. from Chinese newspapers) and there is very little actual argumentation. I like that you give us a local view but putting a few links to back your claims would be very very appreciated. Right now it's hard to update on your post given that the claims are very empirical and without any external sources.

More minorly: "A domestic regulation framework for nuclear power is not a strong signal for a willingness to engage in nuclear arms reduction" I also disagree with this statement. I think it's definitely a signal.

@beren in this post, we find that our method (Causal Direction Extraction) allows to capture a lot of the gender difference with 2 dimensions in a linearly separable way. Skimming that post might of interest to you and your hypothesis. 

In the same post though, we suggest that it's unclear how much logit lens "works", to the extent that basically the direction encoding the best a same concept likely changes by a small angle at each layer, which causes two directions that best encode a concept at 15 layers of interval to have a cosine similarity <0.5.

But what seems plausible to me is that basically almost ~all of the information relevant to a feature are encoded in a very small amounts of directions, which are slightly different for each layer. 

I'd add that it's not an argument to make models agentic in the wild. It's just an argument to be already worried.

Thanks for writing that up Charbel & Gabin. Below are some elements I want to add.

In the last 2 months, I spent more than 20h with David talking and interacting with his ideas and plans, especially in technical contexts. 
As I spent more time with David, I got extremely impressed by the breadth and the depth of his knowledge. David has cached answers to a surprisingly high number of technically detailed questions on his agenda, which suggests that he has pre-computed a lot of things regarding his agenda (even though it sometimes look very weird on first sight). I noticed that I never met anyone as smart as him. 

Regarding his ability to devise a high level plan that works in practice, David has built a technically impressive crypto (today ranked 22nd) following a similar methodology, i.e. devising the plan from first principles. 

Finally, I'm excited by the fact that David seems to have a good ability to build ambitious coalitions with researchers, which is a great upside for governance and for such an ambitious proposal. Indeed, he has a strong track record of convincing researchers to work on his stuff after talking for a couple hours, because he often has very good ideas on their field.

These elements, combined with my increasing worry that scaling LLMs at breakneck speed is not far from certain to kill us, make me want to back heavily this proposal and pour a lot of resources into it. 

I'll thus personally dedicate in my own capacity an amount of time and resources to try to speed that up, in the hope (10-20%) that in a couple of years it could become a credible proposal as an alternative to scaled LLMs. 

I'll focus on 2 first given that it's the most important. 2. I would expect sim2real to not be too hard for foundations models because they're trained over massive distributions which allow and force to generalize to near neighbours. E.g. I think that it wouldn't be too hard for a LLMbto generalize some knowledge from stories to real life if it had an external memory for instance. I'm not certain but I feel like robotics is more sensitive to details than plans (which is why I'm mentioning a simulation here). Finally regarding long horizon I agree that it seems hard but I worry that at current capabilities level you can already build ~any reward model because LLMs, given enough inferences seem generally very capable atb evaluating stuff.

  1. I agree that it's not something which is very likely. But I disagree that "nobody would do that". People would do that if it were useful.

  2. I've asked some ML engineers and it happens that you don't look at it for a day. I don't think that deploying it in the real world changes much. Once again you're also assuming a pretty advanced formb of security mindset.

Yes, I definitely think that countries with strong deontologies will try to solve some narrow versions of alignment harder than those that tolerate failures. 

I think it's quite reassuring and means that it's quite reasonable to focus on the US quite a lot in our governance approaches.

I think that this is misleading to state it that way. There were definitely dinners and discussions with people around the creation of OpenAI. 
Months before the creation of OpenAI, there was a discussion including Chris Olah, Paul Christiano, and Dario Amodei on the starting of OpenAI: "Sam Altman sets up a dinner in Menlo Park, California to talk about starting an organization to do AI research. Attendees include Greg Brockman, Dario Amodei, Chris Olah, Paul Christiano, Ilya Sutskever, and Elon Musk."

Also, I think that it's fine to have less chances of being an excellent alignment research for that reason. What matters is having impact, not being an excellent alignment researcher. E.g. I don't go full-in a technical career myself essentially for that reason, combined with the fact that I have other features that might allow me to go further in the impact tail in other subareas that are relevant. 

If I try to think about someone's IQ (which I don't normally do, except for the sake of this message above where I tried to think about a specific number to make my claim precise) I feel like I can have an ordering where I'm not too uncertain on a scale that includes me, some common reference classes (e.g. the median student of school X has IQ Y), and a few people who did IQ tests around me. I'd by the way be happy to bet on anyone if someone accepted to reveal their IQ (e.g. from the list of SERI MATS's mentors) if you think my claim is wrong. 

Thanks for writing that. 

Three thoughts that come to mind: 

  • I feel like a more right claim is something like "beyond a certain IQ, we don't know what makes a good alignment researcher". Which I think is a substantially weaker claim than the one which is underlying your post. I also think that the fact that the probability of being a good alignment researcher increases with IQ is relevant if true (and I think it's very likely to be true, as for most sciences where Nobels are usually outliers along that axis). 
  • I also feel like I would expect predictors from other research fields to roughly apply (e.g. conscientiousness). 
  • In this post you don't cover what seems to be the most important part of why sometimes advice that are of the form "It seems like given features X and Y you're more likely to be able to fruitfully contribute to Z" (which seems to be adjacent to the claims you're criticizing), i.e. the opportunity cost of someone. 
Load More