Wiki Contributions

Comments

I'm glad to hear you got exposure to the Alignment field in SERI MATS! I still think that your writing reads off as though your ideas misunderstands core alignment problems, so my best feedback then is to share drafts/discuss your ideas with other familiar with the field. My guess is that it would be preferable for you to find people who are critical of your ideas and try to understand why, since it seems like they are representative of the kinds of people who are downvoting your posts.

Answer by starship006Nov 30, 202352

(preface: writing and communicating is hard and that i'm glad you are trying to improve)

i sampled two:

this post was hard to follow, and didn't seem to be very serious. it also reads off as unfamiliar with the basics of the AI Alignment problem (the proposed changes to gpt-4 don't concretely address many/any of the core Alignment concerns for reasons addressed by other commentors)

this post makes multiple (self-proclaimed controversial) claims that seem wrong or are not obvious, but doesn't try to justify them in-depth.

overall, i'm getting the impression that your ideas are 1) wrong and you haven't thought about them enough and/or 2) you arent communicating them well enough. i think the former is more likely, but it could also be some combination of the both. i think this means that:

  1. you should try to become more familiar with the alignment field, and common themes surrounding proposed alignment solutions and their pitfalls
  2. you should consider spending more time fleshing out your writing and/or getting more feedback (whether it be by talking to someone about your ideas, or sending out a draft idea for feedback)

Reverse engineering. Unclear if this is being pushed much anymore. 2022: Anthropic circuitsInterpretability In The WildGrokking mod arithmetic

 

FWIW, I was one of Neel's MATS 4.1 scholars and I would classify 3/4 of Neel's scholar's outputs as reverse engineering some component of LLMs (for completeness, this is the other one, which doesn't nicely fit as 'reverse engineering' imo). I would also say that this is still an active direction of research (lots of ground to cover with MLP neurons, polysemantic heads, and more)

Quick feedback since nobody else has commented - I'm all for the AI Safety appearing "not just a bunch of crazy lunatics, but an actually sensible, open and welcoming community." 

But the spirit behind this post feels like it is just throwing in the towel, and I very much disapprove of that. I think this is why I and others downvoted too

Ehh... feels like your base rate of 10% for LW users who are willing to pay for a subscription is too high, especially seeing how the 'free' version would still offer everything I (and presumably others) care about. Generalizing to other platforms, this feels closest to Twitter's situation with Twitter Blue, whose rates appear is far, far lower: if we be generous and say they have one million subscribers, then out of the 41.5 million monetizable daily active users they currently have, this would suggest a base rate of less than 3%.

Thanks for the writeup! 

Small nitpik: typo in "this indeed does not seem like an attitude that leads to go outcomes" 

I'm not sure if you've seen it or not, but here's a relevant clip where he mentions that they aren't training GPT-5. I don't quite know how to update from it. It doesn't seem likely that they paused from a desire to conduct more safety work, but I would also be surprised if somehow they are reaching some sort of performance limit from model size.

However, as Zvi mentions, Sam did say:

“I think we're at the end of the era where it's going to be these, like, giant, giant models...We'll make them better in other ways”

The increased public attention towards AI Safety risk is probably a good thing. But, when stuff like this is getting lumped in with the rest of AI Safety, it feels like the public-facing slow-down-AI movement is going to be a grab-bag of AI Safety, AI Ethics, and AI... privacy(?). As such, I'm afraid that the public discourse will devolve into "Woah-there-Slow-AI" and "GOGOGOGO" tribal warfare; from the track record of American politics, this seems likely - maybe even inevitable? 

More importantly, though, what I'm afraid of is that this will translate into adversarial relations between AI Capabilities organizations and AI Safety orgs (more generally, that capabilities teams will become less inclined to incorporate safety concerns in their products). 

I'm not actually in an AI organization, so if someone is in one and has thoughts on this dynamic happening/not happening, I would love to hear.

Sheesh. Wild conversation. While I felt Lex was often missing the points Eliezer was saying, I'm glad he gave him the space and time to speak. Unfortunately, it felt like the conversation would keep moving towards reaching a super critical important insight that Eliezer wanted Lex to understand, and then Lex would just change the topic onto something else, and then Eliezer just had to begin building towards a new insight. Regardless, I appreciate that Lex and Eliezer thoroughly engaged with each other; this will probably spark good dialogue and get more people interested in the field. I'm glad it happened. 

For those who are time constrained and wondering what is in it: Lex and Eliezer basically cover a whole bunch of high-level points related to AI not-kill-everyone-ism, delving into various thought experiments and concepts which formulate Eliezer's worldview. Nothing super novel that you probably haven't heard of if you've been following the field for some time. 

Load More