Wiki Contributions

Comments

Does the median LW commenter believe that autoregressive LLMs will take us all the way to superintelligence?

Super cool stuff. Minor question, what does "Fraction of MLP progress" mean? Are you scaling down the MLP output values that get added to the residual stream? Thanks!

FWIW I understand now what it's meant to do, but have very little idea how your protocol/proposal delivers positive outcomes in the world by emitting performative speech acts. I think explaining your internal reasoning/hypothesis for how emitting performative speech acts leads to powerful AI's delivering positive incomes would be helpful. 

Is such a "channel" necessary to deliver positive outcomes? Is it supposed to make it more likely that AI delivers positive outcomes? More details on what a success looks like to you here, etc.

I skimmed The Snuggle/Date/Slap Protocol and Ethicophysics II: Politics is the Mind-Savior which are two recent downvoted posts of yours. I think they get negative karma because they are difficult to understand and it's hard to tell what you're supposed to take away from it. They would probably be better received if the content were written such that it's easy to understand what your message is at an object-level as well as what the point of your post is. 

 

I read the Snuggle/Date/Slap Protocol and feel confused about what you're trying to accomplish (is it solving AI Alignment?) and how the method is supposed to accomplish that. 

In the ethicophysics posts, I understand the object level claims/material (like the homework/discussion questions) but fail to understand what the goal is. It seems like you are jumping to grounded mathematical theories for stuff like ethics/morality which immediately makes me feel dubious. It's a too much, too grand, too certain kind of reaction. Perhaps you're just spitballing/brainstorming some ideas, but that's not how it comes across and I infer you feel deeply assured that it's correct given statements like "It [your theory of ethics modeled on the laws of physics] therefore forms an ideal foundation for solving the AI safety problem."

 

I don't necessarily think you should change whatever you're doing BTW just pointing out some likely reactions/impressions driving negative karma.

This is terrific. One feature that will be great to have, is a way to sort and categorize your predictions under various labels.

Sexuality is, usually, a very strong drive which has a large influence over behaviour and long term goals. If we could create an alignment drive as strong in our AGI we would be in a good position.

I don't think we'd be in a good position even if we instilled an alignment drive this strong in AGI

To me, the caveats section of this post highlights the limited scope from which language models will be able to learn human values and preferences, given explicitly stated (And even implied-from-text) goals != human values as a whole. 

Hi Cameron, nice to see you here : ) what are your thoughts on a critique like: human prosocial behavior/values only look the way they look and hold stable within-lifetimes, insofar as we evolved in + live in a world where there are loads of other agents with roughly equal power as ourselves? Do you disagree with that belief? 

This was very insightful. It seems like a great thing to point to, for the many newish-to-alignment people ideating research agendas (like myself). Thanks for writing and posting!

This is a really cool idea and I'm glad you made the post! Here are a few comments/thoughts:

H1: "If you give a human absolute power, there is a small subset of humans that actually cares and will try to make everyone’s life better according to their own wishes"

How confident are you in this premise? Power and sense of values/incentives/preferences may not be orthogonal (and my intuition is that it isn't).  Also, I feel a little skeptical about the usefulness of thinking about the trait showing up more or less in various intelligence strata within humans. Seems like what we're worried about is in a different reference class. Not sure.

 

H4 is something I'm super interested in and would be happy to talk about it in conversations/calls if you want to : )

Load More