Sorted by New

Wiki Contributions


At its core, ethics really is an ongoing negotiation between which intuitions you want to call values, and which you want to dismiss as biases. 


Yep. See also the related Fun Criterion which I think sounds related to Reflective Equilibrium. 


it’s entirely possible that we will be left with as many Reflective Equilibria as individuals. 

I think not only should we expect to end with many reflective equilbria, one per individual, even for an individual, we dont expect they'd reach a reflective equilibrium, as they will keep evolving their moral self-understanding, at least for sufficiently reflective individuals. I think moral and scientific progress, e.g. as described in The beginning of Infinity describes more a "reflective dissequilibrium" but which is always flowing towards improvement.


However, upon further reflection, it becomes evident to you that your preference for your family over strangers is also influenced by evolution. Perhaps even here, you can decide to set this bias aside and treat all people equally. Good for you.

I think there's an interesting question of how many people would actually implement this change if we had brain editing.

Is it that our meta-morality (explicit-language based moral principles) doesnt yet have enough control over the rest of the brain, and so we post-rationalize our inability to implement such changes with "us not actually wanting them"? 

I dont know how, but I'm pretty sure brain editing tech would alter our reflective pseudo-equilibria


Remember the Reflective Equilibrium. It relied on you to simulate an internal monologue to bring your conflicting intuitions and moral principles into coherence.

Yesss, I want AI assistants that feel like talking to yourself. Right now chatGPT is not designed to work like this. David Holz (founder of Midjourney recently talked about how he doesn't like that chatGPT is designed as an assistant rather than an extension of your own mind. in one of his office hours)


 CEV seeks to align an AGI with an aggregate extrapolation of society’s idealized wishes if we were more rational, lived further together, were smarter, and in short, acted more like the people we wish we were. 

If CEV is what an idealized version of us would want, "if we knew more, thought faster, were more the people we wished we were, had grown up farther together", then we could approach the problem by making us know more, think faster, and be more the people we wished we were, and grow further together. Let's tackle these things that is seems we agree more on? than the harder problem of finding our true values.


For humans, those boundaries are corporal, and with less of civilization’s interactions defined by physical violence, more voluntary interactions are possible. 

Some people still abuse others through text media/the internet though? I guess you could say this is because there's some psychological boundary that can be broken in some cases, but yeah less commonly so it seems than for physical interactions. I guess we could generally define a "voluntary-request allowing-boundary" as "a boundary that only lets messages the the receiver would want to receive pass through", but this hits the wall of our own falability again, whereby sometimes we dont know whats best for us, but it seems like a good starting point/definition, and maybe we can design these boundaries starting from that.

There's however also a pull in a different direction. One of the appeals of VR is that it allows people to connect more closely (in a way because you are exponsing yourself more/reducing boundaries/being more vulnerable).

This leads me to ask? How does the benefits of merging minds fit into this framework?

It seems to me that depending on the people, the "healthy boundary" (lol this is sounding like relationship advice, but of course, that is very much related!), which is another name we could give to the "voluntary-request allowing-boundary", is quite different. I may open up a lot more to a partner or a friend than to a stranger.

It seems to me that the design of boundaries with AIs will follow a similar trend too, where we open up more to AIs we trust (plus some people are more willing to be open to messages overall than others, etc) But yeah hmm this seems to connect to many things about power dynamics and stuff.

Also, I think we need power/intelligence to not be too unequal if we want the game theory equilbria to favor cooperation/ethics~


Cases in which AI systems have been able to improve your normative understanding of yourself. Perhaps they recommend an action that, even though strange at first sight, was actually a good move upon reflection?

I am playing with a more personalized/inner-monologue assistant by augmenting GPT3 with retrieval over all my stream-of-consciousness notes over the last 3 years. I did observe this interesting effect where if I asked it moral questions, it would mix up stuff from my own messages, with stuff that it thought it was better. 
It would sometimes also "tell me off" if I commented stuff that it thought was immoral, and it did make me go internally like "ok you are right, I was letting myself too lose". But the cool thing is that this didn't feel bad/impossing/opressive, because it was all within voluntary boundaries. This is a system that atm is feeling like part of me, in part because it knows so much about me, but also because of the UX: I try having it reply as I would talk to myself, rather than an external assistant assiting me -- under this light your AI debate system would be like Minsky's society of mind x3

A friend also had a similar experience where chatGPT convinced me that asking LMs about certain immoral stuff was itself immoral.

one can also start to realize, as in the example of hofstadter, or of teaching a rock, that the foundamental believes are just the automatic dynamics of thoughts, so believeing something completely different would just be changing those dynamics, which would be equivalent changing the "meaning" of things, and so i dont think there's any sense in which "2+2=3" could be true that doesnt involve redefining things

But how does not this story about 2+2=3 apply too to the belief in god for example? If you are raised in the right circumstances, you will end up with this belief you think its unconditional, even though it was conditonal on your circumstances. Arent ultimately all believes entangled with reality by virtue of believes being encoded in the brain which is a physical system entangled with reality? to not fall in a fallacy of gray, we can conceede that some ways of entanglement are better than others, in that they lead to mora accurate believes. Hmmm

In any case, in your story of 2+2=3, you have just learned a new definition of the symbols, not that 2+2=3 by our symbols? that would seem to require learning belief in different logical axioms? Would you agree with that?

So if you cannot conceive of a situation where 2+2 actually equalled 3, then you cant conceive of a situation where you would learn the believe 2+2=3, with our semantics, using a correct believe learning method?

It seems to me that the fundamental logical truths are only self consistent, i.e. only facts about our brains, which we use to then fit models to the world. There's a lot of math that isnt used for that and doesn't fit the world, but still follows self consistent rules. These believes dont seem to need to be entangled to reality except in the trivial sense of being part of reality. They are like the building blocks of our minds, that when analyzed deeply just say stuff about our own minds, and so dont need external empirical test.

It's like, to even begin making models of the world, you need something, some mental substance to shape. That's there before any evidence. Thats logic, and I guess a Bayesian prior also satisfies this.

I think this may be hinting too at the idea that there may be many mental structures that are equally good maps of reality*, and so believes about which of these to use cant really be about empirical evidence in the usual sense.

*in the sense or predictive power. If we weigh them by other metrics, then maybe theres other types of "evidence" that can select for one or the other, e.g. social acceptance/utility.

"f they say that they'd be emotionally disturbed by knowing, specify that they won't know about the torture."

Couldn't one argue that having preferences about things u assume u dont know, wouldn't affect your actions?

When I'm deciding on an actual action, I can only take into account things I know, and nothing else?

So the preference of the case where I would never know about the person being tortured couldn't affect my actions, so in that sense doesn't matter?

I'd like to add some points to this interesting discussion:

As far as I understand, feature learning is not necessary for some standard types of transfer learning. E.g.: one can train an NNGP on a large dataset, and then use the learned posterior as prior for "fine-tuning" on some new dataset. This is hard to scale using actual GP techniques, but if wide neural nets (with random sampling or SGD) do approximate NNGPs, this could be a way they achieve transfer learning without feature learning.

You say

In contrast, in the case of SGD, it's possible to do feature learning even in the infinite-width limit

That is true, but one of the points in Greg Yang's paper, as far as I remember, was also to say that people weren't using the scaling limit that would lead to that. That has made me wonder whether feature learning may be happening in our biggests models or not. The work on multimodal neurons in CLIP suggests there is feature learning. But what about GPT-3? In any case, I don't think it'll be happening by the mechanism Yang proposes as people aren't using his initialization scheme. Perhaps, then the mechanism by which finite randomly-sampled NNs could conceivably feature-learn, could be the same as the one SGD is using. I am not sure either way. For me to evaluate the empirical evidence better, I'd need a sense about whether the evidence we have is in sufficiently large models or not (as I do think that randomly-sampled NNs for infinite width won't do feature learning -- though I'm not sure how to prove that, without a better definition of feature learning).

Another point is in answer to your comment that NNGP often underpeforms NTK. I think there's actually more evidence on the contrary (see ), even if there're instancs of both ways.

Overall, I think the work in Jascha Sohl-Dickstein's groun (e.g. the paper linked above) has been great for disentangling these issues, and they seem to point at a complex/nuanced picture, which really leads me to believe we don't have a clear answer about whether NNGPs will be a good model of SGD in practice (as of today; practice may also change). However, my general observation is that I'm not aware of any evidence that shows that SGD-trained nets beat architecture-equivalent NNGPs by a significant margin, consistently over a wide range of tasks in practice. Chris' work on Bayesian picture of SGD tried to do this, but the problems are indeed, not quite large enough to be confident. In here we also explore NNGPs (but through a different lens), over SOTA architectures, but still small tasks. So I think the question still remains open as to how would NNGPs perform for more complex datasets.