Contrary, I liked this post and the latter half the most. It serves as a relatively direct parable about different levels of ability and also the major problems with common arguments against AGI/ASI, which I think people still miss making a point of very often. Spelling them out explicitly without going into super-long detail as a full post is good as it provides more concise argumentative handles. That is, people do not actually make the basic counterarguments enough.
(I also think those suggesting that this is already argued out enough should link to alternative posts. Posts for higher quality and more concise argumentation, and also posts made for reading by interlocutors.)
From my current stance, it is plausible, because we haven't settled how we think of aliens (especially those who are significantly outside of our behaviors) philosophically. I most likely don't respect arbitrary intelligent agents, as I'd be for getting rid of a vulnerable paperclipper if we found one on the far edges of the galaxy.
Then, I think you're not extrapolating mentally how much that computronium would give. From our current perspective the logic makes sense: where we upload the aliens regardless even if you respect their preferences beyond that, because it lets you simulate vastly more aliens or other humans at the same time.
I expect we care about their preferences. However those... (read more)
(Note: I've only read a few pages so far, so perhaps this is already in the background)
I agree that if the parent comment scenario holds then it is a case of the upload being improper.
However, I also disagree that most humans naturally generalize our values out of distribution. I think it is very easy for many humans to get sucked into attractors (ideologies that are simplifications of what they truly want; easy lies; the amount of effort ahead stalling out focus even if the gargantuan task would be worth it) that damage their ability to properly generalize and also importantly apply their values. That is, humans have predictable flaws. Then when you add... (read more)
A core element is that you expect acausal trade among far more intelligent agents, such as AGI or even ASI. As well that they'll be using approximations.
Problem 1: There isn't going to be much Darwinian selection pressure against a civilization that can rearrange stars and terraform planets. I'm of the opinion that it has mostly stopped mattering now, and will only matter even less over time. As long as we don't end up in a "everyone has an AI and competes in a race to the bottom". I don't think it is that odd that an ASI could resist selection pressures. It operates on a faster time-scale and can apply more intelligent optimization... (read more)
I think you're referring to their previous work? Or you might find it relevant if you didn't run into it. https://www.lesswrong.com/posts/ifechgnJRtJdduFGC/emergent-misalignment-narrow-finetuning-can-produce-broadly
If you were pessimistic about LLMs learning a general concept of good/bad, then yes, that should update you. However, I think it still has the main core problems. If you are doing a simple continual learning loop (LLM -> output -> retrain to accumulate knowledge; analogous to ICL) then we can ask the question of how robust this process is. Do the values of how to behave drastically diverge. Such as, are there attractors over a hundred days of output that it is dragged towards that aren't aligned at all? Can it be... (read more)
I have some of the same feeling, but internally I've mostly pinned it to two prongs of repetition and ~status.
ChatGPT's writing is increasingly disliked by those who recognize it. The prose is poor in various ways, but I've certainly read worse and not been so off-put. Nor am I as off-put when I first use a new model, but then I increasingly notice its flaws over the next few weeks. The main aspect is that the generated prose is repetitive across the writings which ensures we can pick up on the pattern. Such as making it easy to predict flaws. Just as I avoid many generic power fantasy fiction as much of it... (read more)
Anecdotally, I would perceive "Bowing out of this thread" as a more negative response because it encapsulates both topic as well as the quality of my response or behavior of myself. While "not worth getting into" is mostly about the worth of the object level matter. (Though remarking on behavior of the person you're arguing with is a reasonable thing to do, I'm not sure that interpretation is what you intend)
I disagree. Posts seem to have an outsized effect and will often be read a bunch before any solid criticisms appear. Then are spread even given high quality rebuttals... if those ever materialize.
I also think you're referring to a group of people who write high quality posts typically and handle criticism well, while others don't handle criticism well. Despite liking many of his posts, Duncan is an example of this.
As for Said specifically, I've been annoyed at reading his argumentation a few times, but then also find him saying something obvious and insightful that no one else pointed out anywhere in the comments. Losing that is unfortunate. I don't think there's enough... (read more)
Because Said is an important user who provides criticism/commentary across many years. This is not about some random new user, which is why there is a long post in the first place rather than him being silently banned.
Alicorn is raising a legitimate point. That it is easy to get complaints about a user who is critical of others, that we don't have much information about the magnitude, and that it is far harder to get information about users who think his posts are useful.
LessWrong isn't a democracy, but these are legitimate questions to ask because they are about what kind of culture (as Habryka talks about) LW is trying to create.
One minor thing I've noticed when thinking on interpretability is that of in-distribution versus out-of-distribution versus - what I call - out-of-representation data. I would assume this has been observed elsewhere, but I haven't seen it mentioned before.
In-distribution could be considered inputs in the same ''structure'' of what you trained the neural network on; out-of-distribution is exotic inputs, like an adversarially noisy image of a panda or a picture of a building for an animal-recognizer NN.
Out-of-representation would be when you have a neural network that takes in inputs of a certain form/encoding that restricts the representable values. However, the neural network can theoretically take anything in between, it just shouldn't ever.
The most... (read more)
I disagree. I don't see increased focus on scheming, if anything notably less common. In part due to updating on current gen LLMs. I do think there is a tendency to think about scheming as a discrete thing, but that it is more common among the optimistic who point at current gen LLMs not really being 'schemers'.
I agree with the way Zvi talks about the topic. "Being a schemer" is not quite the right classification. The issue is that deception is a naturally convergent tool for all sorts of goals, anything that interfaces with reality intelligently will find that deception and manipulation are useful tools. So we'd naturally expect that RL and other fun... (read more)