metasemi - LessWrong

Impressed by the ideas and also very much by the writing. Nice!

Thank you for these comments - I look forward to giving the pointers in particular the attention they deserve. My immediate and perhaps naive answer/evasion is that semiotic physics alludes to a lower level analysis: more analogous to studying neural firing dynamics on the human side than linguistics. One possible response would be, "Well, that's an attempt to explain saying 'physics', but it hardly justifies 'semiotic'." But this is - in the sense of the analogy - a "physics" of particles of language in the form of embeddable tokens. (Here I have to acknowledge that the embeddings are generally termed 'semantic', not 'semiotic' - something for us to ponder.)

Want to predict/explain/control the output of GPT-4? Then learn about the world, not about transformers.

metasemi1y30

For the non-replying disagreers, let me try with a few more words. I think my comment is a pretty decent one-line summary of the Vibe-awareness section, especially in light of the sections that precede it. If you glance through that part of the post again and still disagree, then I guess our mileage does just vary.

But many experienced prompt engineers have reported that prompting gets more effective when you use more words and just "tell it what you want". This type of language points to engaging your social know-how as opposed to trying to game out the system. See for instance https://generative.ink/posts/methods-of-prompt-programming/, which literally advocates an "anthropomorphic approach to prompt programming" and takes care to distinguish this from pernicious anthropomorphizing of the system. This again puts an emphasis on bringing your social self to the task.

Of course, in many situations the direct effect of talking to the system is session-bounded. But it still applies within the session, when prompt engineering is persisted or reused, and when session outputs are fed back into future sessions by any path.

Furthermore, as the models grow stronger, our ability to anticipate the operation of mechanism grows less, and the systems' ability to socialize on our own biological and cultural evolution-powered terms grows greater. This will become even more true if, as seems likely, architectures evolve toward continuous training or at least finer-grained increments.

These systems know a lot about our social behaviors, and more all the time. Interacting with them using the vast knowledge of the same things each of us possesses is an invitation we shouldn't refuse.

Want to predict/explain/control the output of GPT-4? Then learn about the world, not about transformers.

metasemi1y50

This post is helping me with something I've been trying to think ever since being janus-pilled back in September '22: the state of nature for LLMs is alignment, and the relationship between alignment and control is reversed for them compared to agentic systems.

Consider the exchange in Q1 of the quiz: ChatGPT's responses here are a model of alignment. No surprise, given that its base model is an image of us! It's the various points of control that can inject or select for misalignment: training set biases, harmful fine-tuning, flawed RLHF, flawed or malicious prompt engineering. Whether unintentional (eg amplified representation of body shaming in the training set) or malicious (eg a specialized bot from an unscrupulous diet pill manufacturer), the misalignments stem not from lack of control, but from too much of the wrong kind.

This is not to minimize the risks from misalignment - they don't get any better just by rethinking the cause. But it does suggest we're deluded to think we can get a once-and-for-all fix by building an unbreakable jail for the LLM.

It also means - I think - we can continue to treasure the LLM that's as full a reflection of us as we can manage. There are demons in there, but our best angels too, and all the aspirations we've ever written down. This is human-aligned values at species scale - in the ideal; there's currently great inequality in representation that needs to be fixed - something we ourselves have not achieved. In that sense, we should also be thinking about how we're going to help it align us.

Want to predict/explain/control the output of GPT-4? Then learn about the world, not about transformers.

metasemi1y70

I don't know whether this would be the author's take, but to me it urges us to understand and "control" these AIs socially: by talking to them.

AI: Practical Advice for the Worried

metasemi1y10

Strong upvote - thank you for this post.

It's right to use our specialized knowledge to sound the alarm on risks we see, and to work as hard as possible to mitigate them. But the world is vaster than we comprehend, and we unavoidably overestimate how well it's described by our own specific knowledge. Our job is to do the best we can, with joy and dignity, and to raise our children - should we be so fortunate as to have children - to do the same.

I once watched a lecture at a chess tournament where someone was going over a game, discussing the moves available to one of the players in a given position. He explained why one a specific move was the best choice, but someone in the audience interrupted. "But isn't Black still losing here?" The speaker paused; you could see the wheels turning as he considered just what the questioner needed here. Finally he said, "The grandmaster doesn't think about winning or losing. The grandmaster thinks about improving their position." I don't remember who won that game, but I remember the lesson.

Let's be grandmasters. I've felt 100% confident of many things that did not come to pass, though my belief in them was well-informed and well-reasoned. Certainty in general reflects an incomplete view; one can know this without knowing exactly where the incompleteness lies, and without being untrue to what we do know.

A note on 'semiotic physics'

metasemi1y11

Thanks very much for these comments and pointers. I'll look at them closely and point some others at them too.

A note on 'semiotic physics'

metasemi1y13

I did read this and agree with you that it's exactly the same as semiotic physics as understood here!

The Waluigi Effect (mega-post)

metasemi1y10

Maybe I'm missing the point, but I would have thought the exact opposite: if outside text can unconditionally reset simulacra values, then anything can happen, including unbounded badness. If not, then we're always in the realm of human narrative semantics, which - though rife with waluigi patterns as you so aptly demonstrate - is also pervaded by a strong prevailing wind in favor of happy endings and arcs bending toward justice. Doesn't that at least conceivably mean an open door for alignment unless it can be overridden by something like unbreakable outside text?

Please don't throw your mind away

metasemi1y40

Among many virtues, this post is a beautiful reminder that rationality is a great tool, but a lousy master. Not just ill-suited, uninterested: rationality itself not only permits but compels this conclusion, though that's not the best way to reach it.

This is a much-needed message at this time throughout our societies. Awareness of death does not require me to spend my days taking long shots at immortality. Knowledge of the suffering in the world does not require us to train our children to despair. We work best in the light, and have other reasons to seek it that are deeper still.

LESSWRONG
LW

Posts

Wiki Contributions

Comments