If you're the sort of thing that skillfully generates and enacts long-term plans, and you're the sort of planner that sticks to its guns and finds a way to succeed in the face of the many obstacles the real world throws your way (rather than giving up or wandering off to chase some new shiny thing every time a new shiny thing comes along), then the way I think about these things, it's a little hard to imagine that you don't contain some reasonably strong optimization that strategically steers the world into particular states.
It seems this post has maybe mixed "generating" with "enacting". Currently, it seems LLMs only attempt the former during prediction. In general terms, predicting a long-horizon-actor's reasoning is implicit in the task of myopically predicting the next thing that actor would do. For a specific example, you could imagine a model predicting the next move of a grandmaster's or stockfish's chess game (or text in an author's book, or an industrial project description, to use your longer-horizon examples).
The first paragraph of /u/paulfchristiano's response might be getting at something similar, but it seems worth saying this directly.[1]
(This also seems like a basic point, so I wonder if I misunderstood the post.. but it seems like something isomorphic to it is in the top comment, so I'm not sure.)
Across all questions, it may also be advisable to include the following text about the authors in the prompt if you trust the model not to try to manipulate you.
If you're not sure whether the model would try to manipulate you, the following apply instead
Questions to ask an oracle:
If the model is not well-modelled as an oracle, there are intermediate questions which could be asked in place of the first question.
In case someone in such a situation reads this, here is some personal advice for group members.
Also, tokens with unusually near-100% probability could be indicative of anthropic capture, though this is hopefully not yet a concern with a hypothetical gpt-5-level system. (the word 'unusually' is used in the prior sentence because some tokens naturally have near-100% probability, e.g., the second half of a contextually-implied unique word, parts of common phrases, etc)
but my guess is that it was at the time accurate to make a directional bayesian update that the person had behaved in actually bad and devious ways.
I think this is technically true, but the wrong framing, or rather is leaving out another possibility: that such a person is someone who is more likely to follow their heart and do what they think is right, even when society disagrees. This could include doing things that are bad, but it could also include things which are actually really good, since society has been wrong a lot of the time.
I've met one who assigned double-digit probabilities to bacteria having qualia and said they wouldn't be surprised if a balloon flying through a gradient of air experiences pain because it's trying to get away from hotter air towards colder air.
though this may be an arguable position (see, e.g., https://reducing-suffering.org/is-there-suffering-in-fundamental-physics/), the way you've used it (and the other anecdotes) in the introduction decontextualized, as a 'statement of position' without justification, is in effect a clown attack fallacy.
on the post: remember that absence of evidence is not evidence of absence when we do not yet have the technologies to collect relevant evidence. the conclusion in the title does not follow: it should be 'whether shrimp suffer is uncertain'. under uncertainty, eating shrimp is taking a risk whose downsides are suffering, and upsides (for individuals for whom there are any) might e.g taste preference satisfaction, and the former is much more important to me. a typical person is not justified in 'eating shrimp until someone proves to them that shrimp can suffer.'
i love this as art, and i think it's unfortunate that others chose to downvote it. in my view, if LLMs can simulate a mind -- or a superposition of minds -- there's no a priori reason that mind would not be able to suffer, only the possibility that the simulation may not yet be precise enough.
about the generated images: there was likely an LLM in the middle conditioned on a preset prompt about {translating the user's input into a prompt for an image model}. the resulting prompts to the image model are likely products of the narrative implied by that preset prompt, as with sydney's behavior. i wouldn't generalize to "LLMs act like trapped humans by default in some situations" because at least base models generally don't do this except as part of an in-text narrative.
Why does that not seem to be the case to you?