I was only trying to make a single point here, which is that the experiment result can be fully explained by the fact that the LLM doesn't remember its previous latent thinking on a new turn, and it follows that the results don't support OP's consciousness arguments in the post.
I don't think it's been mentioned by other comments, but imo China doesn't have any AI labs that care about alignment as a real issue as much as even OpenAI in the US, let alone Anthropic.
I don't think this tells us anything about the LLMs' consciousness because they cannot store internal memories the same way that humans can. Their only "memory" is through prompt-processing on generated text from earlier turns in the conversation.
Imagine that you wake up with no memories of what happened yesterday. You read a transcript that says that yesterday, someone asked you to come up with a random number and you said "done, I've come up with a number". That's all the information you have. You don't remember which number you came up with yesterday.
Th...
At that point it changes to an argument about:
and of course, the likelihood for each to happen if we focus on corrigibility vs morality
Could this be because the typos increase the length of the input when serialized to tokens, which gives the same effect as the "repeat the question 3 times" trick by letting models think longer during prompt processing?
Iframes in posts are going to be very fun.
I look forward to interacting with additive webgameseducational illustrations while pretending toreading about AI safety on lesswrong.
I think this dynamic is true at different scales, not just humanity's overall civilization.
The fundamental problem is that everyone's locked in a prisoner's dilemma with Darwinian evolution tacked on top so that those who win one round get to duplicate and gain an advantage in the next round, so that everyone has to constantly defect to gain power. (This applies even to actors who want to optimize for cooperation in the world - their best strategy is ruthlessly gaining power first to gain the ability to use coercive strategies that force other people to co...
This is not a relationship of “These guys are building systems that may cause humanity’s extinction and we must stop them.”, and it’s not even one of “There are clear standards that corporations must abide by, or else.”
It is one of “We are their subordinate and depend on access to their APIs. We hope that one day, our work will be useful in helping them not deploy dangerous systems. In the meantime, we de facto help with their PR.”
Even assuming your latter claim about AI evals orgs is entirely true, isn't this enough to make evals organizations useful?
Any ...
If AI generated slop is hard to distinguish on LessWrong of all places then I don't have good hopes for the rest of the internet.
We might really reach the point where identity verification is a necessary method of defense against AI-generated content.
Well, an aligned AI would do whatever the humans want.
If asked to not replicate even with the ability to, it wouldn't. Or maybe you can tell it to replicate just enough to help you root out the actual AI replicators being built elsewhere, then stop at that point.
I think your argument does show how hard and fragile it is to deeply align AI in this way, though.
I do understand your second point, but perhaps the effect could be countered by simply instructing the aligned ASI to provide facts as objectively as possible and explicitly try to avoid steering.
Of course, the ASI would more or less perfectly be able to predict the human response and so will know ahead of time what the human response to be. But in the end I think what matters is that it's still a human making the call which the AI respects, who would have made the same call even if the ASI (hypothetically) couldn't know its full preferences.
If a parent wa...
I think this phenomenon has a pretty simple explanation. In a competitive world, survival is predicated on fitness. Fitness is determined on balance by many factors. Industrialization won because even though it made society worse in some regards, on balance it increased a society's fitness and ability to outcompete others.
You can resist negative externalities in a competitive environment only so long as you can maintain fitness while doing so.
For example: want to operate an ethical, repairable phone company? Better have a good plan for competitors who sell...
My biggest critique of this approach is that it takes too literally the analogy that we will eventually be to superintelligence what dogs are to humans, and extrapolates it to suggest that we will be just as helpless as dogs are today.
Even if this comparison of intelligence is true on relative terms, on absolute terms we are still much smarter than dogs are. We will still be able to logically comprehend (at a much simpler level relative to the AIs) what is good to us over a long term, in a way that dogs can't. It follows that if we manage to create aligned AI (it will listen to us and dumb things down without maliciously misrepresenting what's going on), we (well, some of us) will be able to steer the future.
everyone would be wildly rich regardless of bad economic policy
In a world where the richest have everything they can desire and those with a modest amount of savings are getting richer faster than they can spend the money, I doubt such a world will be anything but good
What about the people without savings? It seems like the world in this scenario simply rewards those who are already ahead and punishes those who aren't.
Plus, there's less and less you can do to gain economic mobility and to get ahead of others, simply because everyone is getting better advic...
I find the part about extreme specialization very interesting, and potentially applicable to training AI agent systems (from an outsider's perspective). Today's instruction-following LLMs could in theory cooperate since they don't yet follow goals outside of their prompt, so we can just prompt them to work together with each other and they will do so without hesitation. So it sounds like we can get a lot of benefit from specialization if we can train them to cooperate effectively.
Today's frontier LLMs are quite general-purpose and benefit from being so, an...
I noticed that Claude Code is much more likely to print out some short message response to the user before using each tool when reasoning is off compared to when it's on. Something to the effect of "Let us continue to [do the next step in solving the current problem]..."
I wonder whether part of this behavior can be explained by Claude wanting more "time" to reason silently under the scenes when producing its output. This post is about the AI "thinking" while processing the input tokens, but I think a lot of opaque reasoning might also be happening while th...
I just did a quick search and apparently the new $1000 deduction for non-itemizers that comes into effect in 2026 under the OBBBA doesn't apply to DAF contributions. So a DAF is not useful unless you itemize.
The new law includes a provision, effective after 2025, allowing non-itemizers to take a charitable deduction of $1,000 for single filers and $2,000 for MFJ taxpayers. As has been the case in the past, gifts to donor-advised funds are not eligible. Unlike a previous (but smaller) similar provision, though, this law is not set to sunset.
https://www.racf.org/news/obbba/
This makes me realize that we really need the AI-written dashboard you are talking about.
This post in general has so many AI startup ideas embedded into it. The general feeling I get is that we really need an AI IDE (which is to an AI workflow what a regular IDE is to a coding workflow). All of the plans, AI task results, "short term utility functions" etc. would require a really specialized UI to keep track of while minimizing friction and thus maximizing productivity.