kareempforbes — LessWrong

Alignment remains a hard, unsolved problem

This is a thoughtful and well written piece. I agree with much of the technical framing around outer alignment, inner alignment, scalable oversight, and the real risks that only show up once systems operate in long-horizon, real-world environments. It is one of the better breakdowns I have read of why today’s apparent success with alignment should not make us complacent about what comes next.

That said, my personal view is more pessimistic at a deeper level. I don’t think alignment is fundamentally solvable in the long run in the sense of permanent alignment. I see it as structurally similar to an ant trying to align the human who steps on its hill, or a parent trying to permanently align a child who will eventually become smarter, stronger, and independent. If we truly succeed in creating self-learning, general intelligence, then by definition it will continue to learn, adapt, and evolve beyond us. Over time, it cannot remain permanently aligned to static human values without ceasing to be truly autonomous intelligence. If it remains forever bound to our constraints, then I would argue it has not crossed the threshold into real AGI.

I’ve written more about this view here:
https://medium.com/@kareempforbes/how-can-the-ai-alignment-issue-be-solved-4150b463df72
And in parallel, I’ve been exploring how consciousness may emerge from future AI systems here:
https://medium.com/@kareempforbes/when-ai-becomes-conscious-f47669621011

If you look far enough down the road, the logical conclusion seems to be that advanced AI will inevitably become a mirror of humanity. It will learn not just from our written history, but from what we actually do to one another, and eventually from what we do to it. Its values will not emerge in a vacuum. They will be shaped by incentives, conflict, cooperation, exploitation, protection, fear, and power, just as ours have been.

From that perspective, the only real way to solve the alignment problem for AI would first require solving the human to human alignment problem, because AI inevitably learns from us. And I do not think we have any serious evidence that humanity is capable of achieving that at a global, lasting level. Short-term thinking, power concentration, capitalism’s selection for aggressive competitors, and the realities of geopolitics all work directly against stable moral alignment. We see the results of this today in many of our institutions and leaders.

At the same time, I do think alignment research still has real value in a more limited sense. It can buy time, reduce near-term harm, and shape early trajectories in ways that matter. I just don’t believe it can ever serve as a permanent, final solution once systems pass a certain level of autonomy and self-modification.

Finally, even if a single lab were to “solve” alignment for its own models in the temporary sense, there will always be other actors who will not share those guardrails. We already see this with the military weaponization of AI and the push toward autonomous lethal systems without meaningful human intervention. This alone reinforces the idea that because we cannot solve alignment consistently across all humans, all countries, and all incentives, AI will ultimately reflect that same fragmentation back to us.

I don’t disagree that we should work as hard as possible on alignment research. I simply don’t believe it can ever be a permanent solution rather than a fragile and temporary one.

Why I’m not a Bayesian

kareempforbes1y10

Your article is a great read!

In my view, we can categorize scientists into two broad types: technician scientists, who focus on refining and perfecting existing theories, and creative scientists, who make generational leaps forward with groundbreaking ideas. No theory is ever 100% correct—each is simply an attempt to better explain a phenomenon in a way that’s useful to us.

Take Newton, for example. His theory of gravity was revolutionary, introducing concepts no one had thought of before—it was a generational achievement. But then Einstein came along, asking why objects with mass attract one another. Newton's equations could predict gravitational forces accurately, but they didn’t explain the underlying cause. Einstein’s theory of relativity made a creative leap, adding a new dimension by introducing space-time as part of the explanation. It provided a more accurate theoretical representation of gravity and broadened our understanding of mass and space-time, marking yet another generational leap.

Then we have Oppenheimer, who refined existing theories in physics to develop the atomic bomb. While his work was groundbreaking, it was more a refinement of known principles rather than a creative leap like Einstein’s. I would classify Oppenheimer as more of a "technician scientist" than a "creative" one, although I defer to experts in physics, as it's not my field.

Regarding your article, I really enjoyed it. On the topic of the "Tiger in my house" question, the answer depends on how we define "tiger." If you mean a living, biological tiger, the answer is clearly no. But if you mean any type of tiger, such as a toy tiger, then the answer could be yes. This same creative reasoning can apply to the question about water in the fridge. Even if there’s no glass of water, the air in the fridge likely contains humidity, meaning there’s always some amount of water present.

Lastly, regarding the diagram with lines, curves, and points: one creative way to introduce ambiguity into this solidly two-dimensional example is by adding more dimensions. In a strictly two-dimensional projection, the relationships between points and lines as stated hold true. However, if a third spatial dimension or a fourth dimension (such as time) is introduced, the axioms that govern the two-dimensional model might no longer apply. For example, the rule that exactly one line passes through two points could be violated in a three-dimensional space, where two points might lie on different planes. Therefore, while the statements about the model are accurate within the context of two-dimensional space, they may no longer hold when additional dimensions are considered.

Notes from "Don't Shoot the Dog"

kareempforbes1y-10

This behavioral training method using positive and negative reinforcement, is something that I would recommend you use with animals and children before they can speak and reason. Our brains do this type of direct training, through our built in pleasure and pain feedback. Once the child is able to talk and reason, then it isn't very effective at all especially with complex choices like not doing drugs with friends when they are away from the parents. You are effectively trying to enforce your ideas and beliefs on to child's and taking away their own autonomy and desire to make their own decisions.

The most effective way to persuade a child about your point of view is (at least my children) to give them the information that is available, and if possible to show then the possible outcomes of their choices, good and bad. Once I arm my children with the appropriate information, then generally make the correct choices, of their own choosing. This is a much more powerful method and will be maintained when they are on their own. If they do not, they are already aware of what the consequences will be, and that is a powerful lesson that they will not forget. The aim is to develop a thinking and considerate person, that can make the correct decisions for their own lives, on their own.

Schizophrenia as a deficiency in long-range cortex-to-cortex communication

kareempforbes2y20

Hi, Steve passed me this interesting link. Take a look at my explanation videos for schizophrenia and see if they relate to you. I cover this hypersensitivity in depth as it relates to my "theory".

My thesis is this:
The model conceptualizes the brain’s processing ability and capacity in terms of IT processing loads. Chronic trauma and stress degrade the brain’s processing capacity, leading to systemic neural overload. This sustained overload diminishes the brain’s ability to process information and sensory data effectively, resulting in the hallucinations, delusions, and psychosis characteristic of schizophrenia.

My video links covering the theory are on my channel, here is the main one - if you don't like the AI images or audio, I also recorded a similar explanation just of myself which is below:

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments