Sven Nilsen — LessWrong

Agentized LLMs will change the alignment landscape

Obviously, problems are not exclusive! I find it easier to imagine a civilization that has survived for a long time and made significant technological progress: How would a such civilization approach ASI? I think they will analyze the problem to death and use automated theorem proving as much as possible and having a culture where only a tiny amount of ideas ever get implemented, even if most of those ideas never implemented would seem very good to us. In short: Higher standards for safety.

One challenge with the "people will use it for bad stuff"-situations is that a sufficiently aligned AGI needs to be confidently non-trusting towards minds of people who in general wants to change the underlying physical processes of life as it evolved on Earth. This also holds for more bizarre and somewhat safe goals such as "make human babies have pointy ears". It is not an X-risk, but we still don't want that kind of stuff to happen. However, how to engineer AGI systems such that they refuse to cooperate with such people, is enormously difficult and beyond my level of intelligence.

Agentized LLMs will change the alignment landscape

Sven Nilsen3y31

It is also worth thinking if you put in context that people said "no, obviously, humans would not let it out of the box". Their confident arguments persuaded smart people into thinking that this was not a problem.

You also have the camp "no, the problem will not be people telling the AI do bad stuff, but about this hard theoretical problem we have to spend years doing research on in order to save humanity" versus "we worry that people will use it for bad things" which in hindsight is the first problem that occurred, while alignment research either comes too late or becomes relevant only once many other problems already happened.

However, in the long run, alignment research might be like building the lighthouse in advance of ship traffic on the ocean. If you never seen the ocean before, a lighthouse factory seems mysterious as it is on land and has no seemingly purpose that is easy to relate to. Yet, such infrastructure might be the engine of civilizations that reaches the next Kardashev scale.

Contra LeCun on "Autoregressive LLMs are doomed"

Sven Nilsen3y20

The error margin LeCun used is for independent probabilities, while in an LLM the paths that the output takes become strongly dependent on each other to stay consistent within a path. Once an LLM masters grammar and use of language, it stays consistent within the path. However, you get the same problem, but now repeated on a larger scale.

Think about it as tiling the plane using a building block which you use to create larger building blocks. The larger building block has similar properties but at a larger scale, so errors propagate, but slower.

If you use independent probabilities, then it is easy to make it look as if LLMs are diverging quickly, but they are not doing that in practice. Also, if you have 99% of selecting one token as output, then this is not observed by the LLM predicting next token. It "observes" its previous token as 100%. There is a chance that the LLM produced the wrong token, but in contexts where the path is predictable, it will learn to output tokens consistently enough to produce coherence.

Humans often think about probabilities as inherently uncertain, because we have not evolved the intuition for nuance to understand probabilities at a deeper level. When an AI outputs actions, probabilities might be thought of as an "interpretation" of the action over possible worlds, while the action itself is a certain outcome in the actual world.

Contra LeCun on "Autoregressive LLMs are doomed"

Sven Nilsen3y10

The error margin is larger than it appears as estimated from tokens, due to combinatorial complexity. There are many paths that the output can take and still produce an acceptable answer, but an LLM needs to stay consistent depending on which path is chosen. The next level is to combine paths consistently, such that the dependence of one path on another is correctly learned. This means you get a latent space where the LLM learns the representation of the world using paths and paths between paths. With other words, it learns the topology of the world and is able to map this topology onto different spaces, which appear as conversations with humans.

Foom seems unlikely in the current LLM training paradigm

Sven Nilsen3y21

I up-voted your post because I think this is a useful discussion to have, although I am not inclined to use the same argument and my position is more conditional. I learned this lesson from the time I played with GPT-3, which seemed to me as a safe pathway toward AGI, but I failed to anticipate how all the guardrails to scale back deployment were overrun by other concerns, such as profits. It is like taking a safe pathway and incrementally make it more dangerous over time. In the future, I expect something similar to happen to GPT-4, e.g. people develop hardware to put it directly on a box/device and selling it in stores. Not just as a service, but as a tool where the hardware is patented/marketed. For now, it looks like the training is the bottleneck for deployment, but I don't expect this to stay as there are many incentives to bring the training costs down. Also, I think one should be careful about using flaws of architecture as argument against the path toward self-improvement. There could be a corresponding architecture design that provides a work-around that is cheaper. The basic problem is that we only see a limited number of options while the world processes in parallel many more options that are available to a single person.

The Plan: Put ChatGPT in Charge

Sven Nilsen3y21

If ChatGPT is asked questions like "should we put in safeguards in the development of self-improving AIs" then is it likely to answer "yes". Now, if ChatGPT was given political power, it becomes a policy that world leaders need to solve. Do we need constraints on GPU computing clusters? Maybe ChatGPT answers "STOP", because it thinks the question is too complex to answer directly. It is always more difficult to decide on what actions to do in order to implement general policies, than agreeing about the overall policy. However, if we can align our overall policy decisions, then we might have a better chance dealing with threats of smarter AIs. I don't think this will work perfectly, but it might be aimed at some sort of improvement over the current state.

Harry Potter in The World of Path Semantics

Sven Nilsen3y21

Maybe you can talk to Eric Weiser, who kindly provided me a proof in Lean 3: https://github.com/advancedresearch/path_semantics/blob/master/papers-wip/semiconjugates-as-satisfied-models-of-total-normal-paths.pdf

The experts in dependent types I know, think Path Semantics might help provide a better foundation or understanding in the future, or perhaps languages with some new features. We don't know yet, because it takes a lot of work to get there. I don't have the impression that they are thinking about Path Semantics, since there is already a lot to do in dependent types.

The reason I worked with Kent Palmer, was because unlike in dependent types, it is easier to see the connection between Path Semantics and Continental Philosophy. Currently, there is a divide between Analytic Philosophy and Continental Philosophy and Kent Palmer is interested in bridging these two.

Harry Potter in The World of Path Semantics

Sven Nilsen3y21

The paper about the counter-example to Leibniz's First Principle in Path Semantics has been released: https://github.com/advancedresearch/path_semantics/blob/master/papers-wip2/counter-example-to-leibniz-first-principle.pdf

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments