cubefox

Wiki Contributions

Comments

Two reviewers who worried about the weight: Norman Chan, Marques Brownlee.

There are at least two related theories in which "all sentient beings matter" may be true.

  • Sentient beings can experience things like suffering, and suffering is bad. So sentient beings matter insofar it is better that they experience more rather than less well-being. That's hedonic utilitarianism.

  • Sentient beings have conscious desires/preferences, and those matter. That would be preference utilitarianism.

The concepts of mattering or being good or bad (simpliciter) are intersubjective generalizations of the subjective concepts of mattering or being good for someone, where something matters (simpliciter) more, ceteris paribus, if it matters for more individuals.

I aware of just three methods to modify GPTs: In-context learning (prompting), supervised fine-tuning, reinforcement fine-tuning. The achievable effects seem rather similar.

I did read your post. The fact that something like predicting text requires superhuman capabilities of some sort does not mean that the task itself will result in superhuman capabilities. That's the crucial point.

It is much harder to imitate human text than to write while being a human, but that doesn't mean the imitated human itself is any more capable than the original.

An analogy. The fact that building fusion power plants is much harder than building fission power plants doesn't at all mean that the former are better. They could even be worse. There is a fundamental disconnect between the difficulty of a task and the usefulness of that task.

This approach doesn't seem to work with in-context learning. Then it is unclear whether fine-tuning could be more successful.

Being able to perfectly imitate a Chimpanzee would probably also require superhuman intelligence. But such a system would still only be able to imitate chimpanzees. Effectively, it would be much less intelligent than a human. Same for imitating human text. It's very hard, but the result wouldn't yield large capabilities.

Thank you, this has many interesting points. The takeoff question is the heart of predicting x-risk. With soft takeoff catastrophy seems unlikely, and likely with hard takeoff.

One point though. "Foom" was intended to be a synonym for "intelligence explosion" and "hard takeoff". But not for "recursive self-improvement", although EY perceived the latter to be the main argument for the former, though not the only one. He wrote:

[Recursive self-improvement] is the biggest, most interesting, hardest-to-analyze, sharpest break-with-the-past contributing to the notion of a "hard takeoff" aka "AI go FOOM", but it's nowhere near being the only such factor.

One reason (EY mentions this) AlphaGo Zero was a big update is that it "foomed" (within its narrow domain) without any recursive self-improvement. It used capability amplification by training on synthetic data, self-play in this case.

This is significant because it is actual evidence that a hard takeoff doesn't require recursive self-improvement. RSI arguably needs a very strong AI to even get off the ground, namely one that is able to do ML research. The base level for capability amplification seems much lower. So the existence of AlphaGo Zero is a direct argument for foom.

As you mention, so far not many systems have successfully used something like capability amplification, but at least it is a proof of concept. It demonstrates possibilities, which wasn't clear before.

Yes, current LLMs work in the opposite way, relying on massive amounts of human generated text. They essentially perform imitation learning. But it is exactly this that limits their potential. It is not clear how they could ever develop strongly superhuman intelligence by being superhuman at predicting human text. Even a perfect chimpanzee imitator wouldn't be as intelligent as a human. Just optimizing for imitating human text seems to necessarily lead to diminishing returns. Training on synthetic data doesn't have this limit of imitation learning.

(Moreover, animals, including humans, probably also do not primarily learn by imitation learning. The most popular current theory, predictive coding, says the brain predicts experiences instead of text. Experiences are not synthetic data, but they aren't human generated either. Future experiences are directly grounded in and provided by physical reality, while text has only a very indirect connection to the external world. Text is always human mediated. It's plausible that a superhuman predictive coder would be superhumanly intelligent. It could evaluate complex subjunctive conditionals. Predictive coding could also lead to something like foom: For example via a model which learns with 1000 different remote robot bodies in parallel instead of just with one like animals.)

Yeah. In logic it is usually assumed that sentences are atomic when they do not contain logical connectives like "and". And formal (Montaigne style) semantics makes this more precise, since logic may be hidden in linguistic form. But of course humans don't start out with language. We have some sort of mental activity, which we somehow synthesize into language, and similar thoughts/propositions can be expressed alternatively with an atomic or a complex sentence. So atomic sentences seem definable, but not abstract atomic propositions as object of belief and desire.

A bit late, a related point. Let me start with probability theory. Probability theory is considerably more magic than logic, since only the latter is "extensional" or "compositional", the former is not. Which just means the truth values of and determine the truth value of complex statements like ("A and B"). The same is not the case for probability theory: The probabilities of and do not determine the probability of , they only constrain it to a certain range of values.

For example, if and have probabilities 0.6 and 0.5 respectively, the probability of the conjunction, , is merely restricted to be somewhere between 0.1 and 0.5. This is why we can't do much "probabilistic deduction" as opposed to logical deduction. In propositional logic, all the truth values of complex statements are determined by the truth values of atomic statements.

In probability theory we need much more given information than in logic, we require a "probability distribution" over all statements, including the complex ones (which grow exponentially with the number of of atomic statements), and only require them to not violate the rather permissive axioms of probability theory. In essence, probability theory requires most inference questions to be already settled in advance. By magic.

This already means a purely "Bayesian" AI can't be built, as magic doesn't exist, and some other algorithmic means is required to generate a probability distribution in the first place. After all, probability distributions are not directly given by observation.

(Though while logic allows for inference, it ultimately also fails as an AI solution, partly because purely deductive logical inference is not sufficient, or even very important, for intelligence, and partly also because real world inputs and outputs of an AI do not usually come in form of discrete propositional truth values. Nor as probabilities, for that matter.)

The point about probability theory generalizes to utility theory. Utility functions (utility "distributions") are not extensional either. Nor are preference orderings extensional in any sense. A preference order between atomic propositions implies hardly anything about preferences between complex propositions. We (as humans) can easily infer that someone who likes lasagna better than pizza, and lasagna better than spaghetti, probably also likes lasagna better than pizza AND spaghetti. Utility theory doesn't allow for such "inductive" inferences.

But while these theories are not theories that solve the general problem of inductive algorithmic inference (i.e., artificial intelligence), they at least set, for us humans, some weak coherence constraints on rational sets of beliefs and desires. They are useful for the study of rationality, if not for AI.

This is an interesting result!

  • It seems to support LeCun's argument against autoregressive LLMs more than "simulator theory".

  • One potential weakness about your method is that you didn't use a base (foundation) model, but apparently the heavily finetuned gpt-3.5-turbo. The different system prompts probably can't negate the effect of this common fine-tuning completely. It would be interesting how the results hold up when you use code-davinci-002, the GPT-3.5 base model, which has no instruction tuning or RLHF applied. Though this model is no longer available via OpenAI, it can still be accessed on Microsoft Azure.

Load More