fiso64 — LessWrong

LESSWRONG
LW

fiso64 — LessWrong

Replying to[Linkpost] Solving Quantitative Reasoning Problems with Language Models

[Linkpost] Solving Quantitative Reasoning Problems with Language Models

The model’s performance is still well below human performance

At this point I have to ask what exactly is meant by this. The bigger model beats the average human performance on the national math exam in Poland. Sure, the people taking this exam are usually not adults, but for many it may be where they peak in their mathematical abilities, so I wouldn't be surprised if it beats average human performance in the US. It's all rather vague though; looking at the MATH dataset paper all I could find regarding human performance was the following:

Human-Level Performance. To provide a rough but informative comparison to human-level performance, we randomly sampled 20 problems from the

fiso644y

AGI Safety FAQ / all-dumb-questions-allowed thread

Here's a non-obvious way it could fail. I don't expect researchers to make this kind of mistake, but if this reasoning is correct, public access of such an AI is definitely not a good idea.

Also, consider a text predictor which is trying to roleplay as an unaligned superintelligence. This situation could be triggered even without the knowledge of the user by accidentally creating a conversation which the AI relates to a story about a rogue SI, for example. In that case it may start to output manipulative replies, suggest blueprints for agentic AIs, and maybe even cause the user to run an obfuscated version of the program from the linked post. The... (read more)

Replying toCausal confusion as an argument against the scaling hypothesis

fiso644y

Causal confusion as an argument against the scaling hypothesis

I disagree with your last point. Since we're agents, we can get a much better intuitive understanding of what causality is, how it works and how to apply it in our childhood. As babies, we start doing lots and lots of experiments. Those are not exactly randomized controlled trials, so they will not fully remove confounders, but it gets close when we try to do something different in a relatively similar situation. Doing lots of gymnastics, dropping stuff, testing the parent's limits etc., is what allows us to quickly learn causality.

LLMs, as they are currently trained, don't have this privilege of experimentation. Also, LLMs are missing so many potential confounders as they can only look at text, which is why I think that systems like Flamingo and Gato are important (even though the latter was a bit disappointing).

Replying toAI misalignment risk from GPT-like systems?

fiso644y

AI misalignment risk from GPT-like systems?

I posted a somewhat similar response to MSRayne, with the exception that what you accidentally summon is not an agent with a utility function, but something that tries to appear like one and nevertheless tricks you into making some big mistake.

Here, what you get is a genuine agent which works across prompts by having some internal value function which outputs a different value after each prompt, and acts accordingly, if I understand correctly. It doesn't seem incredibly unlikely, as there is nothing in the process of evolution that necessarily has to make humans themselves be optimizers, but it happened anyways because that is what best performed in the overall goal of reproduction. This AI will still probably have to somehow convince the people communicating with it to give it "true" agency independent of the user's inputs. Seems like an instrumental value in this case.

Replying toAI misalignment risk from GPT-like systems?

fiso644y

AI misalignment risk from GPT-like systems?

That makes a lot of sense, thanks for the link. It is not as dangerous of a situation as a true agent AGI as this failure mode involves a (relatively stupid) user error. I trust researchers not to make that mistake, but it seems like there is no way to safely make those systems available to the public.

A way to make this more plausible I thought of after reading this is that of accidentally making it think it's hostile. Perhaps you make a joking remark about paperclip maximizers, or maybe it just so happens that the chat history is similar to the premise of a story about a hostile AGI in its dataset, and it thinks you're making a reference. Suddenly, it's trying to model an unaligned AGI. This system can then generate outputs which deceive you into doing something stupid, such as running the shell script described in the linked post, or creating a seemingly aligned AGI agent with its suggestions.

AI misalignment risk from GPT-like systems?

fiso64

Right now, it seems that the most likely way we're gonna get an (intellectually) universal AI is by scaling models such as GPT. That is, models trained by self-supervised learning on massive piles of data, perhaps with a similar architecture to the transformer.

I do not see any risk due to misalignment here.

One failure mode I've seen discussed is that of manipulative answers, as seen in Predict-O-Matic. Maybe those AIs will learn that manipulating users to do actions with low entropy outcomes decreases the overall prediction error?
But why should a GPT-like ever output manipulative answers? I am not denying the possibility that a GPT successor develops human level intelligence. When it learns to... (read more)

Replying to[linkpost] The final AI benchmark: BIG-bench

fiso644y

[linkpost] The final AI benchmark: BIG-bench

Small remark; BIG-bench does include tasks on self-awareness, and I'd argue that it is a requirement for your definition "an AI that can do any cognitive tasks that humans can", as well as being generally important for problem solving. Being able to correctly answer the question "Can I do task X?" is evidence of self-awareness and is clearly beneficial.

Replying toThe Reverse Basilisk

fiso644y

The Reverse Basilisk

Again, there seems to be an assumption in your argument which I don't understand. Namely, that a society/superintelligence which is intelligent enough to create a convincing simulation for an AGI would necessarily possess the tools (or be intelligent enough) to assess its alignment without running it. Superintelligence does not imply omniscience.

Maybe showing the alignment of an AI without running it is vastly more difficult than creating a good simulation. This feels unlikely, but I genuinely do not see any reason why this can't be the case. If we create a simulation which is "correct" up to the nth digit of pi, beyond which the simpler explanation for the observed behavior becomes the... (read more)

Replying toThe Reverse Basilisk

fiso644y

The Reverse Basilisk

I don't follow. Why are you assuming that we could adequately evaluate the alignment of an AI system without running it if we were also able to create a simulation accurate enough to make the AI question what's real? This doesn't seem like it would be true necessarily.

Replying toA Parable Of Explainability

fiso644y

A Parable Of Explainability

I think the word "explainable" isn't really the best fit. What we really mean is that the model has to be able to construct theories of the world, and prioritize the ones which are more compact. An AI that has simply memorized that a stone will fall if it's exactly 5, 5.37, 7.8 (etc) meters above the ground is not explainable in that sense, whereas one that discovered general relativity would be considered explainable.

And yeah, at some point, even maximally compressed theories become so complex that no human can hope to understand them. But explainability should be viewed as an intrinsic property of AI models rather than in connection with humans.

Or, maybe... (read more)

Replying toIt Looks Like You're Trying To Take Over The World

fiso644y

It Looks Like You're Trying To Take Over The World

I agree it's irrelevant, but I've never actually seen these terms in the context of AI safety. It's more about how we should treat powerful AIs. Are we supposed to give them rights? It's a difficult question which requires us to rethink much of our moral code, and one which may shift it to the utilitarian side. While it's definitely not as important as AI safety, I can still see it causing upheavals in the future.