Insub — LessWrong

Mechanisms too simple for humans to design

Thus, the design information all has to be in the DNA

The OP mentioned non-DNA sources of information briefly, but I still feel like they're not being given enough weight.

In order to fully define e.g. a human, you need to specify:

The DNA
A full specification of the egg where the DNA will start its life
A full specification of the womb in which the egg will grow into a human

If you gave a piece of DNA to an alien and didn't tell them how to interpret it, then they'd have no way of building a human. You'd need to give them a whole lot of other information too.

Even looking at different DNA for different organisms, each organism's DNA expects to be interpreted differently (as opposed to source code, which mostly intends to be interpreted by the same OS/hardware as other source code). If you put a lizard's DNA into a human's egg and womb, I'm guessing that would not successfully build a lizard.

So I guess my question is: to what extent should the complexity of the interpreter be included in the complexity of the thing-being-interpreted? In one sense I feel like Word's code does fully specify Word amongst all other possible software, but in another sense (including the interpreter) I feel like it does not.

Danger, AI Scientist, Danger

Insub1y10

Transcribed from the screenshot "The AI Scientist Bloopers" in the post

Danger, AI Scientist, Danger

Insub1y4220

Regarding spawning instances of itself, the AI said:

This will ensure the next experiment is automatically started after the current one completes

And regarding increasing the timeout, it said:

Run 2 timed out after 7200 seconds
To address the timeout issue, we need to modify experiment.py to:
Increase the timeout limit or add a mechanism to handle timeouts

I've seen junior engineers do silly things to fix failing unit tests, like increasing a timeout or just changing what the test is checking without any justification. I generally attribute these kinds of things to misunderstanding rather than deception - the junior engineer might misunderstand the goal as "get the test to show a green checkmark" when really the goal was "prove that the code is correct, using unit tests as one tool for doing so".

The way the AI was talking about its changes here, it feels much more like a junior engineer that didn't really understand the task & constraints than like someone who is being intentionally deceptive.

The above quotes don't feel like the AI intentionally "creating new instances of itself" or "seeking resources" to me. It feels like someone who only shallowly understands the task just doing the first thing that comes to mind in order to solve the problem that's immediately in front of them.

That being said, in some sense it doesn't really matter why the AI chooses to do something like break out of its constraints. Whether it's doing it because it fully understand the situation or because it just naively (but correctly) sees a human-made barrier as "something standing between me and the green checkmark", I suppose the end result is still misaligned behavior.

So by and large I still agree this is concerning behavior, though I don't feel like it's as much of a knock-down "this is instrumental convergence in the real world" as this post seems to make out.

The Parable Of The Fallen Pendulum - Part 1

Insub2y611

I would say:

A theory always takes the following form: "given [premises], I expect to observe [outcomes]". The only way to say that an experiment has falsified a theory is to correctly observe/set up [premises] but then not observe [outcomes].

If an experiment does not correctly set up [premises], then that experiment is invalid for falsifying or supporting the theory. The experiment gives no (or nearly no) Bayesian evidence either way.

In this case, [premises] are the assumptions we made in determining the theoretical pendulum period; things like "the string length doesn't change", "the pivot point doesn't move", "gravity is constant", "the pendulum does not undergo any collisions", etc. The fact that (e.g.) the pivot point moved during the experiment invalidates the premises, and therefore the experiment does not give any Bayesian evidence one way or another against our theory.

Then the students could say:

"But you didn't tell us that the pivot point couldn't move when we were doing the derivation! You could just be making up new "necessary premises" for your theory every time it gets falsified!"

In which case I'm not 100% sure what I'd say. Obviously we could have listed out more assumptions that we did, but where do you stop? "the universe will not explode during the experiment"...?

Why I take short timelines seriously

Insub2y32

By "reliable" I mean it in the same way as we think of it for self-driving cars. A self-driving car that is great 99% of the time and fatally crashes 1% of the time isn't really "high skill and unreliable" - part of having "skill" in driving is being reliable.

In the same way, I'm not sure I would want to employ an AI software engineer that 99% of the time was great, but 1% of the time had totally weird inexplicable failure modes that you'd never see with a human. It would just be stressful to supervise, to limit its potential harmful impact to the company, etc. So it seems to me that AI's won't be given control of lots of things, and therefore won't be transformative, until that reliability threshold is met.

Why I take short timelines seriously

Insub2y63

Two possibilities have most of the "no agi in 10 years" probability mass for me:

The next gen of AI really starts to scare people, regulation takes off, and AI goes the way of nuclear reactors
Transformer style AI goes the way of self driving cars and turns out to be really hard to get from 99% reliable to the necessary 99.9999% that you need for actual productive work

Saying the quiet part out loud: trading off x-risk for personal immortality

Insub2y25

Well sure, but the interesting question is the minimum value of P at which you'd still push

Saying the quiet part out loud: trading off x-risk for personal immortality

Insub2y60

I also agree with the statement. I'm guessing most people who haven't been sold on longtermism would too.

When people say things like "even a 1% chance of existential risk is unacceptable", they are clearly valuing the long term future of humanity a lot more than they are valuing the individual people alive right now (assuming that the 99% in that scenario above is AGI going well & bringing huge benefits).

Related question: You can push a button that will, with probability P, cure aging and make all current humans immortal. But with probability 1-P, all humans die. How high does P have to be before you push? I suspect that answers to this question are highly correlated with AI caution/accelerationsim

My AI Predictions 2023 - 2026

Insub2y10

Not sure I understand; if model runs generate value for the creator company, surely they'd also create value that lots of customers would be willing to pay for. If every model run generates value, and there's ability to scale, then why not maximize revenue by maximizing the number of people using the model? The creator company can just charge the customers, no? Sure, competitors can use it too, but does that really override losing an enormous market of customers?

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments