Wiki Contributions

Comments

It took me a good while reading this to figure out whether it was a deconstruction of tabooing words. I would have felt less so if the post didn't keep replacing terms with ones that are both no less charged and also no more descriptive of the underlying system, and then start drawing conclusions from the resulting terms' aesthetics.

With regards to Yudkowsky's takes, the key thing to keep in mind is that Yudkowsky started down his path by reasoning backwards from properties ASI would have, not from reasoning forward from a particular implementation strategy. The key reason to be concerned that outer optimization doesn't define inner optimization isn't a specific hypothesis about whether some specific strategy with neural networks will have inner optimizers, it's because ASI will by necessity involve active optimization on things, and we want our alignment techniques to have at least any reason to work in that regime at all.

There is no ‘the final token’ for weights not at the final layer.

Because that is where all the gradients flow from, and why the dog wags the tail.

Aggregations of things need not be of the same kind as their constituent things? This is a lot like calling an LLM an activation optimizer. While strictly in some sense true of the pieces that make up the training regime, it's also kind of a wild way to talk about things in the context of ascribing motivation to the resulting network.

I think maybe you're intending ‘next token prediction’ to mean something more like ‘represents the data distribution, as opposed to some metric on the output’, but if you are this seems like a rather unclear way of stating it.

You're at token i in a non-final layer. Which token's output are you optimizing for? i+1?

By construction a decoder-only transformer is agnostic over what future token it should be informative to within the context limit, except in the sense that it doesn't need to represent detail that will be more cheaply available from future tokens.

As a transformer is also unrolled in the context dimension, the architecture itself is effectively required to be generic both in what information it gathers and where that information is used. Bias towards next token prediction is not so much a consequence of reward in isolation, but of competitive advantage: at position i, the network has an advantage in predicting i+1 over the network at previous locations by having more recent tokens, and an advantage over the network at future tokens by virtue of still needing to predict token i+1. However, if a token is more predictive of some abstract future token than the next token precisely, say it's a name that might be referenced later, one would expect the dominant learnt effect to be non-myopically optimizing for later use in some timestamp-invariant way.

If they appear to care about predicting future tokens, (which they do because they are not myopic and they are imitating agents who do care about future states which will be encoded into future tokens), it is solely as a way to improve the next-token prediction.

I think you're just fundamentally misunderstanding the backwards pass in an autoregressive transformer here. Only a very tiny portion of the model is exclusively trained on next token prediction. Most of the model is trained on what might be called instead, say, conditioned future informativity.

I greatly appreciate the effort in this reply, but I think it's increasingly unclear to me how to make efficient progress on our disagreements, so I'm going to hop.

If you say “Indeed it's provable that you can't have a faster algorithm than those O(n^3) and O(n^4) approximations which cover all relevant edge cases accurately” I am quite likely to go on a digression where I try to figure out what proof you're pointing at and why you think it's a fundamental barrier, and it seems now that per a couple of your comments you don't believe it's a fundamental barrier, but at the same time it doesn't feel like any position has been moved, so I'm left rather foggy about where progress has been made.

I think it's very useful that you say

I'm not saying that AI can't develop useful heuristic approximations for the simulation of gemstone-based nano-mechanical machinery operating in ultra-high vacuum. I'm saying that it can't do so as a one-shot inference without any new experimental work

since this seems like a narrower place to scope our conversation. I read this to mean:

  1. You don't know of any in principle barrier to solving this problem,
  2. You believe the solution is underconstrained by available evidence.

I find the second point hard to believe, and don't really see anywhere you have evidenced it.

As a maybe-relevant aside to that, wrt.

You're saying that AI could take the garbage and by mere application of thought turn it into something useful. That's not in line with the actual history of the development of useful AI outputs.

I think you're talking of ‘mere application of thought’ like it's not the distinguishing feature humanity has. I don't care what's ‘in line with the actual history’ of AI, I care what a literal superintelligence could do, and this includes a bunch of possibilities like:

  • Making inhumanly close observation of all existing data,
  • Noticing new, inhumanly-complex regularities in said data,
  • Proving new simplifying regularities from theory,
  • Inventing new algorithms for heuristic simulation,
  • Finding restricted domains where easier regularities hold,
  • Bifurcating problem space and operating over each plausible set,
  • Sending an interesting email to a research lab to get choice high-ROI data.

We can ignore the last one for this conversation. I still don't understand why the others are deemed unreasonable ways of making progress on this task.

I appreciated the comments on time complexity but am skipping it because I don't expect at this point that it lies at the crux.

Thanks, I appreciate the attempt to clarify. I do though think there's some fundamental disagreement about what we're arguing over here that's making it less productive than it could be. For example,

The fact that this has been an extremely active area of research for over 80 years with massive real-world implications, and we're no closer to finding such a simplified heuristic.

I think both:

  1. Lack of human progress doesn't necessarily mean the problem is intrinsically unsolvable by advanced AI. Humans often take a bunch of time before proving things.
  2. It seems not at all the case that algorithmic progress isn't happening, so it's hardly a given that we're no closer to a solution unless you first circularly assume that there's no solution to arrive at.

If you're starting out with an argument that we're not there yet, this makes me think more that there's some fundamental disagreement about how we should reason about ASI, more than your belief being backed by a justification that would be convincing to me had only I succeeded at eliciting it. Claiming that a thing is hard is at most a reason not to rule out that it's impossible. It's not a reason on its own to believe that it is impossible.

With regard to complexity,

  • I failed to understand the specific difference with protein folding. Protein folding is NP-hard, which is significantly harder than O(n³).
  • I failed to find the source for the claim that O(n³) or O(n⁴) are optimal. Actually I'm pretty confused how this is even a likely concept; surely if the O(n³) algorithm is widely useful then the O(n⁴) proof can't be that strong of a bound on practical usefulness? So why is this not true of the O(n³) proof as well?

It's maybe true that protein folding is easier to computationally verify solutions to, but first, can you prove this, and second, on what basis are you claiming that existing knowledge is necessarily insufficient to develop better heuristics than the ones we already have? The claim doesn't seem to complete to me.

It's magical thinking to assume that an AI will just one-shot this into existence.

Please note that I've not been making the claim that ASI could necessarily solve this problem. I have been making the claim that the arguments in this post don't usefully support the claim that it can't. It is true that largely on priors I expect it should be able to, but my priors also aren't particularly useful ones to this debate and I have tried to avoid making comments that are dependent on them.

And what reason do you have for thinking it can't be usefully approximated in some sufficiently productive domain, that wouldn't also invalidly apply to protein folding? I think it's not useful to just restate that there exist reasons you know of, I'm aiming to actually elicit those arguments here.

Given Claude is not particularly censored in this regard (in the sense of refusing to discuss the subject), I expect the jailbreak here to only serve as priming.

Well yes, nobody thinks that existing techniques suffice to build de-novo self-replicating nano machines, but that means it's not very informative to comment on the fallibility of this or that package or the time complexity of some currently known best approach without grounding in the necessity of that approach.

One has to argue instead based on the fundamental underlying shape of the problem, and saying accurate simulation is O(n⁷) is not particularly more informative to that than saying accurate protein folding is NP. I think if the claim is that you can't make directionally informative predictions via simulation for things meaningfully larger than helium then one is taking the argument beyond where it can be validly applied. If the claim is not that, it would be good to hear it clearly stated.

Load More