What would you suggest to someone who plain doesn't like to do things with their body?
Maximize gains per unit of subjective effort! Turns out you can get a ton of benefit with very little time expended- like going from 'nigh-bedridden arthritic old lady' to 'able to do deadlifts' with 2 sets a day.
Strength training with progressive overload is probably the best for this kind of effort optimization. You won't be running any marathons with this strategy, but you might find after a year that going up steps no longer hurts your knees, and that it's been a whil...
As simulation complexity grows, it seems likely that these last steps would require powerful general intelligence/GPS as well. And at that point, it's entirely unclear what mesa-objectives/values/shards it would develop.
On one hand, I fully agree that a strong predictor is going develop some very strong internal modeling that could reasonably be considered superhuman in some ways even now.
But I think there's an unstated background assumption sneaking into most discussions about mesaoptimizers- that goal oriented agency (even with merely shard-like motivati...
Is this the first time that the word "Boltzmann" has been used to describe contemporary/near future ML? If not, how frequently has the word "boltzmann" been used in this way?
Not sure- I haven't seen it used before in this way, at least.
Also, I know this question might be a bit of a curve ball, but what pros and cons can you think of for using the word "boltzmann"?
Most lesswrong readers have probably encountered the concept of Boltzmann brains and can quickly map some of its properties over to other ideas, but I'd be surprised if "Boltzmann brain" wou...
I'll start with a pretty uncontroversial example that's neither RLHF nor conditioning but tries to point at a shared intuition; two different models:
1. LLM fine tuned with RL, where reward comes from some kind of activation-reading truth probes.
2. LLM that trains on the output of the first model to the point where it ~perfectly matches its final output, but does not undergo any additional fine tuning.
Despite having identical final outputs, I would expect the first model to have higher probe-reported truthiness because it was optimized against that metric.
W...
Agreed, though I do find framing them as a warped predictor helpful in some cases. In principle, the deviation from the original unbiased prediction over all inputs should include within it all agentic behaviors, and there might exist some way that you could extract goals from that bias vector. (I don't have anything super concrete here and I'm not super optimistic that this framing gives you anything extra compared to other interpretability mechanisms, but it's something I've thought about poking.)
I mean a model "fights" you if the model itself has goals and those goals are at odds with yours. In this context, a model cannot "fight" you if it does not have goals. It can still output things which are bad for you, like an agentic simulacrum that does fight you.
I suspect effective interventions are easier to find when dealing with a goal agnostic model simulating a potentially dangerous agent, compared to a goal-oriented model that is the potentially dangerous agent.
One consequence downstream of this that seems important to me in the limit:
I think having that one extra layer of buffer provided by 2 is actually very valuable. A goal agnostic model (absent strong gradient hacking) seems more amenable to honest and authentic intermediate reporting and to direct mechanistic interpretation.
The "private knowledge space" model does seem a lot more practical than my original post for the purposes of maintaining a community without the same burden on coordinators.
Some questions I think about when it comes to this kind of thing (not directed just at you, but also myself and anyone else!):
True! I just think the specific system I proposed required:
The people who could serve as the higher level organizers are few and are typically doing other stuff, and a poll of a dozen people coming back with zero enthusiastic takers makes 2 seem iffy. Default expectation is that the sy...
I think this is an unusually valuable post, I wish I had seen it earlier, and I want to throw more eyeballs at it.
The convergent/nonconvergent/nonstationary distinction cleans up the background behind some things I was puzzling over, and is much more concise than the vague gesturing I was doing.
(I considered not using tortured wordplay in the title of this post, but I failed my will save.)
That's an important nuance my description left out, thanks. Anything the gradients can reach can be bent to what those gradients serve, so a local token stream's transformation efforts can indeed be computationally split, even if the output should remain unbiased in expectation.
Solid advice! But forgive me, I'm gonna jump on something basically unrelated to the rest of the post:
For some reason, I need to sleep 10:30 to 12:00 hours every day or I will be tired.
Yikes! I'm not a doctor and I don't intend to pry, but if you weren't already aware, that's pretty deep into probable-pathology territory. I was doing that sort of thing before figuring out mitigations for my sleep disorder. I didn't quite appreciate how unusual my sleep issues were until very late; I could have saved myself a couple of decades of intense discomfort if I had.
While there is a limit to the current text datasets, and expanding that with high quality human-generated text would be expensive, I'm afraid that's not going to be a blocker.
Multimodal training already completely bypasses text-only limitations. Beyond just extracting text tokens from youtube, the video/audio itself could be used as training data. The informational richness relative to text seems to be very high.
Further, as gato demonstrates, there's nothing stopping one model from spanning hundreds of distinct tasks, and many of those tasks can come from ...
In fact, although the *output* tokens are myopic, autoregressive transformers are incentivised to compute activations at early sequence positions that will make them better at predicting tokens at later positions. This may also have indirect impacts on the actual tokens output at the early positions, although my guess would be this isn't a huge effect.
(I found myself writing notes down to clarify my own thoughts about parts of this, so this is in large part talking to myself that got commentified, not quite a direct reply)
It's true that gradients can flow ...
I think this is worth expanding on- in practice, I've found the strongest method for avoiding the "oh no my great idea is not working out but I'm stuck in it" trap is to have other promising options just waiting for you to poke them.
Instead of feeling trapped and entering a cycle of motivation-killing burnout, a dying idea starts feeling just... kind of boring, and you naturally want to do the other more interesting thing. You don't even have to try, you just find yourself thinking about it in the...
Thanks for doing this research! The paper was one of those rare brow-raisers. I had suspected there was a way to do something like this, but I was significantly off in my estimation of its accessibility.
While I've still got major concerns about being able to do something like this on a strong and potentially adversarial model, it does seem like a good existence proof for any model that isn't actively fighting back (like simulators or any other goal agnostic architecture). It's a sufficiently strong example that it actually forced my doomchances down a bit, so yay!
My not-very-deep understanding is that phytosterols (plant sterols) are a bit iffy: most people don't absorb much from dietary phytosterols and so it doesn't end up doing anything, but the few people with genetic mutations that cause phytosterol hyperabsorption usually suffer worse health outcomes as a result. Is my understanding wrong, and is there some other benefit to seeking out supplemental phytosterols?
Edit: To be clear, there is research showing a measured reduction in cholesterol from phytosterol supplementation, but I'm a bit confused about how th...
I'm not familiar with how these things usually work, and I suspect other lurkers might be in the same boat, so:
It seems that we have independently converged on many of the same ideas. Writing is very hard for me and one of my greatest desires is to be scooped, which you've done with impressive coverage here, so thank you.
Thanks for writing the simulators post! That crystallized a lot of things I had been bouncing around.
...A decision transformer conditioned on an outcome should still predict a probability distribution, and generate trajectories that are typical for the training distribution given the outcome occurs, which is not necessarily the sequence of actions tha
If by intelligence spectrum you mean variations in capability across different generally intelligent minds, such that there can be minds that are dramatically more capable (and thus more dangerous): yes, it's pretty important.
If it were impossible to make an AI more capable than the most capable human no matter what software or hardware architectures we used, and no matter how much hardware we threw at it, AI risk would be far less concerning.
But it really seems like AI can be smarter than humans. Narrow AIs (like MuZero) already outperform all humans at s...
Seconded. I don't have a great solution for this, but this remains a coordination hole that I'd really like to see filled.
Yup. I'd liken it to the surreality of a bad dream where something irrevocable happens, except there's no waking up.
If you're reading this porby, do you really want to be wrong?
hello this is porby, yes
This made me pace back and forth for about 30 minutes, trying to put words on exactly why I felt an adrenaline spike reading that bit.
I don't think your interpretation of my words (or words similar to mine) is unique, so I decided to write something a bit longer in response.
I went back and forth on whether I should include that bit for exactly that reason. Knowing something is possible is half the battle and such. I ended up settling on a rough rule for whether I could include something:
Something like "single token prediction runs in constant time" falls into 1, while this fell in 2. There ...
Hmm. Apparently you meant something a little more extreme than I first thought. It kind of sounds like you think the content of my post is hazardous.
I see this particular kind of prediction as a kind of ethical posturing and can't in good conscience let people make them without some kind of accountability.
Not sure what you mean by ethical posturing here. It's generally useful for people to put their reasoning and thoughts out in public so that other people can take from the reasoning what they find valuable, and making a bunch of predictions ahead of time ...
As a reasonably active tall person, allow me to try to mitigate some of your sadness!
I suspect some people like me who eat time-optimized food do so because they have to eat a lot of food. I can eat 2000 calories worth of time efficient, nutrient dense food, and still go eat a big meal of conventionally tasty food with other people without blowing my calorie budget. Or I can eat breakfast, and then immediately leave to go eat re-breakfast because by the time I get there I'll be hungry again.
Trying to eat my entire calorie budget in more traditional ways would effectively mean I'm never doing anything but eating. I did that for a while, but it becomes a real chore.
I'm a bit surprised mealsquares haven't been mentioned yet! I've been eating 3-4 a day for years. Modal breakfast is a mealsquare with a milk and whey mix.
Glycemic index isn't zero, but it's solid food. Good sweetspot of not ultrabland, but also not strong enough that I would get sick of it.
(Would recommend microwaving. My typical preparation is wetting one a little with some water, sticking it in a bowl, lightly covering with a paper towel to avoid the consequences of occasional choco-volcanism, and microwaving at 50% for 1.3 minutes.)
May the forces of the cosmos intervene to make me look silly.
I have no clue how that works in a stable manner, but I don't think that current architectures can learn this even if you scale them up.
I definitely agree with this if "stable" also implies "the thing we actually want."
I would worry that the System 1->System 2 push is a low level convergent property across a wide range of possible architectures that have something like goals. Even as the optimization target diverges from what we're really trying to make it learn, I could see it still picking up more deliberate thought just because it helps for so many d...
[I also just got funded (FTX) to work on this for realsies 😸🙀 ]
Congratulations and welcome :D
A mentor could look whenever they want, and comment only on whatever they want to. wdyt?
Sounds reasonable- I'm not actually all that familiar with Slack features, but if it's a pure sequential chatlog, there may be some value in using something that has a more forum-y layout with threaded topics. I've considered using github for this purpose since it's got a bunch of collaboration stuff combined with free private repos and permissions management.
Still don't know ...
While I'd agree there's something like System 2 that isn't yet well captured consistently in AI, and that a breakthrough that dramatically increases an AI's performance in that way would be a big boost to its capabilities, I'm concerned that there is no deep difference in process between System 1 and System 2.
For example, System 2 appears to be built out of System 1 steps. The kinds of things we can accomplish through System 2 still bottom out in smaller chunks of quick intuition. Orchestrating all those steps requires further steps especially as we juggle...
I'm curious what Googetasoft is?
The unholy spiritual merger of Google, Meta, Microsoft, and all the other large organizations pushing capabilities.
I guess I don't understand how scaling up or tweaking the current approach will lead AI's that are uncontrollable or "run away" from us? I'm actually rather skeptical of this.
It's possible that the current approach (that is, token predicting large language models using transformers like we use them now) won't go somewhere potentially dangerous, because they won't be capable enough. It's hard to make...
Provided your work stays within the boundary of safe stuff, or stuff that is already very well known, asking around in public should be fine.
If you're working with questionable stuff that isn't well known, that does get trickier. One strategy is to just... not work on that kind of thing. I've dropped a few research avenues for exactly that reason.
Other than that, getting to know people in the field or otherwise establishing some kind of working relationship could be useful. More organized versions of this could look like Refine, AI Safety Camp, SERI MATS, ...
Many potential technological breakthroughs can have this property and in this post it feels as if AGI is being reduced to some sort of potentially dangerous and uncontrollable software virus.
The wording may have understated my concern. The level of capability I'm talking about is "if this gets misused, or if it is the kind of thing that goes badly even if not misused, everyone dies."
No other technological advancement has had this property to this degree. To phrase it in another way, let's describe technological leverage as the amount of change&...
Great post! I think this captures a lot of why I'm not ultradoomy (only, er, 45%-ish doomy, at the moment), especially A and B. I think it's at least possible that our reality is on easymode, where muddling could conceivably put an AI into close enough territory to not trigger an oops.
I'd be even less doomy if I agreed with the counterarguments in C. Unfortunately, I can't shake the suspicion that superintelligence is the kind of ridiculously powerful lever that would magnify small oopses into the largest possible oopses.
Hypothetically, if we took a clever...
Thanks!
My understanding is that a true quantum computer would be a (mostly) reversible computer as well, by virtue of quantum circuits being reversible. Measurements aren't (apparently) reversible, but they are deferrable. Do you mean something like... in practice, quantum computers will be narrowly reversible, but closer to classical computers as a system because they're forced into many irreversible intermediate steps?
Now I have a fairly low probability for superconduction/reversible/quantum computers this century, like on the order of 2-3%.
Could you elaborate on this? I'm pretty surprised by an estimate that low conditioned on ~normalcy/survival, but I'm no expert.
Most of it is the latter, but to be clear, I do not have inside information about what any large organization is doing privately, nor have I seen an "oh no we're doomed" proof of concept. Just some very obvious "yup that'll work" stuff. I expect adjacent things to be published at some point soonishly just because the ideas are so simple and easily found/implemented independently. Someone might have already and I'm just not aware of it. I just don't want to be the one to oops and push on the wrong side of the capability-safety balance.
A constant time architecture failing to divide arbitrary integers in one step isn't surprising at all. The surprising part is being able to do all the other things with the same architecture. Those other things are apparently computationally simple.
Even with the benefit of hindsight, I don't look back to my 2015 self and think, "how silly I was being! Of course this was possible!"
2015-me couldn't just look at humans and conclude that constant time algorithms would include a large chunk of human intuition or reasoning. It's true that humans tend to suck at ...
I think I'm understanding where you're coming from a bit more now, thanks. So, when I wrote:
The H100, taken as a whole, is on the order of a million times away from the Landauer limit at its operating temperature.
My intended meaning in context was "taking the asspull as an assumption, the abstract computational thing an H100 is doing that is relevant to ML (without caring about the hardware used to accomplish it, and implicitly assuming a move to more ML-optimized architectures) is very roughly 6 OOMs off the absolute lower bound, while granting that the l...
Scanning through your other post, I don't think we disagree on the physics regarding ML-relevant compute. It is a quick and simplistic analysis, yes- my intent there was really just to say "hardware bottlenecks sure don't look like they're going to arrive soon enough to matter, given the rest of this stuff." The exact amount of headroom we have left and everything that goes into that estimation just didn't seem worth including given the length and low impact. (I would have chosen differently if those details changed the conclusion of the section.)
I am curi...
I'd agree that equivalently rapid progress in something like deep reinforcement learning would be dramatically more concerning. If we were already getting such high quality results while constructing a gradient out of noisy samples of a sparse reward function, I'd have to shorten my timelines even more. RL does tend to more directly imply agency, and it would also hurt my estimates on the alignment side of things in the absence of some very hard work (e.g. implemented with IB-derived proof of 'regret bound is alignment' or somesuch).
I also agree that token...
Yes, unfortunately there are indeed quite a few groups interested in it.
There are reasons why they haven't succeeded historically, and those reasons are getting much weaker over time. It should suffice to say that I'm not optimistic about our odds on avoiding this type of threat over the next 30 years (conditioned on no other gameboard flip).
MATH is a dataset of problems from high school competitions, which are well known to require a very limited set of math knowledge and be solveable by applying simple algorithms.
I think you may underestimate the difficulty of the MATH dataset. It's not IMO-level, obviously, but from the original paper:
...We also evaluated humans on MATH, and found that a computer science PhD student who does not especially like mathematics attained approximately 40% on MATH, while a three-time IMO gold medalist attained 90%, indicating that MATH can be challenging for hu
Like, it shouldn't be surprising that the LM can solve problems in text which are notoriously based around applying a short step by step algorithm, when it has many examples in the training set.
I'm not clear on why it wouldn't be surprising. The MATH dataset is not easy stuff for most humans. Yes, it's clear that the algorithm used in the cases where the language models succeeds must fit in constant time and so must be (in a computational sense) simple, but it's still outperforming a good chunk of humans. I can't ignore how odd that is. Perhaps human reaso...
Hopefully we do actually live in that reality!
I'm pretty sure the GPT confabulation is (at least in part) caused by highly uncertain probability distribution collapse, where the uncertainty in the distribution is induced by the computational limits of the model.
Basically the model is asked to solve a problem it simply can't (like, say, general case multiplication in one step), and no matter how many training iterations and training examples are run, it can't actually learn to calculate the correct answer. The result is a relatively even distribution over t...
(Jay's interpretation was indeed my intent.)
Empirically, I don't think it's true that you'd need to rely on superhuman intelligence. The latest paper from the totally anonymous and definitely not google team suggests PaL- I mean an anonymous 540B parameter model- was good enough to critique itself into better performance. Bootstrapping to some degree is apparently possible.
I don't think this specific instance of the technique is enough by itself to get to spookyland, but it's evidence that token bottlenecks aren't going to be much of a concern in the near ...
I suspect this is one of those universal human experiences that isn't.
My best mental outcome after exercise is "no change," and if I push myself too far, I can pretty much ruin myself for 2 days. And sometimes end up on the ground, unable to move, barely staying conscious due to something that looks an awful lot like hypoglycemia.
I do still exercise- I have to, because the alternative is worse- but I've had to come up with less invasive training routines to compensate. Mostly spreading them over the day, and over the week, never doing too much at any one t... (read more)