All of porby's Comments + Replies

I suspect this is one of those universal human experiences that isn't.

My best mental outcome after exercise is "no change," and if I push myself too far, I can pretty much ruin myself for 2 days. And sometimes end up on the ground, unable to move, barely staying conscious due to something that looks an awful lot like hypoglycemia.

I do still exercise- I have to, because the alternative is worse- but I've had to come up with less invasive training routines to compensate. Mostly spreading them over the day, and over the week, never doing too much at any one t... (read more)

2Nanda Ale3d
Lately I also have changed to very long "zone 2" cardio. Because of specific joint and back problems, some injuries, some congenital. But the exertion itself still feels good mentally if I seperate it from my aching body. Luckily zone 2 still works for mental effects, it just takes hours to have the same effect. Basically you only exert yourself below the threshold where your body would start building up lactic acid. So if you feel muscle soreness the next day, you're pushing too hard. Unless you live in a lab you have to use proxies and trial and error to estimate where zone 2 is. Usually people say something like, "You should still be able to have a good conversation at this effort level." The time is annoying but my Netflix addiction has never felt so useful. 

What would you suggest to someone who plain doesn't like to do things with their body?

Maximize gains per unit of subjective effort! Turns out you can get a ton of benefit with very little time expended- like going from 'nigh-bedridden arthritic old lady' to 'able to do deadlifts' with 2 sets a day.

Strength training with progressive overload is probably the best for this kind of effort optimization. You won't be running any marathons with this strategy, but you might find after a year that going up steps no longer hurts your knees, and that it's been a whil... (read more)

As simulation complexity grows, it seems likely that these last steps would require powerful general intelligence/GPS as well. And at that point, it's entirely unclear what mesa-objectives/values/shards it would develop.

On one hand, I fully agree that a strong predictor is going develop some very strong internal modeling that could reasonably be considered superhuman in some ways even now.

But I think there's an unstated background assumption sneaking into most discussions about mesaoptimizers- that goal oriented agency (even with merely shard-like motivati... (read more)

Is this the first time that the word "Boltzmann" has been used to describe contemporary/near future ML? If not, how frequently has the word "boltzmann" been used in this way? 

Not sure- I haven't seen it used before in this way, at least.

Also, I know this question might be a bit of a curve ball, but what pros and cons can you think of for using the word "boltzmann"?

Most lesswrong readers have probably encountered the concept of Boltzmann brains and can quickly map some of its properties over to other ideas, but I'd be surprised if "Boltzmann brain" wou... (read more)

4the gears to ascenscion5d
It's probably fine-ish to allocate another reference to the concept, though I personally might suggest expanding it all the way out to "boltzmann brain mesaoptimizer". Are you familiar with restricted boltzmann machines? I think Hinton has described them as the other branch besides backprop that actually works, though I'm not finding the citation for that claim right now. In any case, they're a major thread in machine learning research, and are what machine learning researchers will think of first. That said, boltzmann brains have a wikipedia page which does not mention lesswrong; I don't think they're a lesswrong-specific concept in any way.

I'll start with a pretty uncontroversial example that's neither RLHF nor conditioning but tries to point at a shared intuition; two different models:
1. LLM fine tuned with RL, where reward comes from some kind of activation-reading truth probes.
2. LLM that trains on the output of the first model to the point where it ~perfectly matches its final output, but does not undergo any additional fine tuning.

Despite having identical final outputs, I would expect the first model to have higher probe-reported truthiness because it was optimized against that metric.

W... (read more)

Agreed, though I do find framing them as a warped predictor helpful in some cases. In principle, the deviation from the original unbiased prediction over all inputs should include within it all agentic behaviors, and there might exist some way that you could extract goals from that bias vector. (I don't have anything super concrete here and I'm not super optimistic that this framing gives you anything extra compared to other interpretability mechanisms, but it's something I've thought about poking.)

I mean a model "fights" you if the model itself has goals and those goals are at odds with yours. In this context, a model cannot "fight" you if it does not have goals. It can still output things which are bad for you, like an agentic simulacrum that does fight you.

I suspect effective interventions are easier to find when dealing with a goal agnostic model simulating a potentially dangerous agent, compared to a goal-oriented model that is the potentially dangerous agent.

4paulfchristiano11d
In both cases the model produces actions that are expected to have certain kinds of effects. Could you spell out what kind of "fighting" happens, or what kind of "intervention" is possible when you are merely conditioning your model and not fine-tuning it? I haven't engaged much with this kind of thinking on LW or the broader safety community, but right now I don't really get it and it feels like anthropomorphizing or magical thinking.

One consequence downstream of this that seems important to me in the limit:

  1. Nonconditioning fine-tuned predictor models make biased predictions. If those biases happen to take the form of a misaligned agent, the model itself is fighting you.
  2. Conditioned predictor models make unbiased predictions. The conditioned sequence could still represent a misaligned agent, but the model itself is not fighting you.

I think having that one extra layer of buffer provided by 2 is actually very valuable. A goal agnostic model (absent strong gradient hacking) seems more amenable to honest and authentic intermediate reporting and to direct mechanistic interpretation.

4cubefox11d
Just a note here: I would not interpret fine-tuned GPTs as still "predicting" tokens. Base models predict tokens by computing a probability distribution conditional on the prompt, but for fine-tuned models this distribution no longer represents probabilities, but some "goodness" relative to the fine-tuning, how good the continuation is. Tokens with higher numbers are then not necessarily more probable continuations of the prompt (though next token probability may also play a role) but overall "better" in some opaque way. We hope that what the model thinks is a better token for the continuation of the prompt corresponds to the goals of being helpful, harmless and honest (to use the Anthropic terminology), but whether the model has really learned those goals, or merely something which looks similar, is ultimately unknown. So RLHF (and equally supervised fine-tuning) also leads to a lack of interpretability. It is unknown what exactly an instruction model like ChatGPT or text-davinci-003 optimizes for. In contrast to this, we know pretty exactly what a base model optimized for: Next token prediction.
2Evan R. Murphy11d
What do you mean when you say the model is or is not "fighting you"?

The "private knowledge space" model does seem a lot more practical than my original post for the purposes of maintaining a community without the same burden on coordinators.

Some questions I think about when it comes to this kind of thing (not directed just at you, but also myself and anyone else!):

  1. How is access to the space managed? What degree of due diligence is involved? (e.g. punt to "passed LTFF's due diligence" or do its own to avoid correlating failures? Rely on personal connections and references? "Wrote some good lesswrong posts?")
  2. What are the con
... (read more)

True! I just think the specific system I proposed required:

  1.  significant time investments on the part of organizers (requiring intrinsic interest or funding for individuals with the requisite knowledge and trustworthiness)
  2. a critical mass of users (requiring that a nontrivial fraction of people would find some value in the system)

The people who could serve as the higher level organizers are few and are typically doing other stuff, and a poll of a dozen people coming back with zero enthusiastic takers makes 2 seem iffy. Default expectation is that the sy... (read more)

1Roman Leventov13d
The system that I proposed is simpler, it doesn't have fine grained and selective access, and therefore continuous efforts on the part of some people for "connecting the dots". It's just a single space, basically like the internal Notion + Slack space + Google Drive of the AI safety lab that would lead this project. On this space, people can share research, ideas, have "mildly" infohazardous discussions such as regarding the pros and cons of different approaches to building AGI. I cannot imagine that system would end up unused. At least three people (you, me, and another person) felt as much frustration as to commit time to write on LW about this problem. All three these posts were well-received with comments like "yes, I agree this is a problem". Another AI safety researcher said to me in private communication he feels this problem, too. So, I suspect a large fraction of all AI safety researchers stumble into capability ideas regularly now and spend significant portion of their mental cycles trying to manage this and still publish something in public. As Nate Soares wrote in his post from 2018 where he announced nondisclosure-by-default strategy, "researchers shouldn't have walls inside their minds".

I think this is an unusually valuable post, I wish I had seen it earlier, and I want to throw more eyeballs at it.

The convergent/nonconvergent/nonstationary distinction cleans up the background behind some things I was puzzling over, and is much more concise than the vague gesturing I was doing.

(I considered not using tortured wordplay in the title of this post, but I failed my will save.)

That's an important nuance my description left out, thanks. Anything the gradients can reach can be bent to what those gradients serve, so a local token stream's transformation efforts can indeed be computationally split, even if the output should remain unbiased in expectation.

Solid advice! But forgive me, I'm gonna jump on something basically unrelated to the rest of the post:

For some reason, I need to sleep 10:30 to 12:00 hours every day or I will be tired.

Yikes! I'm not a doctor and I don't intend to pry, but if you weren't already aware, that's pretty deep into probable-pathology territory. I was doing that sort of thing before figuring out mitigations for my sleep disorder. I didn't quite appreciate how unusual my sleep issues were until very late; I could have saved myself a couple of decades of intense discomfort if I had.

While there is a limit to the current text datasets, and expanding that with high quality human-generated text would be expensive, I'm afraid that's not going to be a blocker.

Multimodal training already completely bypasses text-only limitations. Beyond just extracting text tokens from youtube, the video/audio itself could be used as training data. The informational richness relative to text seems to be very high.

Further, as gato demonstrates, there's nothing stopping one model from spanning hundreds of distinct tasks, and many of those tasks can come from ... (read more)

3Antb1mo
Very insightful, thanks for the clarification, as dooming as it is.

In fact, although the *output* tokens are myopic, autoregressive transformers are incentivised to compute activations at early sequence positions that will make them better at predicting tokens at later positions. This may also have indirect impacts on the actual tokens output at the early positions, although my guess would be this isn't a huge effect.

(I found myself writing notes down to clarify my own thoughts about parts of this, so this is in large part talking to myself that got commentified, not quite a direct reply)

It's true that gradients can flow ... (read more)

2Adam Scherlis1mo
I agree with the myopic action vs. perception (thinking?) distinction, and that LMs have myopic action. I don't think it has to be in service of predicting the current token. It sometimes gives lower loss to make a halfhearted effort at predicting the current token, so that the model can spend more of its weights and compute on preparing for later tokens. The allocation of mental effort isn't myopic. As an example, induction heads make use of previous-token heads. The previous-token head isn't actually that useful for predicting the output at the current position; it mostly exists to prepare some handy activations so that induction head can look back from a later position and grab them. So LMs won't deliberately give bad predictions for the current token if they know a better prediction, but they aren't putting all of their effort into finding that better prediction.

It’s not the only thread I’m pulling on.

I think this is worth expanding on- in practice, I've found the strongest method for avoiding the "oh no my great idea is not working out but I'm stuck in it" trap is to have other promising options just waiting for you to poke them.

Instead of feeling trapped and entering a cycle of motivation-killing burnout, a dying idea starts feeling just... kind of boring, and you naturally want to do the other more interesting thing. You don't even have to try, you just find yourself thinking about it in the... (read more)

Thanks for doing this research! The paper was one of those rare brow-raisers. I had suspected there was a way to do something like this, but I was significantly off in my estimation of its accessibility.

While I've still got major concerns about being able to do something like this on a strong and potentially adversarial model, it does seem like a good existence proof for any model that isn't actively fighting back (like simulators or any other goal agnostic architecture). It's a sufficiently strong example that it actually forced my doomchances down a bit, so yay!

My not-very-deep understanding is that phytosterols (plant sterols) are a bit iffy: most people don't absorb much from dietary phytosterols and so it doesn't end up doing anything, but the few people with genetic mutations that cause phytosterol hyperabsorption usually suffer worse health outcomes as a result. Is my understanding wrong, and is there some other benefit to seeking out supplemental phytosterols?

Edit: To be clear, there is research showing a measured reduction in cholesterol from phytosterol supplementation, but I'm a bit confused about how th... (read more)

I'm not familiar with how these things usually work, and I suspect other lurkers might be in the same boat, so:

  1. What kind of lodging is included? Would attendees just have their own hotel rooms near the venue, or is this more of an 'immersion' thing where everyone's under one roof for a weekend?
  2. How are expenses handled? Are there prepaid services, or would attendees submit expenses after the fact for reimbursement?
  3. About how many people are expected (rough order of magnitude)?
4GradientDissenter2mo
1. Lodging is on-site. It's at a renovated former sorority house. Everyone should get their own bedroom, and if getting a private bathroom is a crux for doing we can make sure you get that too, though by default bathrooms will be shared. 2. You'd submit a reimbursement request for expenses. If that poses a challenge, we can work something else out. I expect the only significant expense you'd need reimbursement for to be travel -- we'll provide food and lodging. 3. I expect 20 people, but it wouldn't surprise me if it turned out to be significantly more or fewer. In general, it will be small.

It seems that we have independently converged on many of the same ideas. Writing is very hard for me and one of my greatest desires is to be scooped, which you've done with impressive coverage here, so thank you.

Thanks for writing the simulators post! That crystallized a lot of things I had been bouncing around.

A decision transformer conditioned on an outcome should still predict a probability distribution, and generate trajectories that are typical for the training distribution given the outcome occurs, which is not necessarily the sequence of actions tha

... (read more)

If by intelligence spectrum you mean variations in capability across different generally intelligent minds, such that there can be minds that are dramatically more capable (and thus more dangerous): yes, it's pretty important.

If it were impossible to make an AI more capable than the most capable human no matter what software or hardware architectures we used, and no matter how much hardware we threw at it, AI risk would be far less concerning.

But it really seems like AI can be smarter than humans. Narrow AIs (like MuZero) already outperform all humans at s... (read more)

Seconded. I don't have a great solution for this, but this remains a coordination hole that I'd really like to see filled.

Yup. I'd liken it to the surreality of a bad dream where something irrevocable happens, except there's no waking up.

-1Omid3mo
Don't worry, as soon as AGI goes live we'll all have a peaceful, eternal rest.

If you're reading this porby, do you really want to be wrong?

hello this is porby, yes

This made me pace back and forth for about 30 minutes, trying to put words on exactly why I felt an adrenaline spike reading that bit.

I don't think your interpretation of my words (or words similar to mine) is unique, so I decided to write something a bit longer in response.

I went back and forth on whether I should include that bit for exactly that reason. Knowing something is possible is half the battle and such. I ended up settling on a rough rule for whether I could include something:

  1. It is trivial, or
  2. it is already covered elsewhere, that coverage goes into more detail, and the audience of that coverage is vastly larger than my own post's reach.
  3. The more potentially dangerous an idea is, the stronger the requirements are.

Something like "single token prediction runs in constant time" falls into 1, while this fell in 2. There ... (read more)

Hmm. Apparently you meant something a little more extreme than I first thought. It kind of sounds like you think the content of my post is hazardous.

I see this particular kind of prediction as a kind of ethical posturing and can't in good conscience let people make them without some kind of accountability.

Not sure what you mean by ethical posturing here. It's generally useful for people to put their reasoning and thoughts out in public so that other people can take from the reasoning what they find valuable, and making a bunch of predictions ahead of time ... (read more)

-2Sen4mo
I did say I think making wrong predictions can be dangerous, but i would have told you explicitly to stop if I thought yours was particularly dangerous (moreso just a bit ridiculous, if I'm being honest). I think you should see the value in keeping a record of what people say, without equating it to anti-science mobbing.  Sure, you will be paid in respect and being taken seriously, because it wasn't a bet like you said. That's why I'm also not asking you to pay anything if you are wrong, you're not one of the surprisingly many people asking for millions to work on this problem. I don't expect them to pay anything either, but it would be nice. I'm not going to hold Nuremberg trials for AGI doomers or anything ridiculous like that. 

As a reasonably active tall person, allow me to try to mitigate some of your sadness!

I suspect some people like me who eat time-optimized food do so because they have to eat a lot of food. I can eat 2000 calories worth of time efficient, nutrient dense food, and still go eat a big meal of conventionally tasty food with other people without blowing my calorie budget. Or I can eat breakfast, and then immediately leave to go eat re-breakfast because by the time I get there I'll be hungry again.

Trying to eat my entire calorie budget in more traditional ways would effectively mean I'm never doing anything but eating. I did that for a while, but it becomes a real chore.

2Edward Pascal4mo
Thank you for this Data Point. I'm 6'1" and age 43 and still have these issues. I thought by now I would not need as much food, but it's still there. I'm still rail thin, and I can easily eat two breakfasts and elevensies before 1pm lunch. One thing I love is my instant pot. It can get me a porridge of maple syrup, buckwheat groats, sprouted brown rice, and nuts and dried fruit within 20 minutes by just dumping in ingredients. Yeah, it only lasts 90 minutes or so, but I have enough to eat it again in 90 minutes. Later, for lunch, I can combine some more with a 12" subway sandwich or something.

I'm a bit surprised mealsquares haven't been mentioned yet! I've been eating 3-4 a day for years. Modal breakfast is a mealsquare with a milk and whey mix.

Glycemic index isn't zero, but it's solid food. Good sweetspot of not ultrabland, but also not strong enough that I would get sick of it.

(Would recommend microwaving. My typical preparation is wetting one a little with some water, sticking it in a bowl, lightly covering with a paper towel to avoid the consequences of occasional choco-volcanism, and microwaving at 50% for 1.3 minutes.)

May the forces of the cosmos intervene to make me look silly.

I have no clue how that works in a stable manner, but I don't think that current architectures can learn this even if you scale them up.

I definitely agree with this if "stable" also implies "the thing we actually want."

I would worry that the System 1->System 2 push is a low level convergent property across a wide range of possible architectures that have something like goals. Even as the optimization target diverges from what we're really trying to make it learn, I could see it still picking up more deliberate thought just because it helps for so many d... (read more)

[I also just got funded (FTX) to work on this for realsies 😸🙀 ]

Congratulations and welcome :D

A mentor could look whenever they want, and comment only on whatever they want to. wdyt?

Sounds reasonable- I'm not actually all that familiar with Slack features, but if it's a pure sequential chatlog, there may be some value in using something that has a more forum-y layout with threaded topics. I've considered using github for this purpose since it's got a bunch of collaboration stuff combined with free private repos and permissions management.

Still don't know ... (read more)

While I'd agree there's something like System 2 that isn't yet well captured consistently in AI, and that a breakthrough that dramatically increases an AI's performance in that way would be a big boost to its capabilities, I'm concerned that there is no deep difference in process between System 1 and System 2.

For example, System 2 appears to be built out of System 1 steps. The kinds of things we can accomplish through System 2 still bottom out in smaller chunks of quick intuition. Orchestrating all those steps requires further steps especially as we juggle... (read more)

2Florian_Dietz4mo
I agree that System 2 is based on System 1 and there is probably no major architectural difference. To me it seems like the most important question is about how the system is trained. Human reasoning does not get trained with a direct input/output mapping most of the time. And when it does, we have to infer what that mapping should be on our own. Some part of our brain has to translate the spoken words "good job!" into a reward signal, and this mechanism in itself must have been learned at some point. So the process that trains the brain and applies the reward signal is in itself subject to training. I have no clue how that works in a stable manner, but I don't think that current architectures can learn this even if you scale them up. You say that as a joke, but it would cost us very little and it might actually work. I mean, it arguably does work for humanity: "There is a bearded man in the sky who is testing your morality and will punish you if you do anything wrong." Obviously this could also backfire tremendously if you are not very careful about it, but it still seems better than the alternative of doing nothing at all.

I'm curious what Googetasoft is?  

The unholy spiritual merger of Google, Meta, Microsoft, and all the other large organizations pushing capabilities.

I guess I don't understand how scaling up or tweaking the current approach will lead AI's that are uncontrollable or "run away" from us?  I'm actually rather skeptical of this.

It's possible that the current approach (that is, token predicting large language models using transformers like we use them now) won't go somewhere potentially dangerous, because they won't be capable enough. It's hard to make... (read more)

Provided your work stays within the boundary of safe stuff, or stuff that is already very well known, asking around in public should be fine.

If you're working with questionable stuff that isn't well known, that does get trickier. One strategy is to just... not work on that kind of thing. I've dropped a few research avenues for exactly that reason.

Other than that, getting to know people in the field or otherwise establishing some kind of working relationship could be useful. More organized versions of this could look like Refine, AI Safety Camp, SERI MATS, ... (read more)

2Yonatan Cale4mo
[I also just got funded (FTX) to work on this for realsies 😸🙀 ] I'm still in "learn the field" mode, I didn't pick any direction to dive into, but I am asking myself questions like "how would someone armed with a pretty strong AI take over the world?". Regarding commitment from the mentor: My current format is "live blogging" in a Slack channel. A mentor could look whenever they want, and comment only on whatever they want to. wdyt? (But I don't know who to add to such a channel which would also contain the potentially harmful ideas)

Many potential technological breakthroughs can have this property and in this post it feels as if AGI is being reduced to some sort of potentially dangerous and uncontrollable software virus.

The wording may have understated my concern. The level of capability I'm talking about is "if this gets misused, or if it is the kind of thing that goes badly even if not misused, everyone dies."

No other technological advancement has had this property to this degree. To phrase it in another way, let's describe technological leverage  as the amount of change&... (read more)

1exkn4mo
Interesting and useful concept, technological leverage. I'm curious what Googetasoft is?   OK I can see a strong AI algorithm being able to do many things we consider intelligence, and I can see how the technological leverage it would have in our increasingly digital / networked world would be far greater than many previous technologies.    This is the story of all new technological advancements, bigger benefits as well as bigger problems and dangers that need to be addressed or solved or else bigger bad things can happen.  There will be no end to these types of problems going forward if we are to continue to progress, and there is no guarantee we can solve them, but there is no law of physics saying we can't.   The efforts on this front are good, necessary, and should demand our attention, but I think this whole effort isn't really about AGI. I guess I don't understand how scaling up or tweaking the current approach will lead AI's that are uncontrollable or "run away" from us?  I'm actually rather skeptical of this. I agree regular AI can generate new knowledge but only an AGI will do so creatively and and recognize it as so.  I don't think we are close to creating that kind of AGI yet with the current approach as we don't really understand how creativity works. That being said, it can't be that hard if evolution was able to figure it out.

Great post! I think this captures a lot of why I'm not ultradoomy (only, er, 45%-ish doomy, at the moment), especially A and B. I think it's at least possible that our reality is on easymode, where muddling could conceivably put an AI into close enough territory to not trigger an oops.

I'd be even less doomy if I agreed with the counterarguments in C. Unfortunately, I can't shake the suspicion that superintelligence is the kind of ridiculously powerful lever that would magnify small oopses into the largest possible oopses.

Hypothetically, if we took a clever... (read more)

-1awg4mo
Agreed that superhuman intelligence seems like the kind of thing that could be a very powerful lever. What gets me is that we don't seem to know how orthogonal or non-orthogonal intelligence and empathy are to one another.[1]  If we were capable of creating a superhumanly intelligent AI and we were to be able to give it superhuman empathy, I might be inclined to trust ceding over a large amount of power and control to that system (or set of systems whatever). But a sociopathic superhuman intelligence? Definitely not ceding power over to that system.  The question then becomes to me, how confident are we that we are not creating dangerously sociopathic AI? 1. ^ If I were to take a stab, I would say they were almost entirely orthogonal, as we have perfectly intelligent yet sociopathic humans walking around today who lack any sort of empathy. Giving any of these people superhuman ability and control would seem like an obviously terrible idea to me.

Thanks!

My understanding is that a true quantum computer would be a (mostly) reversible computer as well, by virtue of quantum circuits being reversible. Measurements aren't (apparently) reversible, but they are deferrable. Do you mean something like... in practice, quantum computers will be narrowly reversible, but closer to classical computers as a system because they're forced into many irreversible intermediate steps?

3Noosphere894mo
Not really. I'm focused on fully reversible systems here, as they theoretically allow you to reverse errors without dissipating any energy, so the energy stored there can keep on going. It's a great advance, and it's stronger than you think since we don't need intermediate steps anymore, and I'll link to the article here: https://www.quantamagazine.org/computer-scientists-eliminate-pesky-quantum-computations-20220119/ [https://www.quantamagazine.org/computer-scientists-eliminate-pesky-quantum-computations-20220119/] But I'm focused on full reversibility, ie the measurement step can't be irreversible.

Now I have a fairly low probability for superconduction/reversible/quantum computers this century, like on the order of 2-3%.

Could you elaborate on this? I'm pretty surprised by an estimate that low conditioned on ~normalcy/survival, but I'm no expert.

3Noosphere894mo
Admittedly this is me thinking worst case scenario, where no technology can reliably improve the speed of getting to those technologies. If I had to compute an average case, I'd operationalize the following predictions: Will a quantum computer be sold to 10,000+ customers with a qubit count of at least 1,000 by 2100? Probability: (15-25%.) Will superconductors be used in at least 1 grid in Europe, China or the US by 2100? Probability: (10-20%). Will reversible computers be created by a company with at least $100 million in market cap by 2100? Probability: (1-5%). Now I'm somewhat pessimistic about reversible computers, as they may not exist, but I think there's a fair chance of superconductors and quantum computers by 2100.

Most of it is the latter, but to be clear, I do not have inside information about what any large organization is doing privately, nor have I seen an "oh no we're doomed" proof of concept. Just some very obvious "yup that'll work" stuff. I expect adjacent things to be published at some point soonishly just because the ideas are so simple and easily found/implemented independently. Someone might have already and I'm just not aware of it. I just don't want to be the one to oops and push on the wrong side of the capability-safety balance.

A constant time architecture failing to divide arbitrary integers in one step isn't surprising at all. The surprising part is being able to do all the other things with the same architecture. Those other things are apparently computationally simple.

Even with the benefit of hindsight, I don't look back to my 2015 self and think, "how silly I was being! Of course this was possible!"

2015-me couldn't just look at humans and conclude that constant time algorithms would include a large chunk of human intuition or reasoning. It's true that humans tend to suck at ... (read more)

I think I'm understanding where you're coming from a bit more now, thanks. So, when I wrote:

The H100, taken as a whole, is on the order of a million times away from the Landauer limit at its operating temperature.

My intended meaning in context was "taking the asspull as an assumption, the abstract computational thing an H100 is doing that is relevant to ML (without caring about the hardware used to accomplish it, and implicitly assuming a move to more ML-optimized architectures) is very roughly 6 OOMs off the absolute lower bound, while granting that the l... (read more)

1jacob_cannell4mo
Hmm actually the 0.5 would assume full bright silicon, all 100% in use, because they only switch about half the time on average. So really it should be 0.5*a, where a is some activity factor, and I do think we are entering dark silicon era to some degree. Consider the nvidia tensorcores, and all the different bit pathways they have. Those may share some sub parts, but seems unlikely they share everything. Also CPUs tend to be mostly SRAM cache, which has much lower activity level.

Scanning through your other post, I don't think we disagree on the physics regarding ML-relevant compute. It is a quick and simplistic analysis, yes- my intent there was really just to say "hardware bottlenecks sure don't look like they're going to arrive soon enough to matter, given the rest of this stuff." The exact amount of headroom we have left and everything that goes into that estimation just didn't seem worth including given the length and low impact. (I would have chosen differently if those details changed the conclusion of the section.)

I am curi... (read more)

2jacob_cannell4mo
Yeah it was the asspull part, which I mostly noticed as Landauer, and this: Well instead of using the asspull math, you can look at the analysis in the engineering literature. At a really high level, you can just look at the end of the ITRS roadmap. The scaling physics for CMOS are reasonably well understood and the endpoint has been known for a decade. A good reference is this [https://scholar.google.com/scholar?cluster=10773536632504446573&hl=en&as_sdt=2005&sciodt=0,5], which lists minimal transition energy around 6e-19J, and minimal switch energy around ~2e-18J (after including local interconnect) for the end of CMOS scaling. The transition energy of around 6e-19J is a few OOM larger than the minimal Landauer bound, but that bound only applies for computations that take infinite time and or have a useless failure rate of 50%. For reliable digital logic, the minimal energy is closer to the electronvolt or 1e-19J (which is why chip voltages are roughly around 1V, whereas neurons compute semi-reliably at just a few times the minimal Landauer voltage). So then if we do a very rough calculation for the upcoming RTX 4090, assuming 50% transistor activity rate, we get: (450W / (0.5 * 7.6e10 * 2.2e9)) = 5.3e-18J, so only a few times above the predicted end-of-CMOS scaling energy, not a million times above. This is probably why all TSMC's future nodes are all just 3X with some new letter, why Jensen (nvidia ceo) says moore's law is dead, etc. (Intel meanwhile says it's not dead yet, but they are 4 or 5 years behind TSMC, so it's only true for them) Now maybe there will be future miracles, but they seem to buy at best only a few OOM, which is the remaining gap to the brain, which really is pushing at the energy limit [https://www.lesswrong.com/posts/xwBuoE9p8GE7RAuhd/brain-efficiency-much-more-than-you-wanted-to-know#Energy].

I'd agree that equivalently rapid progress in something like deep reinforcement learning would be dramatically more concerning. If we were already getting such high quality results while constructing a gradient out of noisy samples of a sparse reward function, I'd have to shorten my timelines even more. RL does tend to more directly imply agency, and it would also hurt my estimates on the alignment side of things in the absence of some very hard work (e.g. implemented with IB-derived proof of 'regret bound is alignment' or somesuch).

I also agree that token... (read more)

2julkopki4mo
Maybe there is some consolation in that if the humanity were to arrive at something approaching AGI, it would rather be better for it to do so using an architecture that's limited in its ultimate capability, demonstrates as little natural agency as possible, ideally that's a bit of a dead end in terms of further AI development. It could serve as a sort of vaccine if you will. Running with the singularity scenario for a moment, I have very serious doubts that a purely theoretical research performed largely in a vacuum will yield any progress on AI safety. The history of science certainly doesn't imply that we will solve this problem before it becomes a serious threat. So the best case scenario we can hope for is that the first crisis caused by the AGI will not be fatal due to the underlying technology's limitations and manageable speed of improvement.

Yes, unfortunately there are indeed quite a few groups interested in it.

There are reasons why they haven't succeeded historically, and those reasons are getting much weaker over time. It should suffice to say that I'm not optimistic about our odds on avoiding this type of threat over the next 30 years (conditioned on no other gameboard flip).

MATH is a dataset of problems from high school competitions, which are well known to require a very limited set of math knowledge and be solveable by applying simple algorithms. 

I think you may underestimate the difficulty of the MATH dataset. It's not IMO-level, obviously, but from the original paper:

We also evaluated humans on MATH, and found that a computer science PhD student who does not especially like mathematics attained approximately 40% on MATH, while a three-time IMO gold medalist attained 90%, indicating that MATH can be challenging for hu

... (read more)

Like, it shouldn't be surprising that the LM can solve problems in text which are notoriously based around applying a short step by step algorithm, when it has many examples in the training set.

I'm not clear on why it wouldn't be surprising. The MATH dataset is not easy stuff for most humans. Yes, it's clear that the algorithm used in the cases where the language models succeeds must fit in constant time and so must be (in a computational sense) simple, but it's still outperforming a good chunk of humans. I can't ignore how odd that is. Perhaps human reaso... (read more)

1Hyperion4mo
I don't think it's odd at all - even a terrible chess bot can outplay almost all humans. Because most humans haven't studied chess. MATH is a dataset of problems from high school competitions, which are well known to require a very limited set of math knowledge and be solveable by applying simple algorithms.  I know chain of thought prompting well - it's not a way to lift a fundamental constraint, it just is a more efficient targeting of the weights which represent what you want in the model. You don't provide any proof of this, just speculation, much of it based on massive oversimplifications (if I have time I'll write up a full rebuttal). For example, RWKV is more of a nice idea that is better for some benchmarks, worse for others, than some kind of new architecture that unlocks greater overall capabilities.

Hopefully we do actually live in that reality!

I'm pretty sure the GPT confabulation is (at least in part) caused by highly uncertain probability distribution collapse, where the uncertainty in the distribution is induced by the computational limits of the model.

Basically the model is asked to solve a problem it simply can't (like, say, general case multiplication in one step), and no matter how many training iterations and training examples are run, it can't actually learn to calculate the correct answer. The result is a relatively even distribution over t... (read more)

(Jay's interpretation was indeed my intent.)

Empirically, I don't think it's true that you'd need to rely on superhuman intelligence. The latest paper from the totally anonymous and definitely not google team suggests PaL- I mean an anonymous 540B parameter model- was good enough to critique itself into better performance. Bootstrapping to some degree is apparently possible.

I don't think this specific instance of the technique is enough by itself to get to spookyland, but it's evidence that token bottlenecks aren't going to be much of a concern in the near ... (read more)

Load More