All of Garrett Baker's Comments + Replies

There is a common belief in AI alignment that most do worse research if they’re independent rather than in an organization (academic lab, nonprofit lab, or for-profit company). Based on your arguments, you probably disagree. Why? And if you agree, why isn’t this strong evidence against the personal-first strategy?

FWIW this doesn't seem right to me. Indeed, working at labs seems to have caused many people previously doing AI Alignment research to now do work that seems basically just capabilities work. Many people at academic labs also tend to go off into capabilities work, or start chasing academic prestige in ways that seems to destroy most possible value from their research.  Average output from independent researchers or very small research organizations seems where most of the best work comes from (especially if you include things like present-day Redwood and ARC, which are like teams of 3-4 people). Many people do fail to find traction, whereas organizations tend to be able to elicit more reliable output from whoever they hire, but honestly a large fraction of that output seems net-negative to me and seems to be the result of people just being funneled into ML engineering work when they lose traction on hard research problems.

The students are treating the theory & its outputs as a black box with which to update towards or away from if a proponent of the theory makes a claim of the form "Newtonian mechanics predicts x is y", and you are able to measure x. The correct process to go through when getting such a result is to analyze the situation and hypothesis at multiple levels, and ask which part of the theory broke. There are some assumptions which are central to the theory, like that , or and others which are not so central, like boundary conditions or what... (read more)

[edit: why does this have so many more upvotes than my actually useful shortform posts]

Its appearing on the front-page to me, and has been for the past day or so. Otherwise I never would have seen it.

Yeah, visibility of shortforms is now like 3-4x higher than it was a week ago, so expect shortforms in-general to get many more upvotes.

This seems probably false? The search term is Epistasis. Its not that well researched, because of the reasons you mentioned. In my brief search, it seems to play a role in some immunodeficiency disorders, but I'd guess also more things which don't seem clearly linked to genes yet.

I don't understand why you'd expect only linear genes to vary in a species. Is this just because most species have relatively little genetic variation, so such variation is by nature linear? This feels like a bastardization of the concept to me, but maybe not.

Edit: Perhaps you can... (read more)

Interesting to compare model editing approaches to Gene Smith's idea to enhance intelligence via gene editing:

Genetically altering IQ is more or less about flipping a sufficient number of IQ-decreasing variants to their IQ-increasing counterparts. This sounds overly simplified, but it’s surprisingly accurate; most of the variance in the genome is linear in nature, by which I mean the effect of a gene doesn’t usually depend on which other genes are present.
So modeling a continuous trait like intelligence is actually extremely straightforward: you si

... (read more)
4Zac Hatfield-Dodds21d
My impression is that the effects of genes which vary between individuals are essentially independent, and small effects are almost always locally linear. With the amount of measurement noise and number of variables, I just don't think we could pick out nonlinearities or interaction effects of any plausible strength if we tried!

The people who I know who currently seem most impressive spent a lot of time earlier on gaining a bunch of skill. I don't know anyone who seems impressive by virtue of having gained lots of status which they are now free to divert to good ends. Perhaps I just don't hang around such people, but for this reason I'm much less convinced of John's arguments when you replace status with skill.

Sorry, yeah, my comment was quite ambiguous. I meant that while gaining status might be a questionable first step in a plan to have impact, gaining skill is pretty much an essential one, and in particular getting an ML PhD or working at a big lab seem like quite solid plans for gaining skill. i.e. if you replace status with skill I agree with the quotes instead of John.
+1, "gain a bunch of skill and then use it to counterfactually influence things" seems very sensible. If the plan is to gain a bunch of skill by leading a parade, then I'm somewhat more skeptical about whether that's really the best strategy for skill gain, but I could imagine situations where that's plausible.

He does, Kim Jong Un is mainly beholden to China, and China would rather keep the state stable & unlikely to grow in population, economy, or power so that it can remain a big & unappetizing buffer zone in case of a Japanese or Western invasion.

This involves China giving him the means to rule absolutely over his subjects, as long as he doesn't bestow political rights to them, and isn't too annoying. He can annoy them with nuclear tests, but that gives China an easy way of scoring diplomatic points by denouncing such tests, makes the state an even worse invasion prospect, committing China to a strict nuclear deterrence policy in that area, without China having to act crazy themselves.

I mean, I don't think China in practice orders him around. Obviously they are a geopolitically ally and thus have power, but democratic countries are also influenced by their geopolitical allies, especially the more powerful allies like the USA or China. KJU is no different in this respect, the difference is he has full domestic control for himself and his entire family bloodline.

This seems mostly true? Very very very rarely is there a dictator unchecked in their power. See: A Dictator’s Handbook. They must satisfy the drives of their most capable subjects. When there are too many such subjects, collapse, and when there are way too many, democracy.

OP doesn't claim that dictators are unchecked in their power, he jokingly claims that dictators and monarchs inevitably end up overthrown. Which is, of course, false: there were ~55 authoritarian leaders in the world in 2015, and 11 of them were 69 years old or older, on their way to die of old age. Dictator's handbook has quite a few examples of dictators ruling until their natural death, too.
4Matthew Barnett1mo
Defending the analogy as charitably as I can, I think there are two separate questions here: 1. Do dictators need to share power in order to avoid getting overthrown? 2. Is a dictatorship almost inherently doomed to fail because it will inevitably get overthrown without "fundamental advances" in statecraft? If (1) is true, then dictators can still have a good life living in a nice palace surrounded by hundreds of servants, ruling over vast territories, albeit without having complete control over their territory. Sharing some of your power and taking on a small, continuous risk of being overthrown might still be a good opportunity, if you ever get the chance to become a dictator. While you can't be promised total control or zero risk of being overthrown, the benefits of becoming a dictator could easily be worth it in this case, depending on your appetite for risk. If (2) is true, then becoming a dictator is a really bad idea for almost anyone, except for those who have solved "fundamental problems" in statecraft that supposedly make long-term stable dictatorships nearly-impossible. For everyone else who hasn't solved these problems, the predictable result of becoming a dictator is that you'll soon be overthrown, and you'll never actually get to live the nice palace life with hundreds of servants.

I’m glad for the attempted prediction! Seems not very cruxy to me. Something more cruxy: I imagine that people are capable of moderating themselves to an appropriate level of “cutting corners” So I expect a continuity of cutting corners levels. But you expect that small amounts of cutting corners quickly snowball into large amounts. So you should expect a pretty bimodal distribution.

[edit] A way this would not change my mind: If we saw a uni, bi, or multimodal distribution, but each of the peaks corresponded to a different cause area. I would say we’re picking up different levels of cutting corners ability from several different areas people may work in.

I don't expect the existing organizations to get more sloppy. I expect more sloppy organizations to join the EA ecosystem... and be welcome to waste the resources and burn out people (and not produce much actual value in return), because the red flags will be misinterpreted as a sign of being awesome. I am not sure if this will result in a bimodal distribution, but expect that there will be some boring organizations that do their accounting properly and also cure malaria, and some exciting organizations that will do a lot of yachting and hot tub karaoke parties... and when things blow up no one will be able to figure out how many employees they actually had and whether they actually paid them according to the contract which doesn't even exist on paper... because everyone was like "wow, these guys are thinking and acting so much out-of-the-box that they are certainly the geniuses who will save the world" when actually there were just some charismatic guys who probably meant good but didn't think too hard about it.

This is correct in general. For this particular discussion? It may be right. Numbers may be too strong a requirement to change my mind. At least a Fermi estimate would be nice, also any kind of evidence, even personal, supporting Viliam’s assertions will definitely be required.

The important part isn't assertions (which honestly I don't see here), it's asking the question. Like with advice, it's useless when taken as a command without argument, but as framing it's asking whether you should be doing a thing more or less than you normally do it, and that can be valuable by drawing attention to that question, even when the original advice is the opposite of what makes sense. With discussion of potential issues of any kind, having norms that call for avoiding such discussion or for burdening it with rigor requirements makes it go away, and so the useful question of what the correct takes are remains unexplored.

Yes, a new point. Basically: "effective organizations cut corners" is a mild infohazard.

I do not in fact know the right amount of cutting corners to do. This is strong evidence this is not in fact an infohazard! I'd like to at least see some numbers before you declare something immoral and dangerous to discuss! I'm tempted to strong-downvote for such a premature comment, but instead I will strong disagree.

But when this meme becomes popular, it motivates organizations to get sloppy, excusing the sloppiness by "as you see, we care about effectiveness so

... (read more)
Discussing hypothetical dangers shouldn't require numbers. It's probably not so dangerous to discuss hypothetical dangers that they shouldn't be discussed when there are no numbers.

There's also the possibility these stories are just folklore, and there was some non-serendipitous way the chemicals were discovered, but people had more fun presenting it as if it were serendipitous.

Sure, all these stories totally sound like urban legends, but the sweeteners are out there and I don't see how they could have been discovered otherwise (unless they were covertly screening drugs on a large number of people).

This is making a different point from your original comment.

Yes, a new point. Basically: "effective organizations cut corners" is a mild infohazard. Yes, sometimes it is necessary to cut corners to achieve a greater good, but such things should be done with caution. Which means, if you cut too many corners, or you keep doing it with unimportant things, you have gone too far. As soon as the necessity to cut corners passes, you should try to get things to normal. But when this meme becomes popular, it motivates organizations to get sloppy, excusing the sloppiness by "as you see, we care about effectiveness so much that we don't have any time left for the stupid concerns of lesser minds". And then... people get hurt, because it turns out that some of the rules actually existed for a reason (usually as a reaction to people getting hurt in the past). Cutting corners should be seen as a bad thing that is sometimes necessary, not as a good thing that should be celebrated. Otherwise bad actors (especially) will pass our tests with flying colors.

From afar at least academia seems absolutely brimming with mops, sociopaths, and geeks. The question should be how does it still function? Several answers, which I don't have enough information to differentiate between:

  1. It doesn't. It sucks, and we should expect most intellectual advancement to happen elsewhere.
  2. 1000 shit papers do nothing to lessen a single great work. The lesson? Set up your subculture so that it prioritizes strong-link problems.
  3. There is a (slow) ground truth signal in the form of replicable experimental evidence that acts to cull the exce
... (read more)
3Mo Putera1mo
Tangential to your comment's main point, but for non-insiders maybe PaperRank, AuthorRank and Citation-Coins are harder to game than the h-index:  They still can't be compared between subfields though, only within.

I do think cutting corners should be tolerated in EA? Everything in moderation and all that. Most very effective organizations cut corners.

Technically true, but gods help us all if organizations start cutting corners as a way to signal greater effectiveness, and we keep responding to this signal positively (until the moment when things predictably blow up).

Thinking too much about what your priors should be at the expense of actually learning about how the world is. Thinking in order to get better priors is tempting, but most priors you start with quickly get updated to be no different from each other.

In Magna Alta Doctrina Jacob Cannell talks about exponential gradient descent as a way of approximating solomonoff induction using ANNs

While that approach is potentially interesting by itself, it's probably better to stay within the real algebra. The Solmonoff style partial continuous update for real-valued weights would then correspond to a multiplicative weight update rather than an additive weight update as in standard SGD.

Has this been tried/evaluated? Why actually yes - it's called exponentiated gradient descent, as exponentiating the result of addi

... (read more)

Will that baby, on Tuesday, when I'm in my Tuesday-blue-T-shirt, regard me as a person? Will he regard someone else, had they being wearing a red T-shirt? Will he be able to separate the idea red from the idea person without the prompt "That was the same person in a blue T-shirt"?

I’m actually relatively uncertain what the answer here is. Definitely babies recognize their mother, maybe from smell or feel and not sight. And your smell would change little, so maybe it does recognize you?

Babies also differentially look at faces, so it seems likely it has an in... (read more)

2Charlie Steiner2mo
As the kids these days say, P R I O R S.
That's actually a fair point, although I'm not sure how much it takes away from the value of the metaphor.It looks to me it can be easily circumvented while mantaining the general idea.

The reason I don't say erroneous proof is because I want to distinguish between the claim that most proofs are wrong, and most conclusions are wrong. I thought most conclusions would be wrong, but thought much more confidently most proofs would be wrong, because mathematicians often have extra reasons & intuition to believe their conclusions are correct. The claim that most proofs are wrong is far weaker than the claim most conclusions are wrong.

Hmm.  I'm not sure which is stronger.  For all proofs I know, the conclusion is part of it such that if the conclusion is wrong, the proof is wrong.  The reverse isn't true - if the proof is right, the conclusion is right.   Unless you mean "the proof doesn't apply in cases being claimed", but I'd hesitate to call that a conclusion of the proof.   Again, a few examples would clarify what you (used to) claim. I'll bow out here - thanks for the discussion.  I'll read futher comments, but probably won't participate in the thread.

No, I meant that most of non-practical mathematics have incorrect conclusions. (I have since changed my mind, but for reasons in an above comment thread).

Still a bit confused without examples about what is a "conclusion" of "non-practical mathematics", if not the QED of a proof. But if that's what you mean, you could just say "erroneous proof" rather than "invalid conclusion". Anyway, interesting discussion.

As a grad student it is expected you will make an effort to understand the derivation of as much of the foundational results in your sub-field as you can […] It is definitely considered moderately distasteful to cite results you dont understand and good mathematicians do try to minimize it.

Yeah, that seems like a feature of math that violates assumption 2 argument 1. If people are actually constantly checking each others’ work, and never citing anything they don’t understand, that leaves me much more optimistic.

This seems like a rarity. I wonder how this culture developed.

I can’t give a few examples, only a criteria under which I don’t trust mathematical reasoning: When there are few experiments you can do to verify claims, and when the proofs aren’t formally verified. Then I’m skeptical that the stated assumptions of the field truly prove the claimed results, and I’m very confident not all the proofs provided are correct.

For example, despite being very abstracted, I wouldn’t doubt the claimed proofs of cryptographers.

OK, I also don't doubt the cryptographers (especially after some real-world time in ensuring implementations can't be attacked, which validates both the math and the implementation. I was thrown off by your specification of "in math fields", which made me wonder if you meant you thought a lot of formal proofs were wrong.  I think some probably are, but it's not my default assumption. If instead you meant "practical fields that use math, but don't formally prove their assertions", then I'm totally with you.  And I'd still recommend being specific in debates - the default position of scepticism may be reasonable, but any given evaluation will be based on actual reasons for THAT claim, not just your prior.

They definitely both have their validity. They probably each also make some results more salient than other results. I’d guess in the future there’ll be easier Lean tools than we currently have, which make the practice feel less like writing in Assembly. Either because of clever theorem construction, or outside tools like LLMs (if they don’t become generally intelligent, they should be able to fill in the stupid stuff pretty competently).

If you had a lot of very smart coders working on a centuries old operating system, and never once running it, every function of which takes 1 hour to 1 day to understand, each coder is put under a lot of pressure to write useful functions, not so much to show that others' functions are flawed, and you pointed out that we don't see many important functions being shown to be wrong, I wouldn't even expect the code to compile, nevermind run even after all the syntax errors are fixed!

The lack of important results being shown to be wrong is evidence, and even mo... (read more)

One way that the analogy with code doesn't carry over is that in math, you often can't even being to use a theorem if you don't know a lot of detail about what the objects in the theorem mean, and often knowing what they mean is pretty close to knowing why the theorem's you're building on are true. Being handed a theorem is less like being handed an API and more like being handed a sentence in a foreign language. I can't begin to make use of the information content in the sentence until I learn what every symbol means and how the grammar works, and at that point I could have written the sentence myself.

People metaphorically run parts of the code themselves all the time! Its quite common for people to work through proofs of major theorems themselves. As a grad student it is expected you will make an effort to understand the derivation of as much of the foundational results in your sub-field as you can. A large part of the rationale is pedagogical but it is also good practice. It is definitely considered moderately distasteful to cite results you dont understand and good mathematicians do try to minimize it. Its rare that an important theorem has a proof t... (read more)

Either way, with the slow march of the Lean community, we can hope to see which of us are right in our lifetimes. Perhaps there will be another schism in math if the formal verifiers are unable to validate certain fields, leading to more rigorous "real mathematics" which are able to be verified in Lean, and less rigorous "mathematics" which insists their proofs, while hard to find a good formal representation for, are still valid, and the failure of the Lean community to integrate their field is more of an indictment of the Lean developers & the project of formally verified proofs than the relevant group of math fields.

Recently I had a conversation where I defended the rationality behind my being skeptical of the validity of the proofs and conclusions constructed in very abstracted, and not experimentally or formally verified math fields.

To my surprise, this provoked a very heated debate, where I was criticized for being overly confident in my assessments of fields I have very little contact with (I was expecting begrudging agreement). But there was very little rebuttal of my points! The rest of my conversation group had three arguments:

  1. Results which much of a given fi
... (read more)
Can you give a few examples?  I can't tell if you're skeptical that proofs are correct, or whether you think the QED is wrong in meaninful ways, or just unclearly proven from minimal axioms.  Or whether you're skeptical that a proof is "valid" in saying something about the real world (which isn't necessarily the province of math, but often gets claimed). I don't think your claim is meaningful, and I wouldn't care to argue on either side.  Sure, be skeptical of everything. But you need to specify what you have lower credence in than your conversational partner does.
Here's an example of what I think you mean by "proofs and conclusions constructed in very abstracted, and not experimentally or formally verified math": Given two intersecting lines AB and CD intersecting at point P, the angle measure of two opposite angles APC and BPD are equal. The proof? Both sides are symmetrical so it makes sense for them to be equal. On the other hand, Lean-style proofs (which I understand you to claim to be better) involve multiple steps, each of which is backed by a reasoning step, until one shows that LHS equals RHS, which here would involve showing that angle APC = BPD: 1. angle APC + angle CPB = 180 * (because of some theorem) 2. angle CPB + angle BPD = 180 * (same) 3. [...] 4. angle APC = angle BPD (substitution?) There's a sense in which I feel like this is a lot more complicated a topic than what you claim here. Sure, it seems like going Lean (which also means actually using Lean4 and not just doing things on paper) would lead to lot more reliable proof results, but I feel like the genesis of a proof may be highly creative, and this is likely to involve the first approach to figuring out a proof. And once one has a grasp of the rough direction with which they want to prove some conjecture, then they might decide to use intense rigor. To me this seems to be intensely related to intelligence (as in, the AI alignment meaning-cluster of that word). Trying to force yourself to do things Lean4 style when you can use higher level abstractions and capabilities, feels to me like writing programs in assembly when you can write them in C instead. On the other hand, it is the case that I would trust Lean4 style proofs more than humanly written elegance-backed proofs. Which is why my compromise here is that perhaps both have their utility.
Long complicated proofs almost always have mistakes. So in that sense you are right. But its very rare for the mistakes to turn out to be important or hard to fix.  In my opinion the only really logical defense of Academic Mathematics as an epistemic process is that it does seem to generate reliable knowledge. You can read through this thread: There just don't seem to be very many recent results that were widely accepted but proven wrong later. Certainly not many 'important' results. The situation was different in the 1800s but standard for rigor have risen.  Admittedly this isn't the most convincing argument in the world. But it convinces me and I am fairly able to follow academic mathematics.
2Garrett Baker2mo
Either way, with the slow march of the Lean community, we can hope to see which of us are right in our lifetimes. Perhaps there will be another schism in math if the formal verifiers are unable to validate certain fields, leading to more rigorous "real mathematics" which are able to be verified in Lean, and less rigorous "mathematics" which insists their proofs, while hard to find a good formal representation for, are still valid, and the failure of the Lean community to integrate their field is more of an indictment of the Lean developers & the project of formally verified proofs than the relevant group of math fields.

Follow Nate Silver's substack, he is the person with the best track-record I know of for predicting US elections.

I really like this post, this is very influential about how I think about plans, and what to work on. I do think its a bit vague though, and lacking in a certain kind of general formulation. It may be better if there were more examples listed where the technique could be used.

I like this post! Steven Byrnes, and Jacob Cannell are two people with big models of the brain and intelligence which give concrete predictions which are unique, and large contributors to my own thinking. The post can only be excellent, and indeed it is! Byrnes doesn't always respond to Cannell how I would, but his responses usually shifted my opinion somewhat.

0Lycaos King2mo
"Why would any supermind want something so inherently worthless as the feeling of discovery without any real discoveries?" "No free lunch.  You want a wonderful and mysterious universe?  That's your value." "These values do not emerge in all possible minds.  They will not appear from nowhere to rebuke and revoke the utility function of an expected paperclip maximizer." "Touch too hard in the wrong dimension, and the physical representation of those values will shatter - and not come back, for there will be nothing left to want to bring it back." I've chosen a small representation of the sort of things that Eliezer says about human values. When I call Eliezer a moral fictionalist, I don't mean that he doesn't think human values are real, just that they are real in the way that fictional stories are real, ie. that they exist only in human minds, and are not in any way objective or discoverable. Human values are, in Eliezer's view: Irrational: they cannot be derived from first principles. Accidental: they arise from the ancestral environment in which humans evolved. Inalienable: You can't get jettison them for arbitrary values, your philosophy must ultimately reconcile your stated values with your innate ones[1] Fragile: because human values are a small subset of high dimensional intersections, they are subject to be destroyed by even small perturbations. All of these attributes are just obvious consequences of his metaphysics so he doesn't attempt to justify any of it in the sequence you linked. Why would he? It's obvious. He's more interested in examining the consequences of these attributes on civilizational policy.   1. ^ "You do have values, even when you're trying to be "cosmopolitan", trying to display a properly virtuous appreciation of alien minds.  Your values are then faded further into the invisible background - they are less obviously human.  Your brain probably won't even generate an alternative so awful that it would wake you up, make y

Clearly a very influential post on a possible path to doom from someone who knows their stuff about deep learning! There are clear criticisms, but it is also one of the best of its era. It was also useful for even just getting a handle on how to think about our path to AGI.

Gwern talks about natural selection like it has a loss function in Evolution as Backstop For Reinforcement Learning:

I suggest interpreting phenomenon as multi-level nested optimization paradigm: many systems can be usefully described as having two (or more) levels where a slow sample-inefficient but ground-truth ‘outer’ loss such as death, bankruptcy, or reproductive fitness, trains & constrains a fast sample-efficient but possibly misguided ‘inner’ loss which is used by learned mechanisms such as neural networks or linear programming group selection

... (read more)

I asked the source of the graph, and he said it was so long because it doesn’t rain in the troposphere. This seems a believable explanation. (though also, why doesn’t it rain?)

Not sure if there's some other reason, but in the stratosphere you don't afaik* get big convective updrafts like there are in the troposphere, which I presume is due to the rate at which temperature declines with altitude getting smaller than the rate at which a rising air body will cool due to expansion. *Actually I think that this property is basically what defines the stratosphere vs the troposphere?
Yeah, thanks for highlighting this. I started writing about it but realised I was out of my depth (even further out of my depth than for the rest of the post!) so I scrapped it.  Thanks for clarifying with Robert Rohde! I reached roughly the conclusion you did. When water vapour is injected into the troposphere (the lowest level of the atmosphere) it is quickly rained out, as you point out. However, the power of the Hunga-Tonga explosion meant that the water vapour was injected much higher, into the stratosphere (what the diagram calls the 'upper atmosphere'). For some reason, water vapour in the stratosphere doesn't move back down and get rained out as easily so it sits there. Which is why 'upper atmosphere' water vapour levels are still elevated almost two years after the explosion.

Though also there may be complications from it being higher up than water vapor usually goes.

However, the Hunga Tonga–Hunga Haʻapai volcano was under the ocean's surface when it erupted, causing it to inject millions of tonnes of water vapour into the atmosphere. Water vapour is a strong greenhouse gas, so this causes warming. See here for a more detailed explanation.

Not yearlong heating though. Added water vapor stays in the atmosphere for a measly 9 days. Not the 5 years claimed by NPR. I don’t know where that graph came from but it seems absurd. Not the 5 years claimed by NPR. Not sure how that graph was made but it seems absurd.

Edit: Asking... (read more)

Well, one possibility is (1) that the article got it terribly wrong. But to my ignorant eye there are at least two others. (2) Perhaps water vapour in the stratosphere stays around for much longer than water vapour in the troposphere. (And most water vapour is in the troposphere, so any sort of average figure will be dominated by that.) (3) Your link says that an average molecule of water stays in the atmosphere for 9 days, but that isn't the same as saying that a change in the amount of water will only persist for that long; maybe there is a constant exchange of water molecules that leaves amounts roughly unchanged, so that if you put 2.3 metric fucktons of extra water into the atmosphere then a month later there will still be 2.3 metric fucktons of excess water but the specific water molecules will be different. Perhaps someone who knows some actual climatology can tell us how plausible 1,2,3 are. Here's the paper I think everyone claiming years is referencing: That in turn references for which I can see only the abstract, which says "tau=1.3 years". If tau has the usual meaning (time to decay by a factor of e) then 5 years would be roughly the time for a 20x decay; but there may be more details that make the "5-10 years" figure less misleading than that makes it sound (e.g., upper versus lower stratosphere -- the water vapour from the recent eruption went a long way up).
3Garrett Baker2mo
Though also there may be complications from it being higher up than water vapor usually goes.

Hm, I guess something like this might work? Not sure regarding the precise operationalization, though.

You willing to do a dialogue about predictions here with @jacob_cannell or @Quintin Pope or @Nora Belrose or others (also a question to those pinged)?

4Thane Ruthenis2mo
If any of the others are particularly enthusiastic about this and expect it to be high-value, sure! That said, I personally don't expect it to be particularly productive. * These sorts of long-standing disagreements haven't historically been resolvable via debate (the failure of Hanson vs. Yudkowsky is kind of foundational to the field). * I think there's great value in having a public discussion nonetheless, but I think it's in informing the readers' models of what different sides believe. * Thus, inasmuch as we're having a public discussion, I think it should be optimized for thoroughly laying out one's points to the audience. * However, dialogues-as-a-feature seem to be more valuable to the participants, and are actually harder to grok for readers. * Thus, my preferred method for discussing this sort of stuff is to exchange top-level posts trying to refute each other (the way this post is, to a significant extent, a response to the AI is easy to control article), and then maybe argue a bit in the comments. But not to have a giant tedious top-level argument. I'd actually been planning to make a post about the difficulties the "classical alignment views" have with making empirical predictions, and I guess I can prioritize it more? But I'm overall pretty burned out on this sort of arguing. (And arguing about "what would count as empirical evidence for you?" generally feels like too-meta fake work, compared to just going out and trying to directly dredge up some evidence.)
2Quintin Pope2mo
Not entirely sure what @Thane Ruthenis' position is, but this feels like a maybe relevant piece of information: 

If you think the Easy button sends cards too far into the future, you can go into the deck settings, and change the Easy Interval to <4.

You may also want to consider turning on FSRS in the same settings, to make Anki learn the optimal interval for a given retention probability you want to hit.

I don't know what add-ons you may have been using in high school. It would make sense though that you would find it easier to memorize stuff, especially languages, when you were younger though. So maybe that's a confounder here.

The main problem I see that are relevant to infohazards are that it encourages a "Great Man Theory" of progress in science, which is basically false, and this still holds despite vast disparities in ability, since no one person or small group is able to single handedly solve scientific fields/problems by themselves, and the culture of AI safety already has a bit of a problem with using the "Great Man Theory" too liberally.

I found other parts of the post a lot more convincing than this part of the post, and almost didn't read it because you highlighted t... (read more)

seems kinda hard to make something formal to me because the basic argument is, i think, "there's really a lot of ways for a model to do well in training", but i don't know how one is supposed to formalize that. i guess i'm curious where you think the force of formality comes in for the analogous argument when it comes to python programs

This may not be easily formalizable, but this does seem easily testable? Like, whats wrong with just training a bunch of different models, and seeing if they have similar generalization properties? If they're radically different, then there's many ways of doing well in training. If they're pretty similar, then there's very few ways of doing well in training.

It really seems like there should be a lower bar to update though. Like, you say to consider humans as an existence proof of AGI, so likely your theory says something about humans. There must be some testable part of everyday human cognition which relies on this general algorithm, right?

Like, at the very least, what if we looked at fMRIs of human brains while they were engaging in all the tasks you laid out above, and looked at some similarity metric between the scans? You would probably expect there to be lots of similarity compared to, possibly, say Jaco... (read more)

2Thane Ruthenis2mo
Well, yes, but they're of a hard-to-verify "this is how human cognition feels like it works" format. E. g., I sometimes talk about how humans seem to be able to navigate unfamiliar environments without experience, in a way that seems to disagree with baseline shard-theory predictions. But I don't think that's been persuading people not already inclined to this view. The magical number 7±2 and the associated weirdness is also of the relevant genre. Hm, I guess something like this might work? Not sure regarding the precise operationalization, though.

Outside the three major AGI labs, I'm reasonably confident no major organization is following a solid roadmap to AGI; no-one else woke up. A few LARPers, maybe, who'd utter "we're working on AGI" because that's trendy now. But nobody who has a gears-level model of the path there, and what its endpoint entails.

This seems pretty false. In terms of large players, there also exists Meta and Inflection AI. There are also many other smaller players who also care about AGI, and no doubt many AGI-motivated workers at three labs mentioned would start their own orgs if the org they're currently working under shuts down.

3Thane Ruthenis2mo
Inflection's claim to fame is having tons of compute and promising to "train models that are 10 times larger than the cutting edge GPT-4 and then 100 times larger than GPT-4", plus the leader talking about "the containment problem" in a way that kind-of palatably misses the point. So far, they seems to be precisely the sort of "just scale LLMs" vision-less actor I'm not particularly concerned about.  I could be proven wrong any day now, but so far they don't really seem to be doing anything interesting. As to Meta – what's the last original invention they did? Last I checked, they couldn't even match GPT-4, with all of Meta's resources. Yann LeCun has thoughts on AGI, but it doesn't look like he's being allowed to freely and efficiently pursue them. That seems to be how a vision-less major corporation investing in AI looks like. Pretty unimpressive. Current AGI labs metastazing across the ecosystem and potentially founding new ones if shut down – I agree that it may be a problem, but I don't think they necessarily by-default coalesce into more AGI labs. Some of them have research skills but no leadership/management skills, for example. So while they'd advance towards an AGI when embedded into a company with this vision, they won't independently start one up if left to their own devices, nor embed themselves into a different project and hijack it towards AGI-pursuit. And whichever of them do manage that – they'd be unlikely to coalesce into a single new organization, meaning the smattering of new orgs would still advance slower collectively, and each may have more trouble getting millions/billions of funding unless the leadership are also decent negotiators.

the actions that maximize your utility are the ones that decrease the probability that PAI kills literally everyone, even if it's just by a small amount.

This seems false. If you believe your marginal impact is small on the probability of PAI, and large on how much money you make, or how fun your work/social environment is, then it seems pretty easy to have actions other than minimize the probability PAI kills everyone as your best action. Indeed, it seems for many their best action will be to slightly increase the probability PAI kills everyone. Though it may still be in their interest to coordinate to stop PAI from being made.

Though, admittedly, the prompt was to modify the original situation I presented, which had an output currently very difficult for any human to produce to begin with. So I don't quite fault you for responding in kind.

Maybe a more relevant concern I have with this is it feels like a "Can you write a symphony" type test to me. Like, there are very few people alive right now who could do the process you outline without any outside help, guidance, or prompting.

4Thane Ruthenis3mo
Yeah, it's necessarily a high bar. See justification here. I'm not happy about only being able to provide high-bar predictions like this, but it currently seems to me to be a territory-level problem.
4Garrett Baker3mo
Though, admittedly, the prompt was to modify the original situation I presented, which had an output currently very difficult for any human to produce to begin with. So I don't quite fault you for responding in kind.
Well, for what's worth, I can write a symphony (following the traditional tonal rules), as this is actually mandated in order to pass some advanced composition classes. I think that letting the AI write a symphony without supervision and then make some composition professor evaluate it could actually be a very good test, because there's no way a stochastic parrot could follow all the traditional rules correctly for more than a few seconds (an even better test would be to ask it to write a fugue on a given subject, whose rules are even more precise).

So I would fortify this a bit: individual or isolated instances don't count. AIs should be broadly known to be able to engage in this sort of stuff. That should be happening frequently, without much optimization and tailoring made on the human end; about as easily as GPT-4 could be tasked to write a graduate-level essay.

I think sticking to this would make it difficult for you to update sooner. We should expect small approaches before large approaches here, and private solutions before publicly disclosed solutions.

Relatedly would DeepMind’s recent LLM ma... (read more)

Paragraph intended as a costly signal I am in fact invested in this conversation, no need to actually read: Sorry for the low effort replies, but by its nature the info I want from you is more costly for you to give than for me to ask for. Thanks for the response, and hopefully thanks also for future responses.

I feel like I’d always be getting an LLM to do something. Like, if I get an LLM to do the field selection for me, does this work?

Maybe more open-endedly: what, concretely, is the closest thing to what I said that would make you update?

7Thane Ruthenis3mo
Oh, nice way to elicit the response you're looking for! The baseline proof-of-concept would go as follows: * You give the AI some goal, such as writing an analytical software intended to solve some task. * The AI, over the course of  writing the codebase, runs into some non-trivial, previously unsolved mathematical problem. Some formulas need to be tweaked to work in the new context, or there's some missing math theory that needs to be derived. * The AI doesn't hallucinate solutions or swap-in the closest (and invalid) analogue. Instead, it correctly identifies that a problem exists, figures out how it can approach solving it, and goes about doing this. * As it's deriving new theory, it sometimes runs into new sub-problems. Likewise, it doesn't hallucinate solutions, but spins off some subtasks, and solves sub-problems in them. * Ideally, it even defines experiments or rigorous test procedures for fault-checking its theory empirically. * In the end, it derives a whole bunch of novel abstractions/functions/terminology, with layers of novel abstractions building up on the preceding layers of novel abstractions, and all of that is coherently optimized to fit into the broader software-engineering task it's been given. * The software works. It doesn't need to be bug-free, the theory doesn't need to be perfect, but it needs to be about as good as a human programmer would've managed, and actually based on some novel derivations. This seems like something an LLM, e. g. in an AutoGPT wrapper, should be able to do, if its base model is generally intelligent I am a bit wary of reality Goodharting on this test, though. E. g., I can totally imagine some specific niche field in which an LLM, for some reason, can do this, but can't do it anywhere else. Or some fuzziness around what counts as "novel math" being exploited – e. g., if the AI happens to hit upon re-applying extant math theory to a different field? Or, even more specifically, that there's some specific resea

Ok, so if I get a future LLM to write the code to use standard genai tricks to generate novel designs in <area>, write a paper about the results, and the paper is seen as a major revolution in <area>, and this seems to not violate the assumptions Nora and Quintin are making during doom arguments, would this update you? What constraints do you want to put on <area>?

4Thane Ruthenis3mo
Nope, because of the "if I get a future LLM to [do the thing]" step. The relevant benchmark is the AI being able to do it on its own. Note also how your setup doesn't involve the LLM autonomously iterating on its discovery, which I'd pointed out as the important part. To expand on that: Consider an algorithm that generates purely random text. If you have a system consisting of trillions of human uploads using it, each hitting "rerun" a million times per second, and then selectively publishing only the randomly-generated outputs that are papers containing important mathematical proofs – well, that's going to generate novel discoveries sooner or later. But the load-bearing part isn't the random-text algorithm, it's the humans selectively amplifying those of its outputs that make sense. LLM-based discoveries as you've proposed, I claim, would be broadly similar. LLMs have a better prior on important texts than a literal uniform distribution, and they could be prompted to further be more likely to generate something useful, which is why it won't take trillions of uploads and millions of tries. But the load-bearing part isn't the LLM, it's the human deciding where to point its cognition and which result to amplify.

It's an easy mistake to make: both things are called "AI", after all. But you wouldn't study manually-written FPS bots circa 2000s, or MNIST-classifier CNNs circa 2010s, and claim that your findings generalize to how LLMs circa 2020s work. By the same token, LLM findings do not necessarily generalize to AGI.

My understanding is that many of those studying MNIST-classifier CNNs circa 2010 were in fact studying this because they believed similar neural-net inspired mechanisms would go much further, and would not be surprised if very similar mechanisms were... (read more)

6Thane Ruthenis3mo
After an exchange with Ryan, I see that I could've stated my point a bit clearer. It's something more like "the algorithms that the current SOTA AIs execute during their forward passes do not necessarily capture all the core dynamics that would happen within an AGI's cognition, so extrapolating the limitations of their cognition to AGI is a bold claim we have little evidence for". So, yes, studying weaker AIs sheds some light on stronger ones (that's why there's "nearly" in "nearly no data"), so studying CNNs in order to learn about LLMs before LLMs exist isn't totally pointless. But the lessons you learn would be more about "how to do interpretability on NN-style architectures" and "what's the SGD's biases?" and "how precisely does matrix multiplication implement algorithms?" and so on. Not "what precise algorithms does a LLM implement?".
Load More