All of Daniel_Eth's Comments + Replies

It is just that we have more stories where bad characters pretend to be good than vice versa

I'm not sure if this is the main thing going on or not. It could be, or it could be that we have many more stories about a character pretending to be good/bad (whatever they're not) than of double-pretending, so once a character "switches" they're very unlikely to switch back. Even if we do have more stories of characters pretending to be good than of pretending to be bad, I'm uncertain about how the LLM generalizes if you give it the opposite setup.

2Cleo Nardo3mo
wawaluigis are misaligned [] TLDR: if I said "hey this is Bob, he pretends to be harmful and toxic!", what would you expect from Bob? Probably a bunch of terrible things — like offering hazardous information.

Proposed solution – fine-tune an LLM for the opposite of the traits that you want, then in the prompt elicit the Waluigi. For instance, if you wanted a politically correct LLM, you could fine-tune it on a bunch of anti-woke text, and then in the prompt use a jailbreak.

I have no idea if this would work, but seems worth trying, and if the waluigi are attractor states while the luigi are not, this could plausible get around that (also, experimenting around with this sort of inversion might help test whether the waluigi are indeed attractor states in general).

4Seb Farquhar2mo
I'm not sure how serious this suggestion is, but note that: 1. It involves first training a model to be evil, running it, and hoping that you are good enough at jailbreaking to make it good rather than make it pretend to be good. And then to somehow have that be stable. 2. The opposite of something really bad is not necessarily good. E.g., the opposite of a paperclip maximiser is... I guess a paperclip minimiser? That seems approximately as bad.
I don't think that Waluigi is an attractor state in some deeply meaningful sense. It is just that we have more stories where bad characters pretend to be good than vice versa (although we have some []). So a much simpler "solution" would be just to filter the training set. But it's not an actual solution, because it's not an actual problem. Instead, it is just a frame to understand LLM behaviour better (in my opinion).

"Putin has stated he is not bluffing"
I think this is very weak evidence of anything. Would you expect him to instead say that he was bluffing?

Great post!

I was curious what some of this looked like, so I graphed it, using the dates you specifically called out probabilities. For simplicity, I assumed constant probability within each range (though I know you said this doesn't correspond to your actual views). Here's what I got for cumulative probability:


And here's the corresponding probabilities of TAI being developed per specific year:

The dip between 2026 and 2030 seems unjustified to me. (I also think the huge drop from 2040-2050 is too aggressive, as even if we expect a plateauing of compu... (read more)

In spoken language, you could expand the terms to "floating-point operations" vs "floating-point operations per second" (or just "operations (per second)" if that felt more apt)

FWIW, I am ~100% confident that this is correct in terms of what they refer to. Typical estimates of the brain are that it uses ~10^15 FLOP/s (give or take a few OOM) and the fastest supercomputer in the world uses ~10^18 FLOP/s when at maximum (so there's no way GPT-3 was trained on 10^23 FLOP/s).

If we assume the exact numbers here are correct, then the actual conclusion is that GPT-3 was trained on the amount of compute the brain uses in 10 million seconds, or around 100 days. 

It's interesting the term 'abused' was used with respect to AI. It makes me wonder if the authors have misalignment risks in mind at all or only misuse risks.


A separate press release says, "It is important that the federal government prepare for unlikely, yet catastrophic events like AI systems gone awry" (emphasis added), so my sense is they have misalignment risks in mind.

Hmm, does this not depend on how the Oracle is making its decision? I feel like there might be versions of this that look more like the smoking lesion problem – for instance, what if the Oracle is simply using a (highly predictive) proxy to determine whether you'll 1-box or 2-box? (Say, imagine if people from cities 1-box 99% of the time, and people from the country 2-box 99% of the time, and the Oracle is just looking at where you're from).

2Stephen Bennett (Previously GWS)1y
It seems like this might become a discussion of Aleatory vs Epistemic Uncertainty. I like this way of describing the distinction between the two (from here - pdf []): I believe that the differences between classical decision theory and FDT's only occur in the context of aleatory uncertainty (although in some formulations of newcomb's paradox there's no actual uncertainty). That is, if you are in an epistemically uncertain environment, then FDT and classical decision theory will agree on all problems (hopefully by saying this I can cause someone to come up with a juicy counterexample). In your example, it is unclear to me what sort of uncertainty the problem possesses because I don't know enough about the oracle. In the simple example where a quantum coin with a 99% chance of coming up heads is flipped to determine whether the oracle gives the right answer or the wrong answer, then the answer I gave above is right. Use expected value under the assumptions of FDT; classical decision theory will lead you to 2-box and that would lower your expected gains. In your example relying on demographic information, it will depend a bit on what sorts of information count as "demographic" in nature. If you are, in this moment by reading this comment on lesswrong, forming the sort of self that will result in you 1-boxing or 2-boxing and that information is also an input to this sort of oracle, then I encourage you to 1-box on the oracle you had in mind.

Okay, but I've also seen rationalists use point estimates for probability in a way that led them to mess up Bayes, and such that it would be clear if they recognized the probability was uncertain (e.g., I saw this a few times related to covid predictions). I feel like it's weird to use "frequency" for something that will only happen (or not happen) once, like whether the first AGI will lead to human extinction, though ultimately I don't really care what word people are using for which concept.

1Yair Halberstadt1y
I think it's less a mistake of using point estimates, but rather not realizing certain probabilities are correlated, so you can't just multiply them out.

How common is it for transposon count to increase in a cell? If it's a generally uncommon event for any one cell, then it could simply be that clones from a large portion of cells will only start off with marginally more (if any) extra transposons, while those that do start off with a fair bit more don't make it past the early development process.

A perhaps even easier (though somewhat less informative) experiment would be to Crispr/CAS9 a bunch of extra transposons into an organism and see if that leads to accelerated aging.

Play with GPT-3 for long, and you'll see it fall hard too.
This sample is a failure.  No one would have written this, not even as satire or surrealism or experimental literature.  Taken as a joke, it's a nonsensical one.  Taken as a plot for a film, it can't even keep track of who's alive and who's dead.  It contains three recognizable genres of writing that would never appear together in this particular way, with no delineations whatsoever.

This sample seems pretty similar to the sort of thing that a human might dream, or that a human... (read more)

FWIW, Hanson has elsewhere promoted the idea that algorithmic progress is primarily due to hardware progress. Relevant passage:

Maybe there are always lots of decent ideas for better algorithms, but most are hard to explore because of limited computer hardware. As hardware gets better, more new ideas can be explored, and some of them turn out to improve on the prior best algorithms. This story seems to at least roughly fit what I’ve heard about the process of algorithm design.

So he presumably would endorse the claim that HLMI will likely requires several te... (read more)

"uranium, copper, lithium, oil"
These are commodities, not equities (unless OP meant invested in companies in those industries?)

Sorry for being unclear. I meant producers of those commodities, except for uranium for which I also have some exposure to the commodity itself.
I assumed that, or derivatives of those commodities. 

So again, I wasn't referring to the expected value of the number of steps, but instead how we should update after learning about the time – that is, I wasn't talking about  but instead  for various .

Let's dig into this. From Bayes, we have: . As you say,  ~ kt^(k-1). We have the pesky  term, but we can note that for any value of , this will yield a constant, so we can discard it and recognize that now we don't get a value for the update, but instead ... (read more)

2Thomas Sepulchre2y
I agree with those computations/results. Thank you

The intuition, I assume, is that this is the inverse function of the previous estimator.

So the estimate for the number of hard steps doesn't make sense in the absence of some prior. Starting with a prior distribution for the likelihood of the number of hard steps, and applying bayes rule based on the time passed and remaining, we will update towards more mass on k = t/(T–t) (basically, we go from P( t | k) to P( k | t)).

By "gives us reason to expect" I didn't mean "this will be the expected value", but instead "we should update in this direction".

1Thomas Sepulchre2y
  Ok let's do this. Since k is an integer, I guess our prior should be a sequence pk. We already know P(t|k)=ktk−1. We can derive from this P(t)=∑kP(t|k)pk, and finally P(k|t)=P(t|k)pkP(t).  In our case, t=4.55.5 I guess the most natural prior would be the uniform prior: we fix an integer N, and set pk=1N for k∈[1;N]. From this we can derive the posterior distribution. This is a bit tedious to do by hand, but easy to code. From the posterior distribution we can for example extract the expected value of k : E[k|t]=∑kkP(k|t). I computed it for N∈[1;100] and voilà!     Obviously E[k|t] is strictly increasing. It also converges toward 10. Actually, for almost all values of N, the expected value is very close to 10. To give a criterion, for N=15, the expected value is already above 7.25, which implies that it is closer to 10 than to 4.5.  We can use different types of prior. I also tried pk=e−kN (with a normalization constant), which is basically a smoother version of the previous one. Instead of stating with certainty "the number of hard steps is at most N", it's more "the number of hard steps is typically N, but any huge number is possible". This gives basically the same result, except is separates from 4.5 even faster, as soon as N=13. My point is not to say that the number of hard steps is 10 in our case. Obviously I cannot know that. Whatever prior we may choose, we will end up with a distribution of probability, not a nice clear answer. My point is that if, for the sake of simplicity, we choose to only remember one number / to only share one number, it should probably not be 4.5 (or k=tT−t), but instead 10 (or k=t+TT−t). I bet that, if you actually have a prior, and actually make the bayesian update, you'll find the same result. 

Having a model for the dynamics at play is valuable for making progress on further questions. For instance, knowing that the expected hard-step time is ~identical to the expected remaining time gives us reason to expect that the number of hard steps passed on Earth already is perhaps ~4.5 (given that the remaining time in Earth's habitability window appears to be ~1 billion years). Admittedly, this is a weak update, and there are caveats here, but it's not nothing.

Additionally, the fact that the expected time for hard steps is ~independent of the difficult... (read more)

1Thomas Sepulchre2y
Sorry for this late response I actually disagree with this statement. Assuming from now on that T=1,  so that we can normalize everything, your post shows that, if there are k hard steps, given that they are all achieved, then the expected time required to achieve all of them is t=kk+1. Thus, you have built an estimator of t, given k: ^t(k)=kk+1.  Now you want to solve the reverse problem: there is k∗ hard steps, and  you want to estimate this quantity. We have one piece of information, which is t. Therefore, the problem is, given t, to build an estimator ^k(t). This is not the same problem. You propose to use ^k(t)=t1−t. The intuition, I assume, is that this is the inverse function of the previous estimator. The first thing we could expect from this estimator is to have the correct expected value, i.e. E[^k(t)]=k∗. Let's check that. The density of t is ft(s)=k∗sk∗−1 (quick sanity check here : the expected value of t is indeed k∗k∗+1). From this we can derive the expected value E[^k(t)]=∫10s1−sk∗sk∗−1ds. And we conclude that E[^k(t)]=∞ How did this happen? Well, it happened because it is likely for t to be really close to 1, which makes ^k(t) explode.  Ok, so the expected value doesn't match anything. But maybe k∗ is the most likely result, which would already be great. Let's check that too. The density of ^k(s) is f^k^k(s)=1|^k′(s)|ft(s). Since ^k(s)=s1−s, we have ^k′(s)=1(1−s)2, therefore f^k^k(s)=(1−s)2k∗sk∗−1. We can derive this to get the argmax :¯s=k∗−1k∗+1, and therefore ¯k=¯s1−¯s=k∗−12. Surprisingly enough, the most likely result is not k∗ Just to be clear about what this means, if there are infinitely many planets in the universe on which live sentient species as advanced as us, and on each of them a smart individual is using your estimator ^k(t)=t1−t to guess how many hard step they already went through, then the average of these estimates is infinite, and the most common result is k∗−12. Fixing the estimator An easy thing is to find the maximum

I like this comment, though I don't have a clear-eyed view of what sort of research makes (A) or (B) more likely. Is there a concrete agenda here (either that you could link to, or in your head), or is the work more in the exploratory phase?

5Steven Byrnes2y
You could read all my posts, but maybe a better bet is to wait a month or two, I'm in the middle of compiling everything into a (hopefully) nice series of blog posts that lays out everything I know so far. I don't really know how to do (B) except "keep trying to do (A), and failing, and maybe the blockers will become more apparent".

Yeah, that also triggered my "probably false or very misleading" alarm. People are making all sorts of wild claims about covid online for political points, and I don't even know who the random person on twitter making that claim was.

Yeah, I'm not trying to say that the point is invalid, just that phrasing may give the point more appeal than is warranted from being somewhat in the direction of a deepity. Hmm, I'm not sure what better phrasing would be.

The statement seems almost tautological – couldn't we somewhat similarly claim that we'll understand NNs in roughly the same ways that we understand houses, except where we have reasons to think otherwise? The "except where we have reasons to think otherwise" bit seems to be doing a lot of work.

Compare: when trying to predict events, you should use their base rate except when you have specific updates to it. Similarly, I claim, our beliefs about brains should be the main reference for our beliefs about neural networks, which we can then update from. I agree that the phrasing could be better; any suggestions?

Thanks. I feel like for me the amount of attention for a marginal daily pill is negligibly small (I'm already taking a couple supplements, and I leave the bottles all on the kitchen table, so this would just mean taking one more pill with the others), but I suppose this depends on the person, and also the calculus is a bit different for people who aren't taking any supplements now.

"the protocol I analyze later requires a specific form of niacin"
What's the form? Also, do you know what sort of dosage is used here?


If niacin is helpful for long covid, I wonder if taking it decreases the chances of getting long covid to begin with. Given how well tolerated it is, it might be worth taking just in case.

The original protocol is here [] (which specifies the niacin form, a suggested dose, and some support vitamins), and my analysis of it is here []. There's a comment here [] on a study that maybe found niacin useful for acute covid, although I haven't investigated and have low confidence by default. I think there's merit to taking nutrition seriously and stocking up on many things, but am in general wary of treating taking vitamins as a free action. Even a daily pill consumes attention, and it can be very hard to notice negative long-term effects or changes in optimal dose unless you're tracking very closely [] (which is a bigger attention cost). There's a very early stage start-up I'm excited about because they might make that tracking and analysis easier, but they're very far from shipping.

"at least nanotech and nano-scale manufacturing at a societal scale would require much more energy than we have been willing to provide it"

Maybe, but:
1) If we could build APM on a small scale now we would

2) We can't

3) This has nothing to do with energy limits

(My sense is also that advanced APM would be incredibly energy efficient and also would give us very cheap energy – Drexler provides arguments for why in Radical Abundance.)


I don't think regulatory issues have hurt APM either (agree they have in biotech, though). Academic power struggles have hur... (read more)

I can see how oodles more energy would mean more housing, construction, spaceflight, and so on, leading to higher GDP and higher quality of life. I don't see how it would lead to revolutions in biotech and nanotech – surely the reason we haven't cured aging or developed atomically precise manufacturing aren't the energy requirements to do those things.

Given my reading of his arguments in the book, it does seem that at least nanotech and nano-scale manufacturing at a societal scale would require much more energy than we have been willing to provide it, so in effect, maybe using a lot more energy in the short term is a prerequisite? Of course, there are also all the regulatory issues and the Machiavellian "power struggles" in academia that Hall claims as reasons for why we don't have advanced nanotech already.  Biotech might be different though since a lot of innovation there is mediated by computing and software.

Worth noting that Northern states abolished slavery long before industrialization. Perhaps even more striking, the British Empire (mostly) abolished slavery during the peak of its profitability. In both cases (and many others across the world), moral arguments seem to have played a very large role.

"Mandates continue to make people angry"
True for some people, but also worth noting that they're popular overall. Looks like around 60% of Americans support Biden's mandate, for instance (this is pretty high for a cultural war issue).


"Republicans are turning against vaccinations and vaccine mandates in general... would be rather disastrous if red states stopped requiring childhood immunizations"
Support has waned, and it would be terrible if they stopped them, but note that:

  • Now republicans are split ~50:50; so it's not like they have a consensus eithe
... (read more)

Also, these physical limits – insofar as they are hard limits – are limits on various aspects of the impressiveness of the technology, but not on the cost of producing the technology. Learning-by-doing, economies of scale, process-engineering R&D, and spillover effects should still allow for costs to come down, even if the technology itself can hardly be improved.

Potentially worth noting that if you add the lifetime anchor to the genome anchor, you most likely get ~the genome anchor.

"Resources are always limited (as they should be) and prioritization is necessary. Why should they focus on who is and isn’t wearing a mask over enforcing laws against, I don’t know, robbery, rape and murder?"

I'm all for the police prioritizing serious crimes over more minor crimes (potentially to the extent of not enforcing the minor crime at all), but I have a problem, as a general rule, with the police telling people that they won't enforce a law and will instead just be asking for voluntary compliance. That sort of statement is completely unnecessary, and seems to indicate that the city doesn't have as strong control of their police as they should.

Also, there's this misconception of what's expected of whom. In most cases the police are asked to make sure businesses are implementing the proper administrative controls - proper signage, politely telling people to wear a mask, training their workers on deescalation, making masks available to customers who forgot their mask, etc. If an individual fails to comply with the mandate, a business (or anyone) can (but doesn't have to) report the incident, and if the police are able to identify the person who who failed to comply with the mandate, a fine can be issued.
I'm with ya. Especially since "stochastic enforcement" should always be doable: Expected penalty = fine × probability of getting caught That would be independent of department resources.

Also, the train of thought seems somewhat binary. If doctors are somewhat competent, but the doctors who worked at the FDA were unusually competent, then having an FDA would still make sense.

Thanks for the comments!

Re: The Hard Paths Hypothesis

I think it's very unlikely that Earth has seen other species as intelligent as humans (with the possible exception of other Homo species). In short, I suspect there is strong selection pressure for (at least many of) the different traits that allow humans to have civilization to go together. Consider dexterity – such skills allow one to use intelligence to make tools; that is, the more dexterous one is, the greater the evolutionary value of high intelligence, and the more intelligent one is, the greater ... (read more)


I agree that symbolic doesn't have to mean not bitter lesson-y (though in practice I think there are often effects in that direction). I might even go a bit further than you here and claim that a system with a significant amount of handcrafted aspects might still be bitter lesson-y, under the right conditions. The bitter lesson doesn't claim that the maximally naive and brute-force method possible will win, but instead that, among competing methods, more computationally-scalable methods will generally win over time (as compute increases). This shoul... (read more)

I think very few people would explicitly articulate a view like that, but I also think there are people who hold a view along the lines of, "Moore will continue strong for a number of years, and then after that compute/$ will grow at <20% as fast" – in which case, if we're bottlenecked on hardware, whether Moore ends several years earlier vs later could have a large effect on timelines.

One more crux that we should have included (under the section on "The Human Brain"):
"Human brain appears to be a scaled-up version of a more generic mammalian/primate brain"

Does this seem likely? I would guess this is basically true for the sensory and emotional parts, but language and mathematical reasoning seem like a large leap to me, so humans may be doing something qualitatively different from nonhuman animals. Nonhuman animals don't do recursion, as far as I know, or maybe they can, but limited to very low recursion depth in practice. OTOH, this might be of interest; he argues the human cerebellum may help explain some of our additional capacity for language and tool use:

So just to be clear, the model isn't necessarily endorsing the claim, just saying that the claim is a potential crux.

I understand, thought it was worth commenting on anyway.

I think in practice allowing them to be sued for egregious malpractice would lead them to be more hesitant to approve, since I think people are much more likely to sue for damage from approved drugs than damage from being prevented from drugs, plus I think judges/juries would find those cases more sympathetic. I also think this standard would potentially cause them to be less likely to change course when they make a mistake and instead try to dig up evidence to justify their case.

This is probably a good thing - I'd imagine that if you could sue the FDA, they'd be a lot more hesitant to approve anything.

Certainly if lawsuits were allowed for approving things but not allowed for failing to approve things, that would be a disaster. But the issue here isn't that they approved something they shouldn't have, it's that, faced with extremely time-sensitive approval decisions, they keep dragging their feet and waiting weeks while not appearing to do anything in the mean time, ie failing to do their job promptly. If they could be sued for that, it would likely be an improvement.

Yeah, that's fair - it's certainly possible that the things that make intelligence relatively hard for evolution may not apply to human engineers. OTOH, if intelligence is a bundle of different modules that all coexistent in humans and of which different animals have evolved in various proportions, that seems to point away from the blank slate/"all you need is scaling" direction.

I think this is a good point, but I'd flag that the analogy might give the impression that intelligence is easier than it is - while animals have evolved flight multiple times by different paths (birds, insects, pterosaurs, bats) implying flight may be relatively easy, only one species has evolved intelligence.

3Daniel Kokotajlo2y
Hmmm, this is a good point -- but here's a counter that just now occurred to me: Let's disambiguate "intelligence" into a bunch of different things. Reasoning, imitation, memory, data-efficient learning, ... the list goes on. Maybe the complete bundle has only evolved once, in humans, but almost every piece of the bundle has evolved separately many times. In particular, the number 1 thing people point to as a candidate X for "X is necessary for TAI and we don't know how to make AIs with X yet and it's going to be really hard to figure it out soon" is data-efficient learning. But data-efficient learning has evolved separately many times; AlphaStar may need thousands of years of Starcraft to learn how to play, but dolphins can learn new games in minutes. Games with human trainers, who are obviously way out of distribution as far as Dolphin's ancestral environment is concerned. The number 2 thing I hear people point to is "reasoning" and maybe "causal reasoning" in particular. I venture to guess that this has evolved a bunch of times too, based on how various animals can solve clever puzzles to get pieces of food. (See also: [] )