I think a very common problem in alignment research today is that people focus almost exclusively on a specific story about strategic deception/scheming, and that story is a very narrow slice of the AI extinction probability mass. At some point I should probably write a proper post on this, but for now here are few off-the-cuff example AI extinction stories which don't look like the prototypical scheming story. (These are copied from a Facebook thread.)
Also (separate comment because I expect this one to be more divisive): I think the scheming story has been disproportionately memetically successful largely because it's relatively easy to imagine hacky ways of preventing an AI from intentionally scheming. And that's mostly a bad thing; it's a form of streetlighting.
Most of the problems you discussed here more easily permit hacky solutions than scheming does.
True, but Buck's claim is still relevant as a counterargument to my claim about memetic fitness of the scheming story relative to all these other stories.
IMO the main argument for focusing on scheming risk is that scheming is the main plausible source of catastrophic risk from the first AIs that either pose substantial misalignment risk or that are extremely useful (as I discuss here). These other problems all seem like they require the models to be way smarter in order for them to be a big problem. Though as I said here, I'm excited for work on some non-scheming misalignment risks.
scheming is the main plausible source of catastrophic risk from the first AIs that either pose substantial misalignment risk or that are extremely useful...
Seems quite wrong. The main plausible source of catastrophic risk from the first AIs that either pose substantial misalignment risk or that are extremely useful is that they cause more powerful AIs to be built which will eventually be catastrophic, but which have problems that are not easily iterable-upon (either because problems are hidden, or things move quickly, or ...).
And causing more powerful AIs to be built which will eventually be catastrophic is not something which requires a great deal of intelligent planning; humanity is already racing in that direction on its own, and it would take a great deal of intelligent planning to avert it. This story, for example:
- People try to do the whole "outsource alignment research to early AGI" thing, but the human overseers are themselves sufficiently incompetent at alignment of superintelligences that the early AGI produces a plan which looks great to the overseers (as it was trained to do), and that plan totally fails to align more-powerful next-gen AGI at all. And at that point, they're already on the more-powerful next gen, so it's too late.
This story sounds clearly extremely plausible (do you disagree with that?), involves exactly the sort of AI you're talking about ("the first AIs that either pose substantial misalignment risk or that are extremely useful"), but the catastropic risk does not come from that AI scheming.
This problem seems important (e.g. it's my last bullet here). It seems to me much easier to handle, because if this problem is present, we ought to be able to detect its presence by using AIs to do research on other subjects that we already know a lot about (e.g. the string theory analogy here). Scheming is the only reason why the model would try to make it hard for us to notice that this problem is present.
A few problems with this frame.
First: you're making reasonably-pessimistic assumptions about the AI, but very optimistic assumptions about the humans/organization. Sure, someone could look for the problem by using AIs to do research on other subject that we already know a lot about. But that's a very expensive and complicated project - a whole field, and all the subtle hints about it, need to be removed from the training data, and then a whole new model trained! I doubt that a major lab is going to seriously take steps much cheaper and easier than that, let alone something that complicated.
One could reasonably respond "well, at least we've factored apart the hard technical bottleneck from the part which can be solved by smart human users or good org structure". Which is reasonable to some extent, but also... if a product requires a user to get 100 complicated and confusing steps all correct in order for the product to work, then that's usually best thought of as a product design problem, not a user problem. Making the plan at least somewhat robust to people behaving realistically less-than-perfectly is itself part of the problem.
Second: looking for the problem by testing on other f...
See also ‘The Main Sources of AI Risk?’ by Wei Dai and Daniel Kokotajlo, which puts forward 35 routes to catastrophe (most of which are disjunctive). (Note that many of the routes involve something other than intent alignment going wrong.)
In response to the Wizard Power post, Garrett and David were like "Y'know, there's this thing where rationalists get depression, but it doesn't present like normal depression because they have the mental habits to e.g. notice that their emotions are not reality. It sounds like you have that."
... and in hindsight I think they were totally correct.
Here I'm going to spell out what it felt/feels like from inside my head, my model of where it comes from, and some speculation about how this relates to more typical presentations of depression.
Core thing that's going on: on a gut level, I systematically didn't anticipate that things would be fun, or that things I did would work, etc. When my instinct-level plan-evaluator looked at my own plans, it expected poor results.
Some things which this is importantly different from:
... but importantly, the core thing is easy to confuse with all three of those. For instance, my intuitive plan-evaluator predicted that things which used to make me happy would not make me happy (like e.g. dancing), but if I actually did the things they still made me ha...
This seems basically right to me, yup. And, as you imply, I also think the rat-depression kicked in for me around the same time likely for similar reasons (though for me an at-least-equally large thing that roughly-coincided was the unexpected, disappointing and stressful experience of the funding landscape getting less friendly for reasons I don't fully understand.) Also some part of me thinks that the model here is a little too narrow but not sure yet in what way(s).
This matches with the dual: mania. All plans, even terrible ones, seem like they'll succeed and this has flow through effects to elevated mood, hyperactivity, etc.
Whether or not this happens in all minds, the fact that people can alternate fairly rapidly between depression and mania with minimal trigger suggests there can be some kind of fragile "chemical balance" or something that's easily upset. It's possible that's just in mood disorders and more stable minds are just vulnerable to the "too many negative updates at once" thing without greater instability.
Like, when I head you say "your instinctive plan-evaluator may end up with a global negative bias" I'm like, hm, why not just say "if you notice everything feels subtly heavier and like the world has metaphorically lost color"
Because everything did not feel subtly heavier or like the world had metaphorically lost color. It was just, specifically, that most nontrivial things I considered doing felt like they'd suck somehow, or maybe that my attention was disproportionately drawn to the ways in which they might suck.
And to be clear, "plan predictor predicts failure" was not a pattern of verbal thought I noticed, it's my verbal description of the things I felt on a non-verbal level. Like, there is a non-verbal part of my mind which spits out various feelings when I consider doing different things, and that part had a global negative bias in the feelings it spit out.
I use this sort of semitechnical language because it allows more accurate description of my underlying feelings and mental motions, not as a crutch in lieu of vague poetry.
Epistemic status: I don't fully endorse all this, but I think it's a pretty major mistake to not at least have a model like this sandboxed in one's head and check it regularly.
Full-cynical model of the AI safety ecosystem right now:
What makes you confident that AI progress has stagnated at OpenAI? If you don’t have the time to explain why I understand, but what metrics over the past year have stagnated?
Could you name three examples of people doing non-fake work? Since towardsness to non-fake work is easier to use for aiming than awayness from fake work.
I do not necessarily disagree with this, coming from a legal / compliance background. If you see any of my profiles, I constantly complain about "performative compliance" and "compliance theatre". Painfully present across the legal and governance sectors.
That said: can you provide examples of activism or regulatory efforts that you do agree with? What does a "non fake" regulatory effort look like?
I don't think it would be okay to dismiss your take entirely, but it would be great to see what solutions you'd propose too. This is why I disagree in principle, because there are no specific points to contribute to.
In Europe, paradoxically, some of the people "close enough to the bureaucracy" that pushed for the AI Act to include GenAI providers, were OpenAI-adjacent.
But I will rescue this:
"(b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI"
BigTech is too powerful to lobby against. "Stopping advanced AI" per se would contravene many market regulations (unless we define exactly what you mean by advanced AI and the undeniable dangers to people's lives). Regulators can only ...
What domains of 'real improvement' exist that are uncoupled to human perceptions of improvement, but still downstream of text prediction?
As defined, this is a little paradoxical: how could I convince a human like you to perceive domains of real improvement which humans do not perceive...?
correctly guessing the true authors of anonymous text
See, this is exactly the example I would have given: truesight is an obvious example of a domain of real improvement which appears on no benchmarks I am aware of, but which appears to correlate strongly with the pretraining loss, is not applied anywhere (I hope), is unobvious that LLMs might do it and the capability does not naturally reveal itself in any standard use-cases (which is why people are shocked when it surfaces), and it would have been easy for no one to have observed it up until now or dismissed it, and even now after a lot of publicizing (including by yours truly), only a few weirdos know much about it.
Why can't there be plenty of other things like inner-monologue or truesight? ("Wait, you could do X? Why didn't you tell us?" "You never asked.")
...What domains of 'real improvement' exist that are uncoupled to human perceptions
On o3: for what feels like the twentieth time this year, I see people freaking out, saying AGI is upon us, it's the end of knowledge work, timelines now clearly in single-digit years, etc, etc. I basically don't buy it, my low-confidence median guess is that o3 is massively overhyped. Major reasons:
I just spent some time doing GPQA, and I think I agree with you that the difficulty of those problems is overrated. I plan to write up more on this.
@johnswentworth Do you agree with me that modern LLMs probably outperform (you with internet access and 30 minutes) on GPQA diamond? I personally think this somewhat contradicts the narrative of your comment if so.
Ok, so sounds like given 15-25 mins per problem (and maybe with 10 mins per problem), you get 80% correct. This is worse than o3, which scores 87.7%. Maybe you'd do better on a larger sample: perhaps you got unlucky (extremely plausible given the small sample size) or the extra bit of time would help (though it sounds like you tried to use more time here and that didn't help). Fwiw, my guess from the topics of those questions is that you actually got easier questions than average from that set.
I continue to think these LLMs will probably outperform (you with 30 mins). Unfortunately, the measurement is quite expensive, so I'm sympathetic to you not wanting to get to ground here. If you believe that you can beat them given just 5-10 minutes, that would be easier to measure. I'm very happy to bet here.
I think that even if it turns out you're a bit better than LLMs at this task, we should note that it's pretty impressive that they're competitive with you given 30 minutes!
So I still think your original post is pretty misleading [ETA: with respect to how it claims GPQA is really easy].
I think the models would beat you by more at FrontierMath.
I think that how you talk about the questions being “easy”, and the associated stuff about how you think the baseline human measurements are weak, is somewhat inconsistent with you being worse than the model.
Generalizing the lesson here: the supposedly-hard benchmarks for which I have seen a few problems (e.g. GPQA, software eng) turn out to be mostly quite easy, so my prior on other supposedly-hard benchmarks which I haven't checked (e.g. FrontierMath) is that they're also mostly much easier than they're hyped up to be
Daniel Litt's account here supports this prejudice. As a math professor, he knew instantly how to solve the low/medium-level problems he looked at, and he suggests that each "high"-rated problem would be likewise instantly solvable by an expert in that problem's subfield.
And since LLMs have eaten ~all of the internet, they essentially have the crystallized-intelligence skills for all (sub)fields of mathematics (and human knowledge in general). So from their perspective, all of those problems are very "shallow". No human shares their breadth of knowledge, so math professors specialized even in slightly different subfields would indeed have to do a lot of genuine "deep" cognitive work; this is not the case for LLMs.
GPQA stuff is even worse, a literal advanced trivia quiz that seems moderately resistant to literal humans literally googling things, but not to the way the kno...
[...] he suggests that each "high"-rated problem would be likewise instantly solvable by an expert in that problem's subfield.
This is an exaggeration and, as stated, false.
Epoch AI made 5 problems from the benchmark public. One of those was ranked "High", and that problem was authored by me.
On the other hand, I don't think the problem is very hard insight-wise - I th...
I'm not confident one way or another.
I think my key crux is that in domains where there is a way to verify that the solution actually works, RL can scale to superhuman performance, and mathematics/programming are domains that are unusually easy to verify/gather training data for RL performance, so with caveats it can become rather good at those specific domains/benchmarks like millennium prize evals, but the important caveat is I don't believe this transfers very well to domains where verifying isn't easy, like creative writing.
I'm bearish on that. I expect GPT-4 to GPT-5 to be palatably less of a jump than GPT-3 to GPT-4, same way GPT-3 to GPT-4 was less of a jump than GPT-2 to GPT-3. I'm sure it'd show lower loss, and saturate some more benchmarks, and perhaps an o-series model based on it clears FrontierMath, and perhaps programmers and mathematicians would be able to use it in an ever-so-bigger number of cases...
I was talking about the 1 GW systems that would be developed in late 2026-early 2027, not GPT-5.
That's the opposite of my experience. Nearly all the papers I read vary between "trash, I got nothing useful out besides an idea for a post explaining the relevant failure modes" and "high quality but not relevant to anything important". Setting up our experiments is historically much faster than the work of figuring out what experiments would actually be useful.
There are exceptions to this, large projects which seem useful and would require lots of experimental work, but they're usually much lower-expected-value-per-unit-time than going back to the whiteboard, understanding things better, and doing a simpler experiment once we know what to test.
Actually, I've changed my mind, in that the reliability issue probably does need at least non-trivial theoretical insights to make AIs work.
I am unconvinced that "the" reliability issue is a single issue that will be solved by a single insight, rather than AIs lacking procedural knowledge of how to handle a bunch of finicky special cases that will be solved by online learning or very long context windows once hardware costs decrease enough to make one of those approaches financially viable.
If I were to think about it a little, I'd suspect the big difference that LLMs and humans have is state/memory, where humans do have state/memory, but LLMs are currently more or less stateless today, and RNN training has not been solved to the extent transformers were.
One thing I will also say is that AI winters will be shorter than previous AI winters, because AI products can now be sort of made profitable, and this gives an independent base of money for AI research in ways that weren't possible pre-2016.
I agree with you on your assessment of GPQA. The questions themselves appear to be low quality as well. Take this one example, although it's not from GPQA Diamond:
In UV/Vis spectroscopy, a chromophore which absorbs red colour light, emits _____ colour light.
The correct answer is stated as yellow and blue. However, the question should read transmits, not emits; molecules cannot trivially absorb and re-emit light of a shorter wavelength without resorting to trickery (nonlinear effects, two-photon absorption).
This is, of course, a cherry-picked example, but is exactly characteristic of the sort of low-quality science questions I saw in school (e.g with a teacher or professor who didn't understand the material very well). Scrolling through the rest of the GPQA questions, they did not seem like questions that would require deep reflection or thinking, but rather the sort of trivia things that I would expect LLMs to perform extremely well on.
I'd also expect "popular" benchmarks to be easier/worse/optimized for looking good while actually being relatively easy. OAI et. al probably have the mother of all publication biases with respect to benchmarks, and are selecting very heavily for items within this collection.
About a month ago, after some back-and-forth with several people about their experiences (including on lesswrong), I hypothesized that I don't feel the emotions signalled by oxytocin, and never have. (I do feel some adjacent things, like empathy and a sense of responsibility for others, but I don't get the feeling of loving connection which usually comes alongside those.)
Naturally I set out to test that hypothesis. This note is an in-progress overview of what I've found so far and how I'm thinking about it, written largely to collect my thoughts and to see if anyone catches something I've missed.
Under the hypothesis, this has been a life-long thing for me, so the obvious guess is that it's genetic (the vast majority of other biological state turns over too often to last throughout life). I also don't have a slew of mysterious life-long illnesses, so the obvious guess is that's it's pretty narrowly limited to oxytocin - i.e. most likely a genetic variant in either the oxytocin gene or receptor, maybe the regulatory machinery around those two but that's less likely as we get further away and the machinery becomes entangled with more other things.
So I got my genome sequenced, and went...
The receptor was the first one I checked, and sure enough I have a single-nucleotide deletion 42 amino acids in to the open reading frame (ORF) of the 389 amino acid protein. That will induce a frameshift error, completely fucking up the rest of protein.
I'm kind of astonished that this kind of advance prediction panned out!
I admit I was somewhat surprised as well. On a gut level, I did not think that the very first things to check would turn up such a clear and simple answer.
This might be a bad idea right now, if it makes John's interests suddenly more normal in a mostly-unsteered way, eg because much of his motivation was coming from a feeling he didn't know was oxytocin-deficiency-induced. I'd suggest only doing this if solving this problem is likely to increase productivity or networking success; else, I'd delay until he doesn't seem like a critical bottleneck. That said, it might also be a very good idea, if depression or social interaction are a major bottleneck, which they are for many many people, so this is not resolved advice, just a warning that this may be a high variance intervention, and since John currently seems to be doing promising work, introducing high variance seems likely to have more downside.
I wouldn't say this to most people; taking oxytocin isn't known for being a hugely impactful intervention[citation needed], and on priors, someone who doesn't have oxytocin signaling happening is missing a lot of normal emotion, and is likely much worse off. Obviously, John, it's up to you whether this is a good tradeoff. I wouldn't expect it to completely distort your values or delete your skills. Someone who knows you better, such as yourself, would be much better equipped to predict if there's significant reason to believe downward variance isn't present. If you have experience with reward-psychoactive chemicals and yet are currently productive, it's more likely you already know whether it's a bad idea.
Didn't want to leave it unsaid, though.
Not directly related to your query, but seems interesting:
The receptor was the first one I checked, and sure enough I have a single-nucleotide deletion 42 amino acids in to the open reading frame (ORF) of the 389 amino acid protein. That will induce a frameshift error, completely fucking up the rest of protein.
Which, in turn, is pretty solid evidence for "oxytocin mediates the emotion of loving connection/aching affection" (unless there are some mechanisms you've missed). I wouldn't have guessed it's that simple.
Generalizing, this suggests we can study links between specific brain chemicals/structures and cognitive features by looking for people missing the same universal experience, checking if their genomes deviate from the baseline in the same way, then modeling the effects of that deviation on the brain. Alternatively, the opposite: search for people whose brain chemistry should be genetically near-equivalent except for one specific change, then exhaustively check if there's some blatant or subtle way their cognition differs from the baseline.
Doing a brief literature review via GPT-5, apparently this sort of thing is mostly done with regards to very "loud" conditions, rather th...
... and so at long last John found the answer to alignment
The answer was Love
and it had always has been
I wouldn't have guessed it's that simple.
~Surely there's a lot of other things involved in mediating this aspect of human cognition, at the very least (/speaking very coarse-grainedly), having the entire oxytocin system adequately hooked up to the rest of everything.
IE it is damn strong evidence that oxytocin signalinf is strictly necessary (and that there's no fallback mechanisms wtc) but not that it's simple.
Did your mother think you were unusual as a baby? Did you bond with your parents as a young child? I'd expect there to be some symptoms there if you truly have an oxytocin abnormality.
For my family this is much more of a "wow that makes so much sense" than a "wow what a surprise". It tracks extremely well with how I acted growing up, in a bunch of different little ways. Indeed, once the hypothesis was on my radar at all, it quickly seemed pretty probable on that basis alone, even before sequencing came back.
A few details/examples:
Those examples are relatively easy to explain, but most of my bits here come from less legible things. It's been very clear for a long time that I relate to other people unusually, in a way that intuitively matches being at the far low end of the oxytocin signalling axis.
Is that frame-shift error or those ~6 (?) SNPs previously reported in the literature for anything, or do they seem to be de novos? Also, what WGS depth did your service use? (Depending on how widely you cast your net, some of those could be spurious sequencing errors.)
I was a relatively late adopter of the smartphone. I was still using a flip phone until around 2015 or 2016 ish. From 2013 to early 2015, I worked as a data scientist at a startup whose product was a mobile social media app; my determination to avoid smartphones became somewhat of a joke there.
Even back then, developers talked about UI design for smartphones in terms of attention. Like, the core "advantages" of the smartphone were the "ability to present timely information" (i.e. interrupt/distract you) and always being on hand. Also it was small, so anything too complicated to fit in like three words and one icon was not going to fly.
... and, like, man, that sure did not make me want to buy a smartphone. Even today, I view my phone as a demon which will try to suck away my attention if I let my guard down. I have zero social media apps on there, and no app ever gets push notif permissions when not open except vanilla phone calls and SMS.
People would sometimes say something like "John, you should really get a smartphone, you'll fall behind without one" and my gut response was roughly "No, I'm staying in place, and the rest of you are moving backwards".
And in hindsight, boy howdy do...
I've updated marginally towards this (as a guy pretty focused on LLM-augmentation. I anticipated LLM brain rot, but it still was more pernicious/fast than I expected)
I do still think some-manner-of-AI-integration is going to be an important part of "moving forward" but probably not whatever capitalism serves up.
I have tried out using them pretty extensively for coding. The speedup is real, and I expect to get more real. Right now it's like a pretty junior employee that I get to infinitely micromanage. But it definitely does lull me into a lower agency state where instead of trying to solve problems myself I'm handing them off to LLMs much of the time to see if it can handle it.
During work hours, I try to actively override this, i.e. have the habit "send LLM off, and then go back to thinking about some kind of concrete thing (although often a higher level strategy." But, this becomes harder to do as it gets later in the day and I get more tired.
One of the benefits of LLMs is that you can do moderately complex cognitive work* while tired (*that a junior engineer could do). But, that means by default a bunch of time is spent specifically training the habit of using LLMs in...
(Disclaimer: only partially relevant rant.)
Outside of [coding], I don't know of it being more than a somewhat better google
I've recently tried heavily leveraging o3 as part of a math-research loop.
I have never been more bearish on LLMs automating any kind of research than I am now.
And I've tried lots of ways to make it work. I've tried telling it to solve the problem without any further directions, I've tried telling it to analyze the problem instead of attempting to solve it, I've tried dumping my own analysis of the problem into its context window, I've tried getting it to search for relevant lemmas/proofs in math literature instead of attempting to solve it, I've tried picking out a subproblem and telling it to focus on that, I've tried giving it directions/proof sketches, I've tried various power-user system prompts, I've tried resampling the output thrice and picking the best one. None of this made it particularly helpful, and the bulk of the time was spent trying to spot where it's lying or confabulating to me in its arguments or proofs (which it ~always did).
It was kind of okay for tasks like "here's a toy setup, use a well-known formula to compute the relationships between ...
(I feel sort of confused about how people who don't use it for coding are doing. With coding, I can feel the beginnings of a serious exoskeleton that can build structures around me with thought. Outside of that, I don't know of it being more than a somewhat better google).
There's common ways I currently use (the free version of) ChatGPT that are partially categorizable as “somewhat better search engine”, but where I feel like that's not representative of the real differences. A lot of this is coding-related, but not all, and the reasons I use it for coding-related and non-coding-related tasks feel similar. When it is coding-related, it's generally not of the form of asking it to write code for me that I'll then actually put into a project, though occasionally I will ask for example snippets which I can use to integrate the information better mentally before writing what I actually want.
The biggest difference in feel is that a chat-style interface is predictable and compact and avoids pushing a full-sized mental stack frame and having to spill all the context of whatever I was doing before. (The name of the website Stack Exchange is actually pretty on point here, insofar as they ...
I found LLMs to be very useful for literature research. They can find relevant prior work that you can't find with a search engine because you don't know the right keywords. This can be a significant force multiplier.
They also seem potentially useful for quickly producing code for numerical tests of conjectures, but I only started experimenting with that.
Other use cases where I found LLMs beneficial:
That said, I do agree that early adopters seem like they're overeager and maybe even harming themselves in some way.
I am perhaps an interesting corner case. I make extrenely heavy use of LLMs, largely via APIs for repetitive tasks. I sometimes run a quarter million queries in a day, all of which produce structured output. Incorrect output happens, but I design the surrounding systems to handle that.
A few times a week, I might ask a concrete question and get a response, which I treat with extreme skepticism.
But I don't talk to the damn things. That feels increasingly weird and unwise.
Agree about phones (in fact I am seriously considering switching to a flip phone and using my iphone only for things like navigation).
Not so sure about LLMs. I had your attitude initially, and I still consider them an incredibly dangerous mental augmentation. But I do think that conservatively throwing a question at them to find searchable keywords is helpful, if you maintain the attitude that they are actively trying to take over your brain and therefore remain vigilant.
Hypothesis: for smart people with a strong technical background, the main cognitive barrier to doing highly counterfactual technical work is that our brains' attention is mostly steered by our social circle. Our thoughts are constantly drawn to think about whatever the people around us talk about. And the things which are memetically fit are (almost by definition) rarely very counterfactual to pay attention to, precisely because lots of other people are also paying attention to them.
Two natural solutions to this problem:
These are both standard things which people point to as things-historically-correlated-with-highly-counterfactual-work. They're not mutually exclusive, but this model does suggest that they can substitute for each other - i.e. "going off into the woods" can substitute for a social circle with its own useful memetic environment, and vice versa.
One thing that I do after social interactions, especially those which pertain to my work, is to go over all the updates my background processing is likely to make and to question them more explicitly.
This is helpful because I often notice that the updates I’m making aren’t related to reasons much at all. It’s more like “ah they kind of grimaced when I said that, so maybe I'm bad?” or like “they seemed just generally down on this approach, but wait are any of those reasons even new to me? Haven’t I already considered those and decided to do it anyway?” or “they seemed so aggressively pessimistic about my work, but did they even understand what I was saying?” or “they certainly spoke with a lot of authority, but why should I trust them on this, and do I even care about their opinion here?” Etc. A bunch of stuff which at first blush my social center is like “ah god, it’s all over, I’ve been an idiot this whole time” but with some second glancing it’s like “ah wait no, probably I had reasons for doing this work that withstand surface level pushback, let’s remember those again and see if they hold up” And often (always?) they do.
This did not come naturally to me; I’ve had to train myself into doing it. But it has helped a lot with this sort of problem, alongside the solutions you mention i.e. becoming more of a hermit and trying to surround myself by people engaged in more timeless thought.
solution 2 implies that a smart person with a strong technical background would go on to work on important problems (by default) which is not necessarily universally true and it's IMO likely that many such people would be working on less important things than what their social circle is otherwise steering them to work on
The claim is not that either "solution" is sufficient for counterfactuality, it's that either solution can overcome the main bottleneck to counterfactuality. After that, per Amdahl's Law, there will still be other (weaker) bottlenecks to overcome, including e.g. keeping oneself focused on something important.
Good idea, but... I would guess that basically everyone who knew me growing up would say that I'm exactly the right sort of person for that strategy. And yet, in practice, I still find it has not worked very well. My attention has in fact been unhelpfully steered by local memetic currents to a very large degree.
For instance, I do love proving everyone else wrong, but alas reversed stupidity is not intelligence. People mostly don't argue against the high-counterfactuality important things, they ignore the high-counterfactuality important things. Trying to prove them wrong about the things they do argue about is just another way of having one's attention steered by the prevailing memetic currents.
People mostly don't argue against the high-counterfactuality important things, they ignore the high-counterfactuality important things. Trying to prove them wrong about the things they do argue about is just another way of having one's attention steered by the prevailing memetic currents.
This is true, but I still can't let go of the fact that this fact itself ought to be a blindingly obvious first-order bit that anyone who calls zerself anything like "aspiring rationalist" would be paying a good chunk of attention to, and yet this does not seem to be the case. Like, motions in the genre of
huh I just had reaction XYZ to idea ABC generated by a naively-good search process, and it seems like this is probably a common reaction to ABC; but if people tend to react to ABC with XYZ, and with other things coming from the generators of XYZ, then such and such distortion in beliefs/plans would be strongly pushed into the collective consciousness, e.g. on first-order or on higher-order deference effects ; so I should look out for that, e.g. by doing some manual fermi estimates or other direct checking about ABC or by investigating the strength of the steelman of reaction XYZ, or by keeping an eye out for people systematically reacting with XYZ without good foundation so I can notice this,
where XYZ could centrally be things like e.g. copium or subtly contemptuous indifference, do not seem to be at all common motions.
I visited Mikhail Khovanov once in New York to give a seminar talk, and after it was all over and I was wandering around seeing the sights, he gave me a call and offered a long string of general advice on how to be the kind of person who does truly novel things (he's famous for this, you can read about Khovanov homology). One thing he said was "look for things that aren't there" haha. It's actually very practical advice, which I think about often and attempt to live up to!
I'm ashamed to say I don't remember. That was the highlight. I think I have some notes on the conversation somewhere and I'll try to remember to post here if I ever find it.
I can spell out the content of his Koan a little, if it wasn't clear. It's probably more like: look for things that are (not there). If you spend enough time in a particular landscape of ideas, you can (if you're quiet and pay attention and aren't busy jumping on bandwagons) get an idea of a hole, which you're able to walk around but can't directly see. In this way new ideas appear as something like residues from circumnavigating these holes. It's my understanding that Khovanov homology was discovered like that, and this is not unusual in mathematics.
By the way, that's partly why I think the prospect of AIs being creative mathematicians in the short term should not be discounted; if you see all the things you see all the holes.
For those who might not have noticed Dan's clever double entendre: (Khovanov) homology is literally about counting/measuring holes in weird high-dimensional spaces - designing a new homology theory is in a very real sense about looking for holes that are not (yet) there.
There's plenty, including a line of work by Carina Curto, Katrin Hess and others that is taken seriously by a number of mathematically inclined neuroscience people (Tom Burns if he's reading can comment further). As far as I know this kind of work is the closest to breaking through into the mainstream. At some level you can think of homology as a natural way of preserving information in noisy systems, for reasons similar to why (co)homology of tori was a useful way for Kitaev to formulate his surface code. Whether or not real brains/NNs have some emergent computation that makes use of this is a separate question, I'm not aware of really compelling evidence.
There is more speculative but definitely interesting work by Matilde Marcolli. I believe Manin has thought about this (because he's thought about everything) and if you have twenty years to acquire the prerequisites (gamma spaces!) you can gaze into deep pools by reading that too.
Conjecture's Compendium is now up. It's intended to be a relatively-complete intro to AI risk for nontechnical people who have ~zero background in the subject. I basically endorse the whole thing, and I think it's probably the best first source to link e.g. policymakers to right now.
I might say more about it later, but for now just want to say that I think this should be the go-to source for new nontechnical people right now.
I think there's something about Bay Area culture that can often get technical people to feel like the only valid way to contribute is through technical work. It's higher status and sexier and there's a default vibe that the best way to understand/improve the world is through rigorous empirical research.
I think this an incorrect (or at least incomplete) frame, and I think on-the-margin it would be good for more technical people to spend 1-5 days seriously thinking about what alternative paths they could pursue in comms/policy.
I also think there are memes spreading around that you need to be some savant political mastermind genius to do comms/policy, otherwise you will be net negative. The more I meet policy people (including successful policy people from outside the AIS bubble), the more I think this narrative was, at best, an incorrect model of the world. At worst, a take that got amplified in order to prevent people from interfering with the AGI race (e.g., by granting excess status+validity to people/ideas/frames that made it seem crazy/unilateralist/low-status to engage in public outreach, civic discourse, and policymaker engagement.)
(Caveat: I don't think the adversarial frame explains everything, and I do think there are lots of people who were genuinely trying to reason about a complex world and just ended up underestimating how much policy interest there would be and/or overestimating the extent to which labs would be able to take useful actions despite the pressures of race dynamics.)
I think I probably agree, although I feel somewhat wary about it. My main hesitations are:
One of the common arguments in favor of investing more resources into current governance approaches (e.g., evals, if-then plans, RSPs) is that there's nothing else we can do. There's not a better alternative– these are the only things that labs and governments are currently willing to support.
The Compendium argues that there are other (valuable) things that people can do, with most of these actions focusing on communicating about AGI risks. Examples:
- Share a link to this Compendium online or with friends, and provide your feedback on which ideas are correct and which are unconvincing. This is a living document, and your suggestions will shape our arguments.
- Post your views on AGI risk to social media, explaining why you believe it to be a legitimate problem (or not).
- Red-team companies’ plans to deal with AI risk, and call them out publicly if they do not have a legible plan.
One possible critique is that their suggestions are not particularly ambitious. This is likely because they're writing for a broader audience (people who haven't been deeply engaged in AI safety).
For people who have been deeply engaged in AI safety, I think the natural steelman here is "focus on helping the ...
Short version: Nvidia's only moat is in software; AMD already makes flatly superior hardware priced far lower, and Google probably does too but doesn't publicly sell it. And if AI undergoes smooth takeoff on current trajectory, then ~all software moats will evaporate early.
Long version: Nvidia is pretty obviously in a hype-driven bubble right now. However, it is sometimes the case that (a) an asset is in a hype-driven bubble, and (b) it's still a good long-run bet at the current price, because the company will in fact be worth that much. Think Amazon during the dot-com bubble. I've heard people make that argument about Nvidia lately, on the basis that it will be ridiculously valuable if AI undergoes smooth takeoff on the current apparent trajectory.
My core claim here is that Nvidia will not actually be worth much, compared to other companies, if AI undergoes smooth takeoff on the current apparent trajectory.
Other companies already make ML hardware flatly superior to Nvidia's (in flops, memory, whatever), and priced much lower. AMD's MI300x is the most obvious direct comparison. Google's TPUs are probably another example, though they're not sold publicly s...
The easiest answer is to look at the specs. Of course specs are not super reliable, so take it all with many grains of salt. I'll go through the AMD/Nvidia comparison here, because it's a comparison I looked into a few months back.
Techpowerup is a third-party site with specs for the MI300x and the H100, so we can do a pretty direct comparison between those two pages. (I don't know if the site independently tested the two chips, but they're at least trying to report comparable numbers.) The H200 would arguably be more of a "fair comparison" since the MI300x came out much later than the H100; we'll get to that comparison next. I'm starting with MI300x vs H100 comparison because techpowerup has specs for both of them, so we don't have to rely on either company's bullshit-heavy marketing materials as a source of information. Also, even the H100 is priced 2-4x more expensive than the MI300x (~$30-45k vs ~$10-15k), so it's not unfair to compare the two.
Key numbers (MI300x vs H100):
... so the compari...
Its worth noting that even if nvidia is charging 2-4x more now, the ultimate question for competitiveness will be manufactoring cost for nvidia vs amd. If nvidia has much lower manufactoring costs than amd per unit performance (but presumably higher markup), then nvidia might win out even if their product is currently worse per dollar.
Note also that price discrimination might be a big part of nvidia's approach. Scaling labs which are willing to go to great effort to drop compute cost by a factor of two are a subset of nvidia's customers where nvidia would ideally prefer to offer lower prices. I expect that nvidia will find a way to make this happen.
I'm holding a modest long position in NVIDIA (smaller than my position in Google), and expect to keep it for at least a few more months. I expect I only need NVIDIA margins to hold up for another 3 or 4 years for it to be a good investment now.
It will likely become a bubble before too long, but it doesn't feel like one yet.
While the first-order analysis seems true to me, there are mitigating factors:
So from my viewpoint I would caution against being short NVIDIA, at least in the short term.
No, the mi300x is not superior to nvidias chips, largely because It costs >2x to manufacture as nvidias chips
If AI automates most, but not all, software engineering, moats of software dependencies could get more entrenched, because easier-to-use libraries have compounding first-mover advantages.
I don't think the advantages would necessarily compound - quite the opposite, there are diminishing returns and I expect 'catchup'. The first-mover advantage neutralizes itself because a rising tide lifts all boats, and the additional data acts as a prior: you can define the advantage of a better model, due to any scaling factor, as equivalent to n additional datapoints. (See the finetuning transfer papers on this.) When a LLM can zero-shot a problem, that is conceptually equivalent to a dumber LLM which needs 3-shots, say. And so the advantages of a better model will plateau, and can be matched by simply some more data in-context - such as additional synthetic datapoints generated by self-play or inner-monologue etc. And the better the model gets, the more 'data' it can 'transfer' to a similar language to reach a given X% of coding performance. (Think about how you could easily transfer given access to an environment: just do self-play on translating any solved Python problem into the target la...
People will hunger for all the GPUs they can get, but then that means that the favored alternative GPU 'manufacturer' simply buys out the fab capacity and does so. Nvidia has no hardware moat: they do not own any chip fabs, they don't own any wafer manufacturers, etc. All they do is design and write software and all the softer human-ish bits. They are not 'the current manufacturer' - that's everyone else, like TSMC or the OEMs. Those are the guys who actually manufacture things, and they have no particular loyalty to Nvidia. If AMD goes to TSMC and asks for a billion GPU chips, TSMC will be thrilled to sell the fab capacity to AMD rather than Nvidia, no matter how angry Jensen is.
So in a scenario like mine, if everyone simply rewrites for AMD, AMD raises its prices a bit and buys out all of the chip fab capacity from TSMC/Intel/Samsung/etc - possibly even, in the most extreme case, buying capacity from Nvidia itself, as it suddenly is unable to sell anything at its high prices that it may be trying to defend, and is forced to resell its reserved chip fab capacity in the resulting liquidity crunch. (No point in spending chip fab capacity on chips you can't sell at your target price and you aren't sure what you're going to do.) And if AMD doesn't do so, then player #3 does so, and everyone rewrites again (which will be easier the second time as they will now have extensive test suites, two different implementations to check correctness against, documentation from the previous time, and AIs which have been further trained on the first wave of work).
Here's a side project David and I have been looking into, which others might have useful input on...
As I understand it, thyroid hormone levels are approximately-but-accurately described as the body's knob for adjusting "overall metabolic rate" or the subjective feeling of needing to burn energy. Turn up the thyroid knob, and people feel like they need to move around, bounce their leg, talk fast, etc (at least until all the available energy sources are burned off and they crash). Turn down the thyroid knob, and people are lethargic.
That sounds like the sort of knob which should probably typically be set higher, today, than was optimal in the ancestral environment. Not cranked up to 11; hyperthyroid disorders are in fact dangerous and unpleasant. But at least set to the upper end of the healthy range, rather than the lower end.
... and that's nontrivial. You can just dump the relevant hormones (T3/T4) into your body, but there's a control system which tries to hold the level constant. Over the course of months, the thyroid gland (which normally produces T4) will atrophy, as it shrinks to try to keep T4 levels fixed. Just continuing to pump T3/...
Uh... Guys. Uh. Biology is complicated. It's a messy pile of spaghetti code. Not that it's entirely intractable to make Pareto improvements but, watch out for unintended consequences.
For instance: you are very wrong about cortisol. Cortisol is a "stress response hormone". It tells the body to divert resources to bracing itself to deal with stress (physical and/or mental). Experiments have shown that if you put someone through a stressful event while suppressing their cortisol, they have much worse outcomes (potentially including death). Cortisol doesn't make you stressed, it helps you survive stress. Deviation from homeostatic setpoints (including mental ones) are what make you stressed.
I don’t think that any of {dopamine, NE, serotonin, acetylcholine} are scalar signals that are “widely broadcast through the brain”. Well, definitely not dopamine or acetylcholine, almost definitely not serotonin, maybe NE. (I recently briefly looked into whether the locus coeruleus sends different NE signals to different places at the same time, and ended up at “maybe”, see §5.3.1 here for a reference.)
I don’t know anything about histamine or orexin, but neuropeptides are a better bet in general for reasons in §2.1 here.
As far as I can tell, parasympathetic tone is basically Not A Thing
Yeah, I recall reading somewhere that the term “sympathetic” in “sympathetic nervous system” is related to the fact that lots of different systems are acting simultaneously. “Parasympathetic” isn’t supposed to be like that, I think.
AFAICT, approximately every "how to be good at conversation" guide says the same thing: conversations are basically a game where 2+ people take turns free-associating off whatever was said recently. (That's a somewhat lossy compression, but not that lossy.) And approximately every guide is like "if you get good at this free association game, then it will be fun and easy!". And that's probably true for some subset of people.
But speaking for myself personally... the problem is that the free-association game just isn't very interesting.
I can see where people would like it. Lots of people want to talk to other people more on the margin, and want to do difficult thinky things less on the margin, and the free-association game is great if that's what you want. But, like... that is not my utility function. The free association game is a fine ice-breaker, it's sometimes fun for ten minutes if I'm in the mood, but most of the time it's just really boring.
Even for serious intellectual conversations, something I appreciate in this kind of advice is that it often encourages computational kindness. E.g. it's much easier to answer a compact closed question like "which of these three options do you prefer" instead of an open question like "where should we go to eat for lunch". The same applies to asking someone about their research; not every intellectual conversation benefits from big open questions like the Hamming Question.
Generally fair and I used to agree, I've been looking at it from a bit of a different viewpoint recently.
If we think of a "vibe" of a conversation as a certain shared prior that you're currently inhabiting with the other person then the free association game can rather be seen as a way of finding places where your world models overlap a lot.
My absolute favourite conversations are when I can go 5 layers deep with someone because of shared inference. I think the vibe checking for shared priors is a skill that can be developed and the basis lies in being curious af.
There's apparently a lot of different related concepts in psychology about holding emotional space and other things that I think just comes down to "find the shared prior and vibe there".
There's a general-purpose trick I've found that should, in theory, be applicable in this context as well, although I haven't mastered that trick myself yet.
Essentially: when you find yourself in any given cognitive context, there's almost surely something "visible" from this context such that understanding/mastering/paying attention to that something would be valuable and interesting.
For example, suppose you're reading a boring, nonsensical continental-philosophy paper. You can:
Some people struggle with the specific tactical task of navigating any conversational territory. I've certainly had a lot of experiences where people just drop the ball leaving me to repeatedly ask questions. So improving free-association skill is certainly useful for them.
Unfortunately, your problem is most likely that you're talking to boring people (so as to avoid doing any moral value judgements I'll make clear that I mean johnswentworth::boring people).
There are specific skills to elicit more interesting answers to questions you ask. One I've heard is "make a beeline for the edge of what this person has ever been asked before" which you can usually reach in 2-3 good questions. At that point they're forced to be spontaneous, and I find that once forced, most people have the capability to be a lot more interesting than they are when pulling cached answers.
This is easiest when you can latch onto a topic you're interested in, because then it's easy on your part to come up with meaningful questions. If you can't find any topics like this then re-read paragraph 2.
Talking to people is often useful for goals like "making friends" and "sharing new information you've learned" and "solving problems" and so on. If what conversation means (in most contexts and for most people) is 'signaling that you repeatedly have interesting things to say', it's required to learn to do that in order to achieve your other goals.
Most games aren't that intrinsically interesting, including most social games. But you gotta git gud anyway because they're useful to be able to play well.
Er, friendship involves lots of things beyond conversation. People to support you when you're down, people to give you other perspectives on your personal life, people to do fun activities with, people to go on adventures and vacations with, people to celebrate successes in your life with, and many more.
Good conversation is a lubricant for facilitating all of those other things, for making friends and sustaining friends and staying in touch and finding out opportunities for more friendship-things.
Part of the problem is that the very large majority of people I run into have minds which fall into a relatively low-dimensional set and can be "ray traced" with fairly little effort. It's especially bad in EA circles.
The simple heuristic: typical 5-year-old human males are just straightforwardly correct about what is, and is not, fun at a party. (Sex and adjacent things are obviously a major exception to this. I don't know of any other major exceptions, though there are minor exceptions.) When in doubt, find a five-year-old boy to consult for advice.
Some example things which are usually fun at house parties:
Some example things which are usually not fun at house parties:
This message brought to you by the wound on my side from taser fighting at a house party last weekend. That is how parties are supposed to go.
One of my son's most vivid memories of the last few years (and which he talks about pretty often) is playing laser tag at Wytham Abbey, a cultural practice I believe instituted by John and which was awesome, so there is a literal five-year-old (well seven-year-old at the time) who endorses this message!
It took me years of going to bars and clubs and thinking the same thoughts:
before I finally realized - the whole draw of places like this is specifically that you don't talk.
Background: Significantly Enhancing Adult Intelligence With Gene Editing, Superbabies
Epistemic Status: @GeneSmith or @sarahconstantin or @kman or someone else who knows this stuff might just tell me where the assumptions underlying this gambit are wrong.
I've been thinking about the proposals linked above, and asked a standard question: suppose the underlying genetic studies are Not Measuring What They Think They're Measuring. What might they be measuring instead, how could we distinguish those possibilities, and what other strategies does that suggest?
... and after going through that exercise I mostly think the underlying studies are fine, but they're known to not account for most of the genetic component of intelligence, and there are some very natural guesses for the biggest missing pieces, and those guesses maybe suggest different strategies.
Before sketching the "different gambit", let's talk about the baseline, i.e. the two proposals linked at top. In particular, we'll focus on the genetics part.
GeneSmith's plan focuses on single nucleotide polymorphisms (SNPs), i.e. places in the genome where a single ba...
With SNPs, there's tens of thousands of different SNPs which would each need to be targeted differently. With high copy sequences, there's a relatively small set of different sequences.
No, rare variants are no silver bullet here. There's not a small set, there's a larger set - there would probably be combinatorially more rare variants because there are so many ways to screw up genomes beyond the limited set of ways defined by a single-nucleotide polymorphism, which is why it's hard to either select on or edit rare variants: they have larger (harmful) effects due to being rare, yes, and account for a large chunk of heritability, yes, but there are so many possible rare mutations that each one has only a few instances worldwide which makes them hard to estimate correctly via pure GWAS-style approaches. And they tend to be large or structural and so extremely difficult to edit safely compared to editing a single base-pair. (If it's hard to even sequence a CNV, how are you going to edit it?)
They definitely contribute a lot of the missing heritability (see GREML-KIN), but that doesn't mean you can feasibly do much about them. If there are tens of millions of possible rare variants, a...
I didn't read this carefully--but it's largely irrelevant. Adult editing probably can't have very large effects because developmental windows have passed; but either way the core difficulty is in editor delivery. Germline engineering does not require better gene targets--the ones we already have are enough to go as far as we want. The core difficulty there is taking a stem cell and making it epigenomically competent to make a baby (i.e. make it like a natural gamete or zygote).
Continuing the "John asks embarrassing questions about how social reality actually works" series...
I’ve always heard (and seen in TV and movies) that bars and clubs are supposed to be a major place where single people pair up romantically/sexually. Yet in my admittedly-limited experience of actual bars and clubs, I basically never see such matching?
I’m not sure what’s up with this. Is there only a tiny fraction of bars and clubs where the matching happens? If so, how do people identify them? Am I just really, incredibly oblivious? Are bars and clubs just rare matching mechanisms in the Bay Area specifically? What’s going on here?
I get the impression that this is true for straight people, but from personal/anecdotal experience, people certainly do still pair up in gay bars/clubs.
TLDR: People often kiss/go home with each other after meeting in clubs, less so bars. This isn't necessarily always obvious but should be observable when looking out for it.
OK, so I think most of the comments here don't understand clubs (@Myron Hedderson's comment has some good points though). As someone who has made out with a few people in clubs, and still goes from time to time I'll do my best to explain my experiences.
I've been to bars and clubs in a bunch of places, mostly in the UK but also elsewhere in Europe and recently in Korea and South East Asia.
In my experience, bars don't see too many hookups, especially since most people go with friends and spend most of their time talking to them. I imagine that one could end up pairing up at a bar if they were willing enough to meet new people and had a good talking game (and this also applied to the person they paired up with), but I feel like most of the actual action happens in clubs on the dancefloor.
I think matching can happen at just about any club in my experience, although I think . Most of the time it just takes the form of 2 people colliding (not necessarily literally), looking at each other, drunkeness making...
I heard it was usually at work, school, or a social group, church. This is not fully captured by How Couples Meet: Where Most Couples Find Love in 2025, but bar is higher than I expected.
My brother met his spouse at a club in NYC, around 2008. If I recall the story correctly, he was “doing the robot” on the stage, and then she started “doing the robot” on the floor. They locked eyes, he jumped down and danced over to her, and they were married a couple years later.
(Funny to think we’re siblings, when we have such different personalities!)
Things non-corrigible strong AGI is never going to do:
Working on a paper with David, and our acknowledgments section includes a thankyou to Claude for editing. Neither David nor I remembers putting that acknowledgement there, and in fact we hadn't intended to use Clause for editing the paper at all nor noticed it editing anything at all.
My MATS program people just spent two days on an exercise to "train a shoulder-John".
The core exercise: I sit at the front of the room, and have a conversation with someone about their research project idea. Whenever I'm about to say anything nontrivial, I pause, and everyone discusses with a partner what they think I'm going to say next. Then we continue.
Some bells and whistles which add to the core exercise:
Why this particular exercise? It's a focused, rapid-feedback way of training the sort of usually-not-very-legible skills one typically absorbs via osmosis from a mentor. It's focused specifically on choosing project ideas, which is where most of the value in a project is (yet also where little time is typically spent, and therefore one typically does not get very much data on project choice from a mentor). Also, it's highly scalable: I could run the exercise in a 200-person lecture hall and still expect it to basically work.
It was, by ...
Petrov Day thought: there's this narrative around Petrov where one guy basically had the choice to nuke or not, and decided not to despite all the flashing red lights. But I wonder... was this one of those situations where everyone knew what had to be done (i.e. "don't nuke"), but whoever caused the nukes to not fly was going to get demoted, so there was a game of hot potato and the loser was the one forced to "decide" to not nuke? Some facts possibly relevant here:
Those are some good points. I wonder whether similar happened (or could at all happen) in other nuclear countries, where we don't know about similar incidents - because the system haven't collapsed there, the archives were not made public etc.
Also, it makes actually celebrating Petrov's day as widely as possible important, because then the option for the lowest-ranked person would be: "Get demoted, but also get famous all around the world."
Regarding the recent memes about the end of LLM scaling: David and I have been planning on this as our median world since about six months ago. The data wall has been a known issue for a while now, updates from the major labs since GPT-4 already showed relatively unimpressive qualitative improvements by our judgement, and attempts to read the tea leaves of Sam Altman's public statements pointed in the same direction too. I've also talked to others (who were not LLM capability skeptics in general) who had independently noticed the same thing and come to similar conclusions.
Our guess at that time was that LLM scaling was already hitting a wall, and this would most likely start to be obvious to the rest of the world around roughly December of 2024, when the expected GPT-5 either fell short of expectations or wasn't released at all. Then, our median guess was that a lot of the hype would collapse, and a lot of the investment with it. That said, since somewhere between 25%-50% of progress has been algorithmic all along, it wouldn't be that much of a slowdown to capabilities progress, even if the memetic environment made it seem pretty salient. In the happiest case a lot of researchers w...
Original GPT-4 is rumored to be a 2e25 FLOPs model. With 20K H100s that were around as clusters for more than a year, 4 months at 40% utilization gives 8e25 BF16 FLOPs. Llama 3 405B is 4e25 FLOPs. The 100K H100s clusters that are only starting to come online in the last few months give 4e26 FLOPs when training for 4 months, and 1 gigawatt 500K B200s training systems that are currently being built will give 4e27 FLOPs in 4 months.
So lack of scaling-related improvement in deployed models since GPT-4 is likely the result of only seeing the 2e25-8e25 FLOPs range of scale so far. The rumors about the new models being underwhelming are less concrete, and they are about the very first experiments in the 2e26-4e26 FLOPs range. Only by early 2025 will there be multiple 2e26+ FLOPs models from different developers to play with, the first results of the experiment in scaling considerably past GPT-4.
And in 2026, once the 300K-500K B200s clusters train some models, we'll be observing the outcomes of scaling to 2e27-6e27 FLOPs. Only by late 2026 will there be a significant chance of reaching a scaling plateau that lasts for years, since scaling further would need $100 billion training systems that won't get built without sufficient success, with AI accelerators improving much slower than the current rate of funding-fueled scaling.
Nobody admitted to trying repeated data at scale yet (so we don't know that it doesn't work), which from the tiny experiments can 5x the data with little penalty and 15x the data in a still-useful way. It's not yet relevant for large models, but it might turn out that small models would greatly benefit already.
There are 15-20T tokens in datasets whose size is disclosed for current models (Llama 3, Qwen 2.5), plausibly 50T tokens of tolerable quality can be found (pretraining only needs to create useful features, not relevant behaviors). With 5x 50T tokens, even at 80 tokens/parameter[1] we can make good use of 5e27-7e27 FLOPs[2], which even a 1 gigawatt 500K B200s system of early 2026 would need 4-6 months to provide.
The isoFLOP plots (varying tokens per parameter for fixed compute) seem to get loss/perplexity basins that are quite wide, once they get about 1e20 FLOPs of compute. The basins also get wider for hybrid attention (compare 100% Attention isoFLOPs in the "Perplexity scaling analysis" Figure to the others). So it's likely that using a slightly suboptimal tokens/parameter ratio of say 40 won't hurt performance much at all. In which case we get to use 9e27-2e28 FLOPs by tra...
Use of repeated data was first demonstrated in the 2022 Galactica paper (Figure 6 and Section 5.1), at 2e23 FLOPs but without a scaling law analysis that compares with unique data or checks what happens for different numbers of repeats that add up to the same number of tokens-with-repetition. The May 2023 paper does systematic experiments with up to 1e22 FLOPs datapoints (Figure 4).
So that's what I called "tiny experiments". When I say that it wasn't demonstrated at scale, I mean 1e25+ FLOPs, which is true for essentially all research literature[1]. Anchoring to this kind of scale (and being properly suspicious of results several orders of magnitude lower) is relevant because we are discussing the fate of 4e27 FLOPs runs.
The largest datapoints in measuring the Chinchilla scaling laws for Llama 3 are 1e22 FLOPs. This is then courageously used to choose the optimal model size for the 4e25 FLOPs run that uses 4,000 times more compute than the largest of the experiments. ↩︎
For what it's worth, and for the purpose of making a public prediction in case I'm wrong, my median prediction is that [some mixture of scaling + algorithmic improvements still in the LLM regime, with at least 25% gains coming from the former] will continue for another couple years. And that's separate from my belief that if we did try to only advance through the current mixture of scale and algorithmic advancement, we'd still get much more powerful models, just slower.
I'm not very convinced by the claims about scaling hitting a wall, considering we haven't had the compute to train models significantly larger than GPT-4 until recently. Plus other factors like post-training taking a lot of time (GPT-4 took ~6 months from the base model being completed to release, I think? And this was a lot longer than GPT-3), labs just not being good at understanding how good their models are, etc. Though I'm not sure how much of your position is closer to "scaling will be <25-50% of future gains" than "scaling gains will be marginal / negligible", especially since a large part of this trajectory involves e.g. self-play or curated data for overcoming the data wall (would that count more as an algorithmic improvement or scaling?)
Ever since GeneSmith's post and some discussion downstream of it, I've started actively tracking potential methods for large interventions to increase adult IQ.
One obvious approach is "just make the brain bigger" via some hormonal treatment (like growth hormone or something). Major problem that runs into: the skull plates fuse during development, so the cranial vault can't expand much; in an adult, the brain just doesn't have much room to grow.
BUT this evening I learned a very interesting fact: ~1/2000 infants have "craniosynostosis", a condition in which their plates fuse early. The main treatments involve surgery to open those plates back up and/or remodel the skull. Which means surgeons already have a surprisingly huge amount of experience making the cranial vault larger after plates have fused (including sometimes in adults, though this type of surgery is most common in infants AFAICT)
.... which makes me think that cranial vault remodelling followed by a course of hormones for growth (ideally targeting brain growth specifically) is actually very doable with current technology.
Well, the key time to implement an increase in brain size is when the neuron-precursors which are still capable of mitosis (unlike mature neurons) are growing. This is during fetal development, when there isn't a skull in the way, but vaginal birth has been a limiting factor for evolution in the past. Experiments have been done on increasing neuron count at birth in mammals via genetic engineering. I was researching this when I was actively looking for a way to increase human intelligence, before I decided that genetically engineering infants was infeasible [edit: within the timeframe of preparing for the need for AI alignment]. One example of a dramatic failure was increasing Wnt (a primary gene involved in fetal brain neuron-precursor growth) in mice. The resulting mice did successfully have larger brains, but they had a disordered macroscale connectome, so their brains functioned much worse.
15 years ago when I was studying this actively I could have sent you my top 20 favorite academic papers on the subject, or recommended a particular chapter of a particular textbook. I no longer remember these specifics. Now I can only gesture vaguely at Google scholar and search terms like "fetal neurogenesis" or "fetal prefrontal cortex development". I did this, and browsed through a hundred or so paper titles, and then a dozen or so abstracts, and then skimmed three or four of the most promising papers, and then selected this one for you. https://www.nature.com/articles/s41386-021-01137-9 Seems like a pretty comprehensive overview which doesn't get too lost in minor technical detail.
More importantly, I can give you my takeaway from years of reading many many papers on the subject. If you want to make a genius baby, there are lots more factors involved than simply neuron count. Messing about with generic changes is hard, and you need to test your ideas in animal models first, and the whole process can take years even ignoring ethical considerations or budget.
There is an easier and more effective way to get super genius babies, and that method should be exhausted before resorting t...
Just made this for an upcoming post, but it works pretty well standalone.
I've been trying to push against the tendency for everyone to talk about FTX drama lately, but I have some generalizable points on the topic which I haven't seen anybody else make, so here they are. (Be warned that I may just ignore responses, I don't really want to dump energy into FTC drama.)
Summary: based on having worked in startups a fair bit, Sam Bankman-Fried's description of what happened sounds probably accurate; I think he mostly wasn't lying. I think other people do not really get the extent to which fast-growing companies are hectic and chaotic and full of sketchy quick-and-dirty workarounds and nobody has a comprehensive view of what's going on.
Long version: at this point, the assumption/consensus among most people I hear from seems to be that FTX committed intentional, outright fraud. And my current best guess is that that's mostly false. (Maybe in the very last couple weeks before the collapse they toed the line into outright lies as a desperation measure, but even then I think they were in pretty grey territory.)
Key pieces of the story as I currently understand it:
I think this is likely wrong. I agree that there is a plausible story here, but given the case that Sam seems to have lied multiple times in confirmed contexts (for example when saying that FTX has never touched customer deposits), and people's experiences at early Alameda, I think it is pretty likely that Sam was lying quite frequently, and had done various smaller instances of fraud.
I don't think the whole FTX thing was a ponzi scheme, and as far as I can tell FTX the platform itself (if it hadn't burned all of its trust in the last 3 weeks), would have been worth $1-3B in an honest evaluation of what was going on.
But I also expect that when Sam used customer deposits he was well-aware that he was committing fraud, and others in the company were too. And he was also aware that there was a chance that things could blow up in the way it did. I do believe that they had fucked up their accounting in a way that caused Sam to fail to orient to the situation effectively, but all of this was many months after they had already committed major crimes and trust violations after touching customer funds as a custodian.
Everyone says flirting is about a "dance of ambiguous escalation", in which both people send progressively more aggressive/obvious hints of sexual intent in conversation.
But, like... I don't think I have ever noticed two people actually do this? Is it a thing which people actually do, or one of those things which like 2% of the population does and everyone else just talks about a lot and it mostly doesn't actually work in practice (like cold approaches)? Have you personally done the thing successfully with another person, with both of you actually picking up on the other person's hints? Have you personally seen two other people do the thing firsthand, where they actually picked up on each others' hints?
EDIT-TO-ADD: Those who have agree/disagree voted, I don't know if agree/disagree indicates that you have/haven't done the thing, or if agree/disagree indicates that you also have/haven't ever noticed anyone (including yourself) successfully do the thing, or something else entirely.
Yes, I've had this experience many times and I'm aware of many other cases of it happening.
Maybe the proliferation of dating apps means that it happens somewhat less than it used to, because when you meet up with someone from a dating app, there's a bit more common knowledge of mutual interest than there is when you're flirting in real life?
The classic setting is a party (a place where you meet potential romantic partners who you don't already know (or who you otherwise know from professional settings where flirting is inappropriate), and where conversations are freely starting and ending, such that when you start talking to someone the conversation might either go for two minutes or four hours).
Examples of hints:
I know this is LessWrong, and that sexual norms are different in the Bay Area, but for the average person:
Please don't tell prospective romantic interests that you "went on a date recently" or that you did something promiscuous. The majority of the time, it would be interpreted as a sign you're taken. Of course, if you elaborate that the date didn't work out, that's a different story.
My disagree vote means: yes, this obviously happens a lot, and the fact that you haven't noticed this happening, to the point you think it might be made up, reveals a huge blindspot of one kind or another.
Some examples of flirting:
medium skill on The Wire, failing to land: https://www.youtube.com/shorts/eyyqoFhXRao
in crazy ex-girlfriend "I'm Going to the Beach with Josh and His Friends!", there's a scene between White Josh and Derrick. I can't find a clip, but the key is that Derrick is hanging on to White Josh's every word.
Ted Lasso:
Note how and how much she's laughing at his very mediocre jokes. Ted could reasonably be interpreted as flirting back, but the audience knows he always make those stupid ass jokes. Actually the whole Ted Lasso show might be good for watching someone who's generally very playful and seeing how it changes when he's actually into someone.
Roy and Keeley, also from Ted Lasso. Note she's dating his teammate.
Roy and some lady, still from Ted Lasso
Note how long she looks at him around 0:50, even though it's awkward while she's putting something away. She also contrives a way to ask if he's married, and makes an interesting face when he says no. He is giving her enough breadcrumbs to continue but not flirting back (because he's still into Keeley).
Half of the movie Challengers (including between the two ambiguously platonic male leads)
[At this p...
I second the point about physical touch being important, and add: in my experience what you're going for when flirting isn't "ambiguous signal" but "plausible deniability". The level of ambiguity is to be minimized, subject to the constraint that plausible deniability is maintained - ambiguity is an unfortunate side-effect, not something you're aiming to modulate directly. Why you want plausible deniability: If the person doesn't respond, or responds in the negative, you want to be able to back off without embarrassment to either party and pretend nothing happened/you were just being friendly/etc. You want to send a signal that is clear enough the other person will pick up on it, but can plausibly claim not to have done so if asked, so you're not backing them into a corner socially where they have to give you a definite yes/no. Similar to the advice not to flirt in an elevator or other enclosed space the person you're flirting with can't easily leave, except the "enclosed space" is the space of possible social responses.
Once you've done a few things they ought to have picked up on, and no negative and some seemingly positive interaction has occurred afterwards (physical proximity h...
I'm not so deliberate/strategic about it, but yeah. Like, there's another 'algorithm' that's more intuitive, which is something like "When interacting with the person, it's ~always an active part of your mental landscape that you're into them, and this naturally affects your words and actions. Also, you don't want to make them uncomfortable, so you suppress anything that you think they wouldn't welcome". This produces approximately the same policy, because you'll naturally leak some bits about your interest in them, and you'll naturally be monitoring their behaviour to estimate their interest in you, in order to inform your understanding of what they would welcome from you. As you gather more evidence that they're interested, you'll automatically become more free in allowing your interest to show, resulting in ~the same 'escalation of signals of interest'.
I think the key thing about this is like "flirting is not fundamentally about causing someone to be attracted to you, it's about gracefully navigating the realisation that you're both attracted to each other". This is somewhat confused by the fact that "ability to gracefully navigate social situations" is itself attractive, so flirting well can in itself make someone more attracted to you. But I claim that this isn't fundamentally different from the person seeing you skillfully break up a fight or lead a team through a difficult situation, etc.
I never did quite that thing successfully. I did have one time when I dropped progressively unsubtle hints on a guy, who remained stubbornly oblivious for a long time until he finally got the message and reciprocated.
I interpret the confusion around flirting as “life imitating art” — specifically, there is a cultural narrative about how flirting works that a lot of socially awkward people are trying to implement.
That means there are big discrepancies between how experts flirt and how most people flirt. It also means that most people have to learn how to read the flirtation signals of other low-flirtation-skill people.
The cultural narrative around flirting therefore doesn’t exactly match practice, even though it influences practice.
It doesn’t necessarily take that much flirting to build enough confidence to ask someone out. Are they alone at a party? Is your conversation with them going on longer than for most people? Is it fun? You’re all set.
Have you personally done the thing successfully with another person, with both of you actually picking up on the other person's hints?
Yes. But usually the escalation happens over weeks or months, over multiple conversations (at least in my relatively awkward nerd experience). So it'd be difficult to notice people doing this. Maybe twice I've been in situations where hints escalated within a day or two, but both were building from a non-zero level of suspected interest. But none of these would have been easy to notice from the outside, except maybe at a couple of moments.
Epistemic status: rumor.
Word through the grapevine, for those who haven't heard: apparently a few months back OpenPhil pulled funding for all AI safety lobbying orgs with any political right-wing ties. They didn't just stop funding explicitly right-wing orgs, they stopped funding explicitly bipartisan orgs.
My best guess this is false. As a quick sanity-check, here are some bipartisan and right-leaning organizations historically funded by OP:
Of those, I think FAI is the only one at risk of OP being unable to fund them, based on my guess of where th...
Also worth noting Dustin Moskowitz was a prominent enough donor this election cycle, for Harris, to get highlighted in news coverage of her donors: https://www.washingtonexaminer.com/news/campaigns/presidential/3179215/kamala-harris-influential-megadonors/ https://www.nytimes.com/2024/10/09/us/politics/harris-billion-dollar-fundraising.html
Curious whether this is a different source than me. My current best model was described in this comment, which is a bit different (and indeed, my sense was that if you are bipartisan, you might be fine, or might not, depending on whether you seem more connected to the political right, and whether people might associate you with the right):
...Yep, my model is that OP does fund things that are explicitly bipartisan (like, they are not currently filtering on being actively affiliated with the left). My sense is in-practice it's a fine balance and if there was some high-profile thing where Horizon became more associated with the right (like maybe some alumni becomes prominent in the republican party and very publicly credits Horizon for that, or there is some scandal involving someone on the right who is a Horizon alumni), then I do think their OP funding would have a decent chance of being jeopardized, and the same is not true on the left.
Another part of my model is that one of the key things about Horizon is that they are of a similar school of PR as OP themselves. They don't make public statements. They try to look very professional. They are probably very happy to compromise on
Main takeaway: to the extent that Bell Labs did basic research, it actually wasn’t all that far ahead of others. Their major breakthroughs would almost certainly have happened not-much-later, even in a world without Bell Labs.
There were really two transistor inventions, back to back: Bardain and Brattain’s point-contact transistor, and then Schockley’s transistor. Throughout, the group was worried about some outside group beating them to the punch (i.e. the patent). There were semiconductor research labs at universities (e.g. at Purdue; see pg 97), and the prospect of one of these labs figuring out a similar device was close enough that the inventors were concerned about being scooped.
Most inventions which were central to Bell Labs actually started elsewhere. The travelling-wave tube started in an academic lab. The idea for fiber optic cable went way back, but it got its big kick at Corning. The maser and laser both started in universities. The ideas were only later picked up by Bell.
In other cases, the ideas were “easy enough to find” that they popped up more than once, independently, and were mos...
I loved this book. The most surprising thing to me was the answer that people who were there in the heyday give when asked what made Bell Labs so successful: They always say it was the problem, i.e. having an entire organization oriented towards the goal of "make communication reliable and practical between any two places on earth". When Shannon left the Labs for MIT, people who were there immediately predicted he wouldn't do anything of the same significance because he'd lose that "compass". Shannon was obviously a genius, and he did much more after than most people ever accomplish, but still nothing as significant as what he did when at at the Labs.
So I read SB1047.
My main takeaway: the bill is mostly a recipe for regulatory capture, and that's basically unavoidable using anything even remotely similar to the structure of this bill. (To be clear, regulatory capture is not necessarily a bad thing on net in this case.)
During the first few years after the bill goes into effect, companies affected are supposed to write and then implement a plan to address various risks. What happens if the company just writes and implements a plan which sounds vaguely good but will not, in fact, address the various risks? Probably nothing. Or, worse, those symbolic-gesture plans will become the new standard going forward.
In order to avoid this problem, someone at some point would need to (a) have the technical knowledge to evaluate how well the plans actually address the various risks, and (b) have the incentive to actually do so.
Which brings us to the real underlying problem here: there is basically no legible category of person who has the requisite technical knowledge and also the financial/status incentive to evaluate those plans for real.
(The same problem also applies to the board of the new regulatory body, once past the first few years.)
Ha...
What happens if the company just writes and implements a plan which sounds vaguely good but will not, in fact, address the various risks? Probably nothing.
The only enforcement mechanism that the bill has is that the Attorney General (AG) of California can bring a civil claim. And, the penalties are quite limited except for damages. So, in practice, this bill mostly establishes liability enforced by the AG.
So, the way I think this will go is:
I don't see why you think "the bill is mostly a recipe for regulatory capture" given that no regulatory body will be established and it de facto does something very similar to the proposal you were suggesting (impose liability for catastrophes). (It doesn't require insurance, but I don't really see why self insuring is notably different.)
(Maybe you just mean that if a given safety case doesn't result in that AI lab being sued by the AG, the...
I've just started reading the singular learning theory "green book", a.k.a. Mathematical Theory of Bayesian Statistics by Watanabe. The experience has helped me to articulate the difference between two kinds of textbooks (and viewpoints more generally) on Bayesian statistics. I'll call one of them "second-language Bayesian", and the other "native Bayesian".
Second-language Bayesian texts start from the standard frame of mid-twentieth-century frequentist statistics (which I'll call "classical" statistics). It views Bayesian inference as a tool/technique for answering basically-similar questions and solving basically-similar problems to classical statistics. In particular, they typically assume that there's some "true distribution" from which the data is sampled independently and identically. The core question is then "Does our inference technique converge to the true distribution as the number of data points grows?" (or variations thereon, like e.g. "Does the estimated mean converge to the true mean", asymptotics, etc). The implicit underlying assumption is that convergence to the true distribution as the number of (IID) data points grows is the main criterion by which inference meth...
Just got my whole genome sequenced. A thing which I could have figured out in advance but only realized once the results came back: if getting a whole genome sequence, it's high value to also get your parents' genomes sequenced.
Here's why.
Suppose I have two unusual variants at two different positions (not very close together) within the same gene. So, there's a variant at location A, and a variant at location B. But (typically) I have two copies of each gene, one from each parent. So, I might have the A and B variants both on the same copy, and the other copy could be normal. OR, I could have the A variant on one copy and the B variant on the other copy. And because modern sequencing usually works by breaking DNA into little chunks, sequencing the chunks, and then computationally stitching it together... those two possibilities can't be distinguished IIUC.
The difference is hugely important if e.g. both the A variant and the B variant severely fuck up the gene. If both are on the same copy, I'd have one normal working variant and one fucked up. If they're on different copies, then I'd have zero normal working variants, which will typically have much more extreme physiological result...
Question I'd like to hear peoples' takes on: what are some things which are about the same amount of fun for you as (a) a median casual conversation (e.g. at a party), or (b) a top-10% casual conversation, or (c) the most fun conversations you've ever had? In all cases I'm asking about how fun the conversation itself was, not about value which was downstream of the conversation (like e.g. a conversation with someone who later funded your work).
For instance, for me, a median conversation is about as fun as watching a mediocre video on youtube or reading a mediocre blogpost. A top-10% conversation is about as fun as watching a generic-but-fun movie, like e.g. a Jason Statham action flick. In both cases, the conversation drains more energy than the equal-fun alternative. I have probably had at most a single-digit number of conversations in my entire life which were as fun-in-their-own-right as e.g. a median night out dancing, or a median escape room, or median sex, or a median cabaret show. Maybe zero, unsure.
The rest of this is context on why I'm asking which you don't necessarily need to read in order to answer the question...
So I recently had a shortform asking "hey, that thing whe...
At a recent EAG afterparty, bored @Algon suggested that he explain something to me, and I explain something to him in return. He explained to me this thing. When it was my turn, I thought that maybe I should do the thing that had been on my mind for several months: give a technical explanation of monads starting with the very basics of category theory, and see how long it takes. It turned out that he knew the most basic basics of category theory, so it was a bit more of an easy mode, but it still took something like 50 minutes, out of which maybe half was spent on natural transformations. A few minutes in, @niplav joined us. I enjoyed drawing diagrams and explaining and discussing a technical topic that I love to think about, in the absurd setting of people playing beerpong one meter from the whiteboard, with passers-by asking "Are you guys OK?" or "WTF are you doing?" ("He's explaining The Meme!"). It was great to witness them having intuition breakthroughs, where you start seeing something that is clear and obvious in hindsight but not in foresight (similar to bistable figures). Throughout, I also noticed some deficiencies in my understanding (e.g., I noticed that I didn't have a...
I presume the psychologist expected John to actively seek out similar conversations. From the psychologist's perspective:
Since John wasn't in either category, it probably struck the psychologist as odd.
Here's a meme I've been paying attention to lately, which I think is both just-barely fit enough to spread right now and very high-value to spread.
Meme part 1: a major problem with RLHF is that it directly selects for failure modes which humans find difficult to recognize, hiding problems, deception, etc. This problem generalizes to any sort of direct optimization against human feedback (e.g. just fine-tuning on feedback), optimization against feedback from something emulating a human (a la Constitutional AI or RLAIF), etc.
Many people will then respond: "Ok, but if how on earth is one supposed to get an AI to do what one wants without optimizing against human feedback? Seems like we just have to bite that bullet and figure out how to deal with it." ... which brings us to meme part 2.
Meme part 2: We already have multiple methods to get AI to do what we want without any direct optimization against human feedback. The first and simplest is to just prompt a generative model trained solely for predictive accuracy, but that has limited power in practice. More recently, we've seen a much more powerful method: activation steering. Figure out which internal activation-patterns encode for the thing we want (via some kind of interpretability method), then directly edit those patterns.
Somebody should probably write a post explaining why RL from human feedback is actively harmful to avoiding AI doom. It's one thing when OpenAI does it, but when Anthropic thinks it's a good idea, clearly something has failed to be explained.
(I personally do not expect to get around to writing such a post soon, because I expect discussion around the post would take a fair bit of time and attention, and I am busy with other things for the next few weeks.)
Here's an idea for a novel which I wish someone would write, but which I probably won't get around to soon.
The setting is slightly-surreal post-apocalyptic. Society collapsed from extremely potent memes. The story is episodic, with the characters travelling to a new place each chapter. In each place, they interact with people whose minds or culture have been subverted in a different way.
This provides a framework for exploring many of the different models of social dysfunction or rationality failures which are scattered around the rationalist blogosphere. For instance, Scott's piece on scissor statements could become a chapter in which the characters encounter a town at war over a scissor. More possible chapters (to illustrate the idea):
Corrigibility proposal. Status: passed my quick intuitive checks, I want to know if anyone else immediately sees a major failure mode before I invest more time into carefully thinking it through.
Setup: shutdown problem. Two timesteps, shutdown button will be either pressed or not-pressed at second timestep, we want agent to optimize for one of two different utility functions depending on whether button is pressed. Main thing we're trying to solve here is the "can't do this with a utility maximizer" barrier from the old MIRI work; we're not necessarily trying to solve parts like "what utility function incentivizes shutting down nicely".
Proposal: agent consists of two subagents with veto power. Subagent 1 maximizes E[u1|do(press)], subagent 2 maximizes E[u2|do(no press)]. Current guess about what this does:
Post which someone should write (but I probably won't get to soon): there is a lot of potential value in earning-to-give EA's deeply studying the fields to which they donate. Two underlying ideas here:
The key idea of knowledge bottlenecks is that one cannot distinguish real expertise from fake expertise without sufficient expertise oneself. For instance, it takes a fair bit of understanding of AI X-risk to realize that "open-source AI" is not an obviously-net-useful strategy. Deeper study of the topic yields more such insights into which approaches are probably more (or less) useful to fund. Without any expertise, one is likely to be mislead by arguments which are optimized (whether intentionally or via selection) to sound good to the layperson.
That takes us to the pareto frontier argument. If one learns enough/earns enough that nobody else has both learned and earned more, then there are potentially opportunities which nobody else has both the knowledge to recognize and the resources to fund. Generalized efficient markets (in EA-giving) are ther...
Below is a graph from T-mobile's 2016 annual report (on the second page). Does anything seem interesting/unusual about it?
I'll give some space to consider before spoiling it.
...
...
...
Answer: that is not a graph of those numbers. Some clever person took the numbers, and stuck them as labels on a completely unrelated graph.
Yes, that is a thing which actually happened. In the annual report of an S&P 500 company. And apparently management considered this gambit successful, because the 2017 annual report doubled down on the trick and made it even more egregious: they added 2012 and 2017 numbers, which are even more obviously not on an accelerating growth path if you actually graph them. The numbers are on a very-clearly-decelerating growth path.
Now, obviously this is an cute example, a warning to be on alert when consuming information. But I think it prompts a more interesting question: why did such a ridiculous gambit seem like a good idea in the first place? Who is this supposed to fool, and to what end?
This certainly shouldn't fool any serious investment analyst. They'll all have their own spreadsheets and graphs forecasting T-mobile's growth. Unless T-mobile's management deeply ...
Basically every time a new model is released by a major lab, I hear from at least one person (not always the same person) that it's a big step forward in programming capability/usefulness. And then David gives it a try, and it works qualitatively the same as everything else: great as a substitute for stack overflow, can do some transpilation if you don't mind generating kinda crap code and needing to do a bunch of bug fixes, and somewhere between useless and actively harmful on anything even remotely complicated.
It would be nice if there were someone who tries out every new model's coding capabilities shortly after they come out, reviews it, and gives reviews with a decent chance of actually matching David's or my experience using the thing (90% of which will be "not much change") rather than getting all excited every single damn time. But also, to be a useful signal, they still need to actually get excited when there's an actually significant change. Anybody know of such a source?
EDIT-TO-ADD: David has a comment below with a couple examples of coding tasks.
My guess is neither of you is very good at using them, and getting value out of them somewhat scales with skill.
Models can easily replace on the order of 50% of my coding work these days, and if I have any major task, my guess is I quite reliably get 20%-30% productivity improvements out of them. It does take time to figure out at which things they are good at, and how to prompt them.
I do use LLMs for coding assistance every time I code now, and I have in fact noticed improvements in the coding abilities of the new models, but I basically endorse this. I mostly make small asks of the sort that sifting through docs or stack-overflow would normally answer. When I feel tempted to make big asks of the models, I end up spending more time trying to get the LLMs to get the bugs out than I'd have spent writing it all myself, and having the LLM produce code which is "close but not quite and possibly buggy and possibly subtly so" that I then have to understand and debug could maybe save time but I haven't tried because it is more annoying than just doing it myself.
If someone has experience using LLMs to substantially accelerate things of a similar difficulty/flavor to transpilation of a high-level torch module into a functional JITable form in JAX which produces numerically close outputs, or implementation of a JAX/numpy based renderer of a traversable grid of lines borrowing only the window logic from, for example, pyglet (no GLSL calls, rasterize from scratch,) with consistent screen-space pixel width and fade-on-distance logic, I'd be interested in seeing how you do y...
Two guesses on what's going on with your experiences:
You're asking for code which involves uncommon mathematics/statistics. In this case, progress on scicodebench is probably relevant, and it indeed shows remarkably slow improvement. (Many reasons for this, one relatively easy thing to try is to breakdown the task, forcing the model to write down the appropriate formal reasoning before coding anything. LMs are stubborn about not doing CoT for coding, even when it's obviously appropriate IME)
You are underspecifying your tasks (and maybe your questions are more niche than average), or otherwise prompting poorly, in a way which a human could handle but models are worse at. In this case sitting down with someone doing similar tasks but getting more use out of LMs would likely help.
In this case sitting down with someone doing similar tasks but getting more use out of LMs would likely help.
I would contribute to a bounty for y'all to do this. I would like to know whether the slow progress is prompting-induced or not.
High vs low voltage has very different semantics at different places on a computer chip. In one spot, a high voltage might indicate a number is odd rather than even. In another spot, a high voltage might indicate a number is positive rather than negative. In another spot, it might indicate a jump instruction rather than an add.
Likewise, the same chemical species have very different semantics at different places in the human body. For example, high serotonin concentration along the digestive tract is a signal to digest, whereas high serotonin concentration in various parts of the brain signals... uh... other stuff. Similarly, acetylcholine is used as a neurotransmitter both at neuromuscular junctions and in the brain, and these have different semantics. More generally, IIUC neurotransmitters like dopamine, norepinephrine, or serotonin are released by neurons originating at multiple anatomically distinct little sub-organs in the brain. Each sub-organ projects to different places, and the same neurotransmitter probably has different semantics when different sub-organs project to different targe...
It feels like unstructured play makes people better/stronger in a way that structured play doesn't.
What do I mean? Unstructured play is the sort of stuff I used to do with my best friend in high school:
In contrast, structured play is more like board games or escape rooms or sports. It has fixed rules. (Something like making and running a survey can be structured play or unstructured play or not play at all, depending on the attitude with which one approaches it. Do we treat it as a fun thing whose bounds can be changed at any time?)
I'm not quite sure why it feels like unstructured play makes people better/stronger, and I'd be curious to hear other peoples' thoughts on the question. I'm going to write some of mine below, but maybe don't look at them yet if you want to an...
I've heard various people recently talking about how all the hubbub about artists' work being used without permission to train AI makes it a good time to get regulations in place about use of data for training.
If you want to have a lot of counterfactual impact there, I think probably the highest-impact set of moves would be:
Model/generator behind this: given the active political salience, it probably wouldn't be too hard to get some kind of regulation implemented. But by-default it would end up being something mostly symbolic, easily circumvented, and/or unenforceable in practice. A robust technical component, plus (crucially) actually bringing that robust technical compo...
Suppose I have a binary function , with a million input bits and one output bit. The function is uniformly randomly chosen from all such functions - i.e. for each of the possible inputs , we flipped a coin to determine the output for that particular input.
Now, suppose I know , and I know all but 50 of the input bits - i.e. I know 999950 of the input bits. How much information do I have about the output?
Answer: almost none. For almost all such functions, knowing 999950 input bits gives us bits of information about the output. More generally, If the function has input bits and we know all but , then we have bits of information about the output. (That’s “little ” notation; it’s like big notation, but for things which are small rather than things which are large.) Our information drops off exponentially with the number of unknown bits.
With input bits unknown, there are possible inputs. The output corresponding to each of those inputs is an independent coin flip, so we have independent coin flips. If of th...
I find it very helpful to get feedback on LW posts before I publish them, but it adds a lot of delay to the process. So, experiment: here's a link to a google doc with a post I plan to put up tomorrow. If anyone wants to give editorial feedback, that would be much appreciated - comments on the doc are open.
I'm mainly looking for comments on which things are confusing, parts which feel incomplete or slow or repetitive, and other writing-related things; substantive comments on the content should go on the actual post once it's up.
EDIT: it's up. Thank you to Stephen for comments; the post is better as a result.
Here's a place where I feel like my models of romantic relationships are missing something, and I'd be interested to hear peoples' takes on what it might be.
Background claim: a majority of long-term monogamous, hetero relationships are sexually unsatisfying for the man after a decade or so. Evidence: Aella's data here and here are the most legible sources I have on hand; they tell a pretty clear story where sexual satisfaction is basically binary, and a bit more than half of men are unsatisfied in relationships of 10 years (and it keeps getting worse from there). This also fits with my general models of mating markets: women usually find the large majority of men sexually unattractive, most women eventually settle on a guy they don't find all that sexually attractive, so it should not be surprising if that relationship ends up with very little sex after a few years.
What doesn't make sense under my current models is why so many of these relationships persist. Why don't the men in question just leave? Obviously they might not have better relationship prospects, but they could just not have any relationship. The central question which my models don't have a compelling answer to is: wh...
Ah, I think this just reads like you don't think of romantic relationships as having any value proposition beyond the sexual, other than those you listed (which are Things but not The Thing, where The Thing is some weird discursive milieu). Also the tone you used for describing the other Things is as though they are traps that convince one, incorrectly, to 'settle', rather than things that could actually plausibly outweigh sexual satisfaction.
Different people place different weight on sexual satisfaction (for a lot of different reasons, including age).
I'm mostly just trying to explain all the disagree votes. I think you'll get the most satisfying answer to your actual question by having a long chat with one of your asexual friends (as something like a control group, since the value of sex to them is always 0 anyway, so whatever their cause is for having romantic relationships is probably the kind of thing that you're looking for here).
There are a lot of replies here, so I'm not sure whether someone already mentioned this, but: I have heard anecdotally that homosexual men often have relationships which maintain the level of sex over the long term, while homosexual women often have long-term relationships which very gradually decline in frequency of sex, with barely any sex after many decades have passed (but still happily in a relationship).
This mainly argues against your model here:
This also fits with my general models of mating markets: women usually find the large majority of men sexually unattractive, most women eventually settle on a guy they don't find all that sexually attractive, so it should not be surprising if that relationship ends up with very little sex after a few years.
It suggests instead that female sex drive naturally falls off in long-term relationships in a way that male sex drive doesn't, with sexual attraction to a partner being a smaller factor.
“I'm skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor.”
Some people enjoy attending to their partner and find meaning in emotional labor. Housing’s a lot more expensive than gifts and dates. My partner and I go 50/50 on expenses and chores. Some people like having long-term relationships with emotional depth. You might want to try exploring out of your bubble, especially if you life in SF, and see what some normal people (ie non-rationalists) in long term relationships have to say about it.
female partners are typically notoriously high maintenance in money, attention, and emotional labor.
That's the stereotype, but men are the ones who die sooner if divorced, which suggests they're getting a lot out of marriage.
ETA: looked it up, divorced women die sooner as well, but the effect is smaller despite divorce having a bigger financial impact on women.
I will also note that Aella's relationships data is public, and has the following questions:
1. Your age? (rkkox57)
2. Which category fits you best? (4790ydl)
3. In a world where your partner was fully aware and deeply okay with it, how much would you be interested in having sexual/romantic experiences with people besides your partner? (ao3mcdk)
4. In a world where you were fully aware and deeply okay with it, how much would *your partner* be interested in having sexual/romantic experiences with people besides you? (wcq3vrx)
5. To get a little more specific, how long have you been in a relationship with this person? (wqx272y)
6. Which category fits your partner best? (u9jccbo)
7. Are you married to your partner? (pfqs9ad)
8. Do you have children with your partner? (qgjf1nu)
9. Have you or your partner ever cheated on each other? (hhf9b8h)
10. On average, over the last six months, about how often do you watch porn or consume erotic content for the purposes of arousal? (vnw3xxz)
11. How often do you and your partner have a fight? (x6jw4sp)
12. "It’s hard to imagine being happy without this relationship." (6u0bje)
13. "I have no secrets from my partner" (bgassjt)
14. "If my partner an
... I see two explanations: the boring wholesome one and the interesting cynical one.
The wholesome one is: You're underestimating how much other value the partner offers and how much the men care about the mostly-platonic friendship. I think that's definitely a factor that explains some of the effect, though I don't know how much.
The cynical one is: It's part of the template. Men feel that are "supposed to" have wives past a certain point in their lives; that it's their role to act. Perhaps they even feel that they are "supposed to" have wives they hate, see the cliché boomer jokes.
They don't deviate from this template, because:
- Their romantic partner offering lots of value in other ways. I'm skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor. Sure, she might be great in a lot of ways, but it's hard for that to add up enough to outweigh the usual costs.
Assuming arguendo this is true: if you care primarily about sex, hiring sex workers is orders of magnitude more efficient than marriage. Therefor the existence of a given marriage is evidence both sides get something out of it besides sex.
Relationship ... stuff?
I guess I feel kind of confused by the framing of the question. I don't have a model under which the sexual aspect of a long-term relationship typically makes up the bulk of its value to the participants. So, if a long-term relationship isn't doing well on that front, and yet both participants keep pursuing the relationship, my first guess would be that it's due to the value of everything that is not that. I wouldn't particularly expect any one thing to stick out here. Maybe they have a thing where they cuddle and watch the sunrise together while they talk about their problems. Maybe they have a shared passion for arthouse films. Maybe they have so much history and such a mutually integrated life with partitioned responsibilities that learning to live alone again would be a massive labour investment, practically and emotionally. Maybe they admire each other. Probably there's a mixture of many things like that going on. Love can be fed by many little sources.
So, this I suppose:
Their romantic partner offering lots of value in other ways. I'm skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor. Sure, she might be great in a lot of ways, but it's hard for that to add up enough to outweigh the usual costs.
I don't find it hard at all to see how that'd add up to something that vastly outweighs the costs, and this would be my starting guess for what's mainly going on in most long-term relationships of this type.
- A quick google search says the male is primary or exclusive breadwinner in a majority of married couples. Ass-pull number: the monetary costs alone are probably ~50% higher living costs. (Not a factor of two higher, because the living costs of two people living together are much less than double the living costs of one person. Also I'm generally considering the no-kids case here; I don't feel as confused about couples with kids.
But remember that you already conditioned on 'married couples without kids'. My guess would be that in the subset of man-woman married couples without kids, the man being the exclusive breadwinner is a lot less common than in the set of all man-woman married couples. These properties seem like they'd be heavily anti-correlated.
In the subset of man-woman married couples without kids that get along, I wouldn't be surprised if having a partner effectively works out to more money for both participants, because you've got two incomes, but less than 2x living expenses.
- I was picturing an anxious attachment style as the typical female case (without kids). That's unpleasant on a day-to-day basis to begin with, and I expect a lack of sex tends to make it a lot worse.
I...
That is useful, thanks.
Any suggestions for how I can better ask the question to get useful answers without apparently triggering so many people so much? In particular, if the answer is in fact "most men would be happier single but are ideologically attached to believing in love", then I want to be able to update accordingly. And if the answer is not that, then I want to update that most men would not be happier single. With the current discussion, most of what I've learned is that lots of people are triggered by the question, but that doesn't really tell me much about the underlying reality.
Track record: My own cynical take seems to be doing better with regards to not triggering people (though it's admittedly less visible).
Any suggestions for how I can better ask the question to get useful answers without apparently triggering so many people so much?
First off, I'm kind of confused about how you didn't see this coming. There seems to be a major "missing mood" going on in your posts on the topic – and I speak as someone who is sorta-aromantic, considers the upsides of any potential romantic relationship to have a fairly low upper bound for himself[1], and is very much willing to entertain the idea that a typical romantic relationship is a net-negative dumpster fire.
So, obvious-to-me advice: Keep a mental model of what topics are likely very sensitive and liable to trigger people, and put in tons of caveats and "yes, I know, this is very cynical, but it's my current understanding" and "I could totally be fundamentally mistaken here".
In particular, a generalization of an advice from here has been living in my head rent-free for years (edited/adapted):
...Tips For Talking About Your Beliefs On Sensitive Topics
You want to make it clear that they're just your current beliefs abo
Consider two claims:
These two claims should probably not both be true! If any system can be modeled as maximizing a utility function, and it is possible to build a corrigible system, then naively the corrigible system can be modeled as maximizing a utility function.
I expect that many peoples' intuitive mental models around utility maximization boil down to "boo utility maximizer models", and they would therefore intuitively expect both the above claims to be true at first glance. But on examination, the probable-incompatibility is fairly obvious, so the two claims might make a useful test to notice when one is relying on yay/boo reasoning about utilities in an incoherent way.
One second-order effect of the pandemic which I've heard talked about less than I'd expect:
This is the best proxy I found on FRED for new businesses founded in the US, by week. There was a mild upward trend over the last few years, it's really taken off lately. Not sure how much of this is kids who would otherwise be in college, people starting side gigs while working from home, people quitting their jobs and starting their own businesses so they can look after the kids, extra slack from stimulus checks, people losing their old jobs en masse but still having enough savings to start a business, ...
For the stagnation-hypothesis folks who lament relatively low rates of entrepreneurship today, this should probably be a big deal.
Neat problem of the week: researchers just announced roughly-room-temperature superconductivity at pressures around 270 GPa. That's stupidly high pressure - a friend tells me "they're probably breaking a diamond each time they do a measurement". That said, pressures in single-digit GPa do show up in structural problems occasionally, so achieving hundreds of GPa scalably/cheaply isn't that many orders of magnitude away from reasonable, it's just not something that there's historically been much demand for. This problem plays with one idea for generating such pressures in a mass-produceable way.
Suppose we have three materials in a coaxial wire:
We construct the wire at high temperature, then cool it. As the temperature drops, the innermost material stays roughly the same size (since it has low thermal expansion coefficient), while the outermost material shrinks, so the superconducting concoction is squeezed between them.
Exerc...
So I saw the Taxonomy Of What Magic Is Doing In Fantasy Books and Eliezer’s commentary on ASC's latest linkpost, and I have cached thoughts on the matter.
My cached thoughts start with a somewhat different question - not "what role does magic play in fantasy fiction?" (e.g. what fantasies does it fulfill), but rather... insofar as magic is a natural category, what does it denote? So I'm less interested in the relatively-expansive notion of "magic" sometimes seen in fiction (which includes e.g. alternate physics), and more interested in the pattern called "magic" which recurs among tons of real-world ancient cultures.
Claim (weakly held): the main natural category here is symbols changing the territory. Normally symbols represent the world, and changing the symbols just makes them not match the world anymore - it doesn't make the world do something different. But if the symbols are "magic", then changing the symbols changes the things they represent in the world. Canonical examples:
Everybody's been talking about Paxlovid, and how ridiculous it is to both stop the trial since it's so effective but also not approve it immediately. I want to at least float an alternative hypothesis, which I don't think is very probable at this point, but does strike me as at least plausible (like, 20% probability would be my gut estimate) based on not-very-much investigation.
Early stopping is a pretty standard p-hacking technique. I start out planning to collect 100 data points, but if I manage to get a significant p-value with only 30 data points, then I just stop there. (Indeed, it looks like the Paxlovid study only had 30 actual data points, i.e. people hospitalized.) Rather than only getting "significance" if all 100 data points together are significant, I can declare "significance" if the p-value drops below the line at any time. That gives me a lot more choices in the garden of forking counterfactual paths.
Now, success rates on most clinical trials are not very high. (They vary a lot by area - most areas are about 15-25%. Cancer is far and away the worst, below 4%, and vaccines are the best, over 30%.) So I'd expect that p-hacking is a pretty large chunk of approved drugs, which means pharma companies are heavily selected for things like finding-excuses-to-halt-good-seeming-trials-early.
Early stopping is a pretty standard p-hacking technique.
It was stopped after a pre-planned interim analysis; that means they're calculating the stopping criteria/p-values with multiple testing correction built in, using sequential analysis.
Here's an AI-driven external cognitive tool I'd like to see someone build, so I could use it.
This would be a software tool, and the user interface would have two columns. In one column, I write. Could be natural language (like google docs), or code (like a normal IDE), or latex (like overleaf), depending on what use-case the tool-designer wants to focus on. In the other column, a language and/or image model provides local annotations for each block of text. For instance, the LM's annotations might be:
I've long been very suspicious of aggregate economic measures like GDP. But GDP is clearly measuring something, and whatever that something is it seems to increase remarkably smoothly despite huge technological revolutions. So I spent some time this morning reading up and playing with numbers and generally figuring out how to think about the smoothness of GDP increase.
Major takeaways:
[Epistemic status: highly speculative]
Smoke from California/Oregon wildfires reaching the East Coast opens up some interesting new legal/political possibilities. The smoke is way outside state borders, all the way on the other side of the country, so that puts the problem pretty squarely within federal jurisdiction. Either a federal agency could step in to force better forest management on the states, or a federal lawsuit could be brought for smoke-induced damages against California/Oregon. That would potentially make it a lot more difficult for local homeowners to block controlled burns.
Brief update on how it's going with RadVac.
I've been running ELISA tests all week. In the first test, I did not detect stronger binding to any of the peptides than to the control in any of several samples from myself or my girlfriend. But the control itself was looking awfully suspicious, so I ran another couple tests. Sure enough, something in my samples is binding quite strongly to the control itself (i.e. the blocking agent), which is exactly what the control is supposed to not do. So I'm going to try out some other blocking agents, and hopefully get an...
Someone should write a book review of The Design of Everyday Things aimed at LW readers, so I have a canonical source to link to other than the book itself.
I had a shortform post pointing out the recent big jump in new businesses in the US, and Gwern replied:
How sure are you that the composition is interesting? How many of these are just quick mask-makers or sanitizer-makers, or just replacing restaurants that have now gone out of business? (ie very low-value-added companies, of the 'making fast food in a stall in a Third World country' sort of 'startup', which make essentially no or negative long-term contributions).
This was a good question in context, but I disagree with Gwern's model of where-progress-come...
[EDIT: Never mind, proved it.]
Suppose I have an information channel . The X components and the Y components are sparsely connected, i.e. the typical is downstream of only a few parent X-components . (Mathematically, that means the channel factors as .)
Now, suppose I split the Y components into two sets, and hold constant any X-com...
Does anyone know of an "algebra for Bayes nets/causal diagrams"?
More specifics: rather than using a Bayes net to define a distribution, I want to use a Bayes net to state a property which a distribution satisfies. For instance, a distribution P[X, Y, Z] satisfies the diagram X -> Y -> Z if-and-only-if the distribution factors according to
P[X, Y, Z] = P[X] P[Y|X] P[Z|Y].
When using diagrams that way, it's natural to state a few properties in terms of diagrams, and then derive some other diagrams they imply. For instance, if a distribution P[W, X, Y, Z]...
Weather just barely hit 80°F today, so I tried the Air Conditioner Test.
Three problems came up:
I stumbled across this paper yesterday. I haven't looked at it very closely yet, but the high-level pitch is that they look at genetic predictors of iron deficiency and then cross that with anxiety data. It's interesting mainly because it sounds pretty legit (i.e. the language sounds like direct presentation of results without any bullshitting, the p-values are satisfyingly small, there's no branching paths), and the effect sizes are BIG IIUC:
...
The odd ratios (OR) of anxiety disorders
I keep seeing news outlets and the like say that SORA generates photorealistic videos, can model how things move in the real world, etc. This seems like blatant horseshit? Every single example I've seen looks like video game animation, not real-world video.
Have I just not seen the right examples, or is the hype in fact decoupled somewhat from the model's outputs?
Putting this here for posterity: I have thought since the superconductor preprint went up, and continue to think, that the markets are putting generally too little probability on the claims being basically-true. I thought ~70% after reading the preprint the day it went up (and bought up a market on manifold to ~60% based on that, though I soon regretted not waiting for a better price), and my probability has mostly been in the 40-70% range since then.
Languages should have tenses for spacelike separation. My friend and I do something in parallel, it's ambiguous/irrelevant which one comes first, I want to say something like "I expect my friend <spacelike version of will do/has done/is doing> their task in such-and-such a way".
Two kinds of cascading catastrophes one could imagine in software systems...
I wish there were a fund roughly like the Long-Term Future Fund, but with an explicit mission of accelerating intellectual progress.
For short-term, individual cost/benefit calculations around C19, it seems like uncertainty in the number of people currently infected should drop out of the calculation.
For instance: suppose I'm thinking about the risk associated with talking to a random stranger, e.g. a cashier. My estimated chance of catching C19 from this encounter will be roughly proportional to . But, assuming we already have reasonably good data on number hospitalized/died, my chances of hospitalization/death given infection will be roughly inversely proportional to ...
Way back in the halcyon days of 2005, a company called Cenqua had an April Fools' Day announcement for a product called Commentator: an AI tool which would comment your code (with, um, adjustable settings for usefulness). I'm wondering if (1) anybody can find an archived version of the page (the original seems to be gone), and (2) if there's now a clear market leader for that particular product niche, but for real.
Here's an interesting problem of embedded agency/True Names which I think would make a good practice problem: formulate what it means to "acquire" something (in the sense of "acquiring resources"), in an embedded/reductive sense. In other words, you should be able-in-principle to take some low-level world-model, and a pointer to some agenty subsystem in that world-model, and point to which things that subsystem "acquires" and when.
Some prototypical examples which an answer should be able to handle well:
This billboard sits over a taco truck I like, so I see it frequently:
The text says "In our communities, Kaiser Permanente members are 33% less likely to experience premature death due to heart disease.*", with the small-text directing one to a url.
The most naive (and presumably intended) interpretation is, of course, that being a Kaiser Permanente member provides access to better care, causing 33% lower chance of death due to heart disease.
Now, I'd expect most people reading this to immediately think something like "selection effects!" - i.e. what the bill...
That is unsurprising to me, since the overall gist of Rationalism is an attempt to factor uncertainty out of the near future, life, and thought itself.
This tells me you don't know anything about LW-rationality or are being deliberately uncharitable to it.
You're mostly making blanket broad claims, maybe make a top level post which is charitable to the entire project. Go in depth post by post on where you think people have gone wrong, and in what way. High effort posting is appreciated.
An interesting conundrum: one of the main challenges of designing useful regulation for AI is that we don't have any cheap and robust way to distinguish a dangerous neural net from a non-dangerous net (or, more generally, a dangerous program from a non-dangerous program). This is an area where technical research could, in principle, help a lot.
The problem is, if there were some robust metric for how dangerous a net is, and that metric were widely known and recognized (as it would probably need to be in order to be used for regulatory purposes), then someone would probably train a net to maximize that metric directly.
Neat problem of the week: we have n discrete random variables, . Given any variable, all variables are independent:
Characterize the distributions which satisfy this requirement.
This problem came up while working on the theorem in this post, and (separately) in the ideas behind this post. Note that those posts may contain some spoilers for the problem, though frankly my own proofs on this one just aren't very good.