Quick Takes

habryka6515

Reputation is lazily evaluated

When evaluating the reputation of your organization, community, or project, many people flock to surveys in which you ask randomly selected people what they think of your thing, or what their attitudes towards your organization, community or project are. 

If you do this, you will very reliably get back data that looks like people are indifferent to you and your projects, and your results will probably be dominated by extremely shallow things like "do the words in your name invoke positive or negative associations".

People l... (read more)

practically all metrics of the EA community's health and growth have sharply declined, and the extremely large and negative reputational effects have become clear.

I want more evidence on your claim that FTX had a major effect on EA reputation. Or: why do you believe it?

8Guive
This is good. Please consider making it a top level post. 
sapphire5723

Don't Induce psychosis intentionally. Don't take psychedelics while someone probes your beliefs. Don't let anyone associated with Michael Vasser anywhere near you during an altered state.

Edit: here is a different report from three years ago with the same person administering the methods: 

Mike Vasser followers practice intentionally inducing psychosis via psychedelic drugs. Inducing psychosis is a verbatim self report of what they are doing. I would say they practice drug induced brain washing. TBC they would dispute the term brain washing and probably... (read more)

Showing 3 of 40 replies (Click to show all)

Thanks for answering; good to hear that you don't think you've had any severe or long-lasting consequences (though it sounds like one time LSD was a contributor to your episode of bad mental health).

I guess here's other question that seems natural: it's been said that some people take LSD on either the personal advice of Michael Vassar, or otherwise as a result of reading/discussing his ideas. Are either of those true for you?

5habryka
I think "psychosis is underrated" and/or "psychosis is often the sign of a good kind of cognitive processing" are things I have heard from at least people very close to Michael (I think @jessicata made some arguments in this direction):  (To be clear, I don't think "jessicata is in favor of psychosis" is at all a reasonable gloss here, but I do think there is an attitude towards things like psychosis that I disagree with that is common in the relevant circles)
1AprilSR
...Yeah I'm well aware but probably useful context
O O10

O1’s release has made me think Yann Lecun’s AGI timelines are probably more correct than shorter ones

I've been thinking of writing up a piece on the implications of very short timelines, in light of various people recently suggesting them (eg Dario Amodei, "2026 or 2027...there could be a mild delay")

Here's a thought experiment: suppose that this week it turns out that OAI has found a modified sampling technique for o1 that puts it at the level of the median OAI capabilities researcher, in a fairly across-the-board way (ie it's just straightforwardly able to do the job of a researcher). Suppose further that it's not a significant additional compute expens... (read more)

Showing 3 of 4 replies (Click to show all)
4plex
Depends what they do with it. If they use it to do the natural and obvious capabilities research, like they currently are (mixed with a little hodge podge alignment to keep it roughly on track), I think we just basically for sure die. If they pivot hard to solving alignment in a very different paradigm and.. no, this hypothetical doesn't imply the AI can discover or switch to other paradigms. I think doom is almost certain in this scenario.

If we could trust OpenAI to handle this scenario responsibly, our odds would definitely seem better to me.

2Noosphere89
I'd say that we'd have a 70-80% chance of going through the next decade without causing a billion deaths if powerful AI comes.
leogao344

it's quite plausible (40% if I had to make up a number, but I stress this is completely made up) that someday there will be an AI winter or other slowdown, and the general vibe will snap from "AGI in 3 years" to "AGI in 50 years". when this happens it will become deeply unfashionable to continue believing that AGI is probably happening soonish (10-15 years), in the same way that suggesting that there might be a winter/slowdown is unfashionable today. however, I believe in these timelines roughly because I expect the road to AGI to involve both fast periods and slow bumpy periods. so unless there is some super surprising new evidence, I will probably only update moderately on timelines if/when this winter happens

Showing 3 of 6 replies (Click to show all)

What do you think would be the cause(s) of the slowdown?

4Noosphere89
My guess is that for now, I'd give around a 10-30% chance to "AI winter happens for a short period/AI progress slows down" by 2027. Also, what would you consider super surprising new evidence?
6TsviBT
If you keep updating such that you always "think AGI is <10 years away" then you will never work on things that take longer than 15 years to help. This is absolutely a mistake, and it should at least be corrected after the first round of "let's not work on things that take too long because AGI is coming in the next 10 years". I will definitely be collecting my Bayes points https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce
4niplav
Thank you for collecting those links :-) I've listened to two or three of the interviews (and ~three other talks from a long time ago), and I still have no clue what the central claims are, what the reasoning supporting them is &c. (I understand it most for Zvi Mowshowitz and Sarah Constantin, less for Jessica Taylor, and least for Benjamin Hoffman & Vassar). I also don't know of anyone who became convinced of or even understood any of Michael Vassar's views/stances through his writing/podcasts alone—it appears to almost always happens through in-person interaction.

I want to say I have to an extent (for all three), though I guess there's been second-hand in person interactions which maybe counts. I dunno if there's any sort of central thesis I could summarize, but if you pointed me at like any more specific topics I could take a shot at translating. (Though I'd maybe prefer to avoid the topic for a little while.)

In general, I think an actual analysis of the ideas involved and their merits / drawbacks existing would've been a lot more helpful for me than just... people having a spooky reputation was.

leogao158

people often say that limitations of an artistic medium breed creativity. part of this could be the fact that when it is costly to do things, the only things done will be higher effort

Showing 3 of 4 replies (Click to show all)
2Noosphere89
This seems the likely explanation for any claim that constraints breed creativity/good things in a field, when the expectation is that the opposite outcome would occur.
1StartAtTheEnd
My own expectation is that limitations result in creativity. Writers block is usually a result of having too many possibilities/choices. If I tell you "You can write a story about anything", it's likely harder for you to think of anything than if I tell you "Write a story about an orange cat". In the latter situation, you're more limited, but you also have something to work with. I'm not sure if it's as true for computers as it is for humans (that would imply information-theoretic factors), but there's plenty of factors in humans, like analysis paralysis and the "See also" section of that page

My other explanation probably has to do with the fact that it's way easier to work with an already almost-executed object than a specification, because we are constrained to only think about a subset of possibilities for a reasonable time.

In other words, constraints are useful given that you are already severely constrained, to limit the space of possibilities.

Risk is a great study into why selfish egoism fails.

I took an ethics class at university, and mostly came to the opinion that morality was utilitarianism with an added deontological rule to not impose negative externalities on others. I.e. "Help others, but if you don't, at least don't hurt them." Both of these are tricky, because anytime you try to "sum over everyone" or have any sort of "universal rule" logic breaks down (due to Descartes' evil demon and Russell's vicious circle). Really, selfish egoism seemed to make more logical sense, but it doesn't h... (read more)

leogao179

a take I've expressed a bunch irl but haven't written up yet: feature sparsity might be fundamentally the wrong thing for disentangling superposition; circuit sparsity might be more correct to optimize for. in particular, circuit sparsity doesn't have problems with feature splitting/absorption

Sodium20

Yeah my view is that as long as our features/intermediate variables form human understandable circuits, it doesn't matter how "atomic" they are. 

Basically every time a new model is released by a major lab, I hear from at least one person (not always the same person) that it's a big step forward in programming capability/usefulness. And then David gives it a try, and it works qualitatively the same as everything else: great as a substitute for stack overflow, can do some transpilation if you don't mind generating kinda crap code and needing to do a bunch of bug fixes, and somewhere between useless and actively harmful on anything even remotely complicated.

It would be nice if there were someone who t... (read more)

Showing 3 of 11 replies (Click to show all)

I find them quite useful despite being buggy. I spend about 40% of my time debugging model code, 50% writing my own code, and 10% prompting. Having a planning discussion first with s3.6, and asking it to write code only after 5 or more exchanges works a lot better.

Also helpful is asking for lots of unit tests along the way yo confirm things are working as you expect.

6David Lorell
Sounds plausible. Is that 50% of coding work that the LLMs replace of a particular sort, and the other 50% a distinctly different sort?
6David Lorell
My impression is that they are getting consistently better at coding tasks of a kind that would show up in the curriculum of an undergrad CS class, but much more slowly improving at nonstandard or technical tasks. 
... (read more)
Viliam40

they lacked the conviction to push any other distinctive brand (and in some cases the situation made alternatives infeasible).

I guess it is difficult to promote the brand of Tough No-Nonsense Prosecutor in the age of Defund The Police.

Which is really unfortunate, because it seems like "defund the police" was actually what woke white people wanted. Black people were probably horrified by the idea of giving up and letting the crime grow exponentially at the places they live. Unfortunately, the woke do not care about the actual opinions of the people they spe... (read more)

If Biden pardons people like Fauci for crimes like perjury, that would set a bad precedent.

There's a reason why perjury is forbidden and if you just give pardons to any government official who committed crimes at the end of an administration that's a very bad precedent.

One way out of that would be to find a different way to punish government criminals when they are pardoned. One aspect of a pardon is that they remove the Fifth Amendment defense. 

You can subpoena pardoned people in front of Congress and ask them under oath to speak about all the crimes... (read more)

Misgivings about Category Theory

[No category theory is required to read and understand this screed]

A week does not go by without somebody asking me what the best way to learn category theory is. Despite it being set to mark its 80th annivesary, Category Theory has the evergreen reputation for being the Hot New Thing, a way to radically expand the braincase of the user through an injection of abstract mathematics. Its promise is alluring, intoxicating for any young person desperate to prove they are the smartest kid on the block.

Recently, there has been sig... (read more)

Showing 3 of 11 replies (Click to show all)
Quinn40

I was at an ARIA meeting with a bunch of category theorists working on safeguarded AI and many of them didn't know what the work had to do with AI.

epistemic status: short version of post because I never got around to doing the proper effort post I wanted to make.

3Quinn
my dude, top level post- this does not read like a shortform
3StartAtTheEnd
Great post! It's a habit of mine to think in very high levels of abstraction (I haven't looked much into category theory though, admittedly), and while it's fun, it's rarely very useful. I think it's because of a width-depth trade-off. Concrete real-world problems have a lot of information specific to that problem, you might even say that the unique information is the problem. An abstract idea which applies to all of mathematics is way too general to help much with a specific problem, it can just help a tiny bit with a million different problems. I also doubt the need for things which are so complicated that you need a team of people to make sense of them. I think it's likely a result of bad design. If a beginner programmer made a slot machine game, the code would likely be convoluted and unintuitive, but you could probably design the program in a way that all of it fits in your working memory at once. Something like "A slot machine is a function from the cartesian product of wheels to a set of rewards". An understanding which would simply the problem so that you could write it much shorter and simpler than the beginner. What I mean is that there may exist simple designs for most problems in the world, with complicated designs being due to a lack of understanding. The real world values the practical way more than the theoretical, and the practical is often quite sloppy and imperfect, and made to fit with other sloppy and imperfect things. The best things in society are obscure by statistical necessity, and it's painful to see people at the tail ends doubt themselves at the inevitable lack of recognition and reward.

What is the current popular (or ideally wise) wisdom wrt publishing demos of scary/spooky AI capabilities? I've heard the argument that moderately scary demos drive capability development into secrecy. Maybe it's just all in the details of who you show what when and what you say. But has someone written a good post about this question?

The way it is now, when one lab has an insight, the insight will probably spread quickly to all the other labs. If we could somehow "drive capability development into secrecy," that would drastically slow down capability development.

I kinda feel like I literally have more subjective experience after experiencing ego death/rebirth. I suspect that humans vary quite a lot in how often they are conscious, and to what degree. And if you believe, as I do, that consciousness is ultimately algorithmic in nature (like, in the "surfing uncertainty" predictive processing view, that it is a human-modeling thing which models itself to transmit prediction-actions) it would not be crazy for it to be a kind of mental motion which sometimes we do more or less of, and which some people lack entirely.

I ... (read more)

Showing 3 of 5 replies (Click to show all)

interesting. what if she has her memories and some abstract theory of what she is, and that theory is about as accurate as anyone else's theory, but her experiences are not very vivid at all. she's just going through the motions running on autopilot all the time - like when people get in a kind of trance while driving.

5Seth Herd
I think you are probably attending more often to sensory experiences, and thereby both creating and remembering more detailed representations of physical reality. You are probably doing less abstract thought, since the number of seconds in a day hasn't changed. Which do you want to spend more time on? And which sorts? It's a pretty personal question. I like to try to make my abstract thought productive (relative to my values), freeing up some time to enjoy sensory experiences. I'm not sure there's a difference in representational density in doing sensory experience vs. abstract thought. Maybe there is. One factor in making it seem like you're having more sensory experience is how much you can remember after a set amount of time; another is whether each moment seems more intense by having strong emotional experience attached to it. Or maybe you mean something different by more subjective experience.
3Sinclair Chen
You are definitely right about tradeoff of my direct sensory experience vs other things my brain could be doing like calculation or imagination. I hope with practice or clever tool use I will get better at something like doing multiple modes at once, task switching faster between modes, or having a more accurate yet more compressed integrated gestalt self.

Putting down a prediction I have had for quite some time.
The current LLM/Transformer architecture will stagnate before AGI/TAI (That is the ability to do any cognitive task as effectively and cheaper than a human)

From what I have seen, Tesla autopilot learns >10,000 slower than a human datawise.

We will get AGI by copying nature, at the scale of a simple mammal brain, then scaling up, like this kind of project:

https://x.com/Andrew_C_Payne/status/1863957226010144791
https://e11.bio/news/roadmap
I expect AGI to be 0-2 years after a mammal brain is mapped. In... (read more)

Showing 3 of 6 replies (Click to show all)
3RussellThor
Perhaps LLM will help with that. The reason I think that is less likely is  1. Deep mind etc is already heavily across biology from what I gather from interview with Demis. If the knowledge was there already there's a good chance they would have found it 2. Its something specific we are after, not many small improvements, i.e. the neural code. Specifically  back propagation is not how neurons learn. I'm pretty sure how they actually do is not in the literature. Attempts have been made such as the forward-forward algorithm by Hinton, but that didn't come to anything as far as i can tell. I havn't seen any suggestion that even with too much detail on biology we know what it is. i.e. can a very detailed neural sim with extreme processing power learn as data efficiently as biology? 3. If progress must come from a large jump rather than small steps, then LLM have quite a long way to go, i.e. LLM need to speed up coming up ideas as novel as the forward-forward algo to help much. If they are still below that threshold in 2026 then those possible insights are still almost entirely done by people. 4. Even the smartest minds in the past have been beaten by copying biology in AI. The idea for neural nets came from copying biology. (Though the transformer arch and back prop didn't)
3Nathan Helm-Burger
I've heard this viewpoint expressed before, and find it extremely confusing. I've been studying neuroscience and it's implications for AI for twenty years now. I've read thousands of papers, including most of what DeepMind has produced. There's still so many untested ideas because biology and the brain are so complex. Also because people tend to flock to popular paradigms, rehashing old ideas rather than testing new ones. I'm not saying I know where the good ideas are, just that I perceive the explored portions of the Pareto frontier of plausible experiments to be extremely ragged. The are tons of places covered by "Fog of War" where good ideas could be hiding. DeepMind is a tiny fraction of the scientists in the world that have been working on understanding and emulating the brain. Not all the scientists in the world have managed to test all the reasonable ideas, much less DeepMind alone. Saying DeepMind has explored the implications of biology for AI is like saying that the Opportunity Rover has explored Mars. Yes, this is absolutely true, but the unexplored area vastly outweighs the explored area. If you think the statement implies "explored ALL of Mars" then you have a very inaccurate picture in mind.

OK fair point. If we are going to use analogies, then my point #2 about a specific neural code shows our different positions I think.

Lets say we are trying to get a simple aircraft of the ground and we have detailed instructions for a large passenger jet. Our problem is that the metal is too weak and cannot be used to make wings, engines etc. In that case detailed plans for aircraft are no use, a single minded focus on getting better metal is what its all about. To me the neural code is like the metal and all the neuroscience is like the plane schematics. ... (read more)

1a3orn94

Lighthaven clearly needs to get an actual Gerver's sofa particularly if the proof that it's optimal comes through.

It does look uncomfortable I'll admit, maybe it should go next to the sand table.

I was just thinking of adding some kind of donation tier where if you donate $20k to us we will custom-build a Gerver sofa, and dedicate it to you.

We still don't know if this will be guaranteed to happen, but it seems that OpenAI is considering removing its "regain full control of Microsoft shares once AGI is reached" clause. It seems they want to be able to keep their partnership with Microsoft (and just go full for-profit (?)).

Here's the Financial Times article:

OpenAI seeks to unlock investment by ditching ‘AGI’ clause with Microsoft

OpenAI is in discussions to ditch a provision that shuts Microsoft out of its most advanced models when the start-up achieves “artificial general intelligence”, as

... (read more)

At long last, I'm delurking here. Hi!

Showing 3 of 6 replies (Click to show all)
Neil 10

need any help on post drafts? whatever we can do to reduce those trivial inconveniences 

1Karl Krueger
Since LW2.0 went up, on and off. Been meaning to delurk since at least Less Online earlier this year. There's more interesting stuff going on of late!
1Fernando
Thanks for the karma. Post published!

New AWS Trainium 2 cluster offers compute equivalent to 250K H100s[1], and under this assumption Anthropic implied[2] their previous compute was 50K H100s (possibly what was used to train Claude 3.5 Opus).

So their current or imminent models are probably 1e26-2e26 FLOPs (2-4 months on 50K H100s at 40% compute utilization in BF16)[3], and the upcoming models in mid to late 2025 will be 5e26-1e27 FLOPs, ahead of what 100K H100s clusters of other players (possibly except Google) can deliver by that time.


  1. SemiAnalysis gives an estimate of 24-27 kilowatts per

... (read more)
Showing 3 of 8 replies (Click to show all)
4Vladimir_Nesov
Training as it's currently done needs to happen within a single cluster (though this might change soon). The size of the cluster constrains how good a model can be trained within a few months. Everything that isn't training of a frontier model can happen using many smaller clusters, something like 16 to 4096 accelerators each. You can use a lot of these smaller clusters, but they can be sourced from anywhere and built piecemeal at multiple sites with smaller power allocations, while the big training cluster needs to be a single purposefully built system. So I expect the big expenses are inference and many training experiments with smaller models. What I'm discussing here is the big cluster for training frontier models rather than the aggregate of the small clusters for other purposes. See also this comment. Patel's claim is 100K H100s at 150 megawatts.
5Aaron_Scher
I think that's probably wrong, or at least effectively wrong. Gemini 1.0, trained a year ago has the following info in the technical report:  As you note, public distributed training methods have advanced beyond basic data parallelism (though they have not been publicly shown at large model scales because nobody has really tried yet). 

This might require bandwidth of about 300 Tbps for 500K B200s systems (connecting their geographically distributed parts), based on the below estimate. It gets worse with scale.

The "cluster" label applied in this context might be a bit of a stretch, for example the Llama 3 24K H100s cluster is organized in pods of 3072 GPUs, and the pods themselves are unambiguously clusters, but at the top level they are connected with 1:7 oversubscription (Section 3.3.1).

Only averaged gradients need to be exchanged at the top level, once at each optimizer step (minibatch... (read more)

Load More