It might be some elements of human intelligence (at least at the civilizational level) are culturally/memetically transmitted. All fine and good in theory. Except the social hypercompetition between people and intense selection pressure of ideas online might be eroding our world's intelligence. Eliezer wonders if he's only who he is because he grew up reading old science fiction from before the current era's memes.

10Raemon
This a first pass review that's just sort of organizing my thinking about this post. This post makes a few different types of claims: * Hyperselected memes may be worse (generally) than weakly selected ones * Hyperselected memes may specifically be damaging our intelligence/social memetic software * People today are worse at negotiating complex conflicts from different filter bubbles * There's a particular set of memes (well represented in 1950s sci-fi) that was particularly important, and which are not as common nowadays. It has a question which is listed although not focused on too explicitly on its own terms: * What do you do if you want to have good ideas? (i.e. "drop out of college? read 1950s sci-fi in your formative years?") It prompts me to separately consider the questions: * What actually is the internet doing to us? It's surely doing something. * What sorts of cultures are valuable? What sorts of cultures can be stably maintained? What sorts of cultures cause good intellectual development? ... Re: the specific claim of "hypercompetition is destroying things", I think the situation is complicated by the "precambrian explosion" of stuff going on right now. Pop music is defeating classical music in relative terms, but, like, in absolute terms there's still a lot more classical music now than in 1400 [citation needed?]. I'd guess this is also true of for tribal FB comments vs letter-to-the-editor-type writings.  * [claim by me] Absolute amounts of thoughtful discourse is probably still increasing My guess is that "listens carefully to arguments" has just always been rare, and that people have generally been dismissive of the outgroup, and now that's just more prominent. I'd also guess that there's more 1950s style sci-fi today than in 1950. But it might not be, say, driving national projects that required a critical mass of it. (And it might or might not be appearing on bestseller lists?) If so, the question is less "are things being destro
Customize
Annapurna70
0
The U.S. 30-year Treasury rate has reached 5.13%, a level last seen in October 2023. The last time this rate was at this level was in 2007, when the U.S. federal debt was about $9 trillion. Today, that debt is nearing $37 trillion. I believe bond market participants are signaling a lack of confidence that the fiscal situation in the United States will improve during President Trump’s second administration. Like many financial professionals, I had high hopes that President Trump’s election would bring the fiscal situation in order. Unfortunately, the "Department of Government Efficiency" has not been as efficient as many had hoped, and U.S. Congress seems completely uninterested in reducing federal spending in a meaningful way. The tax cut bill currently moving through Congress, fully backed by the White House, will exacerbate the fiscal situation. If this trend of rising long-term Treasury rates continues, the United States will soon face very tough decisions that neither Wall Street nor Main Street is ready to face.
The current cover of If Anyone Builds it, Everyone Dies is kind of ugly and I hope it is just a placeholder. At least one of my friends agrees. Book covers matter a lot! I'm not a book cover designer, but here are some thoughts: AI is popular right now, so you'd probably want to indicate that from a distance. The current cover has "AI" half-faded in the tagline. Generally the cover is not very nice to look at.  Why are you de-emphasizing "Kill Us All" by hiding it behind that red glow? I do like the font choice, though. No-nonsense and straightforward.  @Eliezer Yudkowsky @So8res 
Thomas Kwa*Ω401188
17
Cross-domain time horizon:  We know AI time horizons (human time-to-complete at which a model has a 50% success rate) on software tasks are currently ~1.5hr and doubling every 4-7 months, but what about other domains? Here's a preliminary result comparing METR's task suite (orange line) to benchmarks in other domains, all of which have some kind of grounding in human data: Observations * Time horizons agentic computer use (OSWorld) is ~100x shorter than other domains. Domains like Tesla self-driving (tesla_fsd), scientific knowledge (gpqa), and math contests (aime), video understanding (video_mme), and software (hcast_r_s) all have roughly similar horizons. * My guess is this means models are good at taking in information from a long context but bad at acting coherently. Most work requires agency like OSWorld, which may be why AIs can't do the average real-world 1-hour task yet. * There are likely other domains that fall outside this cluster; these are just the five I examined * Note the original version had a unit conversion error that gave 60x too high horizons for video_mme; this has been fixed (thanks @ryan_greenblatt ) * Rate of improvement varies significantly; math contests have improved ~50x in the last year but Tesla self-driving only 6x in 3 years. * HCAST is middle of the pack in both. Note this is preliminary and uses a new methodology so there might be data issues. I'm currently writing up a full post! Is this graph believable? What do you want to see analyzed? edit: fixed Video-MME numbers
Many believe that one hope for our future is that the AI labs will makes some mistake that will kill many people, but not all of us, resulting in the survivors finally realizing how dangerous AI is. I wish people would refer to that as a "near miss", not a "warning shot". A warning shot is when the danger (originally a warship) actually cares about you but cares about its mission more, with the result that it complicates its plans and policies to try to keep you alive.
Jack Morris has posted this thread https://x.com/jxmnop/status/1925224612872233081 about his paper "Harnessing the Universal Geometry of Embeddings"    Have others thought through what this means for the notion of fundamentally alien internal ontologies? Would love any ideas! Sorry if missed a post on it.

Popular Comments

> The trends reflect the increasingly intense tastes of the highest spending, most engaged consumers. https://logicmag.io/play/my-stepdad's-huge-data-set/ > > While a lot of people (most likely you and everyone you know) are consumers of internet porn (i.e., they watch it but don’t pay for it), a tiny fraction of those people are customers. Customers pay for porn, typically by clicking an ad on a tube site, going to a specific content site (often owned by MindGeek), and entering their credit card information. > > > > This “consumer” vs. “customer” division is key to understanding the use of data to perpetuate categories that seem peculiar to many people both inside and outside the industry. “We started partitioning this idea of consumers and customers a few years ago,” Adam Grayson, CFO of the legacy studio Evil Angel, told AVN. “It used to be a perfect one-to-one in our business, right? If somebody consumed your stuff, they paid for it. But now it’s probably 10,000 to one, or something.” > > > > There’s an analogy to be made with US politics: political analysts refer to “what the people want,” when in fact a fraction of “the people” are registered voters, and of those, only a percentage show up and vote. Candidates often try to cater to that subset of “likely voters”— regardless of what the majority of the people want. In porn, it’s similar. You have the people (the consumers), the registered voters (the customers), and the actual people who vote (the customers who result in a conversion—a specific payment for a website subscription, a movie, or a scene). Porn companies, when trying to figure out what people want, focus on the customers who convert. It’s their tastes that set the tone for professionally produced content and the industry as a whole. > > > > By 2018, we are now over a decade into the tube era. That means that most LA-area studios are getting their marching orders from out-of-town business people armed with up-to-the-minute customer data. Porn performers tend to roll their eyes at some of these orders, but they don’t have much choice. I have been on sets where performers crack up at some of the messages that are coming “from above,” particularly concerning a repetitive obsession with scenes of “family roleplay” (incest-themed material that uses words like “stepmother,” “stepfather,” and “stepdaughter”) or what the industry calls “IR” (which stands for “interracial” and invariably means a larger, dark-skinned black man and a smaller light-skinned white woman, playing up supposed taboos via dialogue and scenarios). > > > > These particular “taboo” genres have existed since the early days of commercial American porn. For instance, see the stellar performance by black actor Johnnie Keyes as Marilyn Chambers’ orgy partner in 1972’s cinematic Behind the Green Door, or the VHS-era incest-focused sensation Taboo from 1980. But backed by online data of paid customers seemingly obsessed with these topics, the twenty-first-century porn industry—which this year, to much fanfare, was for the first time legally allowed to film performers born in this millennium—has seen a spike in titles devoted to these (frankly old-fashioned) fantasies. > > > > Most performers take any jobs their agents send them out for. The competition is fierce—the ever-replenishing supply of wannabe performers far outweighs the demand for roles—and they don’t want to be seen as “difficult” (particularly the women). Most of the time, the actors don’t see the scripts or know any specific details until they get to set. To the actors rolling their eyes at yet another prompt to declaim, “But you’re my stepdad!” or, “Show me your big black dick,” the directors shrug, point at the emailed instructions and say, “That’s what they want…” So my interpretation here is that it's not that there's suddenly a huge spike in people discovering they love incest in 2017 where they were clueless in 2016 or that they were all brainwashed to no longer enjoy vanilla that year, it's that that is when the hidden oligopoly turned on various analytics and started deliberately targeting those fetishes as a fleet-wide business decision. And this was because they had so thoroughly commodified regular porn to a price point of $0, that the only paying customers that are left are the ones with extreme fetishes who cannot be supplied by regular amateur or pro supply. They may or may not have increased in absolute number compared to pre-2017, but it doesn't matter, because everyone else vanished, and their relative importance skyrocketed: "If somebody consumed your stuff, they paid for it. But now it’s probably 10,000 to one, or something.” (For younger readers who may be confused by how a ratio like 10000:1 is even hypothetically possible because 'where did that 10k come from when no one pays for porn?', it's worth recalling that renting porn videos used to be big business that would be done by a lot of men, and it kept many non-Blockbuster video rental stores afloat and it was an ordinary thing for your local store to have a 'back room' that the kiddies were strictly forbidden from, and while it would certainly stock a lot of fetish stuff like interracial porn, it also rented out tons of normal stuff. If you have no idea what this was like, you may enjoy reading "True Porn Clerk Stories", Ali Davis 2002.) I think there is a similar effect with foot fetishes & furries: they are oddly well-heeled and pay a ton of money for new stuff, because they are under-supplied and demand new ones. There is not much 'organic' supply of women photographing their feet in various lascivious ways; it's not that it's hard, they just don't do it, but can be incentivized to do so. (I recall reading an article on Wikifoot where IIRC they interviewed a contributor who said he got some photos by simply politely emailing or DMing the woman to ask for her to take some foot photos, and she would oblige. "send foots kthnxbai" apparently works. And probably it's fairly easy to pay for or commission feet images/videos: almost everyone has two feet already, and you can work in feet into regular porn easily by simply choosing different angles or postures, and a closeup of a foot won't turn off regular porn consumers either, so you can have your cake & eat it too. Similarly for incest: saying "But you're my stepdad!" is cheap and easy and anyone can do it if the Powers That Be tell them to in case a few 'customers' will pay actual $$$ for it, while those 'consumers' not into that plot roll their eyes and ignore it as so much silly 'porn movie plot' framing as they get on with business.)
Text diffusion models are still LLMs, just not autoregressive.
The key question is whether you can find improvements which work at large scale using mostly small experiments, not whether the improvements work just as well at small scale. The 3 largest algorithmic advances discussed here (Transformer, MoE, and MQA) were all originally found at tiny scale (~1 hr on an H100 or ~1e19 FLOP[1] which is ~7 orders of magnitude smaller than current frontier training runs).[2] This paper looks at how improvements vary with scale, and finds the best improvements have returns which increase with scale. But, we care about predictability given careful analysis and scaling laws which aren't really examined. > We found that, historically, the largest algorithmic advances couldn't just be scaled up from smaller versions. They needed to have large amounts of compute to develop and validate This is false: the largest 3 advances they identify were all first developed at tiny scale. To be clear, the exact versions of these advances used in modern AIs are likely based on higher compute experiments. But, the returns from these more modern adaptations are unclear (and plausibly these adaptations could be found with small experiments using careful scaling analysis). ---------------------------------------- Separately, as far as I can tell, the experimental results in the paper shed no light on whether gains are compute-dependent (let alone predictable from small scale). Of the advances they experimentally test, only one (MQA) is identified as compute dependent. They find that MQA doesn't improve loss (at small scale). But, this isn't how MQA is supposed to help, it is supposed to improve inference efficiency which they don't test! So, these results only confirm that a bunch of innovations (RoPE, FA, LN) are in fact compute independent. Ok, so does MQA improve inference at small scale? The paper says: > At the time of its introduction in 2019, MQA was tested primarily on small models where memory constraints were not a major concern. As a result, its benefits were not immediately apparent. However, as model sizes grew, memory efficiency became increasingly important, making MQA a crucial optimization in modern LLMs Memory constraints not being a major concern at small scale doesn't mean it didn't help then (at the time, I think people didn't care as much about inference efficiency, especially decoder inference efficiency). Separately, the inference performance improvements at large scale are easily predictable with first principles analysis! The post misses all of this by saying: > MQA, then, by providing minimal benefit at small scale, but much larger benefit at larger scales —is a great example of the more-general class of a compute-dependent innovation. I think it's actually unclear if there was minimal benefit at small scale—maybe people just didn't care much about (decoder) inference efficiency at the time—and further, the inference efficiency gain at large scale is easily predictable as I noted! The post says: > compute-dependent improvements showed minimal benefit or actually hurt performance. But, as I've noted, they only empirically tested MQA and those results are unclear! The transformer is well known to be a huge improvement even at very small scale. (I'm not sure about MoE.) ---------------------------------------- FAQ: Q: Ok, but surely the fact that returns often vary with scale makes small scale experiments less useful? A: Yes, returns varying with scale would reduce predictability (all else equal), but by how much? If returns improve in a predictable way that would be totally fine. Careful science could (in principle) predict big gains at large scale despite minimal or negative gains at small scale. Q: Ok, sure, but if you actually look at modern algorithmic secrets, they are probably much less predictable from small to large scale. (Of course, we don't know that much with public knowledge.) A: Seems quite plausible! In this case, we're left with a quantitative question of how predictable things are, whether we can identify if something will be predictable, and if there are enough areas of progress which are predictable. ---------------------------------------- Everyone agrees compute is a key input, the question is just how far massively accelerated, much more capable, and vastly more prolific labor can push things. ---------------------------------------- This was also posted as a (poorly edited) tweet thread here. ---------------------------------------- 1. While 1e19 FLOP is around the scale of the final runs they included in each of these papers, these advances are pretty likely to have been initially found at (slightly) smaller scale. Like maybe 5-100x lower FLOP. The larger runs were presumably helpful for verifying the improvement, though I don't think they were clearly essential, probably you could have instead done a bunch of careful scaling analysis. ↩︎ 2. Also, it's worth noting that Transformer, MoE, and MQA are selected for being large single advances, making them unrepresentative. Large individual advances are probably typically easier to identify, making them more likely to be found earlier (and at smaller scale). We'd also expect large single improvements to be more likely to exhibit returns over a large range of different scales. But I didn't pick these examples, they were just the main examples used in the paper! ↩︎
Load More

Recent Discussion

Some competitions have a clear win condition: In a race, be the first to cross a finish line.

The US-China AI competition isn’t like this. It’s not enough to be the first to get a powerful AI system.

So, what is necessary for a good outcome from the US-China AI competition?

I thought about this all the time as a researcher on OpenAI’s AGI Readiness team: If the US races to develop powerful AI before China - and even succeeds at doing so safely - what happens next? The endgame is still pretty complicated, even if we’ve “won” the race by getting to AGI1first.

I suggest two reframes on the US-China AI race:

  • US-China AI competition is a containment game, not a race.
  • The competition only ends when China has verifiably yielded on
...
1Ustice
I don’t think China is willing to accept yielding. I can’t think of any reason that they would.  This is totally a shower thought, and I don’t trust it, but what about a strategy of semi-cooperation? China has been contributing to open source models. Those models have been keeping up with, and catching up to the capabilities of the closed source models.  I wonder if mutual containment could come through having similar capabilities as we both learn from each other’s research. Then neither side has a gross advantage. Maybe it doesn’t have to be zero-sum.  .
5Mitchell_Porter
My take on this: AGI has existed since 2022 (ChatGPT). There are now multiple companies in America and in China which have AGI-level agents. (It would be good to have a list of countries which are next in line to develop indigenous AGI capability.) Given that we are already in a state of coexistence between human and AGI, the next major transition is ASI, and that means, if not necessarily the end of humanity, the end of human control over human affairs. This most likely means that the dominant intelligence in the world is entirely nonhuman AI; it could also refer to some human-AI symbiosis, but again, it won't be natural humanity in charge.  A world in which the development of AGI is allowed, let alone a world in which there is a race to create ever more powerful AI, is a world which by default is headed towards ASI and the dethronement of natural humanity. That is our world already, even if our tech and government leadership manage to not see it that way.  So the most consequential thing that humans can care about right now is the transition to ASI. You can try to stop it from ever happening. or you can try to shape it in advance. Once it happens, it's over, humans per se have no further say, if they still exist they are at the mercy of the transhuman or posthuman agents now in charge.  Now let me try to analyze your essay from this point of view. Your essay is meant as a critique of the paradigm according to which there is a race between America and China to create powerful AI, as if all that either side needs to care about is getting there first. Your message is that even if America gets to powerful AI (or safe powerful AI) first, the possibility of China (and in fact anyone else) developing the same capability would still remain. I see two main suggestions: a "mutual containment" treaty, in which both countries place bounds on their development of AI along with means of verifying that the bounds are being obeyed; and spreading around defensive measures, whic

ASI governed by a value system which, if placed in complete control of the Earth, would still be something we could live with, 

That's exactly my point.  However, once the value system is defined, it will either lock mankind in or be corrigible. The former case contains options like my take where the AI only provides everyone with access to education and enforces only universally agreed political opinions or[1] the situations where the AI builds the Deep Utopia or governs the world, criminalising social parasitism in the whole world. The race... (read more)

What a week, huh? America signed a truly gigantic chip sales agreement with UAE and KSA that could be anything from reasonable to civilizational suicide depending on security arrangements and implementation details, Google announced all the things, OpenAI dropped Codex and also bought Jony Ive’s device company for $6.5 billion, Vance talked about reading AI 2027 (surprise, in a good way!) and all that other stuff.

Lemon, it’s Thursday, you’ve got movie tickets for Mission Impossible: Final Reckoning (19th and Broadway AMC, 3pm), an evening concert tonight from Light Sweet Crude and there’s a livestream from Anthropic coming up at 12:30pm eastern, the non-AI links are piling up and LessOnline is coming in a few weeks. Can’t go backwards and there’s no time to spin anything else out...

Kajus10

If you go to Amazon most of the books in that section look similar 

14TsviBT
My version: Probably too understated, but it's the sort of thing I like. GoogleDraw link if anyone wants to copy and modify: https://docs.google.com/drawings/d/10nB-1GC_LWAZRhvFBJnAAzhTNJueDCtwHXprVUZChB0/edit

BPC-157, a peptide frequently marketed as a breakthrough for healing and tissue repair, has attracted substantial attention in wellness and performance communities. It’s discussed in forums, recommended by biohackers, and even offered in clinics—despite lacking FDA approval or broader clinical recognition.

The challenge is personal: we all want healing when we’re hurting—but how do we evaluate bold health claims like those surrounding BPC-157 when strong evidence is absent, the origin story is murky, and anecdotes sound convincing? And more pointedly—what does the evidence say about BPC-157’s effectiveness? It’s not just about one compound—it’s a test of how we decide which health treatments to trust when the choice is ours to make. Whether it’s a supplement, a therapy, or advice from a friend, what guides our choices is our...

For some reason, all current benchmarks, with the sole exception of OSWorld[1], now seem to differ by a factor of less than 3. Does this imply that the progress in every benchmark is likely to slow down?

  1. ^

    OSWorld resembles a physical task, which LLMs tend to fail. However, the article about LLMs failing basic physical tasks was written in April 14, before the pre-release of Gemini Diffusion. Mankind has yet to determine how well diffusion-based LLMs deal with physical tasks. 

The U.S. 30-year Treasury rate has reached 5.13%, a level last seen in October 2023. The last time this rate was at this level was in 2007, when the U.S. federal debt was about $9 trillion. Today, that debt is nearing $37 trillion.

I believe bond market participants are signaling a lack of confidence that the fiscal situation in the United States will improve during President Trump’s second administration. Like many financial professionals, I had high hopes that President Trump’s election would bring the fiscal situation in order. Unfortunately, the "Depa... (read more)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
6Jozdien
One reason why they might be worse is that chain of thought might make less sense for diffusion models than autoregressive models. If you look at an example of when different tokens are predicted in sampling (from the linked LLaDA paper), the answer tokens are predicted about halfway through instead of at the end: This doesn't mean intermediate tokens can't help though, and very likely do. But this kind of structure might lend itself more toward getting to less legible reasoning faster than autoregressive models do.

It does seem likely that this is less legible by default, although we'd need to look at complete examples of how the sequence changes across time to get a clear sense. Unfortunately I can't see any in the paper.

Yes, this topic has been discussed multiple times.  But at this point, long enough ago that my choices for joining the conversation are [Necro] or [New post], so here we are.  (You can jump to the 2nd to last paragraph if you want to just see my conclusion.)

Background: Why am I asking this question?

After reading some comments I realize I should give some background on what I mean and why I'm asking this question.  

Winning here refers to gaining utility, aka achieving your goals, whatever those goals may be, and is in reference to this post.

The reason for asking the question at all is that I personally expected a large improvement across domains from learning rationality.  Plenty of articles here hint at how strong of a power

...
2lsusr
"Why Aren't Rationalists Winning" is quite a broad question. I prefer to ask myself "Why Aren't Rationalists Winning at Chess". I believe it has to do with insufficient education in openings and endgames.
k6410

Good point that its broad, maybe the reasons are domain-specific.  You might be right about chess in specific, but I lean against concluding that insufficient eduction is the reason rationalists aren't winning in most domains.  Rationalists on average, are significantly more educated that the general population, and I'd imagine that gap grows when you take into account self-directed education.

1danielechlin
OK the thesis makes sense. Like, you should be able to compare "people generally following rationalist improvements methods" and "people doing some other thing" and find an effect.  It might have a really small effect size across rationalism as a whole. And rationalism might have just converged to other self-improvement systems. (Honestly, if your self-improvement system is just "results that have shown up in 3 unrelated belief systems" you would do okay. It might also be hard to improve, or accelerate, winningness in all of life by type 2 thinking. Then what are we doing when we're type 2 thinking and believe we're improving, idk. Good questions, I guess.
1k64
Thank you!  I'm not sure if the first paragraph questions are intended for you to answer, for me to answer, or as purely rhetorical.  The only one I feel like I have an answer for is what my criteria for winning are: Broadly, I mean achieving my goals.  Specifically for relationships, my goals are: be comfortable and good at meeting new people and forming relationships, have a close group of friends that I spend a significant amount of time with and share a significant portion of my thoughts and feelings with, feel generally connected to other people, have a romantic relationship that is good enough that I don't wonder if I could do better with someone else or that question doesn't seem important, and have positive relationships with my family.  The goals are all necessarily subjective, so its always a possibility that I just have too high expectations, but that would also seem to me to be a form of not winning.  Of the goals I listed, I have succeeded at "good at meeting new people" (but not comfortable, which matters since I don't do it often as a result), and "have positive relationships with family". I agree with you about improving system 1.  I get better at pickleball by hitting a pickleball a lot with a goal of where the ball should go and noticing immediately if it did or didn't go there.  I believe there are 3 reasons I haven't found that for relationships: 1. Moral restrictions - People don't like to be experimented on and the bounds of monogamy prevent me from practicing with other people.  It feels wrong to "play games" with my significant other and do/say things just to see how they will react.  Perhaps there's a way around this with some meta-permission and boundaries. 2. Complex goals - in pickleball I know exactly what the goal and can instantly tell whether it was or was not achieved with a lot of objectivity.  In relationships, there are multiple competing goals, many poorly defined, and almost none are objective. 3. Slow feedback.  Certain behavi

Have the Accelerationists won?

Last November Kevin Roose announced that those in favor of going fast on AI had now won against those favoring caution, with the reinstatement of Sam Altman at OpenAI. Let’s ignore whether Kevin’s was a good description of the world, and deal with a more basic question: if it were so—i.e. if Team Acceleration would control the acceleration from here on out—what kind of win was it they won?

It seems to me that they would have probably won in the same sense that your dog has won if she escapes onto the road. She won the power contest with you and is probably feeling good at this moment, but if she does actually like being alive, and just has different ideas about how safe...

2Logan Riggs
Correct! I did mean to communicate that in the first footnote. I agree value-ing the unborn would drastically lower the amount of acceptable risk reduction.
2Yair Halberstadt
That would imply that if you could flip a switch which 90% chance kills everyone, 10% chance grants immortality then (assuming there weren't any alternative paths to immortality) you would take it. Is that correct?

Gut reaction is “nope!”.

Could you spell out the implication?

2ryan_greenblatt
I'm using the word "theoretical" more narrowly than you and not including conceptual/AI-futurism research. I agree the word "theoretical" is underdefined and there is a reasonable category that includes Redwood and AI 2027 which you could call theoretical research, I'd just typically use a different term for this and I don't think Matthew was including this. I was trying to discuss what I thought Matthew was pointing at, I could be wrong about this of course. (Similarly, I'd guess that Matthew wouldn't have counted Epoch's work on takeoff speeds and what takeoff looks like as an example of "LW-style theoretical research", but I think this work is very structurally/methodologically similar to stuff like AI 2027.") If Matthew said "LW-style conceptual/non-empirical research" I would have interpreted this pretty differently.