It might be some elements of human intelligence (at least at the civilizational level) are culturally/memetically transmitted. All fine and good in theory. Except the social hypercompetition between people and intense selection pressure of ideas online might be eroding our world's intelligence. Eliezer wonders if he's only who he is because he grew up reading old science fiction from before the current era's memes.

10Raemon
This a first pass review that's just sort of organizing my thinking about this post. This post makes a few different types of claims: * Hyperselected memes may be worse (generally) than weakly selected ones * Hyperselected memes may specifically be damaging our intelligence/social memetic software * People today are worse at negotiating complex conflicts from different filter bubbles * There's a particular set of memes (well represented in 1950s sci-fi) that was particularly important, and which are not as common nowadays. It has a question which is listed although not focused on too explicitly on its own terms: * What do you do if you want to have good ideas? (i.e. "drop out of college? read 1950s sci-fi in your formative years?") It prompts me to separately consider the questions: * What actually is the internet doing to us? It's surely doing something. * What sorts of cultures are valuable? What sorts of cultures can be stably maintained? What sorts of cultures cause good intellectual development? ... Re: the specific claim of "hypercompetition is destroying things", I think the situation is complicated by the "precambrian explosion" of stuff going on right now. Pop music is defeating classical music in relative terms, but, like, in absolute terms there's still a lot more classical music now than in 1400 [citation needed?]. I'd guess this is also true of for tribal FB comments vs letter-to-the-editor-type writings.  * [claim by me] Absolute amounts of thoughtful discourse is probably still increasing My guess is that "listens carefully to arguments" has just always been rare, and that people have generally been dismissive of the outgroup, and now that's just more prominent. I'd also guess that there's more 1950s style sci-fi today than in 1950. But it might not be, say, driving national projects that required a critical mass of it. (And it might or might not be appearing on bestseller lists?) If so, the question is less "are things being destro
Customize
Cross-domain time horizon:  We know AI time horizons on software tasks are currently ~1.5hr and doubling every 4-7 months, but what about other domains? Here's a preliminary result comparing METR's task suite (orange line) to benchmarks in other domains, all of which have some kind of grounding in human data: Observations * Time horizon in different domains varies by >3 orders of magnitude, with the hardest tasks for AIs being agentic computer use (OSWorld) and the easiest being video understanding (video_mme). In the middle are Tesla self-driving (tesla_fsd), scientific knowledge (gpqa), software (hcast_r_s), and math contests (aime). * My guess is this means models are good at taking in information from a long context but bad at acting coherently. Most work requires agency like OSWorld, which may be why AIs can't do the average real-world 1-hour task yet. * Rate of improvement also varies significantly; math contests have improved ~50x in the last year but Tesla self-driving only 6x in 3 years. * HCAST is middle of the pack in both. Note this is preliminary and uses a new methodology so there might be data issues. I'm currently writing up a full post! Is this graph believable? What do you want to see analyzed?
Elizabeth120
1
Last week I got nerdsniped with the question of why established evangelical leaders had a habit of taking charismatic  narcissists and giving them support to found their own churches[1]. I expected this to be a whole saga that would teach lessons on how selecting for one set of good things secretly traded off against others. Then I found this checklist on churchplanting.com. It’s basically “tell me you’re a charismatic narcissist who will prioritize growth above virtue without telling me you’re a…“. And not charismatic in the sense of asking reasonable object-level questions that are assessed by a 3rd party and thus vulnerable to halo effects[2].  The first and presumably most important item on the checklist is "Visioning capacity", which includes both the ability to dream that you are very important and to convince others to follow that dream. Comittment to growth has it's own section (7), but it's also embedded in, but there's also section 4 (skill at attracting converts).  Section 12 is Resilience, but the only specific setback mentioned is ups and downs in attendance. The very item on the list is "Can you create a grand Faith" is the last item on the 13 point list. "displaying Godly love and compassion to people" is a subheading under "6. Effectively builds relationships". There are other checklists that at least act about character, so this isn’t all church planting. But it looks like the answer to "why do some evangelicals support charismatic narcissists that prioritize growth above all else..." is “because that's what they want, presumably for the same reason lots of people value charm and growth." 1. ^ This is church planting, where the churches may advise, or fund but not have any authority over like they might in mainline denominations. 2. ^ nor in the Christian sense of Charismatic
If you don't believe in your work, consider looking for other options I spent 15 months working for ARC Theory. I recently wrote up why I don't believe in their research. If one reads my posts, I think it should become very clear to the reader that either ARC's research direction is fundamentally unsound, or I'm still misunderstanding some of the very basics after more than a year of trying to grasp it. In either case, I think it's pretty clear that it was not productive for me to work there. Throughout writing my posts, I felt an intense shame imagining readers asking the very fair question: "If you think the agenda is so doomed, why did you keep working on it?"[1] In my first post, I write: "Unfortunately, by the time I left ARC, I became very skeptical of the viability of their agenda."This is not quite true. I was very skeptical from the beginning, for largely similar reasons I expressed in my posts. But first I told myself that I should stay a little longer. Either they manage to convince me that the agenda is sound, or I demonstrate that it doesn't work, in which case I free up the labor of the group of smart people working on the agenda. I think this was initially a somewhat reasonable position, though it was already in large part motivated reasoning. But half a year after joining, I don't think this theory of change was very tenable anymore. It was becoming clear that our arguments were going in circles. I couldn't convince Paul and Mark (the two people thinking the most about the big picture questions), nor could they convince me. Eight months in, two friends visited me in California, and they noticed that I always derailed the conversation when they asked me about my research. I think that should have been an important thing to notice that I was ashamed to talk about my research to my friends, because I was afraid they would see how crazy it was. I should have quit then, but I stayed for another seven months. I think this was largely due to cowardice.
Who predicted that AI will have a multi-year "everything works" period where the prerequisite pieces come together and suddenly every technique works on every problem? Like before electricity you had to use the right drill bit or saw blade for a given material, but now you can cut anything with anything if you are only slightly patient.
ryan_greenblatt*Ω123511
10
Sometimes people talk about how AIs will be very superhuman at a bunch of (narrow) domains. A key question related to this is how much this generalizes. Here are two different possible extremes for how this could go: 1. It's effectively like an attached narrow weak AI: The AI is superhuman at things like writing ultra fast CUDA kernels, but from the AI's perspective, this is sort of like it has a weak AI tool attached to it (in a well integrated way) which is superhuman at this skill. The part which is writing these CUDA kernels (or otherwise doing the task) is effectively weak and can't draw in a deep way on the AI's overall skills or knowledge to generalize (likely it can shallowly draw on these in a way which is similar to the overall AI providing input to the weak tool AI). Further, you could actually break out these capabilities into a separate weak model that humans can use. Humans would use this somewhat less fluently as they can't use it as quickly and smoothly due to being unable to instantaneously translate their thoughts and not being absurdly practiced at using the tool (like AIs would be), but the difference is ultimately mostly convenience and practice. 2. Integrated superhumanness: The AI is superhuman at things like writing ultra fast CUDA kernels via a mix of applying relatively general (and actually smart) abilities, having internalized a bunch of clever cognitive strategies which are applicable to CUDA kernels and sometimes to other domains, as well as domain specific knowledge and heuristics. (Similar to how humans learn.) The AI can access and flexibly apply all of the things it learned from being superhuman at CUDA kernels (or whatever skill) and with a tiny amount of training/practice it can basically transfer all these things to some other domain even if the domain is very different. The AI is at least as good at understanding and flexibly applying what it has learned as humans would be if they learned the (superhuman) skill to the same ex

Popular Comments

The key question is whether you can find improvements which work at large scale using mostly small experiments, not whether the improvements work just as well at small scale. The 3 largest algorithmic advances discussed here (Transformer, MoE, and MQA) were all originally found at tiny scale (~1 hr on an H100 or ~1e19 FLOP[1] which is ~7 orders of magnitude smaller than current frontier training runs).[2] This paper looks at how improvements vary with scale, and finds the best improvements have returns which increase with scale. But, we care about predictability given careful analysis and scaling laws which aren't really examined. > We found that, historically, the largest algorithmic advances couldn't just be scaled up from smaller versions. They needed to have large amounts of compute to develop and validate This is false: the largest 3 advances they identify were all first developed at tiny scale. To be clear, the exact versions of these advances used in modern AIs are likely based on higher compute experiments. But, the returns from these more modern adaptations are unclear (and plausibly these adaptations could be found with small experiments using careful scaling analysis). ---------------------------------------- Separately, as far as I can tell, the experimental results in the paper shed no light on whether gains are compute-dependent (let alone predictable from small scale). Of the advances they experimentally test, only one (MQA) is identified as compute dependent. They find that MQA doesn't improve loss (at small scale). But, this isn't how MQA is supposed to help, it is supposed to improve inference efficiency which they don't test! So, these results only confirm that a bunch of innovations (RoPE, FA, LN) are in fact compute independent. Ok, so does MQA improve inference at small scale? The paper says: > At the time of its introduction in 2019, MQA was tested primarily on small models where memory constraints were not a major concern. As a result, its benefits were not immediately apparent. However, as model sizes grew, memory efficiency became increasingly important, making MQA a crucial optimization in modern LLMs Memory constraints not being a major concern at small scale doesn't mean it didn't help then (at the time, I think people didn't care as much about inference efficiency, especially decoder inference efficiency). Separately, the inference performance improvements at large scale are easily predictable with first principles analysis! The post misses all of this by saying: > MQA, then, by providing minimal benefit at small scale, but much larger benefit at larger scales —is a great example of the more-general class of a compute-dependent innovation. I think it's actually unclear if there was minimal benefit at small scale—maybe people just didn't care much about (decoder) inference efficiency at the time—and further, the inference efficiency gain at large scale is easily predictable as I noted! The post says: > compute-dependent improvements showed minimal benefit or actually hurt performance. But, as I've noted, they only empirically tested MQA and those results are unclear! The transformer is well known to be a huge improvement even at very small scale. (I'm not sure about MoE.) ---------------------------------------- FAQ: Q: Ok, but surely the fact that returns often vary with scale makes small scale experiments less useful? A: Yes, returns varying with scale would reduce predictability (all else equal), but by how much? If returns improve in a predictable way that would be totally fine. Careful science could (in principle) predict big gains at large scale despite minimal or negative gains at small scale. Q: Ok, sure, but if you actually look at modern algorithmic secrets, they are probably much less predictable from small to large scale. (Of course, we don't know that much with public knowledge.) A: Seems quite plausible! In this case, we're left with a quantitative question of how predictable things are, whether we can identify if something will be predictable, and if there are enough areas of progress which are predictable. ---------------------------------------- Everyone agrees compute is a key input, the question is just how far massively accelerated, much more capable, and vastly more prolific labor can push things. ---------------------------------------- This was also posted as a (poorly edited) tweet thread here. ---------------------------------------- 1. While 1e19 FLOP is around the scale of the final runs they included in each of these papers, these advances are pretty likely to have been initially found at (slightly) smaller scale. Like maybe 5-100x lower FLOP. The larger runs were presumably helpful for verifying the improvement, though I don't think they were clearly essential, probably you could have instead done a bunch of careful scaling analysis. ↩︎ 2. Also, it's worth noting that Transformer, MoE, and MQA are selected for being large single advances, making them unrepresentative. Large individual advances are probably typically easier to identify, making them more likely to be found earlier (and at smaller scale). We'd also expect large single improvements to be more likely to exhibit returns over a large range of different scales. But I didn't pick these examples, they were just the main examples used in the paper! ↩︎
As I've pointed out before, people saying "insects suffer X% as much as humans" or even "there's a Y% chance that insects are able to suffer" tells you more about the kind of numbers that people pick when picking small numbers, than it tells you about insect suffering. Most people are not good enough at picking appropriately small numbers and just pick something smaller than the numbers they usually see every day. Which isn't small enough. If they actually picked appropriately sized numbers instead of saying "if there's even a 1% chance", you could do the calculations in this article and figure out that insect suffering should be ignored.
I see modeling vs. implementation as a spectrum more than a dichotomy. Something like: 1. On the "implementation" extreme you prove theorems about the exact algorithm you implement in your AI, s.t. you can even use formal verification to prove these theorems about the actual code you wrote. 2. Marginally closer to "modeling", you prove (or at least conjecture) theorems about some algorithm which is vaguely feasible in theory. Some civilization might have used that exact algorithm to build AI, but in our world it's impractical, e.g. because it's uncompetitive with other AI designs. However, your actual code is conceptually very close to the idealized algorithm, and you have good arguments why the differences don't invalidate the safety properties of the idealized model. 3. Further along the spectrum, your actual algorithm is about as similar to the idealized algorithm as DQN is similar to vanilla Q-learning. Which is to say, it was "inspired" by the idealized algorithm but there's a lot of heavy lifting done by heuristics. Nevertheless, there is some reason to hope the heuristic aspects don't change the safety properties. 4. On the "modeling" extreme, your idealized model is something like AIXI: completely infeasible and bears little direct resemblance to the actual algorithm in your AI. However, there is still some reason to believe real AIs will have similar properties to the idealized model. More precisely, rather than a 1-dimensional spectrum, there are at least two parameters involved:  * How close is the object you make formal statements about to the actual code of your AI, where "closeness" is measured by the strength of the arguments you have for the analogy, on a scale from "they are literally the same" to solid theoretical and/or empirical evidence to pure hand-waving/intuition * How much evidence you have for the formal statements, on a scale from "I proved it within some widely accepted mathematical foundation (e.g. PA)" to "I proved vaguely related things, tried very hard but failed to disprove the thing and/or accumulated some empirical evidence". [EDIT: And a 3rd parameter is, how justified/testable the assumptions of your model is. Ideally, you want these assumptions to be grounded in science. Some will likely be philosophical assumptions which cannot be tested empirically, but at least they should fit into a coherent holistic philosophical view. At the very least, you want to make sure you're not assuming away the core parts of the problem.] For the purposes of safety, you want to be as close to the implementation end of the spectrum as you can get. However, the model side of the spectrum is still useful as:  * A backup plan which is better than nothing, more so if there is some combination of theoretical and empirical justification for the analogizing * A way to demonstrate threat models, as the OP suggests * An intermediate product that helps checking that your theory is heading in the right direction, comparing different research agendas, and maybe even making empirical tests.
Load More

Recent Discussion

Computers get smarter. People don't. Some bots will be greedy and some will not. The greedy ones will take everything.

(h/t Otis Reid)

I think this post captures a lot of important features of the US policymaking system. Pulling out a few especially relevant/broadly applicable sections:

1. There's No Efficient Market For Policy

There can be a huge problem that nobody is working on; that is not evidence that it's not a huge problem. Conversely, there can be a marginal problem swamped with policy work; that's not evidence it's really all that big of a deal.

On the upside, this means there are never-ending arbitrage opportunities in policy. Pick your workstreams wisely.

2. Personnel Really Is The Most Important Thing

The quality of staffers varies dramatically and can make or break policy efforts. Some Hill staffers are just awesome; if they like your idea, they'll take it and run with it, try to

...
2BryceStansfield
Is there a name for the general principle that doing boring things is more effective than doing interesting ones? It seems generally true in a lot of situations.
khafra20

I think it's especially true for the type of human that likes Lesswrong. Using Scott's distinction between Metis and Techne, we are drawn to Techne. When a techne-leaning person does a deep dive into metis, that can  generate a lot of value. 

More speculatively, I feel like often--as in the case of lobbying for good government policy--there isn't a straightforward way to capture any of the created value; so it is under-incentivized.

3cousin_it
"Where there's muck, there's brass" comes to mind.
1kaime
Most intriguing is this hints at a testable hypothesis: policy markets should show greater inefficiency in domains where feedback loops are longest and most distorted. Would bet FOIAs on national security consistently outperform economic policy FOIAs in information yield despite lower attention

I want to show a philosophical principle which, I believe, has implications for many alignment subproblems. If the principle is valid, it might allow to

This post clarifies and expands on ideas from here and here. Reading the previous posts is not required.

The Principle

The principle and its most important consequences:

  1. By default, humans only care about variables they could (in principle) easily optimize or comprehend.[1] While the true laws of physics can be arbitrarily complicated, the behavior of variables humans care about can't be arbitrarily complicated.
  2. Easiness of optimization/comprehension can be captured by a few relatively
...
5TristanTrim
Thanks for responding : ) A is amusing, definitely not what I was thinking. B seems like it is probably what I was thinking, but I'm not sure, and don't really understand how having a different metric of simplicity changes things. I think this is the part that prompted my question. I may be pretty far off of understanding what you are trying to say, but my thinking is basically that I am not content with the capabilities of my current mind, so I would like to improve it, but in doing so I would be capable of having more articulate preferences, and my current preference would define a function from the set of possible preferences to an approval rating such that I would be trying to improve my mind in such a way that my new more articulate preferences are the ones I most approve of or find sufficiently acceptable. If this process is iterated, it defines some path or cone from my current preferences through the space of possible preferences moving from less to more articulate. It might be that other people would not seek such a thing, though I suspect many would, but with less conscientiousness about what they are doing. It is also possible there are convergent states where my preferences and capabilities would determine a desire to remain as I am. ( I am mildly hopeful that that is the case. ) It is my understanding that the mandelbrot set is not smooth at any scale (not sure if anyone has proven this), but that is the feature I was trying to point out. If people iterativly modified themselves, would their preferences become ever more exacting? If so, then it is true that the "variables humans care about can't be arbitrarily complicated", but the variables humans care about could define a desire to become a system capable of caring about arbitrarily complicated variables.
Q Home10

I think I understand you now. Your question seems much simpler than I expected. You're basically just asking "but what if we'll want infinitely complicated / detailed values in the future?"

If people iterativly modified themselves, would their preferences become ever more exacting? If so, then it is true that the "variables humans care about can't be arbitrarily complicated", but the variables humans care about could define a desire to become a system capable of caring about arbitrarily complicated variables.

It's OK if the principle won't be true for hu... (read more)

Eliezer and I wrote a book. It’s titled If Anyone Builds It, Everyone Dies. Unlike a lot of other writing either of us have done, it’s being professionally published. It’s hitting shelves on September 16th.

It’s a concise (~60k word) book aimed at a broad audience. It’s been well-received by people who received advance copies, with some endorsements including:

The most important book I’ve read for years: I want to bring it to every political and corporate leader in the world and stand over them until they’ve read it. Yudkowsky and Soares, who have studied AI and its possible trajectories for decades, sound a loud trumpet call to humanity to awaken us as we sleepwalk into disaster. Their brilliant gift for analogy, metaphor and parable clarifies for the general

...

I think they are delaying so people can early pre order which affects how many books the publisher prints and distributes which affects how many people ultimately read it and how much it breaks into the Overton window. Getting this conversation mainstream is an important instrumental goal.

If you are looking for info in the mean time you could look at PauseAI:

https://pauseai.info/

Or if you want less facts and quotes and more discussion, I recall that Yudkowsky’s Coming of Age is what changed my view from "orthogonality kinda makes sense" to "orthogonality i... (read more)

3sanxiyn
Preordered ebook version on Amazon. I am also interested in doing Korean translation.
7AnthonyC
Not sure what he's done on AI since, but Tim Urban's 2015 AI blog post series mentions how he was new to AI or AI risk and spent a little under a month studying and writing those posts. I re-read them a few months ago and immediately recommended them to some other people with no prior AI knowledge, because they have held up remarkably well.
2Mikhail Samin
Insider trading by anyone who can help on the Yes side is welcome :)

Preface: I am not suicidal or anywhere near at risk, this is not about me. Further, this is not infohazardous content. There will be discussions of death, suicide, and other sensitive topics so please use discretion, but I’m not saying anything dangerous and reading this will hopefully inoculate you against an existing but unseen mental hazard.

There is a hole at the bottom of functional decision theory, a dangerous edge case which can and has led multiple highly intelligent and agentic rationalists to self-destructively spiral and kill themselves or get themselves killed. This hole can be seen as a symmetrical edge case to Newcomb’s Problem in CDT, and to Solomon’s Problem in EDT: a point where an agent naively executing on a pure version of the decision theory will consistently underperform in a...

Lorec10

Predictably avoiding death at all costs, even the cost of your mortal soul, eternal fealty, etc., is unfortunately a bigger security flaw than a willingness to follow through on implied local kamikaze threats.

If you follow decision-theoretic loss-aversion to its natural conclusion, both of us should be closeted and making a good YouTube grift as Republicans. We're making less money this way.

(Work done at Convergence Analysis. Mateusz wrote the post and is responsible for most of the ideas with Justin helping to think it through. Thanks to Olga Babeeva for the feedback on this post.)

1. Motivation

Suppose the perspective of pausing or significantly slowing down AI progress or solving the technical problems necessary to ensure that arbitrarily strong AI has good effects on humanity (in time, before we get such systems) both look gloomy.[1] What options do we have left?

Adam Shimi presents a useful frame on the alignment problem in Abstracting The Hardness of Alignment: Unbounded Atomic Optimization:

alignment [is] the problem of dealing with impact on the world (optimization) that is both of unknown magnitude (unbounded) and non-interruptible (atomic).

If the problem is about some system (or a collection of systems) having an unbounded, non-interruptible impact,[2] can we handle it by ensuring that...

About getting coherent corrigibility, my and Joar's post on Updating Utility Functions, makes some progress on a soft form of corrigibility. 

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Our government, having withdrawn the new diffusion rules, has now announced an agreement to sell massive numbers of highly advanced AI chips to UAE and Saudi Arabia (KSA). This post analyzes that deal and that decision.

It is possible, given sufficiently strong agreement details (which are not yet public and may not be finalized) and private unvoiced considerations, that this deal contains sufficient safeguards and justifications that, absent ability to fix other American policy failures, this decision is superior to the available alternatives. Perhaps these are good deals, with sufficiently strong security arrangements that will actually stick.

Perhaps UAE and KSA are more important markets and general partners than we realize, and the rest of the world really is unable to deploy capital and electrical power the way they...

It always seemed outlandish that in The Animatrix, the first AI city (01) was located in the Middle East... 

If we had limitless time, it would be interesting to know how this happened. I guess the prehistory of it involved Saudi Vision 2030 (e.g. the desert city Noem), and the general hypermodernization of Dubai. You can see precursors in the robot Sophia getting Saudi citizenship in 2017, and the UAE's "Falcon" LLM in 2023. 

But the initiative must have come from the American side - some intersection of the geopolitical brain trust around Trump, ... (read more)

1O O
https://x.com/rwang07/status/1924658336600854632 Other countries adopting Chinese hardware may mean this was basically the US being forced to sell their GPUs to prevent the Chinese from taking advantage of economies of scale.
2Stephen Martin
I have seen commentary floating around that the Trump administration is, to some extent, looking to build the Arab world into stronger allies for the US. The implication being that there was a possibility that they might fall into China's sphere of influence. I could see a 'two birds one stone' motive where in doing this the US not only gets to 'friendshore' some of its capacity without having to deal with NIMBYs, but also strengthens the alliance and keeps the Arab power players firmly in the US camp.

Epistemic status: Argument from behavioural analogy. I’m not claiming that AI systems feel care, only that human users interpret their behaviour as if it carries social meaning. When AI systems simulate care but fail to change behaviour in response to feedback, trust erodes. This essay reframes interface-level alignment not as a momentary performance or affective mimicry, but as longitudinal behavioural coherence with the capacity to track, adapt to, and reflect what users value over time.

--

A few weeks ago, I corrected an LLM for misreading numerical data in a table I’d provided. Again. Same mistake. Same correction. Same nonchalant apology.

“You're absolutely right! I should’ve been more careful. Here’s the corrected version blah blah blah.”

It wasn't the error that irked me, it was the polite but emotionally hollow apology....

In a recent post, Zvi described what he calls "The Most Forbidden Technique":

An AI produces a final output [X] via some method [M]. You can analyze [M] using technique [T], to learn what the AI is up to. You could train on that. Never do that.

You train on [X]. Only [X]. Never [M], never [T].

Why? Because [T] is how you figure out when the model is misbehaving.

If you train on [T], you are training the AI to obfuscate its thinking, and defeat [T]. You will rapidly lose your ability to know what is going on, in exactly the ways you most need to know what is going on.

The article specifically discusses this in relation to reasoning models and Chain of Thought (CoT): if we train a model...

Do we know that the examples of Gemini thinking in kaomoji and Claude speaking in spanish, etc, are real?

I say that because ChatGPT doesn't actually display its chain of thought to the user, so it's possible neither does Gemini or Claude. ChatGPT has the chain of thought obfuscated into something more approachable to the user, as I understand it.