LESSWRONG
LW

564
TsviBT
8924Ω57860107793
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
New Report: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence
TsviBT2h20

Nice. I've only read the quoted paragraphs from the main body:

Despite prohibitions on large-scale training, AIs could continue getting more capable via improvements to the algorithms and software used to train them. Therefore, the coalition adopts appropriate restrictions on research that contributes to frontier AI development or research that could endanger the verification methods in the agreement. These restrictions would cover certain machine learning-related research and may eventually expand to include other AI paradigms if those paradigms seem likely to lead to ASI. Coalition members draw on their experience restricting research in other dangerous fields such as nuclear and chemical weapons. These research restrictions aim to be as narrow as possible, preventing the creation of more capable general AI models while still allowing safe and beneficial narrow AI models to be created. The coalition encourages and makes explicit carve-outs for safe, application-specific AI activities, such as self-driving cars and other uses that provide benefits to society.

Verification that members are adhering to research restrictions is aided by the fact that relatively few people have the skills to contribute to this research—likely only thousands or tens of thousands, see Appendix A, Article IX. Nations verify compliance by way of intelligence gathering, interviews with the researchers, whistleblower programs, and more. This verification aims to use non-invasive measures to ensure that researchers are not working on restricted topics. Additionally, inspectors verify that the medium-scale training allowed by the agreement uses only approved methods (e.g., from a Whitelist) and does not make use of any novel AI methods or algorithms; these would be evidence that restricted research had taken place.

And then skimmed the appendix "ARTICLE VIII — Restricted Research: AI Algorithms and Hardware". I'm very happy to see any kind of proposal that includes a serious attempt to ban AGI research. I want to also mention "The Problem with Defining an "AGI Ban" by Outcome (a lawyer's take).".

My low-effort attempted very short summary of this aspect of the treaty is:

Part of the treaty sets up a committee that decides what counts as AGI research. That stuff is banned.

Is that basically right? (And there's more discussion of how you'd enforce that, and a few examples of non-AGI research are given, and some comparisons with nuclear bombs and Asilomar are given.)

I don't have a better idea, so this seems as promising as I can think of. I'm curious though for ideas about

  • Some sort of category that could actually work here.
    • You need a category that's politically viable and also captures most or all of AGI research.
    • It's ok but sad and also maybe politically more difficult to exclude a bunch of stuff that isn't AGI research.
    • For example (some bad ideas to spur better ideas from others):
      • You could say "you have to declare a specific task you're trying to solve, like protein interactions or self-driving cars; the task can't be super broad / heterogeneous, like predicting text".
      • You could say "no running searches over very algorithmically rich spaces, or strongly enriching or speeding up existing searches".
      • You could say "your system has to demonstrate by lesion experiments that it requires domain-specific knowledge".
      • You could say "if other researchers can copy-paste your system into another domain, and it works well, it's banned".
      • You could say "if our red team is able to use your system to do X and Y and Z, then it's banned".
      • You could say "you have to state reasons why your research is interesting/promising, and those reasons have to sound like the promisingness comes from something domain-specific".
    • It might be possible to targetedly make some boundaries more coordinatable-on. Cf. https://tsvibt.blogspot.com/2025/11/constructing-and-coordinating-around.html
  • And/or, some way of having really good governance.
    • It seems like if the committee goes awry, it's very easy to pretend you've drawn a good boundary, but you haven't.
    • It seems likely for the committee to go awry, given how much pressure there could be for less restriction.
    • Curious for any thoughts on how to do that, e.g. examples of governance with significant responsibility to develop unclear policies that went well despite strong pressures.
Reply
Buck's Shortform
TsviBT6h20

IANAE but what I use is log_10 by increments of .1. The starting point is that 10^.1 is roughly 5/4. (It's actually ~1.259 rather than 1.25.) You get, basically,

1 1.25 1.57 2 2.5 3.1 4 5 6.3 8 10

and you can pretty closely regenerate this with 10^.1 = 5/4, accelerated by remembering 10^.3 is 2 (which pins down 4 and 8).

Reply
Wei Dai's Shortform
TsviBT17h20

Right. I realized later that I framed this as something the commenter decides; it would also be possible to have this sort of thing replace authors deleting comments or ban users. The author could press the "boot" button, and then this boots the comment out of the comment section. But it doesn't delete it, it just moves all discussion to wherever the comment was booted to (e.g. open thread or quick take or something). Maybe it also hides most of the comment, and shows a single response from the author. (Not especially advocating for this.)

Reply
Wei Dai's Shortform
TsviBT1d10

[ belabor -> bemoan? ]

Reply1
Wei Dai's Shortform
TsviBT1d1412

The free space for responses and rebuttals isn't supposed to be the comments of the post, but the ability to write a different post in reply.

I want to just note, for the sake of the hypothesis space, a probably-useless idea: There could somehow be more affordance for a middle ground of "offshoot" posting. In other words, structurally formalize / enable the pattern that Anna exhibited in here comment here:

https://www.lesswrong.com/posts/AZwgfgmW8QvnbEisc/cfar-update-and-new-cfar-workshops?commentId=N2r5xTerxfxtfeLCJ

on her post, where she asked for a topic to be budded off to another venue. Adele then did so here:

https://www.lesswrong.com/posts/n299hFwqBxqwJfZyN/adele-lopez-s-shortform?commentId=k326Yx3vYBzQntS4j

And the ensuing discussion seemed productive. This kinda a bit like quote-tweeting as opposed to replying. The difference between just making your own shortform post would be that it's a shortform post, but also paired with a comment on the original post. This would be useful if, as in the above example, the OP author asked for a topic to be discussed in a different venue; or if a commenter wants to discuss something, and also notify the author, and also make their comment visible to other people reading the comments on the OP, but wants to have their own venue or wants to avoid taking up attention in the OP because of off-topic or whatever reason.

Reply
Alex_Altair's Shortform
TsviBT2d122

My guess would be that we actually want to view there as being multiple basic/intuitive cognitive starting points, and they'd correspond to different formal models. As an example, consider steps / walking. It's pretty intuitive that if you're on a straight path, facing in one fixed direction, there's two types of actions--walk forward a step, walk backward a step--and that these cancel out. This corresponds to addition and subtraction, or addition of positive numbers and addition of negative numbers. In this case, I would say that it's a bit closer to the intuitive picture if we say that "take 3 steps backward" is an action, and doing actions one after the other is addition, and so that action would be the object "-3"; and then you get the integers. I think there just are multiple overlapping ways to think of this, including multiple basic intuitive ones. This is a strange phenomenon, one which Sam has pointed out. I would say it's kinda similar to how sometimes you can refactor a codebase infinitely, or rather, there's several different systemic ways to factor it, and they are each individually coherent and useful for some niche, but there's not necessarily a clear way to just get one system that has all the goodnesses of all of them and is also a single coherent system. (Or maybe there is, IDK. Or maybe there's some elegant way to have it all.)

Another example might be "addition as combining two continuous quantities" (e.g. adding some liquid to some other liquid, or concatenating two lengths). In this case, the unit is NOT basic, and the basic intuition is of pure quantity; so we really start with R.

Reply
Overview of strong human intelligence amplification methods
TsviBT2d20

Huh. They have the observed association with autistic traits being negative? That's different from what I'd previously heard. (I'm a layman re/ this stuff, but just noticing.)

Reply
The Charge of the Hobby Horse
TsviBT3d62

FYI an aspect of my experience is briefly thinking "Huh, I wonder what are norms around user-to-user-banning?", and then not trying to find anything about that, and instead just assuming that it's kinda like a twitter block (in particular, mostly it's up to the banner's discretion in order to make their experience work for them), and doing the ban. It sounds like user-bans are considered much more weighty than twitter blocks. This makes some sense, since the structures of the forums are pretty different; not saying I was justified in not processing the difference; I'm just noting descriptively that my (angry) awareness didn't include much awareness of this difference. I'd suggest having a link to some description of the meaning / intended use of a user-ban near the interface element for that; possibly even with a confirmation warning dialogue thing like "are you sure? have you read the thing?".

Reply
The Charge of the Hobby Horse
TsviBT3d40

I try to keep my contributions relevant, and I think I'm applying a significantly higher standard than mere "words that [I] can then respond to."

AFAIK, that's accurate.

I would say the off-topicness of the hobby horse is not actually the central problem with a charging hobby horse. The central problem would be simply persistently misunderstanding the author.

This is made worse by being framed as a correction, because it's a kind of challenge--a high-salience push for engagement. Which is perfectly fine to make, but IMO you should then be trying to correct the author's actual opinions. In the Tree thread, you respond to written words, which makes sense; but then you are pretty slow to question the background philosophical stance that you've imagined the author to have, which to me seems rude. Like, you can imagine a hostile TV interviewer having a guest on and being like "So why do you think it's ok to kill babies?" and the guest is like "Of course I don't think that, what I'm saying is that 3-week fetus has no neurons, so it's not sentient, so it's not a moral cost to abort" and the interviewer is like "I see. So why don't you value human life?". I mean your behavior is not very bad IMO, not like the TV interview, but it's structurally similar. Does this make any sense?

The off-topicness is relevant for two reasons. First of all, it's an additional annoyance, in that the author "is being dragged in" (whatever this should mean in context) to a topic that's less of interest to them. Second of all, it's relevant because hobby horses are a thing, and they tend to produce [persistent misunderstanding, perhaps framed as corrections], which then tend to also be off-topic since hobby horses tend to be off-topic.

I think "Explicitly confirm authorial intent before criticizing" is not a good practice for an async public forum, because it adds way too much friction to the publication of valuable criticisms. (Confirming intent seems good in syncronous conversations, where it's not as costly for the discussion process to block waiting for the author's Yes/No/IDK.)

That's fair. Let me clarify: I'm not especially saying "ask and then wait". Rather I'm suggesting for example that you state the position you're arguing against; ask "Is that roughly what you think?"; and then continue with your response, maybe saying "If so:" or "Anyway, I'll respond to that position I stated:". You could add "I think so, because you wrote X" if you want. (I think this argues in favor of synchronous convos generally. Personally, I am much nicer in synchronous convos.) I'd also recommend

  1. Keeping in mind that you might have misunderstood the author's intent/positions, e.g. by imagining a position they don't hold;
  2. Updating quickly about their positions (of course I don't mean be overcredulous, but.).

I would guess that you in particular put a lot of effort into this, so IDK if there's an update you're supposed to make. But I do think the specific example I used in the OP exhibits the pattern I describe.

Anyway:

Maybe I'd like to zoom out a bit. I'd probably leave this thread to rest, but it's of course fair for you to want to hash out what if anything was objectionable about your comments in the Tree thread.

Reply1
The Charge of the Hobby Horse
TsviBT3d150

Thank you. IDK if it's closer, it's still quite incorrect, but I greatly appreciate the effort.

[Edit: Just to state this: I think Dai was coming from good intentions, e.g. to think about important and interesting things. On reflection: I stand by something being unhelpful about the behavior pattern that I'm claiming I have engaged in, that you have engaged in, and that I claim Dai has engaged in; but also, I don't think it's some egregious transgression; I think I overreacted and I would like figure out how to handle discussion more gracefully and non-antagonistically; I value open thoughtful critical discussion and am grateful when people are interested in discussing things related to my writing. I also sorta meant this OP as being somewhat playful, but also I was annoyed, so it might have been more antagonizing than ideal.]

[Edit 2: I'm going to go through some of the points in more detail, because it seems like I haven't successfully explained the pattern clearly enough yet, so I want to try to make things clearer to Zack. But, I don't want this detailed wall of text to imply that the situation is some big deal; the behavior I'm discussing is not that bad even if you completely buy my perspective; I don't mean to harp on it, I just want to answer Zack's questions.]

Dai is wrong to implicitly ask "Why not just not defer in this case?"

I do not think Dai is wrong to ask that question. I think it's a good and important question.

I don't think the question was super inexplicit. In his first comment, the last sentence was "In other words, if you were going to spend your career on AI x-safety, of course you could have become an expert on these questions first.". I followed up with a response to that.

I would say that the fact that his first comment quoted my statement about "Yudkowsky = best AGI X-derisk strategist", and most of the words were arguing against that, confused the topic. I thought that literal statement was what he wanted to argue against, because he quoted it. Subsequently he did not seem to engage in that specific question, instead starting to develop a different question about different aspects of being a strategic thinker (which is also interesting, but even more off-topic, which is fine, but also confusing).

you think that the reason Dai is wrong to implicitly ask "Why not just not defer in this case?" is because you think that's not relevantly on-topic

I do think it's off-topic, but again, being off-topic is fine. I think it's polite to acknowledge this, e.g. sometimes if I'm responding to something other than one of the main threads of a post I will (or think I ought to even if I don't) say "Nitpick:" or "Off-topic, but:" or "I didn't read the post carefully, just responding to [quote]:". But being maybe mildly impolite in this way is not by itself the issue I have with that thread (though it is one element of the pattern I describe in the OP).

because you think that the implied question falsely presupposes that the post author is not aware of why deference is harmful

Let me venture a description of the events from Dai's perspective. I will write Imaginary!Dai to emphasize that this is not Dai's perspective, but my attempt to guess a plausible-to-me way his experience might have been. [I should maybe have similarly used Straw!Dai in the dialogue in the OP--though I did try to make that somewhat accurate, albeit abstracted and only a subset of the original thread, so I also don't want to imply that I'm not asserting it.]

From Imaginary!Dai's perspective, Imaginary!Dai has a question-blob (perhaps a hobby horse--which is fine/good to have), which is about strategy, and deference, and Yudkowsky being deferred to and over-deferred to, and why did that happen, and what bad effects it had, and how to do strategy well individually and as groups, and so on. So then there's Tsvi's post, and Imaginary!Dai starts some lines of discussion, relating to his question-blob. Tsvi is replying in the thread, and Imaginary!Dai is continuing the discussion. Tsvi's sorta going off on slight-tangents, not quite focusing on the interesting/important things, so Imaginary!Dai is somewhat continuing the interesting lines of discussion, while responding to some of Tsvi's comments in ways that further the interesting parts. Then Tsvi freaks out and bans him.

Ok. So, that would be among my mainline guesses of experiences that Dai was having. I think this possible perspective is empathizable-with, and is mostly fine. It's also unfortunately perfectly compatible with my description of a charging hobby horse. I don't defend and endorse all of my behavior in response, but on the question of whether Dai's behavior had a significant inappropriate pattern, I continue to think yes.

(Whereas I think, and I think that Dai thinks, that the question is relevantly on-topic, because even if everyone in the conversation agrees on the broad outlines of why deference is harmful, they might disagree on the nitty-gritty details of exactly how harmful and exactly why, and litigating the nitty-gritty details of an example that was brought up in the post might help in evaluating the post's thesis.)

I definitely agree that we might (and in fact do) disagree about various details like that, and that this is in general an important topic.

I'm not sure I understand how it's on topic though. The topic of the post is "If you're going to defer, how can you alleviate the problems with that?".

I think it would be on topic to say "We should have been using a different procedure to choose who to defer to" or "We should have been deferring on different questions" or "We should have deferred in a different pattern, e.g. to a larger group of people or with different of us deferring to a wider range of single people". It could also be well on-topic to say "We should have been deferring to someone else", e.g. because Yudkowsky was visibly not the best strategic AGI X-derisk thinker; and so it could be on-topic to discuss "Was Yudkowsky the best AGI X-derisk strategic thinker?" to discuss whether there was a mistaken choice in deferee.

But note that, on that thread, Dai was not AFAICT arguing "We should have deferred to someone else". He was arguing that we should have deferred less overall. (Which IMO is technically off-topic, though of course quite adjacent. Which is fine/good to discuss. In my first reply to him, I did engage on that question.) He sort of kept discussing "Was Yudkowsky the best AGI X-derisk strategic thinker?"--except AFAICT he completely ignored my initial response "But it seems strange to be counting down..." on that topic, so it was unclear to me whether he was trying to discuss that topic with me.

Anyway:

You wrote earlier:

But it really seems like you do have a significant disagreement with Dai about the extent to which deference to Yudkowsky was justified.

Plausibly! I think we both think he was over-deferred to, probably by "a lot". But plausibly we have different "a lot"s.

You even seem to ridicule Dai for this ("And then you're like 'Ha. Why not just not defer?'"). This seems like a real and substantive disagreement, not a hallucination on Dai's part.

On my interpretation, my statements make it pretty clear that I think:

  1. People in general defer a lot, and this is quite bad (and implicitly therefore they should defer less in total).
  2. But, people have to defer a lot due to cognitive costs and the complexity of the world (so the solution can't just be "don't defer", or even "don't defer in your field", though it can and IMO should be "strive to un-defer on as many key questions as feasible in your field").

So, geniune question, what did Dai mean by

By saying that he was the best strategic thinker, it seems like you're trying to justify deferring to him on strategy (why not do that if he is actually the best), while also trying to figure out how to defer "gracefully", whereas I'm questioning whether it made sense to defer to him at all,

The two natural interpretations I can think of are:

  1. You, Tsvi, by saying he's the best strategist, are justifying that people, instead of un-deferring-to Yudkowsky when they can, should continue deferring to him.
  2. Insofar as people were deferring, they should have defered not to him at all (but by implication, someone else instead).

I believe I would have ruled out 2, since at that point it did not seem to me that he was putting forth some other candidate as an alternate deferree, preferable to Yudkowsky.

So I interpreted 1, which is why I flew off the handle (which I'm sorry about), given that I'd repeatedly stated that I was not arguing in favor of deferring more than you have to.

(In fact, my actual intentions in bringing up Yudkowsky, was pretty off-hand; right now I think my original intention in even adding the phrase "being the best..." was just to descriptively explain why he was deferred to so much, though I state here that it was for emphasis, which is also plausible.)

Separately, as I mentioned above the question of "how, and how much, are we actually forced to defer" is important and interesting, and maybe technically off-topic or something but a good thread and something I engaged with in my first response.

Reply1
Load More
7TsviBT's Shortform
Ω
1y
Ω
141
62The Charge of the Hobby Horse
5d
46
24Tools for deferring gracefully
6d
2
106The problem of graceful deference
8d
41
55Escalation and perception
11d
0
39Meta-agentic Prisoner's Dilemmas
Ω
14d
Ω
1
66A prayer for engaging in conflict
15d
0
102LLM-generated text is not testimony
16d
85
138Do confident short timelines make sense?
4mo
76
41A regime-change power-vacuum conjecture about group belief
5mo
16
83Genomic emancipation
5mo
14
Load More
Sinclair's Razor
11 days ago
(+18/-30)
Sinclair's Razor
11 days ago
(+836)
Tracking
9 months ago
(+191)
Tracking
9 months ago
(+2/-2)
Tracking
9 months ago
(+1571)
Joint probability distribution
9 years ago
(+850)
Square visualization of probabilities on two events
9 years ago
(+72)