LESSWRONG
LW

921
yams
142551640
Message
Dialogue
Subscribe

MIRI, formerly MATS, sometimes Palisade

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Heroic Responsibility
yams5d40

Didn’t disagree vote myself, but I think there’s a linguistic pattern of ‘just asking questions’ that is used to signal disagreement while also evading interrogation yourself. At first glance, your comment may be reading that way to others, who then hastily smash the disagree button to signal disagreement with the position they think you’re implying (even though you were really genuinely just asking questions).

I see this happen a lot, where folks mismodel someone’s epistemic state or tacking when, really, the person is just confused and trying to explicate the conditions of their confusion. In the broader world, claiming to be confused about something is a common tactic for trying to covertly convince someone of your position.

Reply1
Noah Birnbaum's Shortform
yams16d60

Duncan Sabien once ran the inverse experiment. He made a separate account to see how his posts would do without his reputation. The account only has one post still up, but iirc there used to be many more (tens). They performed similarly well to posts under his own name. Cool idea!

[plausibly I'm getting parts of the story wrong and someone who was around then will correct me]

Reply
Guys I might be an e/acc
yams16d30

I think that I’d do this math by net QUALYs and not net deaths. My guess is doing it that way may actually change your result.

I’m not trying to avoid dying; I’m trying to steer toward living.

Reply
Which side of the AI safety community are you in?
yams18d412

Yup! I just think there’s an unbounded way that a reader could view his comment: “oh! There are no current or future consequences at OAI for those who sign this statement!”

…and I wanted to make the bound explicit: real protections, into the future, can’t plausibly be offered, by anyone. Surely most OAI researchers are thinking ahead enough to feel the pressure of this bound (whether or not it keeps them from signing).

I’m still glad he made this comment, but the Strong Version is obviously beyond his reach to assure.

Reply1
Which side of the AI safety community are you in?
yams19d32

This is good!

My guess is that their hesitance is also linked to potential future climates, though, and not just the current climate, so I don’t expect additional signees to come forward in response to your assurances.

Reply
The IABIED statement is not literally true
yams22d40

I think my crux is ‘how much does David’s plan resemble the plans labs actually plan to pursue?’

I read Nate and Eliezer as baking in ‘if the labs do what they say they plan to do, and update as they will predictably update based on their past behavior and declared beliefs’ to all their language about ‘the current trajectory’ etc etc.

I don’t think this resolves ‘is the tittle literally true’ in a different direction if it’s the only crux, and agree that this should have been spelled out more explicitly in the book (e.g. ‘in detail, why are the authors pessimistic about current safety plans’) from a pure epistemic standpoint (although think it was reasonable to omit from a rhetorical standpoint, given the target audience) and in various Headline Sentences throughout the book, and The Problem.

One generous way to read Nate and Eliezer here is to say ‘current techniques’ is itself intending to bake in ‘plans the labs currently plan to pursue’. I was definitely reading it this way, but think it’s reasonable for others not to. If we read it that way, and take David’s plan above to be sufficiently dissimilar from real lab plans, then I think the title’s literal interpretation goes through.

[your post has updated me from ‘the title is literally true’ to ‘the title is basically reasonable but may not be literally true depending on how broadly we construe various things’, which is a significantly less comfortable position!]

Reply
If Anyone Builds It Everyone Dies, a semi-outsider review
yams23d20

I want to vouch for Eli as a great person to talk with about this. He has been around a long time, has done great work on a few different sides of the space, and is a terrific communicator with a deep understanding of the issues.

He’s run dozens of focus-group style talks with people outside the space, and is perhaps the most practiced interlocutor for those with relatively low context.

[in case OP might think of him as some low-authority rando or something and not accept the offer on that basis]

Reply111
Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data
yams1mo33

You’re disagreeing with a claim I didn’t intend to make.

I was unclear in my language and shouldn’t have used ‘contains’. Sorry! Maybe ‘relaying’ would have avoided this confusion.

I think you’re not objecting to the broader point other than by saying ‘neuralese requires very high bandwidth’, but LLMs have a lot of potential associations that can be made in processing a single token (which is, potentially, an absolute ton of bandwidth).

Reply
Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data
yams1mo40

@StanislavKrym  can you explain your disagree vote?

Strings of numbers are shown to transmit a fondness for owls. Numbers have no semantic content related to owls. This seems to point to ‘tokens containing much more information than their semantic content’, doesn’t it?

Reply
Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data
yams1mo*30

Doesn't this have implications for the feasibility of neuralese? I've heard some claims that tokens are too low-bandwidth for neuralese to work for now, but this seems to point at tokens containing (edit: I should have said something like ‘relaying’ or ‘invoking’ rather than ‘containing’) much more information than their semantic content.

Reply1
Load More
2yams's Shortform
1y
49
316The Problem
3mo
218
112If Anyone Builds It, Everyone Dies: Call for Translators (for Supplementary Materials)
4mo
12
85If Anyone Builds It, Everyone Dies: Advertisement design competition
4mo
37
46Existing Safety Frameworks Imply Unreasonable Confidence
7mo
3
10[Job Ad] MATS is hiring!
1y
0
62MATS Alumni Impact Analysis
1y
7
2yams's Shortform
1y
49
124Talent Needs of Technical AI Safety Teams
1y
65