I’m not sure what the consensus is on how to vote on these posts, but I’m sad that this post’s poor reception might be why its author deactivated their account.

Reply

Feedly Breaks MathML

Kenny10mo40

I just reported this to Feedly.

Reply

Lies Told To Children

Kenny1y20

Thanks for the info! And no worries about the (very) late response – I like that people fairly often reply at all (beyond same-day or within a few days) on this site; makes the discussions feel more 'timeless' to me.

The second "question" wasn't a question, but it was due to not knowing that Conservative Judaism is distinct from Orthodox Judaism. (Sadly, capitalization is only relatively weak evidence of 'proper-nounitude'.)

Reply

How could AIs 'see' each other's source code?

Kenny1y40

Some of my own intuitions about this:

Yes, this would be 'probabilistic' and thus this is an issue of evidence that AIs would share with each other.
Why or how would one system trust another that the state (code+data) shared is honest?
Sandboxing is (currently) imperfect, tho perhaps sufficiently advanced AIs could actually achieve it? (On the other hand, there are security vulnerabilities that exploit the 'computational substrate', e.g. Spectre, so I would guess that would remain as a potential vulnerability even for AIs that designed and built their own substrates.) This also seems like it would only help if the sandboxed version could be 'sped up' and if the AI running the sandboxed AI can 'convince' the sandboxed AI that it's not' sandboxed.
The 'prototypical' AI I'm imagining seems like it would be too 'big' and too 'diffuse' (e.g. distributed) for it to be able to share (all of) itself with another AI. Another commenter mentioned an AI 'folding itself up' for sharing, but I can't understand concretely how that would help (or how it would work either).

Reply

How could AIs 'see' each other's source code?

Kenny1y20

I think my question is different, tho that does seem like a promising avenue to investigate – thanks!

Reply

How could AIs 'see' each other's source code?

Kenny1y20

That's an interesting idea!

Reply

How could AIs 'see' each other's source code?

Kenny1y20

An oscilloscope

I guessed that's what you meant but was curious whether I was right!

If the AI isn't willing or able to fold itself up into something that can be run entirely on single, human-inspectable CPU in an airgapped box, running code that is amenable to easily proving things about its behavior, you can just not cooperate with it, or not do whatever else you were planning to do by proving something about it, and just shut it off instead.

Any idea how a 'folded-up' AI would imply anything in particular about the 'expanded' AI?

If an AI 'folded itself up' and provably/probably 'deleted' its 'expanded' form (and all instances of that), as well as any other AIs or not-AI-agents under its control, that does seem like it would be nearly "alignment-complete" (especially relative to our current AIs), even if, e.g. the AI expected to be able to escape that 'confinement'.

But that doesn't seem like it would work as a general procedure for AIs cooperating or even negotiating with each other.

Reply

How could AIs 'see' each other's source code?

Kenny1y41

What source code and what machine code is actually being executed on some particular substrate is an empirical fact about the world, so in general, an AI (or a human) might learn it the way we learn any other fact - by making inferences from observations of the world.

This is a good point.

But I'm trying to develop some detailed intuitions about how this would or could work, in particular what practical difficulties there are and how they could be overcome.

For example, maybe you hook up a debugger or a waveform reader to the AI's CPU to get a memory dump, reverse engineer the running code from the memory dump, and then prove some properties you care about follow inevitably from running the code you reverse engineered.

In general though, this is a pretty hard, unsolved problem - you probably run into a bunch of issues related to embedded agency pretty quickly.

(What do you mean by "waveform" reader"?)

Some practical difficulties with your first paragraph:

How can AI's credibly claim that any particular CPU is running their code, or that a debugger connected to it isn't being subverted via, e.g. MITM?
How can AI's credibly claim that whatever the contents of a 'CPU's' memory is at some point, it won't be replaced at some later point?
How could one AI safely execute code given to it by another (e.g. via "memory dump")?
How could one AI feasibly run another's code 'fast enough' to be able to determine that it could (probably) trust it now (even assuming [1], [2], and [3] are solved)?

[1] points to what I see as a big difficulty, i.e. AIs will probably (or could) be very distributed computing systems and there might not be any practical way to 'fit into a single box' for, e.g. careful inspection by others.

Reply

"Corrigibility at some small length" by dath ilan

Kenny1y10

This is a nice summary!

fictional role-playing server

As opposed to all of the non-fictional role playing servers (e.g. this one)?

I don't think most/many (or maybe any) of the stories/posts/threads on the Glowfic site are 'RPG stories', let alone some kind of 'play by forum post' histories, there's just a few that use the same settings as RPGs.

Reply

AI #5: Level One Bard

Kenny1y10

I suspect a lot of people, like myself, learn "content-based writing" by trying to communicate, e.g. in their 'personal life' or at work. I don't think I learned anything significant by writing in my own "higher forms of ['official'] education".

Reply