Vadim Fomin — LessWrong

Kimi K2 is basically as aligned and as likely to be safe when scaled to superintelligence as whatever Anthropic is cooking up today.

Sorry, I know this is tangential, but I'm curious — is it based on it being less psychosis-inducing in this investigation or are there more data points / is it known to be otherwise more aligned as well?

Please, Don't Roll Your Own Metaethics

Vadim Fomin3mo21

I have never done cryptography, but the way I imagine working in it is that it exists in a context of extremely resourceful adversarial agents, and thus you have to give up a kind of casual, not quite noticed neglect toward extremely weird and artificial-sounding edge cases / seemingly weird and unlikely scenarios, because this is where the danger lives: your adversaries may force these weird edge cases to happen, and this is a part of the system's behavior you haven't sufficiently thought through.

Maybe one possible analogy with AI alignment, at least, is that we're also talking about potential extremely resourceful agents that are adversarial until we've actually solved alignment, so we're not allowed to treat weird hypothetical scenarios as unlikely edge cases and say "Come on, that's way too far-fetched, how would it even do that?", because it's like pointing to a hole in a ship's hull and saying "What are the odds the water molecules would decide to go through this hole? The ship is so big!"

EU explained in 10 minutes

Vadim Fomin4mo20

Council of Europe ... (and Russia is in, believe it or not).

It's not. It was Yeltsin trying to get in in the nineties, and then Russia was excluded in 2022.

All AGI Safety questions welcome (especially basic ones) [April 2023]

Vadim Fomin3y10

What is the connection between the concepts of intelligence and optimization?

I see that optimization implies intelligence (that optimizing sufficiently hard task sufficiently well requires sufficient intelligence). But it feels like the case for existential risk from superintelligence is dependent on the idea that intelligence is optimization, or implies optimization, or something like that. (If I remember correctly, sometimes people suggest creating "non-agentic AI", or "AI with no goals/utility", and EY says that they are trying to invent non-wet water or something like that?)

It makes sense if we describe intelligence as a general problem-solving ability. But intuitively, intelligence is also about making good models of the world, which sounds like it could be done in a non-agentic / non-optimizing way. One example that throws me off if Solomonoff induction - which feels like a superintelligence, and indeed contains good models of the world, but doesn't seem to be pushing to any specific state of the world.

I know there's the concept of AIXI, basically an agent armed with Solomonoff induction as their epistemology, but it feels like agency is added separately. Like, there's the intelligence part (Solomonoff induction) and the agency part and they are clearly different, rather that agency automatically popping out because they're superintelligent.

Open & Welcome Thread — March 2023

Vadim Fomin3y100

Is there currently any place for possibly stupid or naive questions about alignment? I don't wish to bother people with questions that have probably been addressed, but I don't always know where to look for existing approaches to a question I have.

Security Mindset and Ordinary Paranoia

Vadim Fomin3y90

The OpenBSD project to build a secure operating system has also, in passing, built an extremely robust operating system, because from their perspective any bug that potentially crashes the system is considered a critical security hole. An ordinary paranoid sees an input that crashes the system and thinks, “A crash isn't as bad as somebody stealing my data. Until you demonstrate to me that this bug can be used by the adversary to steal data, it's not extremely critical.” Somebody with security mindset thinks, “Nothing inside this subsystem is supposed to behave in a way that crashes the OS. Some section of code is behaving in a way that does not work like my model of that code. Who knows what it might do? The system isn't supposed to crash, so by making it crash, you have demonstrated that my beliefs about how this system works are false.”

Hey there,

I was showing this post to a friend who's into OpenBSD. He felt that this is not a good description, and wanted me to post his comment. I'm curious about what you guys think about this specific case and what it does to the point of the post as a whole. Here's his comment:

This isn't an accurate description of what OpenBSD does and how it differs from other systems.
> any bug that potentially crashes the system is considered a critical security hole
For the kernel, this is not true: OpenBSD, just like many other systems, has a concept of crashing in a controlled manner when it's the right thing to do, see e.g. [here](https://man.openbsd.org/crash). As far as I understand [KARL](https://why-openbsd.rocks/fact/karl), avoiding crashes at any cost would make the system less secure: attacker guesses incorrectly => the system crashes => the system boots a new randomized kernel => attacker is back at square one vs. attacker guesses incorrectly => the system continues working as usual => attacker guesses again with new knowledge
For the other parts of the system, the opposite is true: OpenBSD consistently introduces new interesting restrictions, if a program violates them, it will crash immediately.
Example 1: printf and %n
Printf manual page for OpenBSD: http://man.openbsd.org/printf.3
"The %n conversion specifier has serious security implications, so it was changed to no longer store the number of bytes written so far into the variable indicated by the pointer argument. Instead a syslog(3) message will be generated, after which the program is aborted with SIGABRT."
Printf manual page for Linux: https://man7.org/linux/man-pages/man3/printf.3.html
"Code such as printf(foo); often indicates a bug, since foo may contain a % character. If foo comes from untrusted user input, it may contain %n, causing the printf() call to write to memory and creating a security hole."
Printf manual page for macOS: https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/printf.3.html
"%n can be used to write arbitrary data to potentially carefully-selected addresses. Programmers are therefore strongly advised to never pass untrusted strings as the format argument, as an attacker can put format specifiers in the string to mangle your stack, leading to a possible security hole."
As we see, on Linux and macOS, the potential security issue is well-known and documented, but a program that uses it is supposed to work. On OpenBSD, it's supposed to crash.
Example 2: [pledge](http://man.openbsd.org/pledge.2)
This system call allows a program to sandbox itself, basically saying "I only need this particular system functionality to operate properly; if I ever attempt to use anything else, may I crash immediately".
Example 3: [KERN_WXABORT](http://man.openbsd.org/sysctl.2#KERN_WXABORT)
Like many other systems, OpenBSD doesn't allow you to have memory that is both writable and executable. However, it's an error the program can recover from. By setting a kernel parameter, you can make the error unrecoverable. The program that attempts to use memory like that will crash.
I hope I've made my case clear.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments