Tetraspace

Drew the shoggoth and named notkilleveryoneism.

https://twitter.com/TetraspaceWest

Posts

Sorted by New

10Tetraspace Grouping's Shortform

4Is there a "coherent decisions imply consistent utilities"-style argument for non-lexicographic preferences?

31Recently I bought a new laptop

60You Can Do Futarchy Yourself

10Tetraspace Grouping's Shortform

Wiki Contributions

Shut Up and Multiply

(+6/-5)

Aumann's Agreement Theorem

Comments

Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural

Tetraspace4d54

Obeying it would only be natural if the AI thinks that the humans are more correct than the AI would ever be, after gathering all available evidence, where "correct" is given by the standards of the definition of the goal that the AI actually has, which arguendo is not what the humans are eventually going to pursue (otherwise you have reduced the shutdown problem to solving outer alignment, and the shutdown problem is only being considered under the theory that we won't solve outer alignment).

An agent holding a belief state that given all available information it will still want to do something other than the action it will think is best then is anti-natural; utility maximisers would want to take that action.

This is discussed on Arbital as the problem of fully updated deference.

Conditional prediction markets are evidential, not causal

Tetraspace6mo30

This ends up being pretty important in practise for decision markets ("if I choose to do X, will Y?"), where by default you might e.g. only make a decision if it's a good idea (as evaluated by the market), and therefore all traders will condition on the market having a high probability which is obviously quite distortionary.

Tamsin Leake's Shortform

Tetraspace7mo20

I replied on discord that I feel there's maybe something more formalisable that's like:

reality runs on math because, and is the same thing as, there's a generalised-state-transition function
because reality has a notion of what happens next, realityfluid has to give you a notion of what happens next, i.e. it normalises
the idea of a realityfluid that doesn't normalise only comes to mind at all because you learned about R^n first in elementary school instead of S^n

which I do not claim confidently because I haven't actually generated that formalisation, and am posting here because maybe there will be another Lesswronger's eyes on it that's like "ah, but...".

Shutdown-Seeking AI

Tetraspace1y10

Not unexpected! I think we should want AGI to, at least until it has some nice coherent CEV target, explain at each self-improvement step exactly what it's doing, to ask for permission for each part of it, to avoid doing anything in the process that's weird, to stop when asked, and to preserve these properties.

Tetraspace Grouping's Shortform

Tetraspace1y30

Even more recently I bought a new laptop. This time, I made the same sheet, multiplied the score from the hard drive by because 512 GB is enough for anyone and that seemed intuitively the amount I prioritised extra hard drive space compared to RAM and processor speed, and then looked at the best laptop before sharply diminishing returns set in; this happened to be the HP ENVY 15-ep1503na 15.6" Laptop - Intel® Core™ i7, 512 GB SSD, Silver. This is because I have more money now, so I was aiming to maximise consumer surplus rather than minimise the amount I was spending.^[1]

Surprisingly, it came with a touch screen! That's just the kind of nice thing that laptops do nowadays, because as I concluded in my post, everything nice about laptops correlates with everything else so high/low end is an axis it makes sense to sort things on. Less surprisingly, it came with a graphics card, because ditto.

Unfortunately this high-end laptop is somewhat loud; probably my next one will be less loud, up to including an explicit penalty for noise.

^{^}
It would have been predictable, however, at the time that I bought that new laptop, that I would have had that much money at a later date. Which means that I should have just skipped straight to consumer surplus maxxing.

Is the fact that we don't observe any obvious glitch evidence that we're not in a simulation?

Answer by TetraspaceApr 26, 202355

It would be evidence at all. Simple explanation: if we did observe a glitch, that would pretty clearly be evidence we were in a simulation. So by conservation of expected evidence, non-glitches are evidence against.

Pausing AI Developments Isn't Enough. We Need to Shut it All Down

Tetraspace1y20

I don't think it's quite that; a more central example I think would be something like a post about extrapolating demographic trends to 2070 under the UN's assumptions, where then justifying whether or not 2070 is a real year is kind of a different field.

Tetraspace Grouping's Shortform

Tetraspace1y70

, as a mathematical structure, is smarter than god and perfectly aligned to $U$ ; the value of $arg max U$ will never actually be $arg max V$ because $V$ is more objectively rational, or because you made a typo and it knows you meant to say $arg max V$ ; and no matter how complicated the mapping is from $a$ to $U (a)$ it will never fall short of giving the $a$ that gives the highest value of $U$ .

Which is why in principle you can align a superior being, like $arg max$ , or maybe like a superintelligence.

Tetraspace Grouping's Shortform

Tetraspace1y40

"The AI does our alignment homework" doesn't seem so bad - I don't have much hope for it, but because it's a prosaic alignment scheme so someone trying to implement it can't constrain where Murphy shows up, rather than because it's an "incoherent path description".

A concrete way this might be implemented is

A language model is trained on a giant text corpus to learn a bunch of adaptations that make it good at math, and then fine-tuned for honesty. It's still being trained at a safe and low level of intelligence where honesty can be checked, so this gets a policy that produces things that are mostly honest on easy questions and sometimes wrong and sometimes gibberish and never superhumanly deceptive.^[1]
It's set to work producing conceptually crisp pieces of alignment math, things like expected utility theory or logical inductors, slowly on inspectable scratchpads and so on, with the dumbest model that can actually factor scientific research^[1], with human research assistants to hold their hand if that lets you make the model dumber. It does this, rather than engineering, because this kind of crisp alignment math is fairly uniquely pinned down so it can be verified, and it's easier to generate compared to any strong pivotal engineering task where you're competing against humans on their own ground so you need to be smarter than humans, so while it's operating in a more dangerous domain it's using a safer level of intelligence.^[1]
The human programmers then use this alignment math to make an corrigible thingy that has dangerous levels of intelligence that does difficult engineering and doesn't know about humans, while this time knowing what they're doing. Getting the crisp alignment math from parallelisable language models helps a lot and gives them a large lead time, because a lot of it's the alignment version of backprop where it would have took a surprising amount of time to discover otherwise.

This all happens at safe-ish low-ish levels of intelligence (such a model would probably be able to autonomously self-replicate on the internet, but probably not reverse protein folding, which means that all the ways it could be dangerous are "well don't do that"s as long as you keep the code secret^[1]), with the actual dangerous levels of optimisation being done by something made by the humans using pieces of alignment math which are constrained down to a tiny number of possibilities.

EDIT 2023-07-25: A longer debate that I think is worth reading about the model that leads it to being an incoherent path description between Holden Karnofsky (pro) and Nate Soares (against) is here; I hadn't read this as of writing this.

^{^}
Unless it isn't; it's a giant pile of tensors, how would you know? But this isn't special to this use case.

"Dangers of AI and the End of Human Civilization" Yudkowsky on Lex Fridman

Tetraspace1y210

The solanine poisoning example was originally posted to Reddit here, the picture of Sydney Bing from a text description was posted on Twitter here.