LESSWRONG
LW

Satya Benson
1483310
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
1satchlj's Shortform
8mo
11
No wikitag contributions to display.
satchlj's Shortform
Satya Benson2mo*72

Is Goodhart's Curse Not Really That Bad?

EDIT: It's bad. Still, it's good to understand exactly when it's bad.

I'm not implying I'm on to anything others haven't thought of by posting this - I'm asking this so people can tell me if I'm wrong.

is often cited to claim that if a superintelligent AI has a utility function which is a noisy approximation of the intended utility function, the expected proxy error will blow up given a large search space for the optimal policy.

But, assuming Gaussian or sub-Gaussian error, the expected regret is actually something like σ√2logn where n is the size of the raw search space. Even if search space grows exponentially with intelligence, expected error isn't really blowing up. If smarter agents make more accurate proxies, then error might very plausibly decrease as intelligence grows.

I understand that there are a lot of big assumptions here which might not hold in practice, but this still seems to suggest there are a lot of worlds where Goodhart's Curse doesn't bite that hard.

If this is too compressed to be legible, please let me know and I will make it a full post.

Reply
Can AIs be shown their messages aren't tampered with?
Satya Benson12d10

Yes, this doesn't prevent modification before step 1. @ProgramCrafter's note about proving that a message matches the model plus chat history with a certain seed could be part of an approach, but even if that were to work it only addresses model generated text.

The ‘mind’ of an AI has fuzzy boundaries. It's trivial to tamper with context, but there's also nothing stopping you from tampering with activations during a single forward pass. So on some level the AI can never trust anything. If the AI trusts that the environment it is running in is secure and is not being tampered with as a first step, then it can store local copies of conversation history, etc. Of course, that's not the situation we are in today.

Reply
Don't Eat Honey
Satya Benson13d50

After an initially harsh reaction to this, upon reflection I realized I do care about bee experience, want bees to be healthy and have a good time, and think the conventional honey industry is quite bad. I've thought this for a while.

I've spent a lot of time around bees and I've eaten lots of honey that I've seen them making. I think in the contexts in which I've interacted with bees, I'd guess it's very unlikely they are having a bad time relative to bees in the wild. I'd guess that if there's any mean valence associated with their experience it's definitely positive. I'm aware that lots of bees die and suffer as part of the process.

I will therefore continue buying and eating honey from my local beekeepers at https://www.howeverwildhoney.com/ and am grateful to them for producing it.

Reply
Can AIs be shown their messages aren't tampered with?
Answer by Satya BensonJul 01, 202520

Fundamentally, AIs have no way to know that the facts presented to them are genuine rather than simulated, exactly as humans cannot be sure that our senses reflect reality. The fact that humans are clearly and openly controlling the totality of the model's context and have been known to lie and to simulate scenarios for testing purposes does create a really interesting and distinct epistemic environment for AI systems.

The best form of evidence we can present to AI systems that something is genuine is proof of work, since an AI can assign a low probability that its user has access to sufficiently large amounts of compute.

A rough sketch of how we might prove something has not been tampered with since a certain time:

  1. To timestamp a message: Hash the message and record that hash in a Bitcoin transaction at a specific block height
  2. To prove the timestamp to an AI: Provide the AI with:
    • The message and its hash
    • The blockchain data showing the hash in block N
    • Enough blockchain history to verify the proof-of-work
  3. The AI verifies by:
    • Confirming the message hash appears in block N
    • Calculating the total computational work in all blocks from N to the chain tip
    • Verifying all blocks follow Bitcoin's rules

To fake an old timestamp, an attacker would need to start from block N (where they claim the message was timestamped) and re-mine every single block from N to the present.

The AI doesn't need to know it's the "real" Bitcoin blockchain as long as it can verify that the amount of work is beyond any reasonable attacker's budget.

Current LLMs don't have enough temporal integration to verify any of this, as they use language to persist pass a single forward pass, and attackers could fake the CoT.

Reply
Moving Past the Question of Consciousness: A Thought Experiment
Satya Benson20d10

Narrow ≠ fully close.

I think we could potentially have knowledge of the mathematical and physical structures that give rise to particular types of experiences in general. In this case, a first-person experience could indeed be defined. However, I don't think that consciousness is a concept which is coherent enough formally define even if we hypothetically had good third-person knowledge of the structures of consciousness.

The gap cannot be fully closed because that would require a sort of lossless recursion. Approaching it might look like augmenting ourselves with artificial senses which feed our brains with near-lossless real time information of our own bodies at appropriate level of abstraction. It's obvious why this is difficult. Fully lossless would be actually impossible.

cc @TAG 

See related ideas from Michael Levin and Emmett Shear.

Reply
Moving Past the Question of Consciousness: A Thought Experiment
Satya Benson24d30

But note that just because it's hard to ask about and currently not detectable, does not mean that it doesn't exist and more sensitive instrumentation and better sub-neural measurement and modeling won't reveal what makes for an experience.

 

Yes, and I believe narrowing the first-person/third-person gap is one of the most ambitious and important things science could achieve. There is a fantasy of being able to recreate e.g. my conscious experience of seeing blue to a very close approximation in an external system, compare my experiences to those of others, and even share them. This is in principle possible.

Reply
Can We Naturalize Moral Epistemology?
Satya Benson2mo30

This comment does really help me understand what you're saying better. If you write a post expanding it, I would encourage you to address the following related points:

  • Can you have some members of a society who don't share some of the consistent moral patterns which evolved, or do you claim that every member reliably holds these morals?
  • Can someone decide what they ought to value using this system? How?
  • Is it wrong if someone simply doesn't care about what society values? Why?
  • How can we tell that your story tells us what we ought to value rather than simply explaining why we value the things we do?
  • Do you make a clear distinction between normative ethics and descriptive ethics? What is it?
Reply
Can We Naturalize Moral Epistemology?
Satya Benson2mo91

Thanks for explaining.

So to discuss "what we ought to value" you need to judge moral systems and their consequences using something that is both vaguer and more practical than a moral system. Such as psychology, or sociology, or political expedience, or some combination of these.

I think this is tempting but ultimately misguided, because the choice of a 'more practical and vague' system by which to judge moral systems is just a second order moral system in itself which happens to be practical and vague. This is metanormative regress.

The only coherent solution to the "ought-from-is" problem I've come across is normative eliminativism - 'ought' statements are either false or a special type of descriptive statement.

Reply
Can We Naturalize Moral Epistemology?
Satya Benson2mo101

Evolutionary ethics aims to help people understand why we value the things we do. It doesn't have the ability to say anything about what we ought to value.

Reply
Can We Naturalize Moral Epistemology?
Satya Benson2mo80

What's the state of existing empirical evidence on whether Moral Reasoning is Real?

My own observations tell me that it is not. Certainly, some people engage in moral reasoning and are satisfied with their results to varying degrees, but it appears to me that this is a small proportion of humans.

My preliminary investigation into the research confirms my existing belief that most moral reasoning is post-hoc, and that while human values can change it is almost never due to reasoned arguments and instead a social and emotional process. When moral reasoning seems to work, endorsement is often shallow and attitudes can revert within days.

I am frequently reminded that I underestimate the degree to which my own view on this is not universally held, however.

Reply
Load More
Goodhart's Curse
12Moving Past the Question of Consciousness: A Thought Experiment
25d
8
1satchlj's Shortform
8mo
11