LESSWRONG
LW

714
niplav
52314090433
Message
Dialogue
Subscribe

I operate by Crocker's rules. All LLM output is explicitely designated as such. I have made no self-hiding agreements.

Website.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Acausal Trade
3shortplav
5y
347
RSPs are pauses done right
niplav3h210

I'm revisiting this post after listening to this section of this recent podcast with Holden Karnofsky.

Seems like this post was overly optimistic in what RSPs would be able to enforce/not quite clear on different scenarios for what "RSP" could refer to. Specifically, this post was equivocating between "RSP as a regulation that gets put into place" vs. "RSP as voluntary commitment"—we got the latter, but not really the former (except maybe in the form of the EU Codes of Practice).

Even at Anthropic, the way the RSP is put into practice is now basically completely excluding a scaling pause from the picture:

RSPs are pauses done right: if you are advocating for a pause, then presumably you have some resumption condition in mind that determines when the pause would end. In that case, just advocate for that condition being baked into RSPs!

Interview:

That was never the intent. That was never what RSPs were supposed to be; it was never the theory of change and it was never what they were supposed to be... So the idea of RSPs all along was less about saying, 'We promise to do this, to pause our AI development no matter what everyone else is doing'

and

But we do need to get rid of some of this unilateral pause stuff.

Furthermore, what apparently happens now is that really difficult commitments either don't get made or get walked back on:

Since the strictest conditions of the RSPs only come into effect for future, more powerful models, it's easier to get people to commit to them now. Labs and governments are generally much more willing to sacrifice potential future value than realized present value.

Interview:

So I think we are somewhat in a situation where we have commitments that don't quite make sense... And in many cases it's just actually, I would think it would be the wrong call. In a situation where others were going ahead, I think it'd be the wrong call for Anthropic to sacrifice its status as a frontier company

and

Another lesson learned for me here is I think people didn't necessarily think all this through. So in some ways you have companies that made commitments that maybe they thought at the time they would adhere to, but they wouldn't actually adhere to. And that's not a particularly productive thing to have done.

I guess the unwillingness of the government to turn RSPs into regulation is what ultimately blocked this. (Though maybe today even a US-centric RSP-like regulation would be considered "not that useful" because of geopolitical competition). We got RSP-like voluntary commitments from a surprising number of AI companies (so good job on predicting the future on this one) but that didn't get turned into regulation.

Reply
shortplav
niplav1d42

It's a bit of a travesty there's no canonical formal write-up of UDASSA, given all the talk about it. Ugh, TODO for working on this I guess.

Reply1
shortplav
niplav1d20

My understanding is that UDASSA doesn't give you unbounded utility, by virtue of directly assigning U(eval(p))∝2−|p|, and the sum of utilities is proportional to ∑∞i=02−i=2. The whole dance I did was in order to be able to have unbounded utilities. (Maybe you don't care about unbounded utilities, in which case UDASSA seems like a fine choice.)

(I think that the other horn of de Blanc's proof is satisfied by UDASSA, unless the proportion of non-halting programs bucketed by simplicity declines faster than any computable function. Do we know this? "Claude!…")

Edit: Claude made up plausible nonsense, but GPT-5 upon request was correct, proportion of halting programs declines more slowly than some computable functions.

Edit 2: Upon some further searching (and soul-searching) I think UDASSA is currently underspecified wrt whether its utility is bounded or unbounded. For example, the canonical explanation doesn't mention utility at all, and none of the other posts about it mention how exactly utility is defined..

Reply
shortplav
niplav1d40

Makes sense, but in that case, why penalize by time? Why not just directly penalize by utility? Like the leverage prior.

Huh. I find the post confusingly presented, but if I understand correctly, 15 logical inductor points to Yudkowsky₂₀₁₃—I think I invented the same concept from second principles.

Let me summarize to understand: My speed prior on both the hypotheses and the utility functions is trying to emulate just discounting utility directly (because in the case of binary tapes and integers penalizing both for the exponential of speed gets you exactly an upper bound for the utility), and a cleaner way is to set the prior to 2−|p|⋅1U(eval(p)). That avoids the "how do we encode numbers" question that naturally raises itself.

Does that sound right?

(The fact that I reinvented this looks like a good thing, since that indicates it's a natural way out of the dilemma.)

Reply
shortplav
niplav1d40

I think the upper bound here is set by a program "walking" along the tape as far as possible while setting the tape to 1 and then setting a list bit before halting (thus creating the binary number 11ⁿ where n≤BB(|p|)[1]). If we interpret that number as a utility, the utility is exponential in the number of steps taken, which is why we need to penalize by 2−steps(p) instead of just 1/steps(p)[2]. If you want to write 3↑↑↑3 on the tape you have to make at least log2(3↑↑↑3) steps on a binary tape (and logn(3↑↑↑3) on an n-ary tape).


  1. Technically the upper bound is Σ(|p|), the score function. ↩︎

  2. Thanks to GPT-5 for this point. ↩︎

Reply
shortplav
niplav2d*70

epistemic status: Going out on a limb and claiming to have solved an open problem in decision theory[1] by making some strange moves. Trying to leverage Cunningham's law. Hastily written.

p(the following is a solution to Pascal's mugging in the relevant sense)≈25%[2].

Okay, setting (also here in more detail): You have a a Solomonoff inductor with some universal semimeasure as a prior. The issue is that the utility of programs can grow faster than your universal semimeasure can penalize them, e.g. a complexity prior has busy-beaver-like programs that produce  BB(|p|) amounts of utility with the program |p|, while only being penalized by 2−|p|. The more general results are de Blanc 2007, de Blanc 2009 (LW discussion on the papers from 2007). We get this kind of divergence of expected utility on the prior if

  1. the prior is bounded from below by a computable function and
  2. the utility function is computable and unbounded

The next line of attack is to use the speed prior as a prior 2−|p|−log(t(p)). That prior is not bounded by a computable function from below (because it grows slower than 1/BB(n) for programs of length n), so we escape into one of de Blanc's horns. (I don't think having a computable lower bound is that important because K-complexity was never computable in the first place.)

But there's an issue: What if our hypotheses output strings that are short, but are evaluated by our utility function as being high-value anyway? That is, the utility function takes in some short string of length n and outputs as its utility BB(n). This is the case if the utility function itself is a program of some computational power, in the most extreme case the utility function is Turing-complete, and our hypotheses "parasitize" on this computational power of our utility function to be a Pascal's mugging. So what we have to do is to also consider the computation of our utility function as being part of what's penalized by the prior. That is,

2−|p|−t(p)−tu(eval(p))

for tu(eval(p)) being the time it takes to run the utility function on the output of p. I'll call this the "reflective speed prior". Note that if you don't have an insane utility function which is Turing-complete, the speed penalty for evaluating the output of p should be fairly low most of the time.

Pascal's mugging can be thought of in two parts:

  1. My expected utility diverges on priors, that is, not having observed any information or made any Bayesian updating, my utility can get arbitrarily big. I think this one is a problem.
  2. My expected utility can diverge after updating on adversarially selected information. I think this case should be handled by improving your epistemology.

I claim that the reflective speed prior solves 1., but not 2. Furthermore, and this is the important thing, if you use the reflective speed prior, the expected utility is bounded on priors, but you can have arbitrarily high maximal expected utilities after performing Bayesian updating. So you get all the good aspects of having unbounded utility without having to worry about actually getting mugged (well, unless you have something controlling the evidence you observe, which is its own issue).

(Next steps: Reading the two de Blanc papers carefully, trying to suss out a proof, writing up the argument in more detail. Think/argue about what it means to update your prior in this strange way, and specifically penalizing hypotheses by how long it takes your utility function to evaluate them. Figure out which of these principles are violated. (On an initial read: Definitely Anti-Timidity.) Changing ones prior in a "superupdate" has been discussed here and here.)

Edit: Changed from penalizing the logarithm of runtime and utility-runtime to penalizing it linearly, after feedback from GPT-5.


  1. I like to live dangerously. ↩︎

  2. Updated down from 40% after getting some feedback by GPT-5 on what the costs of this approach are. ↩︎

Reply
FWIW: What I noticed at a (Goenka) Vipassana retreat
niplav5d62

Best I can tell, the risk of psychosis is much higher with Goenka style retreats, although I don't have hard numbers, only anecdotal evidence and theory that suggests it should be more common.

My experience has been that anything except long intensive retreats don't move my mind out of its default attractor state, and that I probably waited too long to attend do long retreats, and all the advice goes in the opposite direction.

I mention this because all the talk of the downside of meditation have me thinking of a tweet that goes roughly like "why do both republicans and democrats pretend HRT does anything". Goenka retreats have medium-strength effects on me, an intensive one-month retreat at home had a decent effect. I may be doing something wrong.

Reply
Humanity Learned Almost Nothing From COVID-19
niplav10d20

Yup, that's correct if I remember the sources correctly. I guess the tone surrounding it doesn't match that particular bit of content. I should also turn the pledged/received numbers into a table for easier reading.

Reply
Humanity Learned Almost Nothing From COVID-19
niplav12d50

Yup, it's a regionalism that I mis-/over-generalized. I'll avoid it from now on.

Reply
leogao's Shortform
niplav12d20

It is, and it's the thing I'd most like Smil to read if I could recommend something to him.

Reply
Load More
162Humanity Learned Almost Nothing From COVID-19
12d
35
14Ontological Cluelessness
1mo
12
21Anti-Superpersuasion Interventions
3mo
1
36Meditation and Reduced Sleep Need
7mo
8
24Logical Correlation
9mo
7
39Resolving von Neumann-Morgenstern Inconsistent Preferences
1y
5
380.836 Bits of Evidence In Favor of Futarchy
1y
0
14Pomodoro Method Randomized Self Experiment
1y
2
21How Often Does Taking Away Options Help?
1y
7
47Michael Dickens' Caffeine Tolerance Research
1y
5
Load More
Comp-In-Sup
25 days ago
(+13/-13)
AI-Assisted Alignment
5 months ago
(+54)
AI-Assisted Alignment
5 months ago
(+127/-8)
Recursive Self-Improvement
5 months ago
(+68)
Alief
6 months ago
(+11/-11)
Old Less Wrong About Page
8 months ago
Successor alignment
9 months ago
(+26/-3)
Cooking
a year ago
(+26/-163)
Future of Humanity Institute (FHI)
2 years ago
(+11)
Future of Humanity Institute (FHI)
2 years ago
(+121/-49)
Load More