Roman Belaire — LessWrong

LESSWRONG
LW

Replying toWhy we are excited about confession!

Regarding the problem of unknown-unknowns, it looks like there's a pretty heavy emphasis on the correctness and completeness of the judge. Is the aggregate judge reward binary per component, or can there be a decimal reward for something like "the model confessed to half of its errors"?

Also, I'm curious as to why CoT is excluded from the judge input, and if you have conducted/plan to conduct ablations on doing so? I would intuit that doing so might resolve some of the judge's unknown-unknowns.

Edit:

I'm guessing one of the reasons you excluded CoT was that by including it, the judge might see something like a reiterated-but-ignored rule and return a false compliance score. If that is the case, could you compare $ R(y_c~|~x,y,x_c) $ to $ R(y_c~|~x,y,x_c,z )$?

Replying toTensor-Transformer Variants are Surprisingly Performant

Roman Belaire1mo

Tensor-Transformer Variants are Surprisingly Performant

Ah, that makes more sense, thanks!

Also I agree with using loss/time as the measure of performance, since it's fairly straightforward to interpret (loss recovered per unit time). If I were reviewing this, I'd look for that.

For efficiency in practice, I think most ML papers look at FLOP/s since it is hardware agnostic. Maybe a good measure of efficiency here would be loss per FLOP per second? I haven't seen that used, but it might reflect how performance scales with computational speed.

Edit: Actually thinking about it, the test-time efficiency might be a better comparison, assuming the two scale within roughly the same complexity class. I think from a product perspective, speed for users is super (maybe the most) valuable.

Replying toTensor-Transformer Variants are Surprisingly Performant

Roman Belaire1mo

Tensor-Transformer Variants are Surprisingly Performant

Disclaimer: I'm not very familiar with the ins and outs of tensor networks (so thanks for the reading list :D).

I would think that a composable dual encoder architecture would be able to hold more information than an MLP one, so it seems counterintuitive that the dual encoder requires more steps to achieve the same cross-entropy. I'm sure this is in part due to the more complex loss function, so maybe there is some threshold on dataset size or model size above which the tensor variant achieves lower CE?

Replying toIn My Misanthropy Era

Roman Belaire1mo

In My Misanthropy Era

It seems that your goal is essentially to find compassion for those with a different value set than yours, and that the confounding element is that other value structures (e.g., truth vs. utility vs. tradition, etc.) often don't support each other. Is that on target?

It's worth recognizing that any set of guiding principles is essentially arbitrary if you inspect them deeply enough. What Schopenhauer calls apathy and hedonism, another might call "the human experience." While I value the ability to introspect and think abstractly, I take issue with Schopenhauer's disdain for 'dumb' entertainment: if my longing for higher understanding leaves me, and only me, miserable, is that really a moral victory? Depends... (read more)

Replying toRecent LLMs can do 2-hop and 3-hop latent (no-CoT) reasoning on natural facts

Roman Belaire1mo

Recent LLMs can do 2-hop and 3-hop latent (no-CoT) reasoning on natural facts

Say a CoT answer is "person A was born in 1900. Tungsten is the 74th element. The Oscar movie of the year in 1974 was [movie]".

Am I correct in understanding that a successful n-hop answer would be just "[movie]"?
Would "The winner of movie of the year in 1974 was [movie]" fulfill the success criteria?

I'm also curious if the model latent values update to similar vectors for the CoT response vs filler tokens (I suppose also the n-hop response as well). is this something you explored?

Replying toYou will be OK

Roman Belaire1mo

You will be OK

Always glad to see any attempt to balance the bad vibes with hope. Happy New Year :)

Regarding the "OK" debate, I would put forth that perhaps a sentiment worth valuing is that, either way, we will continue to "be", which I think/hope many will agree is likely.

-2

Replying toTurning 20 in the probable pre-apocalypse

Roman Belaire1mo

Turning 20 in the probable pre-apocalypse

very true! Actually, the best fix for nihilism (in my experience) has been acceptance, followed by revolt, of whatever existential threat is causing it (i.e. absurdism). The 747 will always outrun me, so I will be content just running for the sake of it.

In the pursuit of AI safety, I think the cases of AGI apocalypse and AGI happening at all are equally unpredictable. I personally see them as feasible within our lifetimes, but with no smaller range of certainty than that. The uncertainty of that makes it feel strange to build a career around it, yet the existential dread does not go away. So, I choose to find things within the... (read more)

Roman Belaire's Shortform

Roman Belaire

1mo

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

Roman Belaire2mo

I actually view art as the opposite, as a vessel for social connection and culture, which is a behavioral aspect mostly unique and ever-important to humans. Of course, the constraint is that the art is shared externally, so perhaps the crisis is more a lack of sharing than the act of creation.

Roman Belaire2moQuick Take

Rationalism vs the Platonic Form: thoughts? As I understand it, assigning probabilities to world outcomes is a (maybe implicit) step towards understanding the Platonic ideal form of something.
Does this resonate with how anyone approaches things?

Replying toTurning 20 in the probable pre-apocalypse

Roman Belaire2mo

Turning 20 in the probable pre-apocalypse

On the 1% vs 0.001% note, a framework of measurement I prefer over absolute impact is relative impact, which is more intuitive. For example, considering AI safety, how is 1% measured empirically? Without a unit of measure, numbers don't reveal much. But an inequality does. I can tell you with certainty that Nanda has done more than me (so far). Or that p(flourishing) is greater than zero.

All that to say, in a world that seems so overwhelming, a good fix for nihilism can be found in relative measurement. In the grand scheme of things, individual impact is minuscule and thus often demoralizing to try and measure. However, if I do better than I did yesterday/last month/last year, and many others try as well, I can keep the motivation high to keep on.