Say a CoT answer is "person A was born in 1900. Tungsten is the 74th element. The Oscar movie of the year in 1974 was [movie]".
Am I correct in understanding that a successful n-hop answer would be just "[movie]"?
Would "The winner of movie of the year in 1974 was [movie]" fulfill the success criteria?
I'm also curious if the model latent values update to similar vectors for the CoT response vs filler tokens (I suppose also the n-hop response as well). is this something you explored?
Always glad to see any attempt to balance the bad vibes with hope. Happy New Year :)
Regarding the "OK" debate, I would put forth that perhaps a sentiment worth valuing is that, either way, we will continue to "be", which I think/hope many will agree is likely.
very true! Actually, the best fix for nihilism (in my experience) has been acceptance, followed by revolt, of whatever existential threat is causing it (i.e. absurdism). The 747 will always outrun me, so I will be content just running for the sake of it.
In the pursuit of AI safety, I think the cases of AGI apocalypse and AGI happening at all are equally unpredictable. I personally see them as feasible within our lifetimes, but with no smaller range of certainty than that. The uncertainty of that makes it feel strange to build a career around it, yet the existential dread does not go away. So, I choose to find things within the space that I enjoy learning about, working on, and applying myself to, and accept that it may very well be unfruitful in the end.
It's cliché to say that the journey matters more than the destination, as that is not always true, but I do think one can choose to find intrinsic value in the act of doing. I chose to start thinking this way, and its going pretty good so far :)
I actually view art as the opposite, as a vessel for social connection and culture, which is a behavioral aspect mostly unique and ever-important to humans. Of course, the constraint is that the art is shared externally, so perhaps the crisis is more a lack of sharing than the act of creation.
On the 1% vs 0.001% note, a framework of measurement I prefer over absolute impact is relative impact, which is more intuitive. For example, considering AI safety, how is 1% measured empirically? Without a unit of measure, numbers don't reveal much. But an inequality does. I can tell you with certainty that Nanda has done more than me (so far). Or that p(flourishing) is greater than zero.
All that to say, in a world that seems so overwhelming, a good fix for nihilism can be found in relative measurement. In the grand scheme of things, individual impact is minuscule and thus often demoralizing to try and measure. However, if I do better than I did yesterday/last month/last year, and many others try as well, I can keep the motivation high to keep on.
It seems that your goal is essentially to find compassion for those with a different value set than yours, and that the confounding element is that other value structures (e.g., truth vs. utility vs. tradition, etc.) often don't support each other. Is that on target?
It's worth recognizing that any set of guiding principles is essentially arbitrary if you inspect them deeply enough. What Schopenhauer calls apathy and hedonism, another might call "the human experience." While I value the ability to introspect and think abstractly, I take issue with Schopenhauer's disdain for 'dumb' entertainment: if my longing for higher understanding leaves me, and only me, miserable, is that really a moral victory? Depends on what your morals are. This is reflected in your writing that people "simply don't [intrinsically] value holding true beliefs." I would argue that this is because many truths are existentially painful, so much so that it requires much active cognitive effort to overcome our psychological disposition and place value on these truths.
In your writing, your own disdain for others makes you uncomfortable. If I were in your place, I would try to figure out why the uncomfortable feeling occurs, why the disdain occurs (beyond 'they don't think hard enough,' and into 'why do I value this over that'), and see if there's an internally consistent framework that squares the two.
I am trying to write an anecdote as an example, but am struggling to make it coherent. So, let me know if this resonates and I'll try a bit harder :^ )