## LESSWRONGLW

junk heap homotopy

Lurker since 2008. Second-generation diasporan. Rationalist out of the woodwork.

# Wiki Contributions

Set theory is the prototypical example I usually hear about. From Wikipedia:

Mathematical topics typically emerge and evolve through interactions among many researchers. Set theory, however, was founded by a single paper in 1874 by Georg Cantor: "On a Property of the Collection of All Real Algebraic Numbers".

It'd be cool if a second group also worked towards "rationality skill assessment."

This was my project at last year's Epistea, but I sort of had to pause it to work full-time on my interp upskilling experiment.

I only got as far as implementing ~85% of an app to facilitate this (as described here), but maybe a quick chat about this would still be valuable?

I wouldn’t say that. Signalling the way you seem to have used it implies deception on their part, but each of these instances could just be a skill issue on their end, an inability to construct the right causal graph with sufficient resolution.

For what it’s worth whatever this pattern is pointing at also applies to how wrongly most of us got the AI box problem, i.e., that some humans by default would just let the damn thing out without needing to be persuaded.

There seems to two major counter-claims to your project:

• Feedback loops can't work for nebulous domains, so this whole thing is misguided.
• Transfer learning is impossible and you can't get better at rationality by grinding LeetCode equivalents.

(There's also the third major counter-claim that this can't work for alignment research, but I assume that's actually irrelevant since your main point seems to be about rationality training.)

My take is that these two claims stem from inappropriately applying an outcome-oriented mindset to a process-oriented problem. That is, the model seems to be: "we wanted to learn X and applied Feedback Loops™️ but it didn't work, so there!" instead of "feedback-loopiness seems like an important property of a learning approach we can explicitly optimise for".

In fact, we can probably factor out several senses of 'feedback loops' (henceforth just floops) that seem to be leading a lot of people to talk past each other in this thread:

• Floops as reality pushing back against movement, e.g. the result of swinging a bat, the change in an animation when you change a slider in an Explorable Explanation
• Floops where the feedback is quick but nebulous (e.g. persuasion, flirting)
• Floops where the feedback is clear but slow (e.g. stock market)
• Floops as reinforcement, i.e. the cycle Goal → Attempt → Result
• Floops as OODA loops (less legible, more improvisational than previous item)
• Floops where you don't necessary choose the Goal (e.g. competitive multiplayer games, dealing with the death of a loved one)
• Floops which are not actually loops, but a single Goal → Attempt → Result run (e.g., getting into your target uni)
• Floops which are about getting your environment to support you doing one thing over and over again (e.g. writing habits, deliberate practice)
• Floops which are cumulative (e.g. math)
• Floops where it's impossible to get sufficiently fine-grained feedback without the right paradigm (e.g. chemistry before Robert Boyle)
• Floops where you don't necessarily know the Goal going in (e.g. doing 1-on-1s at EA Global)
• Floops where failure is ruinous and knocks you out of the game (e.g. high-rise parkour)
• Anti-floops where the absence of an action is the thing that moves you towards the Goal
• Floops that are too complex to update on a single result (e.g. planning, designing a system)

When someone says "you can't possibly apply floops to research", I imagine they're coming from a place where they interpret goal-orientedness as an inherent requirement of floopiness. There are many bounded, close-ended things that one can use floops for that can clearly help the research process: stuff like grinding the prerequisites and becoming fluent with certain techniques (cf. Feynman's toolbox approach to physics), writing papers quickly, developing one's nose (e.g. by trying to forecast the number of citations of a new paper), etc.

This claim is independent of whether or not the person utilising floops is good enough to get better quickly. I think it's not controversial to claim that you can never get a person with profound mental disabilities and who is not a savant at technical subjects to discover a new result in quantum field theory, but this is also irrelevant when talking about people who are baseline capable enough to worry about this things on LessWrong dot com in the first place.

On the other end of the spectrum, the reductio version of being against floops: that everyone was literally born with all the capabilities they would ever need in life and Learning Is Actually a Myth, seems blatantly false too. Optimising for floopiness seems to me merely trying to find a happy medium in between.

On an unrelated note, I wrote about how to package and scalably transfer floops a while back: https://www.lesswrong.com/posts/3CsynkTxNEdHDexTT/how-i-learned-to-stop-worrying-and-love-skill-trees

All modern games have two floops built-in: a core game loop that gets completed in under a minute, and a larger game loop that makes you come back for more. Or in the context of my project, Blackbelt:

IMAGE

The idea is you can design bespoke tests-of-skill to serve as your core game loop (e.g., a text box with a word counter underneath, the outputs of your Peloton bike, literally just checking a box like with a TODO list) and have the deliberately status-oriented admission to a private channel be the larger, overarching hook. I think this approach generalises well to things that are not just alignment, because floops can be found in both calculating determinants and doing on-the-fly Fermi calculations for setting your base rates, and who wouldn't want to be in the company of people who obsess endlessly about numbers between 0 and 1?

There's this tension between what I know from the literature (i.e. transfer learning is basically impossible) and my lived experience that I and a handful of the people I know in real life whom I have examined in depth are able to quickly apply e.g. thermodynamics concepts to designing software systems, or how consuming political fiction has increased my capacity to model equilibrium strategies in social situations. Hell, this entire website was built on the back of HPMoR, which is an explicit attempt to teach rationality by reading about it.

The point other people have made about alignment research being highly nebulous is important but irrelevant. You simply cannot advance the frontiers of a field without mastery of some technique or skill (or a combination thereof) that puts you in a spot where you can do things that were impossible before, like how Rosalind Franklin needed some mastery of x-ray crystallography to be able to image the DNA.

Research also seems to be another skill that's trainable or at least has trainable parts. If for example the bottleneck is sheer research output, I can imagine a game where you just output as many shitty papers as possible in a bounded period of time would let people write more papers ceteris paribus afterwards. Or even at the level of paragraphs even: one could play a game of "Here's 10 random papers outside your field with the titles, authors, and publication year removed. Guess how many citations they got." to develop one's nose for what makes a paper impactful, or "Write the abstract of this paper." to get better at distillation.

The part in That Alien Message and the Beisutsukai shorts that are about independently regenerating known science from scratch. Or zetetic explanations, whichever feels more representative of the idea cluster.

In particular, how does one go about making observations that are useful for building models? How does one select the initial axioms in the first place?

I don't know. It seems to me that we have to make the graph of progress in alignment vs capabilities meet somewhere and part of that would probably involve really thinking about which parts of which bottlenecks are really blockers vs just epiphenomena that tag along but can be optimised away. For instance, in your statement:

If research would be bad for other people to know about, you should mainly just not do it

Then maybe doing research but not having the wrong people know about it is the right intervention, rather than just straight-up not doing it at all?

Yup, that's definitely on the roadmap. Sometimes you need facts to advance a particular skill (e.g., if you're figuring out scaling laws, you have got to know a bunch of important numbers) or even just to use the highly specialised language of your field, and there's no better way to do that than to use SRS.

We're probably going to offer Anki import some time in the future just so we can take advantage of the massive amount of material other people have already made, but also I can't promise one-to-one parity since I'd like to aim more towards gwern's programmable flashcards which directly inspired this whole project in the first place.

Have you scaled up the moderation accordingly? I have noticed fewer comments that are on the level of what gets posted on r/slatestarcodex these days but I'm not sure if it's just a selection effect.