I operate by Crocker's rules. All LLM output is explicitely designated as such. I have made no self-hiding agreements.
It's too bad the BALROG benchmark isn't being updated with the newest models. Nethack is both really hard, gives a floating point score, and is text-based, so if a model is vision-impaired (like the Claudes) there's less contamination through "the model just can't see where it is".
Will reveal 2030-01-01.
Hashsum used: SHA-256
303de030331f8e546d015ee69ab9fa91e6339b0560c51ab978f1ef6d8b6906bc
8b21114d4e46bf6871a1e4e9812c53e81a946f04b650e94615d6132855e247e8
To be revealed: 2024-12-31
Revealed content (@Zach Stein-Perlman didn't shame me into revealing this):
Manifold dating will fail:
Most men want to date cis women, there are too few cis women on
Manifold. Additionally, the people participating are low-status nerds,
and the scheme is not 100x better than swipe-match making (which is the
factor I'd put on how much better it needs to be to sway good-looking
cis-women to participate.
Also, having 𝒪(n²) markets for matchmaking requires too many
participants, and these markets can't get deep & liquid, traders get
annoyed by repetitive questions fast. Sure, you can just bet among your
friend circle, but then the value-add is small: one-gender-heavy friend
circles can't solve the scarcity problem that swipe-online dating attempts
to solve. So you either get way too many markets that nobody bets on,
or sparse markets that are limited by social connections.
I fortunately know of TAPs :-) (I don't feel much apocalypse panic so I don't need this post.)
I guess I was hoping there'd be some more teaching from up high about this agent foundations problem that's been bugging me for so long, but I guess I'll have to think for myself. Fine.
I got Claude to read this text and explain the proposed solution to me [[1]] , which doesn't actually sound like a clean technical solution to issues regarding self-prediction, did Claude misexplain or is this an idiosyncratic mental technique & not a technical solution to that agent foundations problem?
C.f. Steam (Abram Demski, 2022), Proper scoring rules don’t guarantee predicting fixed points (Caspar Oesterheld/Johannes Treutlein/Rubi J. Hudson, 2022) and the follow-up paper, Fixed-Point Solutions to the Regress Problem in Normative Uncertainty (Philip Trammell, 2018), active inference which simply bundles the prediction and utility goal together in one (I find this ugly (I didn't read these two comments before writing this one, so the distaste for active inference was developed independently)).
I guess this was also talked about in Embedded Agency (Abram Demski/Scott Garrabrant, 2020) under the terms "action counterfactuals", "observation counterfactuals"?
Your brain has a system that generates things that feel like predictions but actually function as action plans/motor output. These pseudo-predictions are a muddled type in the brain's type system.
You can directly edit them without lying to yourself because they're not epistemic beliefs — they're controllers. Looking at the place in your mind where your action plan is stored and loading a new image there feels like predicting/expecting, but treating it as a plan you're altering (not a belief you're adopting) lets you bypass the self-prediction problem entirely.
So: "I will stay sane" isn't an epistemic prediction that would create a self-fulfilling prophecy loop or violate the belief-action firewall. It's writing a different script into the pseudo-model that connects to motor output — recognizing that the thing-that-feels-like-a-prediction is actually the controller, and you get to edit controllers.
I didn't want to read a bunch of unrelated text from Yudkowsky about a problem I don't really have. ↩︎
I immensely enjoyed the detailed discussion this post generated. Tons of knowledgeable people hashing out object-level beliefs in the replies.
I'm a big fan of this post [[1]] .
For a long time the simplicity inductive bias/prior ideas were insufficiently motivated in my eyes; neural networks to me appeared like they would have some more specific behavior than that, but: What kinds of programs do neural networks tend to learn, and which algorithms do they tend to struggle learning?
People had been telling me that neural networks have a simplicity bias, but when I pressed "which UTM on though" they'd scutter away with excuses like "oh don't worry they only differ by a constant factor" or "it's about volume in parameter-space".
But this post, as far as I was able to understand it, gave me a neat handle: The training process for deep neural networks tends to select programs which are robust to errors in their inputs. That's a very neat handle, and I've been using it a bit when thinking about whether schemers would be selected for during the training process.
This view also connects to other research, especially on superposition and especially especially on computation in superposition [[2]] . My current understanding is this way: Training of large over-parameterized neural networks selects for running a ton of error-correcting short programs in superposition (the programs are shorter the shallower the network is, and more numerous the wider it is), because training selects for having many programs in superposition, but they have to constantly "fix" the data they are operating on due to interference. Towards the end of the end of a forward-pass the computations of these programs are assembled into a final output [[3]] .
I'll be giving this post a +4, and hope people find more of these cool concept handles to compress what is going on in big neural networks.
35-40%
ʜᴇʟᴘ[Right-to-left mark]ꜰ[Left-to-right mark]ᴜʟ, ɦɑr[Word joiner]𝚖𝓁ₑss, h[Zero-width space][Zero-width space]o[Zero-width space]n[Cyrillic е]st dæmons.
Relevant Manifold market.