I edited the post to reflect this! (pun intended)
Went to the kitchen and tried to fill a bowl with water I think you are right, I underestimated how easy it is to get to see a reflection in water. I believe it is unlikely for someone to spend a lifetime without seeing their face (blind person apart), maybe still in arid desert area, or people living in the arctic?
Here is a choice: you could buy an alarm clock (I personally like this one ) and make your bedroom phone-free.
The Balkan house analogy has been almost literally applied to the architecture of the seat of the European Parliament in Strasbourg. It is an unfinished amphitheater symbolizing the ever going construction of the Union.
Nope, I didn't know PaCMAP! Thanks for the pointer, I'll have a look.
In section 5, I explain how CoEm is an agenda with relaxed constraints. It does try to reduce the alignment tax to make the safety solution competitive for lab to use. Instead it considers there's enough advance in international governance that you have full control over how your AI get built and that there's enforcement mechanism to ensure no competitive but unsafe AI can be built somewhere else.
That's what the bifurcation of narrative is about: not letting lab implement only solution that have low alignment tax because this could just not be enough.
My steelman of Conjecture's position here would be:
My opinion is:
I really appreciate the naturalistic experimentation approach – the fact that it tries to poke at the unknown unknowns, discovering new capabilities or failure modes of Large Language Models (LLMs).
I'm particularly excited by the idea of developing a framework to understand hidden variables and create a phenomenological model of LLM behavior. This seems like a promising way to "carve LLM abilities at their joint," moving closer to enumeration rather than the current approach of 1) coming up with an idea, 2) asking, "Can the LLM do this?" and 3) testing it. We lack access to a comprehensive list of what LLMs can do inherently. I'm very interested in anything that moves us closer to this, where human creativity is no longer the bottleneck in understanding LLMs. A constrained psychological framework could be helpful in uncovering non-obvious areas to explore. It also offers a way to evaluate the frameworks we build: do they merely describe known data, or do they suggest experiments and point toward phenomena we wouldn't have discovered on our own?
However, I believe there are unique challenges in LLM psychology that make it more complex:
I really like the concept of species-specific experiments. However, you should be careful not to project too much of your prior models into these experiments. The ideas of latent patterns and shadows could already make implicit assumptions and constrain what we might imagine as experiments. I think this field requires epistemology on steroids because i) experiments are cheap, so most of our time is spent digesting data, which makes it easy to go off track and continually study our pet theories, and ii) our human priors are probably flawed to understand LLMs.
What I really like about ancient language is that there's no online community the model could exploit. Even low-ressource modern languages have online forums an AI could use as an entry point.
But this consideration might be eclipsed by the fact that a rogue AI would have access to a translator before trying online manipulation, or by another scenario I'm not considering.
Agree with the lack of direct access to CoT being one of the major drawback. Though we could have a slightly smarter reporter that could also answer questions about CoT interpretation.
Yup indeed! See the other comment thread below