A collection of examples of AI systems "gaming" their specifications - finding ways to achieve their stated objectives that don't actually solve the intended problem. These illustrate the challenge of properly specifying goals for AI systems.
It seems more accurate to say that AI progress is linear rather than exponential, as a result of being logarithmic in resources that are in turn exponentially increasing with time. (This is not quantitative, any more than the "exponential progress" I'm disagreeing with[1].)
Logarithmic return on resources means strongly diminishing returns, but that's not actual plateauing, and the linear progress in time is only slowing down according to how the exponential growth of resources is slowing down. Moore's law in the price-performance form held for a really lon...
(Crossposted from my Substack: https://taylorgordonlunt.substack.com/p/my-ai-predictions-for-2027)
I think a lot of blogging is reactive. You read other people's blogs and you're like, no, that's totally wrong. A part of what we want to do with this scenario is say something concrete and detailed enough that people will say no, that's totally wrong, and write their own thing.
--- Scott Alexander
I recently read the AI 2027 predictions[1] . I think they're way off. I was visualizing my self at Christmastime 2027, sipping eggnog and gloating about how right I was, but then I realized it doesn't count if I don't register my prediction publicly, so here it is.
This blog post is mostly about me trying to register my predictions than trying to convince anyone, but I've also included my justifications below,...
The function of the feedforward components in transformers is mostly to store knowledge and to enrich the token vectors with that knowledge. The wider you make the ff-network the more knowledge you can store. The network is trained to put the relevant knowledge from the wide hidden layer into the output (i.e. into the token stream).
I fail to see the problem in the fact that the hidden activation is not accessible to future tokens. The ff-nn is just a component to store and inject knowledge. It is wide because it has to store a lot of knowledge, not b...
I'm wondering. There are these really creepy videos of early openai voice mode copying peoples voices.
https://www.youtube.com/shorts/RbCoIa7eXQE
I wonder if they're a result of openai failing to do this loss-masking with their voice models, and then messing up turn-tokenization somehow.
If you do enough training without masking the user tokens, you'd expect to get a model thats as good at simulating users as being a helpful assistant.
We are having another rationalist Shabbat event at Rainbow Star House this Friday, as we do most weeks. Email or DM me for the address if you haven’t been before.
We could use 2-3 people to help with main/side dishes this week. We appreciate all your help in making these events sustainable for us! Thanks in advance this week to Kayla, who has offered to bring pawpaws to share.
What is this event?
At rationalist Shabbat each week, we light candles, sing Landsailor, eat together, and discuss topics of interest and relevance to the rationalist crowd. If you have suggestions for topics, would like to help contribute food, or otherwise assist with organizing, let us know.
This is a kid-friendly event -- we have young kids, so we have space and toys for them to play and hang out while the adults are chatting.
Allergen notice: we have two cats.
Doors open at 6pm, ritual and food a bit after.
While I'm intrigued by the idea of acausal trading, I confess that so far I fail to see how they make sense in practice. Here I share my (unpolished) musings, in the hopes that someone can point me to a stronger (mathematically rigorous?) defense of the idea. Specifically, I've heard the claim that AI Safety should consider acausal trades over a Tegmarkian multiverse, and I want to know if there is any validity to this.
Basically, I in Universe A want to trade with some agent that I imagine to live in some other Universe B, who similarly imagines me. Suppose I really like the idea of filling the multiverse with triangles. Then maybe I can do something in A that this agent likes; in return, it goes on...
I want to know if there is any validity to this.
Not as far as I've ever been able to discern.
There's also problem 3 (or maybe it's problem 0): the whole thing assumes that you accept that these other universes exist in any way that would make it desirable to trade with them to begin with. Tegmarkianism isn't a given, and satisfying the preferences of something nonexistent, for the "reward" of it creating a nonexistent situation where your own preferences are satisfied, is, um, nonstandard. Even doing something like that with things bidirectionally outsi...
Epistemic status: Philosophical argument. I'm critiquing Hinton's maternal instinct metaphor and proposing relationship-building as a better framework for thinking about alignment. This is about shifting conceptual foundations, not technical implementations.
--
Geoffery Hinton recently argued that since AI will become more intelligent than humans, traditional dominance-submission models won't work for alignment. Instead, he suggests we might try building "maternal instincts" into AI systems, so they develop genuine compassion and care for humans. He offers the mother-baby relationship as the only example we have of a more intelligent being "controlled" by a less intelligent one.
I don't buy this - for starters, it is not clear that mothers are always more intelligent than their babies, and it is also not clear that it is always the babies that control their mothers....
I agree. That was my reaction to Hinton's comment as well - that it's good to think in terms of relationship rather than control, but that the "maternal instinct" framing was off.
At the risk of getting too speculative, this has implications for AI welfare as well. I don't believe that current LLMs have feelings, but if we build AGI it might. And rather than thinking about how to make such an entity a controllable servant, we should start planning how to have a mutually beneficial relationship with it.
It's been roughly 7 years since the LessWrong user-base voted on whether it's time to close down shop and become an archive, or to move towards the LessWrong 2.0 platform, with me as head-admin. For roughly equally long have I spent around one hundred hours almost every year trying to get Said Achmiz to understand and learn how to become a good LessWrong commenter by my lights.[1] Today I am declaring defeat on that goal and am giving him a 3 year ban.
What follows is an explanation of the models of moderation that convinced me this is a good idea, the history of past moderation actions we've taken for Said, and some amount of case law that I derive from these two. If you just want to know...
Please just ask us if you want publicly available but annoying to get information about LW posts! (for example, if you want a past revision of a post that was public at some point)
I've answered requests like that many times over the years and will continue to do that (of course barring some exceptional circumstances like doxxing or people accidentally leaking actually sensitive private data)