A new paper proposes an unsupervised way to extract knowledge from language models. The authors argue this could be a key part of aligning superintelligent AIs, by letting us figure out what the AI "really believes" rather than what it thinks humans want to hear. But there are still some challenges to overcome before this could work on future superhuman AIs.
Alignment Newsletter is a weekly publication with recent content relevant to AI alignment around the world. Find all Alignment Newsletter resources here. In particular, you can look through this spreadsheet of all summaries that have ever been in the newsletter.
Audio version here (may not be up yet).
Please note that while I work at DeepMind, this newsletter represents my personal views and not those of my employer.
Explaining Neural Scaling Laws and A Neural Scaling Law from the Dimension of the Data Manifold (Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, and Utkarsh Sharma) (summarized by Rohin): We’ve seen lots of empirical work on scaling laws (AN #87), but can we understand theoretically why these arise? This paper suggests two different models for how power-law scaling laws could arise,...
Hi Rohin Shah, did you get my private message? Please just read the first few sentences. Thanks!
Thanks for posting this. I've been confused about the connection between shard theory and activation vectors for a long time!
AIXI is not a shard theoretic agent because it does not have two motivational circuits which can be activated independently of each other
This confuses me.
I can imagine an AIXI program where the utility function is compositional even if the optimisation is unitary. And I guess this isn't two full motivational circuits, but it kind of is tow motivational circuits.
I think that most people underestimate how many scientific mysteries remain, even on questions that sound basic.
My favourite candidate for "the most basic thing that is still unknown" is the momentum carried by light, when it is in a medium (for example, a flash of light in glass or water).
If a block of glass has a refractive index of , then the light inside that block travels times slower than the light would in vacuum. But what is the momentum of that light wave in the glass relative to the momentum it would have in vacuum?"
In 1908 Abraham proposed that the light's momentum would be reduced by a factor of . This makes sense on the surface, times slower means times less momentum. This gives a single photon a momentum of . For the...
Very nice post, thanks for writing it.
Your options are numbered when you refer to them in the text, but are listed as bullet points originally. Probably they should also be numbered there!
Now we can get down to the actual physics discussion. I have a bag of fairly unrelated statements to make.
The "center of mass moves at constant velocity" thing is actually just as solid as, say, conservation of angular momentum. It's just less famous. Both are consequences of Noether's theorem, angular momentum conservation arising from symmetry under rotations and the
If it’s worth saying, but not worth its own post, here's a place to put it.
If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.
If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.
If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.
The Open Thread tag is here. The Open Thread sequence is here.
What we'd ask depends on the context. In general, not all rationalist teachings are in the form of a question, but many could probably be phrased that way.
"Do I desire to believe X if X is the case and not-X if X is not the case?" (For whatever X in question.) This is the fundamental lesson of epistemic rationality. If you don't want to lie to yourself, the rest will help you get better at that. But if you do, you'll lie to yourself anyway and all your acquired cleverness will be used to defeat itself.
"Am I winning?" This is the fundamental lesson of instr...
I’ve held the prediction for some time that we are likely to see a fall from grace in Sam Altman, akin to Sam Bankman-Fried’s.
This is a difficult intuition to pin down, so I thought I should write it out. I believe it’s also pertinent to the Rationalist community: it is a warning to head a lesson from SBF that I believe has not yet been taken to heart.
It’s one of many riffs on ‘the road to hell is paved with good intentions’.
Recently, Joe Carlsmith captured what 'deep atheism' is in relation to AI risk. In particular, he unpacks Yudkowsky’s particular flavour of deep atheism -- how and why it came to be. I think it paints an extraordinarily clear picture of the psychology of individuals like Yudkowsky,...
This post focuses on philosophical objections to Bayesianism as an epistemology. I first explain Bayesianism and some standard objections to it, then lay out my two main objections (inspired by ideas in philosophy of science). A follow-up post will speculate about how to formalize an alternative.
The core idea of Bayesian epistemology: we should ideally reason by assigning credences to propositions which represent our degrees of belief that those propositions are true. (Note that this is different from Bayesianism as a set of statistical techniques, or Bayesianism as an approach to machine learning, which I don’t discuss here.)
If that seems like a sufficient characterization to you, you can go ahead and skip to the next section, where I explain my objections to it. But for those...
Grammatically, the most obvious interpretation is a universal quantification (i.e. "All men are taller than all women"), which I think is a major reason why such statements so often lead to objections of "But here's an exception!" Maybe you can tell the audience that they should figure out when to mentally insert "... on average" or "tend to be". Though there are also circumstances where one might validly believe that the speaker really means all. I think it's best to put such qualified language into your statements from the start.
Damon Sasi (@DaystarEld) is a mentor of mine, and he is also a therapist who claims to be very psychologically healthy. What’s his inner experience like?
...[My childhood] wasn't terrible. And I say this and then maybe I'll give some examples and people will be like, maybe it was terrible. […] my parents divorced when I was pretty young. Both my parents, I think, were very absent. […]
He would get drunk often. He once hit me in the back with a two by four. My older
I appreciate the acknowledgement against psychoanalyzing people in public, and I agree that trying to cargoculting any of this is unlikely to go well, but I'd be curious to know what specific things you think can also fall under "being very unwell?" I just reread the excerpts Chris highlighted and the only thing I can think of is the "letting go of anger" thing, which is only a sign of unwellness, imo, if it leads to being exploited/abused/etc.
I interact with journalists quite a lot and I have specific preferences. Not just for articles, but for behaviour. And journalists do behave pretty strangely at times.
This account comes from talking to journalists on ~10 occasions. Including being quoted in ~5 articles.
I do not trust journalists to abide by norms of privacy. If I talk to a friend and without asking, share what they said, with their name attached, I expect they'd be upset. But journalists regularly act as if their profession sets up the opposite norm - that everything is publishable, unless explicitly agreed otherwise. This is bizarre to me. It's like they have taken a public oath to be untrustworthy.
Perhaps they would argue that it’s a few bad journalists who behave like this, but how...
This article fails to account for the fact that abiding by the rules suggested would mostly kill the ability of journalists to share the most valuable information they share with the public.
You don't get to reveal stuff from the world most powerful organizations if you double check the quotes with them.
I think journalism is one of the professions where the consequentialist vs deontological ethics have the toughest trade-offs. It's just really hard to abide by very high privacy standards and broke highly important news.
As one illustrative example, your standard would have prevented Kelsey Piper from sharing her conversation with SBF. Is that a desirable outcome? Not sure.