Best of LessWrong 2022

A new paper proposes an unsupervised way to extract knowledge from language models. The authors argue this could be a key part of aligning superintelligent AIs, by letting us figure out what the AI "really believes" rather than what it thinks humans want to hear. But there are still some challenges to overcome before this could work on future superhuman AIs.

Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
Mark XuΩ112611
1
AI safety researchers might be allocated too heavily to Anthropic compared to Google Deepmind Some considerations: * Safety researchers should want Google Deepmind (GDM) to have a robust and flourishing safety department. It seems plausible that GDM will be able to create "the smartest" models: they have lots of talent, and own lots of computers. (see e.g. https://epochai.org/data/notable-ai-models#computing-capacity) * Anthropic (ANT) might run into trouble in the future due to not owning their own computers, e.g. if Amazon (or where ever they're renting their computers from) starts their own internal scaling competitor, and decides to stop renting out most of their compute. * ANT has a stronger safety culture, and so it is a more pleasant experience to work at ANT for the average safety researcher. This suggests that there might be a systematic bias towards ANT that pulls away from the "optimal allocation". * GDM only recently started a bay area based safety research team/lab (with members like Alex Turner). So if people had previously decided to work for ANT based on location, they now have the opportunity to work for GDM without relocating. * I've heard that many safety researchers join ANT without considering working for GDM, which seems like an error, although I don't have 1st hand evidence for this being true. * ANT vs GDM is probably a less important consideration than “scaling lab” (ANT, OAI, GMD, XAI, etc.) vs “non scaling lab” (USAISI, UKAISI, Redwood, ARC, Palisade, METR, MATS, etc. (so many...)). I would advise people to think hard about how joining a scaling lab might inhibit their future careers by e.g. creating a perception they are “corrupted” (in addition to strengthening them, which I expect people to spend more time thinking about by default). * Because ANT has a stronger safety culture, doing safety at GDM involve more politics and navigating around buerearcracy, and thus might be less productive. This consideration applies most if you
Nathan Young288
10
It is disappointing/confusing to me that of the two articles I recently wrote, the one that was much closer to reality got a lot less karma. * A new process for mapping discussions is a summary of months of work that I and my team did on mapping discourse around AI.  We built new tools, employed new methodologies. It got 19 karma * Advice for journalists is a piece that I wrote in about 5 hours after perhaps 5 hours of experiences. It has 73 karma and counting I think this is isn't much evidence, given it's just two pieces. But I do feel a pull towards coming up with theories rather than building and testing things in the real world. To the extent this pull is real, it seems bad. If true, I would recommend both that more people build things in the real world and talk about them and that we find ways to reward these posts more, regardless of how alive they feel to us at the time. (Aliveness being my hypothesis - many of us understand or have more live feelings about dealing with journalists than a sort of dry post about mapping discourse)
* Psychotic “delusions” are more about holding certain genres of idea with a socially inappropriate amount of intensity and obsession than holding a false idea. Lots of non-psychotic people hold false beliefs (eg religious people). And, interestingly, it is absolutely possible to hold a true belief in a psychotic way. * I have observed people during psychotic episodes get obsessed with the idea that social media was sending them personalized messages (quite true; targeted ads are real) or the idea that the nurses on the psych ward were lying to them (they were). * Preoccupation with the revelation of secret knowledge, with one’s own importance, with mistrust of others’ motives, and with influencing others' thoughts or being influenced by other's thoughts, are classic psychotic themes. * And it can be a symptom of schizophrenia when someone’s mind gets disproportionately drawn to those themes. This is called being “paranoid” or “grandiose.” * But sometimes (and I suspect more often with more intelligent/self-aware people) the literal content of their paranoid or grandiose beliefs is true! * sometimes the truth really has been hidden! * sometimes people really are lying to you or trying to manipulate you! * sometimes you really are, in some ways, important! sometimes influential people really are paying attention to you! * of course people influence each others' thoughts -- not through telepathy but through communication! * a false psychotic-flavored thought is "they put a chip in my brain that controls my thoughts." a true psychotic-flavored thought is "Hollywood moviemakers are trying to promote progressive values in the public by implanting messages in their movies." * These thoughts can come from the same emotional drive, they are drawn from dwelling on the same theme of "anxiety that one's own thoughts are externally influenced", they are in a deep sense mere arbitrary verbal representations of a single mental phenomenon...
TurnTroutΩ15283
0
Apply to the "Team Shard" mentorship program at MATS Research areas Apply here. Applications due by October 13th! 1. ^ Paper available soon.
I think this post was underrated; I look back at it frequently: AI labs can boost external safety research. (It got some downvotes but no comments — let me know if it's wrong/bad.) [Edit: it was at 15 karma.]

Popular Comments

Recent Discussion

Alignment Newsletter is a weekly publication with recent content relevant to AI alignment around the world. Find all Alignment Newsletter resources here. In particular, you can look through this spreadsheet of all summaries that have ever been in the newsletter.

Audio version here (may not be up yet).

Please note that while I work at DeepMind, this newsletter represents my personal views and not those of my employer.

HIGHLIGHTS

Explaining Neural Scaling Laws and A Neural Scaling Law from the Dimension of the Data Manifold (Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, and Utkarsh Sharma) (summarized by Rohin): We’ve seen lots of empirical work on scaling laws (AN #87), but can we understand theoretically why these arise? This paper suggests two different models for how power-law scaling laws could arise,...

Hi Rohin Shah, did you get my private message? Please just read the first few sentences. Thanks!

Thanks for posting this. I've been confused about the connection between shard theory and activation vectors for a long time!

AIXI is not a shard theoretic agent because it does not have two motivational circuits which can be activated independently of each other

This confuses me.

I can imagine an AIXI program where the utility function is compositional even if the optimisation is unitary. And I guess this isn't two full motivational circuits, but it kind of is tow motivational circuits.

6Eli Tyre
Huh. This was quite helpful / motivating for me. Something about the updatelessness of "if I had to decide when I was still beyond the veil of ignorance, I would obviously think it was worth working as hard as feasible in the tiny sliver of probability that I wake up on the cusp of the hinge of history, regardless of how doomed things seem."  It's a reminder that even if things seem super doomed, and it might feel like I have no leverage to fix things, I actually actually one of the tiny number of beings that has the most leverage, when I zoom out across time. Thanks for writing this.

I think that most people underestimate how many scientific mysteries remain, even on questions that sound basic.

My favourite candidate for "the most basic thing that is still unknown" is the momentum carried by light, when it is in a medium (for example, a flash of light in glass or water). 

If a block of glass has a refractive index of , then the light inside that block travels  times slower than the light would in vacuum. But what is the momentum of that light wave in the glass relative to the momentum it would have in vacuum?"

In 1908 Abraham proposed that the light's momentum would be reduced by a factor of . This makes sense on the surface,  times slower means  times less momentum. This gives a single photon a momentum of . For  the...

Very nice post, thanks for writing it.

Your options are numbered when you refer to them in the text, but are listed as bullet points originally. Probably they should also be numbered there!

Now we can get down to the actual physics discussion. I have a bag of fairly unrelated statements to make.

  • The "center of mass moves at constant velocity" thing is actually just as solid as, say, conservation of angular momentum. It's just less famous. Both are consequences of Noether's theorem, angular momentum conservation arising from symmetry under rotations and the

... (read more)
1Eccentricity
I have such a strong intuitive opposition to the Internal Reaction Drive that I agree with your conclusion that we should update away from any theory which allows it.  Then again, perhaps it is impossible to build such a drive for the merely practical reason that any material with a positive or negative index of refraction will absorb enough light to turn the drive into an expensive radiator. Especially given the recent Nobel prize announcement, I think the most concerning piece of information is that there are cultural forces from within the physics community discouraging people from trying to answer the question at all.
1RussellThor
Another analogy is with a ball rolling on two surfaces crossing the boundary. The first very little friction, then second a bit more.  From AI: This is similar to a light ray entering water. So is the physics the same? (on second reading, its not so clear, if you put a golf ball from a smooth surface to a rough one, what happens to the angle at the boundary?) Well in this case, the momentum of the ball clearly won't increase, instead it will be constantly losing momentum and if the second surface was floating it would be pushed so as to conserve momentum. Unlike for light however if it then re-enters the smooth surface it will be going slower. It seems the ball would lose momentum at both transition boundary. (however if the rough surface was perfectly floating, then perhaps it would regain it) Anyway for a rough surface that is perfectly floating, it seems the ball gives some momentum to the rough surface when it enters it, (making it have velocity) then recovers it and returns the rough surface to zero velocity when it exits it. In that case the momentum of the ball decreases while travelling over the rough surface. Not trying to give answers here, just add to the confusion lol.

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.

gilch50

What we'd ask depends on the context. In general, not all rationalist teachings are in the form of a question, but many could probably be phrased that way.

"Do I desire to believe X if X is the case and not-X if X is not the case?" (For whatever X in question.) This is the fundamental lesson of epistemic rationality. If you don't want to lie to yourself, the rest will help you get better at that. But if you do, you'll lie to yourself anyway and all your acquired cleverness will be used to defeat itself.

"Am I winning?" This is the fundamental lesson of instr... (read more)

I’ve held the prediction for some time that we are likely to see a fall from grace in Sam Altman, akin to Sam Bankman-Fried’s.

This is a difficult intuition to pin down, so I thought I should write it out. I believe it’s also pertinent to the Rationalist community: it is a warning to head a lesson from SBF that I believe has not yet been taken to heart.

It’s one of many riffs on ‘the road to hell is paved with good intentions’.

Recently, Joe Carlsmith captured what 'deep atheism' is in relation to AI risk. In particular, he unpacks Yudkowsky’s particular flavour of deep atheism -- how and why it came to be. I think it paints an extraordinarily clear picture of the psychology of individuals like Yudkowsky,...

This post focuses on philosophical objections to Bayesianism as an epistemology. I first explain Bayesianism and some standard objections to it, then lay out my two main objections (inspired by ideas in philosophy of science). A follow-up post will speculate about how to formalize an alternative.

Degrees of belief

The core idea of Bayesian epistemology: we should ideally reason by assigning credences to propositions which represent our degrees of belief that those propositions are true. (Note that this is different from Bayesianism as a set of statistical techniques, or Bayesianism as an approach to machine learning, which I don’t discuss here.)

If that seems like a sufficient characterization to you, you can go ahead and skip to the next section, where I explain my objections to it. But for those...

4localdeity
Are you not familiar with the term "vacuously true"?  I find this very surprising.  People who study math tend to make jokes with it. The idea is that, if we were to render a statement like "Colorless green ideas sleep furiously" into formal logic, we'd probably take it to mean the universal statement "For all X such that X is a colorless green idea, X sleeps furiously".  A universal statement is logically equivalent to "There don't exist any counterexamples", i.e. "There does not exist X such that X is a colorless green idea and X does not sleep furiously".  Which is clearly true, and therefore the universal is equally true. There is, of course, some ambiguity when rendering English into formal logic.  It's not rare for English speakers to say "if" when they mean "if and only if", or "or" when they mean "exclusive or".  (And sometimes "Tell me which one", as in "Did you do A, or B?" "Yes." "Goddammit.")  Often this doesn't cause problems, but sometimes it does.  (In which case, as I've said, the solution is not to give their statement an ambiguous truth value, but rather to ask them to restate it less ambiguously.) "Dragons are attacking Paris" seems most naturally interpreted as the definite statement "There's some unspecified number—but since I used the plural, it's at least 2—of dragons that are attacking Paris", which would be false.  One could also imagine interpreting it as a universal statement "All dragons are currently attacking Paris", which, as you say, would be vacuously true since there are no dragons.  However, in English, the preferred way to say that would be "Dragons attack Paris", as CBiddulph says.  "Dragons are attacking Paris" uses the present progressive tense, while "Dragons attack Paris" uses what is called the "simple present"/"present indefinite" tense.  Wiki says: English grammar rules aren't necessarily universal and unchanging, but they do give at least medium-strength priors on how to interpret a sentence.
2cubefox
I don't think so. "Smoking causes cancer" doesn't express a universal (or existential) quantification either. Or "Canadians are polite", "Men are taller than women" etc.

Grammatically, the most obvious interpretation is a universal quantification (i.e. "All men are taller than all women"), which I think is a major reason why such statements so often lead to objections of "But here's an exception!"  Maybe you can tell the audience that they should figure out when to mentally insert "... on average" or "tend to be".  Though there are also circumstances where one might validly believe that the speaker really means all.  I think it's best to put such qualified language into your statements from the start.

2CBiddulph
Your example wouldn't be true, but "Dragons attack Paris" would be, interpreted as a statement about actual dragons' habits
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Damon Sasi (@DaystarEld) is a mentor of mine, and he is also a therapist who claims to be very psychologically healthy. What’s his inner experience like?

  • “Probably for an average week, maybe I'll suffer for like five minutes
  • “I don't think I have any prolonged internal conflict
  • Self-loathing to me was a weird concept […] Like, what does it mean to like self-loathe?”

It’s not that he had a great childhood, either.

[My childhood] wasn't terrible. And I say this and then maybe I'll give some examples and people will be like, maybe it was terrible. […] my parents divorced when I was pretty young. Both my parents, I think, were very absent. […]

He would get drunk often. He once hit me in the back with a two by four. My older

...

I appreciate the acknowledgement against psychoanalyzing people in public, and I agree that trying to cargoculting any of this is unlikely to go well, but I'd be curious to know what specific things you think can also fall under "being very unwell?" I just reread the excerpts Chris highlighted and the only thing I can think of is the "letting go of anger" thing, which is only a sign of unwellness, imo, if it leads to being exploited/abused/etc.

I interact with journalists quite a lot and I have specific preferences. Not just for articles, but for behaviour. And journalists do behave pretty strangely at times. 

This account comes from talking to journalists on ~10 occasions. Including being quoted in ~5 articles. 

Privacy

I do not trust journalists to abide by norms of privacy. If I talk to a friend and without asking, share what they said, with their name attached, I expect they'd be upset. But journalists regularly act as if their profession sets up the opposite norm - that everything is publishable, unless explicitly agreed otherwise. This is bizarre to me. It's like they have taken a public oath to be untrustworthy.

Perhaps they would argue that it’s a few bad journalists who behave like this, but how...

This article fails to account for the fact that abiding by the rules suggested would mostly kill the ability of journalists to share the most valuable information they share with the public.

You don't get to reveal stuff from the world most powerful organizations if you double check the quotes with them.

I think journalism is one of the professions where the consequentialist vs deontological ethics have the toughest trade-offs. It's just really hard to abide by very high privacy standards and broke highly important news.

As one illustrative example, your standard would have prevented Kelsey Piper from sharing her conversation with SBF. Is that a desirable outcome? Not sure.

1StartAtTheEnd
If the journalist acts in good faith, I think you will be alright. If not, there's nothing you can do, whatsoever. Coming up with reasons is almost too easy: 1: The journalist can write an article about you even if you've never talked to them 2: A journalist can start out trustworthy and then change for the worse (most untrustworthy authorities today grew powerful by being trustworthy. Now that they've created their public image of impartiality and fairness, they can burn it for years. Examples include Google and Wikipedia) 3: If you record me saying "I wouldn't say I'm very interested in cars", you just cut out the first part of the video, and now you have me saying "I'm very interested in cars". If I quote another person, "X said that Y people are bad", you could cut out the part of me saying "Y people are bad". The deeper and more complex a subject you can get me to talk about, the easier it would be to take me out of context. Making Jordan Peterson look bad is trivial for instance. 4: Even if you have evidence that your words were twisted, you'll lose if your evidence can't reach other people. So if your values don't align with the average journalist, or if your reputation is bad, you might find yourself relying on getting the word out by having a social media post go viral or something. Personally, if I see a journalist or website treating anyone unfairly, I make a mental check that they're inherently untrustworthy. I'd contact such people only if they had a stake in releasing my story (so that our goals align). As you may imagine, my standards result in me not bothering with about 90% of society. I rarely attempt to solve problems like this, because I have solved them in the past and realized that the solution is actually unwanted (that many things are flawed on purpose, and not because they lack intelligent people to help them fix them) Things would be better if society as a whole valued truthfulness, and if winning directly (rather than with underhanded t
1MichaelDickens
If your goal is to influence journalists to write better headlines, then it matters whether the journalist has the ability to take responsibility over headlines. If your goal is to stop journalists from misrepresenting you, then it doesn't actually matter whether the journalist has the ability to take responsibility, all that matters is whether they do take responsibility.
2ChristianKl
If the journalist accurately represents my position in the text of the article I would already see that as a win in most of the media interviews (I have given a bunch but it was a decade ago).