Best of LessWrong 2022

A new paper proposes an unsupervised way to extract knowledge from language models. The authors argue this could be a key part of aligning superintelligent AIs, by letting us figure out what the AI "really believes" rather than what it thinks humans want to hear. But there are still some challenges to overcome before this could work on future superhuman AIs.

Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
Mark XuΩ16358
1
AI safety researchers might be allocated too heavily to Anthropic compared to Google Deepmind Some considerations: * Safety researchers should want Google Deepmind (GDM) to have a robust and flourishing safety department. It seems plausible that GDM will be able to create "the smartest" models: they have lots of talent, and own lots of computers. (see e.g. https://epochai.org/data/notable-ai-models#computing-capacity) * Anthropic (ANT) might run into trouble in the future due to not owning their own computers, e.g. if Amazon (or where ever they're renting their computers from) starts their own internal scaling competitor, and decides to stop renting out most of their compute. * ANT has a stronger safety culture, and so it is a more pleasant experience to work at ANT for the average safety researcher. This suggests that there might be a systematic bias towards ANT that pulls away from the "optimal allocation". * GDM only recently started a bay area based safety research team/lab (with members like Alex Turner). So if people had previously decided to work for ANT based on location, they now have the opportunity to work for GDM without relocating. * I've heard that many safety researchers join ANT without considering working for GDM, which seems like an error, although I don't have 1st hand evidence for this being true. * ANT vs GDM is probably a less important consideration than “scaling lab” (ANT, OAI, GMD, XAI, etc.) vs “non scaling lab” (USAISI, UKAISI, Redwood, ARC, Palisade, METR, MATS, etc. (so many...)). I would advise people to think hard about how joining a scaling lab might inhibit their future careers by e.g. creating a perception they are “corrupted” (in addition to strengthening them, which I expect people to spend more time thinking about by default). * Because ANT has a stronger safety culture, doing safety at GDM involve more politics and navigating around buerearcracy, and thus might be less productive. This consideration applies most if you
(still) speculative, but I think the pictures of Shard Theory, activation engineering and Simulators (and e.g. Bayesian interpretations of in-context learning) are looking increasingly similar: https://www.lesswrong.com/posts/dqSwccGTWyBgxrR58/turntrout-s-shortform-feed?commentId=qX4k7y2vymcaR6eio  https://www.lesswrong.com/posts/dqSwccGTWyBgxrR58/turntrout-s-shortform-feed#SfPw5ijTDi6e3LabP 
* Psychotic “delusions” are more about holding certain genres of idea with a socially inappropriate amount of intensity and obsession than holding a false idea. Lots of non-psychotic people hold false beliefs (eg religious people). And, interestingly, it is absolutely possible to hold a true belief in a psychotic way. * I have observed people during psychotic episodes get obsessed with the idea that social media was sending them personalized messages (quite true; targeted ads are real) or the idea that the nurses on the psych ward were lying to them (they were). * Preoccupation with the revelation of secret knowledge, with one’s own importance, with mistrust of others’ motives, and with influencing others' thoughts or being influenced by other's thoughts, are classic psychotic themes. * And it can be a symptom of schizophrenia when someone’s mind gets disproportionately drawn to those themes. This is called being “paranoid” or “grandiose.” * But sometimes (and I suspect more often with more intelligent/self-aware people) the literal content of their paranoid or grandiose beliefs is true! * sometimes the truth really has been hidden! * sometimes people really are lying to you or trying to manipulate you! * sometimes you really are, in some ways, important! sometimes influential people really are paying attention to you! * of course people influence each others' thoughts -- not through telepathy but through communication! * a false psychotic-flavored thought is "they put a chip in my brain that controls my thoughts." a true psychotic-flavored thought is "Hollywood moviemakers are trying to promote progressive values in the public by implanting messages in their movies." * These thoughts can come from the same emotional drive, they are drawn from dwelling on the same theme of "anxiety that one's own thoughts are externally influenced", they are in a deep sense mere arbitrary verbal representations of a single mental phenomenon...
It is disappointing/confusing to me that of the two articles I recently wrote, the one that was much closer to reality got a lot less karma. * A new process for mapping discussions is a summary of months of work that I and my team did on mapping discourse around AI.  We built new tools, employed new methodologies. It got 19 karma * Advice for journalists is a piece that I wrote in about 5 hours after perhaps 5 hours of experiences. It has 73 karma and counting I think this is isn't much evidence, given it's just two pieces. But I do feel a pull towards coming up with theories rather than building and testing things in the real world. To the extent this pull is real, it seems bad. If true, I would recommend both that more people build things in the real world and talk about them and that we find ways to reward these posts more, regardless of how alive they feel to us at the time. (Aliveness being my hypothesis - many of us understand or have more live feelings about dealing with journalists than a sort of dry post about mapping discourse)
I feel like there should exist a more advanced sequence that explains problems with filtered evidence leading to “confirmation bias”. I think the Luna sequence is already a great step in the right direction. I do feel like there is a lack of the equivalent non-fiction version, that just plainly lays out the issue. Maybe what I am envisioning is just a version of What evidence filtered evidence with more examples of how to practice this skill (applied to search engines, language models, someone’s own thought process, information actively hidden from you, rationality in groups etc.).

Popular Comments

Recent Discussion

Joshua Achiam is the OpenAI Head of Mission Alignment

I start off this post with an apology for two related mistakes from last week.

The first is the easy correction: I incorrectly thought he was the head of ‘alignment’ at OpenAI rather than his actual title ‘mission alignment.’

Both are important, and make one’s views important, but they’re very different.

The more serious error, which got quoted some elsewhere, was: In the section about OpenAI, I noted some past comments from Joshua Achiam, and interpreted them as him lecturing EAs that misalignment risk from AGI was not real.

While in isolation I believe this is a reasonable way to interpret this quote, this issue is important to get right especially if I’m going to say things like that. Looking at it only...

(still) speculative, but I think the pictures of Shard Theory, activation engineering and Simulators (and e.g. Bayesian interpretations of in-context learning) are looking increasingly similar: https://www.lesswrong.com/posts/dqSwccGTWyBgxrR58/turntrout-s-shortform-feed?commentId=qX4k7y2vymcaR6eio 

https://www.lesswrong.com/posts/dqSwccGTWyBgxrR58/turntrout-s-shortform-feed#SfPw5ijTDi6e3LabP 

You’ve probably seen this chart from Mark Perry at the American Enterprise Institute.

I’ve seen this chart dozens of times and have always enjoyed how many different and important stories it can tell.

There is a story of the incredible abundance offered by technological growth and globalization. Compared to average hourly wages, cars, furniture, clothing, internet access, software, toys, and TVs have become far more accessible than they were 20 years ago. Flatscreens and Fiats that were once luxuries are now commodities.

There is also a story of sclerosis and stagnation. Sure, lots of frivolous consumer goods have gotten cheaper but healthcare, housing, childcare, and education, all the important stuff, has exploded in price. Part of this is “cost disease” where the high productivity of labor in advancing industries like...

The health and education categories would be quite different in most european countries

Gurnee & Tegmark (2023) trained linear probes to take an LLM's internal activation on a landmark's name (e.g. "The London Eye"), and predict the landmark's longitude and latitude. The results look like this:[1]

 

Two angles of true world atlas, with predicted atlas hovering above. True locations are red points; predicted locations are blue, in a slightly raised plane, linked to the corresponding true location by a grey line.

So LLMs (or at least, Llama 2, which they used for this experiment) contain a pretty good linear representation of an atlas.

Sometimes, like when thinking about distances, a globe is more useful than an atlas. Do models use the globe representation? To find out, we can train probes to predict the (x,y,z) coordinates of landmarks, viewed as living in 3D space....

This is cool, although I suspect that you'd get something similar from even very simple models that aren't necessarily "modelling the world" in any deep sense, simply due to first and second order statistical associations between nearby place names. See e.g. https://onlinelibrary.wiley.com/doi/pdfdirect/10.1111/j.1551-6709.2008.01003.x , https://escholarship.org/uc/item/2g6976kg .

Viliam20

Well, karma is not a perfect tool. It is good at keeping good stuff above zero and bad stuff below zero, by distributed effort. It is not good at quantifying how good or how bad the stuff is.

Solving alignment = positive karma. Cute kitten = positive karma. Ugly kitten = negative karma. Promoting homeopathy = negative karma.

It is a good tool for removing homeopathy and ugly kittens. Without it, we would probably have more of those. So despite all the disadvantages, I want the karma system to stay. Until perhaps we invent something better.

I think we currentl... (read more)

2MondSemmel
I see. Then I'll point to my feedback in the other comment and say that the journalism post was likely better written despite your lower time investment. And that if you spend a lot of time on a post, I recommend spending more of that time on the title in particular, because of the outsized importance of a) getting people to click on your thing and b) having people understand what the thing you're asking them to click on even is. Here are two long comments on this topic. Separately, when it comes to the success of stuff like blog posts, I like the framing in Ben Kuhn's post Searching for outliers, about the implications for activities (like blogging) whose impacts are dominated by heavy-tailed outcomes.
2MondSemmel
You may think the post is far more important and well informed, but if it isn't sufficiently clear, then maybe that didn't come across to your audience.
2Nathan Young
I would have a dialogue with someone on whether Piper should have revealed SBF's messages. Happy to take either side.

I interact with journalists quite a lot and I have specific preferences. Not just for articles, but for behaviour. And journalists do behave pretty strangely at times. 

This account comes from talking to journalists on ~10 occasions. Including being quoted in ~5 articles. 

Privacy

I do not trust journalists to abide by norms of privacy. If I talk to a friend and without asking, share what they said, with their name attached, I expect they'd be upset. But journalists regularly act as if their profession sets up the opposite norm - that everything is publishable, unless explicitly agreed otherwise. This is bizarre to me. It's like they have taken a public oath to be untrustworthy.

Perhaps they would argue that it’s a few bad journalists who behave like this, but how...

Solutions do not have to be perfect to be useful. Trust can be built up over time.

misinformation is impossible to combat

If you take the US government who at the same time tells Facebook not to delete the anti-vaccine misinformation that the US government is spreading while telling Facebook to delete certain anti-vaccine misinformation that the US government doesn't like, it's obvious that the institutions aren't trustworthy and thus they have a hard time fighting misinformation. 

If the US government would stop lying, it would find it a lot easier to f... (read more)

2Viliam
I mostly agree, just some nitpicking: This is exactly an example where if you also record the conversation, and then write a short post saying "I said this ..., he reported that ..., listen for yourself here ...", this should make me dramatically lose credibility among anyone who knows you. (Plus a small chance of your article getting viral. Or at least anytime anyone mentions my name in the future, someone else can link your article in reply.) Also, if e.g. everyone in the rationalist community started doing this, we could collectively keep one wiki page containing all of this. (A page with more examples is a more useful resource.) And every rationalist who doesn't have previous experience with journalists could easily look up a name there.
3simeon_c
This article fails to account for the fact that abiding by the rules suggested would mostly kill the ability of journalists to share the most valuable information they share with the public. You don't get to reveal stuff from the world most powerful organizations if you double check the quotes with them. I think journalism is one of the professions where the consequentialist vs deontological ethics have the toughest trade-offs. It's just really hard to abide by very high privacy standards and broke highly important news. As one illustrative example, your standard would have prevented Kelsey Piper from sharing her conversation with SBF. Is that a desirable outcome? Not sure.
4ChristianKl
I don't think the most important stuff that journalist reveal comes from the people who misspeak in interviews. It rather comes from the journalist having strong relationships with sources that are willing to tell the journalist about real problems. Investigative journalism is often about finding someone within an organization that's actually cares about exposing problems, and it's quite important to be portray the position of the person who's exposing the problems as accurately as possible to affect real change. If the journalists makes mistakes in portraying the positions it's a lot easier for a company to talk the problem away then when the problem is accurately described. 
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

How can we make many humans who are very good at solving difficult problems?

Summary (table of made-up numbers)

I made up the made-up numbers in this table of made-up numbers; therefore, the numbers in this table of made-up numbers are made-up numbers.

Call to action

If you have a shitload of money, there are some projects you can give money to that would make supergenius humans on demand happen faster. If you have a fuckton of money, there are projects whose creation you could fund that would greatly accelerate this technology.

If you're young and smart, or are already an expert in either stem cell / reproductive biology, biotech, or anything related to brain-computer interfaces, there are some projects you could work on.

If neither, think hard, maybe I missed something.

You can...

1papetoast
Manifold is pretty weak evidence for anything >=1 year away because there are strong incentives to bet on short time markets.
6RogerDearnaley
You're assuming a steady state. Firstly, evolution takes time. Secondly, if humans were, for example, in an intelligence arms-race with other humans (for example, if smarter people can reliably con dumber people out of resources often enough to get a selective advantage out of it), then the relative genetic fitness of a specific intelligence level can vary over time, depending on how it compares to the rest of the population. Similarly, if much of the advantage of an IQ of 150 requires being able to find enough IQ 150 coworkers to collaborate with, then the relative genetic fitness of IQ 150 depends on the IQ profile of the rest of the population. 
6Jackson Wagner
Maybe other people have a very different image of meditation than I do, such that they imagine it as something much more delusional and hyperreligious? Eg, some religious people do stuff like chanting mantras, or visualizing specific images of Buddhist deities, which indeed seems pretty crazy to me. But the kind of meditation taught by popular secular sources like Sam Harris's Waking Up app, (or that I talk about in my "Examining The Witness" youtube series about the videogame The Witness), seems to me obviously much closer to basic psychology or rationality techniques than to religious practices. Compare Sam Harris's instructions about paying attention to the contents of one's experiences, to Gendlin's idea of "Circling", or Yudkowsky's concept of "sit down and actually try to think of solutions for five minutes", or the art of "noticing confusion", or the original Feynman essay where he describes holding off on proposing solutions. So it's weird to me when people seem really skeptical of meditation and set a very high burden of proof that they wouldn't apply for other mental habits like, say, CFAR techniques. I'm not like a meditation fanatic -- personally I don't even meditate these days, although I feel bad about not doing it since it does make my life better. (Just like how I don't exercise much anymore despite exercise making my day go better, and I feel bad about that too...) But once upon a time I just tried it for a few weeks, learned a lot of interesting stuff, etc. I would say I got some mundane life benefits out of it -- some, like exercise or good sleep, that only lasted as long as I kept up the habit. and other benefits were more like mental skills that I've retained to today. I also got some very worthwhile philosophical insights, which I talk about, albeit in a rambly way mixed in with lots of other stuff, in my aforementioned video series. I certainly wouldn't say the philosophical insights were the most important thing in my whole life, or anythi
Viliam20

Thanks for answering my question directly in the second half.

I find the testimonies of rationalists who experimented with meditation less convincing than perhaps I should, simply because of selection bias. People who have pre-existing affinity towards "woo" will presumably be more likely to try meditation. And they will be more likely to report that it works, whether it does or not. I am not sure how much should I discount for this, perhaps I overdo it. I don't know.

A proper experiment would require a control group -- some people who were originally skepti... (read more)

I feel like there should exist a more advanced sequence that explains problems with filtered evidence leading to “confirmation bias”. I think the Luna sequence is already a great step in the right direction. I do feel like there is a lack of the equivalent non-fiction version, that just plainly lays out the issue. Maybe what I am envisioning is just a version of What evidence filtered evidence with more examples of how to practice this skill (applied to search engines, language models, someone’s own thought process, information actively hidden from you, ra... (read more)

During the last Foresight Intelligent Cooperation Workshop I got very curious about what collective intelligence tools currently exist. A list:

...