Best of LessWrong 2022

Nate Soares reviews a dozen plans and proposals for making AI go well. He finds that almost none of them grapple with what he considers the core problem - capabilities will suddenly generalize way past training, but alignment won't.

Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
evhub7430
9
COI: I work at Anthropic and I ran this by Anthropic before posting, but all views are exclusively my own. I got a question about Anthropic's partnership with Palantir using Claude for U.S. government intelligence analysis and whether I support it and think it's reasonable, so I figured I would just write a shortform here with my thoughts. First, I can say that Anthropic has been extremely forthright about this internally, and it didn't come as a surprise to me at all. Second, my personal take would be that I think it's actually good that Anthropic is doing this. If you take catastrophic risks from AI seriously, the U.S. government is an extremely important actor to engage with, and trying to just block the U.S. government out of using AI is not a viable strategy. I do think there are some lines that you'd want to think about very carefully before considering crossing, but using Claude for intelligence analysis seems definitely fine to me. Ezra Klein has a great article on "The Problem With Everything-Bagel Liberalism" and I sometimes worry about Everything-Bagel AI Safety where e.g. it's not enough to just focus on catastrophic risks, you also have to prevent any way that the government could possibly misuse your models. I think it's important to keep your eye on the ball and not become too susceptible to an Everything-Bagel failure mode.
quila3410
2
nothing short of death can stop me from trying to do good. the world could destroy or corrupt EA, but i'd remain an altruist. it could imprison me, but i'd stay focused on alignment, as long as i could communicate to at least one on the outside. even if it tried to kill me, i'd continue in the paths through time where i survived.
It seems the pro-Trump Polymarket whale may have had a real edge after all. Wall Street Journal reports (paywalled link, screenshot) that he’s a former professional trader, who commissioned his own polls from a major polling firm using an alternate methodology—the neighbor method, i.e. asking respondents who they expect their neighbors will vote for—he thought would be less biased by preference falsification. I didn't bet against him, though I strongly considered it; feeling glad this morning that I didn't.
Epistemic state: thoughts off the top of my head, not the economist at all, talked with Claude about it Why is there almost nowhere a small (something like 1%) universal tax on digital money transfers? It looks like a good idea to me: * it's very predictable * no one except banks has to do any paperwork * it's kinda progressive, if you are poor you can use cash I see probable negative effects... but doesn't VAT and individial income tax just already have the same effects, so if this tax replace [parts of] those nothing will change much? Also, as I understand, it would discourage high-frequency trading. I'm not sure if this would be a feature or a bug, but my current very superficial understanding leans towards the former. Why is it a bad idea? 
Eli Tyre15-4
17
That no one rebuilt old OkCupid updates me a lot about how much the startup world actually makes the world better The prevailing ideology of San Francisco, Silicon Valley, and the broader tech world, is that startups are an engine (maybe even the engine) that drives progress towards a future that's better than the past, by creating new products that add value to people's lives. I now think this is true in a limited way. Software is eating the world, and lots of bureaucracy is being replaced by automation which is generally cheaper, faster, and a better UX. But I now think that this narrative is largely propaganda. It's been 8 years since Match bought and ruined OkCupid and no one, in the whole tech ecosystem, stepped up to make a dating app even as good as old OkC is a huge black mark against the whole SV ideology of technology changing the world for the better. Finding a partner is such a huge, real, pain point for millions of people. The existing solutions are so bad and extractive. A good solution has already been demonstrated. And yet not a single competent founder wanted to solve that problem for planet earth, instead of doing something else, that (arguably) would have been more profitable. At minimum, someone could have forgone venture funding and built this as a cashflow business. It's true that this is a market that depends on economies of scale, because the quality of your product is proportional to the size of your matching pool. But I don't buy that this is insurmountable. Just like with any startup, you start by serving a niche market really well, and then expand outward from there. (The first niche I would try for is by building an amazing match-making experience for female grad students at a particular top university. If you create a great experience for the women, the men will come, and I'd rather build an initial product for relatively smart customers. But there are dozens of niches one could try for.) But it seems like no one tried to recreate

Popular Comments

Recent Discussion

Pablo20

Why do you focus on this particular guy? Tens of thousands of traders were cumulatively betting billions of dollars in this market. All of these traders faced the same incentives.

Note that it is not enough to assume that willingness to bet more money makes a trader worth paying more attention to. You need the stronger assumption that willingness to bet n times more than each of n traders makes the single trader worth paying more attention to than all the other traders combined. I haven’t thought much about this, but the assumption seems false to me. 

Summary

Four months after my post 'LLM Generality is a Timeline Crux', new research on o1-preview should update us significantly toward LLMs being capable of general reasoning, and hence of scaling straight to AGI, and shorten our timeline estimates.

Summary of previous post

In June of 2024, I wrote a post, 'LLM Generality is a Timeline Crux', in which I argue that 

  1. LLMs seem on their face to be improving rapidly at reasoning.
  2. But there are some interesting exceptions where they still fail much more badly than one would expect given the rest of their capabilities, having to do with general reasoning. Some argue based on these exceptions that much of their apparent reasoning capability is much shallower than it appears, and that we're being fooled by having trouble internalizing just how
...

Nice post!

Regarding o1 like models: I am still unsure how to draw the boundary between tasks that see a significant improvement with o1 style reasoning and tasks that do not. This paper sheds some light on the kinds of tasks that benefit from regular COT. However, even for mathematical tasks, which should benefit the most from CoT, o1-preview does not seem that much better than other models on extraordinarily difficult (and therefore OOD?) problems. I would love to see comparisons of o1 performance against other models in games like chess and Go. 

Also... (read more)

3eggsyntax
For sure! At the same time, a) we've continued to see new ways of eliciting greater capability from the models we have, and b) o1 could (AFAIK) involve enough additional training compute to no longer be best thought of as 'the same model' (one possibility, although I haven't spent much time looking into what we know about o1: they may have started with a snapshot of the 4o base model, put it through additional pre-training, then done an arbitrary amount of RL on CoT). So I'm hesitant to think that 'based on 4o' sets a very strong limit on o1's capabilities.
3eggsyntax
That all seems pretty right to me. It continues to be difficult to fully define 'general reasoning', and my mental model of it continues to evolve, but I think of 'system 2 reasoning' as at least a partial synonym. Agreed; not only are we very limited at it, but we often aren't doing it at all. I agree that it may be possible to achieve it with scaffolding even if LLMs don't get there on their own; I'm just less certain of it.
3eggsyntax
In the medium-to-long term I'm inclined to taboo the word and talk about what I understand as its component parts, which I currently (off the top of my head) think of as something like: * The ability to do deduction, induction, and abduction. * The ability to do those in a careful, step by step way, with almost no errors (other than the errors that are inherent to induction and abduction on limited data). * The ability to do all of that in a domain-independent way. * The ability to use all of that to build a self-consistent internal model of the domain under consideration.   Don't hold me to that, though, it's still very much evolving. I may do a short-form post with just the above to invite critique.

Claim: memeticity in a scientific field is mostly determined, not by the most competent researchers in the field, but instead by roughly-median researchers. We’ll call this the “median researcher problem”.

Prototypical example: imagine a scientific field in which the large majority of practitioners have a very poor understanding of statistics, p-hacking, etc. Then lots of work in that field will be highly memetic despite trash statistics, blatant p-hacking, etc. Sure, the most competent people in the field may recognize the problems, but the median researchers don’t, and in aggregate it’s mostly the median researchers who spread the memes.

(Defending that claim isn’t really the main focus of this post, but a couple pieces of legible evidence which are weakly in favor:

...

I agree that lab leaders are not in much better position, I just think that lab leaders causally screen off influence of subordinates, while incentives in the system causally screens off lab leaders.

3danwil
If an outsider's objective is to be taken seriously, they should write papers and submit them to peer review (e.g. conferences and journals). Yann LeCun has gone so far to say that independent work only counts as "science" if submitted to peer review: "Without peer review and reproducibility, chances are your methodology was flawed and you fooled yourself into thinking you did something great." - https://x.com/ylecun/status/1795589846771147018?s=19. From my experience, professors are very open to discuss ideas and their work with anyone who seems serious, interested, and knowledgeable. Even someone inside academia will face skepticism if their work uses completely different methods. They will have to very convincingly prove the methods are valid.
10jimmy
  Yes. More emphasis on concrete useful results, less emphasis on trying to find simple correlations in complex situations. For example, "Do power poses work?". They did studies like this one where they tell people to hold a pose for five minutes while preparing for a fake job interview, and then found that the pretend employers pretended to hire them more often in the "power pose" condition. Even assuming there's a real effect where those students from that university actually impress those judges more when they pose powerfully ahead of time... does that really imply that power posing will help other people get real jobs and keep them past the first day?  That's like studying "Are car brakes really necessary?" by setting up a short track and seeing if the people who run the "red light" progress towards their destination quicker. Contrast that with studying the cars and driving behaviors that win races, coming up with your theories, and testing them by trying to actually win races. You'll find out very quickly if your "brakes aren't needed" hypothesis is a scientific breakthrough or foolishly naive. Instead of studying "Does CBT work?", study the results of individual therapists, see if you can figure out what the more successful ones are doing differently than the less successful ones, and see if you can use what you learn to increase the effectiveness of your own therapy or the therapy of your students. If the answer turns out to be "The successful therapists all power pose pre-session, then perform textbook CBT" and that allows you to make better therapists, great. If it's something else, then you get to focus on the things that actually show up in the data. The results should speak for themselves. If they don't, and you aren't keeping in very close contact with real world results, then it's super easy to go astray with internal feedback loops because the loop that matters isn't closed.

The cleanest argument that current-day AI models will not cause a catastrophe is probably that they lack the capability to do so.  However, as capabilities improve, we’ll need new tools for ensuring that AI models won’t cause a catastrophe even if we can’t rule out the capability. Anthropic’s Responsible Scaling Policy (RSP) categorizes levels of risk of AI systems into different AI Safety Levels (ASL), and each level has associated commitments aimed at mitigating the risks. Some of these commitments take the form of affirmative safety cases, which are structured arguments that the system is safe to deploy in a given environment. Unfortunately, it is not yet obvious how to make a safety case to rule out certain threats that arise once AIs have sophisticated strategic abilities. The goal

...
BuckΩ220

If we can construct fake honeypots that are indistinguishable - to the policy - from real opportunities, and do this at scale, then that could suffice: we could drive the policy's P(deployment | great opportunity) very low. (This relies on how many high-stakes we can create and how many the model will see in deployment.)

Yes totally. (This is impractical if you think that the model has a very large number of opportunities to attack, e.g. if every time it's programming is an opportunity to attack.)

If we can construct fake attacks that are indistinguishable -

... (read more)

Here are several examples; I found these captchas via the web rather than generating them anew, but none of them came attached to solutions so I'm not sure their presence in the training data would affect things in any case. (That said, it's possible that the lower resolution of the latter two degraded the adversarial perturbation; I would appreciate a source of higher-resolution captchas if anyone happens to know one.)

It clearly couldn't see all the objects, but the owl was in fact the correct answer
Entertaining failure at basic numerals while nonetheless answering correctly here
This one I was surprised by; I expected the image to be too low-resolution to be comprehensible, but 8/9 are correct here (the middle left image is a chair with an unusually low back)
jbash20

CAPTCHAs have "adversarial perturbations"? Is that in the sense of "things not visible to humans, but specifically adversarial to deep learning networks"? I thought they just had a bunch of random noise and weird ad hoc patterns thrown over them.

Anyway, CAPTCHAs can't die soon enough. Although the fact that they persist in the face of multiple commercial services offering to solve 1000 for a dollar doesn't give me much hope...

There are two nuclear options for treating depression: Ketamine and TMS; This post is about the latter.

TMS stands for Transcranial Magnetic Stimulation. Basically, it fixes depression via magnets, which is about the second or third most magical things that magnets can do.

I don’t know a whole lot about the neuroscience - this post isn’t about the how or the why. It’s from the perspective of a patient, and it’s about the what.

What is it like to get TMS?

TMS

The Gatekeeping

For Reasons™, doctors like to gatekeep access to treatments, and TMS is no different. To be eligible, you generally have to have tried multiple antidepressants for several years and had them not work or stop working. Keep in mind that, while safe, most antidepressants involve altering your brain chemistry...

This isn't directly related to TMS, but I've been trying to get an answer to this question for years, and maybe you have one.

When doing TMS, or any depression treatment, or any supplementation experiment, etc. it would make sense to track the effects objectively (in addition to, not as a replacement for subjective monitoring). I haven't found any particularly good option for this, especially if I want to self-administer it most days. Quantified mind comes close, but it's really hard to use their interface to construct a custom battery and an indefinite experiment.

Do you know of anything?

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

In my bioinformatics work I often stream files between linux hosts and Amazon S3. This could look like:

$ scp host:/path/to/file /dev/stdout | \
    aws s3 cp - s3://bucket/path/to/file

This recently stopped working after upgrading:

ftruncate "/dev/stdout": Invalid argument
Couldn't write to "/dev/stdout": Illegal seek

I think I figured out why this is happening:

With scp I can give the -O flag:

Use the legacy SCP protocol for file transfers instead of the SFTP protocol. Forcing the use of the SCP protocol may be necessary for servers that do not implement SFTP, for backwards-compatibility for particular filename wildcard patterns and for expanding paths with a '~' prefix for older SFTP servers.

This does work, but it doesn't seem ideal: probably servers will drop support for the SCP protocol at some point? I've filed a bug with OpenSSH.


[1] "man scp" gives me: "Since OpenSSH 8.8 (8.7 in Red Hat/Fedora builds), scp has used the SFTP protocol for transfers by default."

Comment via: facebook, mastodon

jbash20

Using scp to stdout looks weird to me no matter what. Why not

ssh -n host cat /path/to/file | weird-aws-stuff

... but do you really want to copy everything twice? Why not run weird-aws-stuff on the remote host itself?

1Phib
A quick OpenAI-o1 preview BOTEC for additional emissions from a sort of Leopold scenario ~2030, assuming energy is mostly provided by natural gas, since I was kinda curious. Not much time spent on this and took the results at face value. I (of course?) buy that emissions don't matter in short term, in a world where R&D is increasingly automated and scaled. Phib: Say an additional 20% of US electricity was added to our power usage (e.g. for AI) over the next 6 years, and it was mostly natural gas.  Also, that AI inference is used at an increasing rate, say 100Million H100s equivalent, increasing from today's level to 100M over the next 6 years, and it's more or less the same as today's energy breakdown, although with more natural gas over time. What do the emissions from such a build out of natural gas look like? [Reasoning available here: https://chatgpt.com/share/672ed183-7c3c-800d-ab0e-d08020d072c7] ****Answer:** The emissions from this build-out of natural gas over the next six years would be significant. Specifically: - **Total Additional Emissions Over Six Years:** Approximately **1.136 billion metric tons of CO₂**. - **Average Annual Emissions Increase:** About **189 million metric tons of CO₂ per year**. - **Percentage of Current U.S. Emissions:** This represents an approximate **2.9% increase** over the current annual U.S. greenhouse gas emissions. These emissions result from the increased natural gas usage required to meet the additional electricity demand—both from the general 20% increase in electricity consumption and the substantial growth in AI inference workloads powered by the equivalent of 100 million NVIDIA H100 GPUs. Phib: Imagine there is a similar buildout in China, if not larger, how do the emissions from all of the above (within some confidence interval) place us environmentally within 2 years after 2030 of such a buildout and increase in emissions? Within 10 years? Considering a more or less constant rate of emissions thereafter fo
8Vladimir_Nesov
Emissions don't matter in the long term, ASI can reshape the climate (if Earth is not disassembled outright). They might matter before ASI, especially if there is an AI Pause. Which I think is still a non-negligible possibility if there is a recoverable scare at some point; probably not otherwise. Might be enforceable by international treaty through hobbling semiconductor manufacturing, if AI of that time still needs significant compute to adapt and advance.
Phib10

Yeah oops, meant long

Buck30

Another potential benefit of this is that Anthropic might get more experience deploying their models in high-security environments.

7Adam Scholl
I would guess it somewhat exacerbates risk. I think it's unlikely (~15%) that alignment is easy enough that prosaic techniques even could suffice, but in those worlds I expect things go well mostly because the behavior of powerful models is non-trivially influenced/constrained by their training. In which case I do expect there's more room for things to go wrong, the more that training is for lethality/adversariality. Given the present state of atheoretical confusion about alignment, I feel wary of confidently dismissing these sorts of basic, obvious-at-first-glance arguments about risk—like e.g., "all else equal, probably we should expect more killing people-type problems from models trained to kill people"—without decently detailed/strong countervailing models.
15Ben Pace
Personally, I think that overall it's good on the margin for staff at companies risking human extinction to be sharing their perspectives on criticisms and moving towards having dialogue at all, so I think (what I read as) your implicit demand for Evan Hubinger to do more work here is marginally unhelpful; I weakly think quick takes like this are marginally good. I will add: It's odd to me, Stephen, that this is your line for (what I read as) disgust at Anthropic staff espousing extremely convenient positions while doing things that seem to you to be causing massive harm. To my knowledge the Anthropic leadership has ~never engaged in public dialogue about why they're getting rich building potentially-omnicidal-minds with worthy critics like Hinton, Bengio, Russell, Yudkowsky, etc, so I wouldn't expect them or their employees to have high standards for public defenses of far less risky behavior like working with the US military.[1] 1. ^ As an example of the low standards for Anthropic's public discourse, notice how a recent essay about what's required for Anthropic to succeed at AI Safety by Sam Bowman (a senior safety researcher at Anthropic) flatly states "Our ability to do our safety work depends in large part on our access to frontier technology... staying close to the frontier is perhaps our top priority in Chapter 1" with ~no defense of this claim or engagement with the sorts of reasons that I consider adding a marginal competitor to the suicide race is an atrocity, or acknowledgement that this makes him personally very wealthy (i.e. he and most other engineers at Anthropic will make millions of dollars due to Anthropic acting on this claim).
2Stephen Fowler
No disagreement.  The community seems to be quite receptive to the opinion, it doesn't seem unreasonable to voice an objection. If you're saying it is primarily the way I've written it that makes it unhelpful, that seems fair. I originally felt that either question I asked would be reasonably easy to answer, if time was given to evaluating the potential for harm. However, given that Hubinger might have to run any reply by Anthropic staff, I understand that it might be negative to demand further work. This is pretty obvious, but didn't occur to me earlier. Ultimately, the original quicktake was only justifying one facet of Anthropic's work so that's all I've engaged with. It would seem less helpful to bring up my wider objections.  I don't expect them to have a high standard for defending Anthropic's behavior, but I do expect the LessWrong community to have a high standard for arguments.